Loading...
Please wait while we prepare your content
Please wait while we prepare your content
Solutions for Informatics Practices, Class 12, CBSE
Assertion. A quantile refers to equally distributed portion of a data set.
Reason. A median divides a distribution in 2 quantiles while a quartile divides a distribution in 4 quantiles.
Both A and R are true and R is the correct explanation of A.
Explanation
Quantile is a statistical term that refers to dividing the total distribution of given data into a given number of equal proportions. Each portion is called a quantile, and it represents a specific percentage of the data set. The median divides a distribution into two equal parts, representing the middle value of a data set when the data is sorted in ascending order. A quartile divides a distribution into four equal parts, with each quartile representing 25% of the data set.
Assertion. Data aggregation produces a summary statistics of a dataset.
Reason. Data aggregation summarizes data using statistical aggregation functions.
Both A and R are true and R is the correct explanation of A.
Explanation
Data aggregation produces a summary statistic of a dataset because it involves combining and summarizing data from multiple sources or data points, and then calculating a single summary statistic or a set of summary statistics. This process utilizes statistical aggregation functions, such as sum, average, count, minimum, and maximum.
Assertion. In a dataset, there can be missing values that cannot contribute to any computation.
Reason. In a dataset, NULL, NaN or None are considered the missing values.
Both A and R are true and R is the correct explanation of A.
Explanation
Missing values are the values that cannot contribute to any computation or we can say that missing values are the values that carry no computational significance. In a dataset, NULL, NaN or None are considered the missing values.
Assertion (A). The output of addition of two series will be NaN, if one of the elements or both the elements have no value(s).
Reason (R). While performing mathematical operations on a series, by default all missing values are filled in with 0.
A is true but R is false.
Explanation
When adding two pandas Series and encountering NaN values in either or both Series, the result of the addition operation will be NaN for those corresponding elements. Additionally, pandas does not automatically replace missing values with 0 during mathematical operations but instead with NaN.
Quantile is a process of dividing the total distribution of given data into a given number of equal proportions.
Quartile refers to the division of the total distribution of given data into a four equal proportions with each containing one fourth of the total population.
iterrows()
Reason — The iterrows()
method iterates over a DataFrame row-wise, where each horizontal subset is in the form of (row-index, Series) where Series contains all column values for that row-index.
iteritems()
Reason — The iteritems()
method in pandas iterates over a DataFrame column-wise, where each vertical subset is in the form of (column-index, Series) where Series contains all row values for that column-index.
add, radd
Reason — We can add two DataFrame objects using either (+) operator or using add()
function as per syntax : <DF1>.add(<DF2>)
which means <DF1>+<DF2>
or by using radd()
function i.e., reverse add as per syntax : <DF1>.radd(<DF2>)
which means <DF2>+<DF1>
.
sub, rsub
Reason — We can subtract two DataFrame objects using either (-) operator or using sub()
function as per syntax : <DF1>.sub(<DF2>)
which means <DF1>-<DF2>
or by using rsub()
function i.e., reverse subtract as per syntax : <DF1>.rsub(<DF2>)
which means <DF2>-<DF1>
.
div, rdiv
Reason — We can divide two DataFrame objects using either (/) operator or using div()
function as per syntax : <DF1>.div(<DF2>)
which means <DF1>/<DF2>
or by using rdiv()
function i.e., reverse division as per syntax : <DF1>.rdiv(<DF2>)
which means <DF2>/<DF1>
.
DF.div(DF1)
Reason — The div()
function with the syntax <DF>.div(<DF1>)
, performs element-wise division between DF
and DF1
, which is equivalent to <DF>/<DF1>
in terms of functionality and output.
length()
Reason — In Pandas, the length()
function is not a valid function for DataFrames. Instead, we can use the len()
function to get the length of a DataFrame. The other functions listed (count(), sum(), mad()) are valid functions for performing operations on DataFrames.
quantile
Reason — Quantile is a process of dividing the total distribution of given data into a given number of equal proportions.
quartile
Reason — Quartile refers to the division of the total distribution of given data into a four equal proportions with each containing one fourth of the total population.
quantile()
Reason — The quantile()
function is used to divide the total distribution of given data into two equal parts. For example, to divide the data of DataFrame df
into two equal parts, the statement would be df.quantile(q = 0.5)
.
quantile()
Reason — The quantile()
function is used to divide the total distribution of given data into four equal parts. For example, to divide the data of DataFrame ndf
into four equal parts, the statement would be df.quantile(q = 0.25)
.
quantile()
Reason — To divide the total distribution of given data into eight equal parts, we can use the quantile()
function with a quantile value of 0.125 (1/8). For example, to divide the data of DataFrame df1
into eight equal parts, the statement would be df.quantile(q = 0.125)
.
cumsum()
Reason — The cumsum()
function in pandas is used to calculate the cumulative sum of a column in a DataFrame i.e., in the output of this function, the value of each row is replaced by the sum of all prior rows, including this row.
head(), head(5)
Reason — The head()
function in pandas is used to get the top n
rows of a DataFrame, where n
is an optional argument. If n
is not provided, it defaults to 5. Hence, both head()
and head(5)
are correct and will return the top 5 rows of the DataFrame.
tail(3)
Reason — The tail()
function in pandas is used to get the bottom n
rows of a DataFrame, where n
is an optional argument. Hence, tail(3)
is correct and will return the bottom 3 rows of the DataFrame.
join()
Reason — The join()
function in pandas is a patching method, which means that it aligns the DataFrames based on a common index or set of keys, and then combines them.
False
Reason — The iteritems()
method in pandas iterates over a DataFrame column-wise, where each vertical subset is in the form of (column-index, Series) where Series contains all row values for that column-index.
True
Reason — The iteritems()
method in pandas iterates over a DataFrame column-wise, where each vertical subset is in the form of (column-index, Series) where Series contains all row values for that column-index.
False
Reason — The sub()
and rsub()
functions produce different results because they subtract the operands in a different order. The sub()
function performs element-wise subtraction between two DataFrames, subtracting the right operand from the left operand. The syntax is <DF1>.sub(<DF2>)
, which means <DF1> - <DF2>
. On the other hand, the rsub()
function performs element-wise subtraction with the right operand subtracted from the left operand. The syntax is <DF1>.rsub(<DF2>)
, which means <DF2> - <DF1>
.
True
Reason — When adding two DataFrame objects using the add()
function with the syntax <DF1>.add(<DF2>)
, it means <DF1> + <DF2>
. Similarly, using the radd()
function with the syntax <DF1>.radd(<DF2>)
means <DF2> + <DF1>
. Both operations produce the same results because addition is commutative, meaning the order of operands does not affect the result.
False
Reason — The div()
and rdiv()
functions do not produce the same result. The div()
function performs element-wise division between two DataFrames, where the left operand is the dividend and the right operand is the divisor. The syntax is <DF1>.div(<DF2>)
, which means <DF1> / <DF2>
. On the other hand, the rdiv()
function performs element-wise division with the right operand as the dividend and the left operand as the divisor. The syntax is <DF1>.rdiv(<DF2>)
, which means <DF2> / <DF1>
.
False
Reason — The info()
method in pandas provides basic information about a DataFrame, including the data types of each column, the number of rows, and the memory usage. It also displays the index name and type. On the other hand, the describe()
method provides detailed descriptive statistics about the numerical columns in a DataFrame, including count, mean, standard deviation, minimum value, 25th percentile, median (50th percentile), 75th percentile, and maximum value.
True
Reason — In pandas, both the '+' operator and the add()
function can be used to perform element-wise addition between two DataFrames, resulting in the same output. The syntax for the '+' operator is <DF1> + <DF2>
, while the syntax for the add()
function is <DF1>.add(<DF2>)
, which is equivalent to <DF1> + <DF2>
.
False
Reason — The rsub()
function and the '-' operator in pandas do not produce the same result because the rsub()
function performs reverse subtraction, which means it subtracts the left operand from the right, while the '-' subtracts the right operand from the left operand.
False
Reason — The '-' operator in pandas subtracts the right operand from the left operand, similar to the sub()
function. However, the rsub()
function subtracts the left operand from the right operand. Therefore, '-' and sub()
produce the same result, while rsub()
produces a different result due to the order of operands.
False
Reason — The function sum()
returns the sum of the values for the requested axis. On the other hand, cumsum()
calculates the cumulative sum. In the output of this function, the value of each row is replaced by the sum of all prior rows, including this row.
True
Reason — The fillna()
function in pandas can be used to fill individual missing values for different columns. This can be done by passing a dictionary with column names as keys and the corresponding filling values as values to the fillna()
function.
(i) A.add(B) — A + B
(ii) B.add(A) — B + A
(iii) A.sub(B) — A - B
(iv) B.sub(A) — B - A
(v) A.rsub(B) — B - A
(vi) B.mul(A) — B * A
(vii) A.rdiv(B) — B / A
(viii) B.div(A) — B / A
(ix) B.rdiv(A) — A / B
(x) A.div(B) — A / B
The sub()
and rsub()
functions produce different results because they subtract the operands in a different order. The sub()
function performs element-wise subtraction between two DataFrames, subtracting the right operand from the left operand. The syntax is <DF1>.sub(<DF2>)
, which means <DF1> - <DF2>
. On the other hand, the rsub()
function performs element-wise subtraction with the right operand subtracted from the left operand. The syntax is <DF1>.rsub(<DF2>)
, which means <DF2> - <DF1>
.
Write appropriate functions to perform the following on a DataFrame ?
(i) Calculate the sum
(ii) Count the values
(iii) Calculate the average
(iv) Calculate the most repeated value
(v) Calculate the median
(vi) Calculate the standard deviation
(vii) Calculate the variance
(viii) Calculate the maximum value
(ix) Calculate the standard deviation
(x) Calculate the variance
(i) Calculate the sum — sum() function
(ii) Count the values — count() function
(iii) Calculate the average — mean() function
(iv) Calculate the most repeated value — mode() function
(v) Calculate the median — median() function
(vi) Calculate the standard deviation — std() function
(vii) Calculate the variance — var() function
(viii) Calculate the maximum value — max() function
(ix) Calculate the standard deviation — std() function
(x) Calculate the variance — var() function
The info()
method in pandas provides basic information about a DataFrame, including the data types of each column, the number of rows, and the memory usage. It also displays the index name and type. On the other hand, the describe()
method provides detailed descriptive statistics about the numerical columns in a DataFrame, including count, mean, standard deviation, minimum value, 25th percentile, median (50th percentile), 75th percentile, and maximum value.
No, the sum()
and add()
functions are not the same in pandas. The sum()
function in pandas calculates the sum of values along a specified axis. On the other hand, the add()
function in pandas is used to add two objects element-wise.
The numeric_only = True
argument is passed to statistics functions in pandas to consider only the numeric values (int, float, boolean columns) for calculation.
When the mode()
function in pandas returns multiple values for a column (meaning there are multiple modes), but other columns have a single mode, pandas fills the non-mode columns with NaN values.
Quantile is a process of dividing the total distribution of given data into a given number of equal proportions.
Quartile refers to the division of the total distribution of given data into a four equal proportions with each containing one fourth of the total population.
The dropna()
method in pandas can be used to remove rows or columns with missing values from a DataFrame. However, this can also lead to a loss of non-null data, which may not be desirable for data analysis. To avoid losing non-null data, we can use the fillna()
method to fill missing values with a specified value. This can help to ensure that the dataset is complete and can be used for data analysis.
To iterate over a DataFrame in Python, we can use various methods depending on our specific needs. We can use the iterrows()
method to iterate over each row in the DataFrame. In this method, each horizontal subset is in the form of (row-index, series), where the series contains all column values for that row-index. Additionally, we can use the iteritems()
method to iterate over each column in the DataFrame. Here, each vertical subset is in the form of (column-index, series), where the series contains all row values for that column-index.
Binary operations refer to operations that require two values to perform and these values are picked element-wise. In a binary operation involving DataFrames, the data from the two DataFrames are aligned based on their row and column indexes. For matching row and column indexes, the specified operation is performed, while for non-matching row and column indexes, NaN values are stored in the result. The functions that perform binary operations on a DataFrame include add(), radd(), sub(), rsub(), mul(), div(), and rdiv().
A descriptive statistic is a summary statistic that quantitatively describes or summarizes features of a collection of information.
The functions that perform descriptive statistics on a DataFrame are min(), max(), idxmax(), idxmin(), mode(), mean(), median(), count(), sum(), quantile(), std(), var().
The info()
and describe()
functions in pandas are considered inspection functions because they provide valuable insights into the structure, content, and statistical summary of a DataFrame. The info()
method in pandas provides basic information about a DataFrame, including the data types of each column, the number of rows, and the memory usage. It also displays the index name and type. On the other hand, the describe()
method provides detailed descriptive statistics about the numerical columns in a DataFrame, including count, mean, standard deviation, minimum value, 25th percentile, median (50th percentile), 75th percentile, and maximum value.
The sum provides the total of all numbers in a sequence, while the cumulative sum represents the running total of numbers encountered up to a specific point in the sequence.
The sum()
function gives the total sum of values along specified axes, whereas the cumsum()
function provides the cumulative sum of values along specified axes, either column-wise or row-wise.
The cumulative functions provided by Pandas are cumsum(), cumprod(), cummax(), cummin().
cumsum() — It calculates cumulative sum i.e., in the output of this function, the value of each row is replaced by sum of all prior rows including this row. The syntax is <DF>.cumsum([axis = None])
.
cumprod() — It calculates cumulative product of values in a DataFrame object.
cummax() — It calculates cumulative maximum of value from a DataFrame object.
cummin() — It calculates cumulative minimum of value from a DataFrame object.
The head()
function in pandas retrieves the top n rows of a DataFrame, where n is an optional argument defaulting to 5 if not provided. It is used with the syntax <DF>.head()
. Similarly, the tail()
function in pandas fetches the bottom n rows of a DataFrame, where n is also optional and defaults to 5 if not specified. Its syntax is <DF>.tail()
.
When a column in a pandas DataFrame contains a mixture of integers and missing values (NaNs), the data type is automatically changed to a floating-point type to accommodate these missing values because in Python integer types cannot store NaN values.
The quantile()
function returns the values at the given quantiles over requested axis (axis 0 or 1). On the other hand, var()
function computes variance and returns the unbiased variance over the requested axis.
Quartile refers to the division of the total distribution of given data into a four equal proportions with each containing one fourth of the total population.
Quartiles specifically divide the dataset into four equal parts, while quantiles can divide the dataset into any desired number of equal parts.
Assume that required libraries (Pandas and Numpy) are imported and DataFrame ndf has been created as shown in solved problem 16. Predict the output produced by following code fragment :
print(ndf[ndf["age"] > 30])
print(ndf.head(2))
print(ndf.tail(3))
The following DataFrame ndf
is from solved problem 16 :
Name Sex Position City age Projects Budget
0 Rabina F Manager Bangalore 30 13 48
1 Evan M Programer New Delhi 27 17 13
2 Jia F Manager Chennai 32 16 32
3 Lalit M Manager Mumbai 40 20 21
4 Jaspreet M Programmer Chennai 28 21 17
5 Suji F Programmer Bangalore 32 14 10
Name Sex Position City age Projects Budget
2 Jia F Manager Chennai 32 16 32
3 Lalit M Manager Mumbai 40 20 21
5 Suji F Programmer Bangalore 32 14 10
Name Sex Position City age Projects Budget
0 Rabina F Manager Bangalore 30 13 48
1 Evan M Programer New Delhi 27 17 13
Name Sex Position City age Projects Budget
3 Lalit M Manager Mumbai 40 20 21
4 Jaspreet M Programmer Chennai 28 21 17
5 Suji F Programmer Bangalore 32 14 10
print(ndf[ndf["age"] > 30])
— The ndf["age"] > 30
creates a boolean mask where True indicates rows where the "age" column has values greater than 30. Then ndf[...]
uses this boolean mask to filter the DataFrame ndf
, returning only rows where the condition is True.
print(ndf.head(2))
— It prints the first 2 rows of the DataFrame ndf
.
print(ndf.tail(3))
— It prints the last 3 rows of the DataFrame ndf
.
(a) print(dfc1 + dfc2)
— This statement tries to add two DataFrames dfc1
and dfc2
. However, they have different shapes (dfc1
has 3 rows and 2 columns, while dfc2
has 2 rows and 3 columns), and their indices do not match. DataFrame addition requires both DataFrames to have the same shape and compatible indices, which is not the case here.
(b) print(dfc1.sub(dfc2))
— This statement attempts to subtract dfc1
from dfc2
using the sub()
method. Similar to addition, subtraction between DataFrames requires them to have the same shape and compatible indices, which is not satisfied here due to the mismatch in shapes and indices.
(c) print(dfc1 * dfc2)
— This statement tries to perform element-wise multiplication between dfc1
and dfc2
. Again, this operation requires both DataFrames to have the same shape and compatible indices, which is not the case here due to the mismatched shapes and indices.
Consider the following code that creates two DataFrames :
ore1 = pd.DataFrame(np.array([[20, 35, 25, 20], [11, 28, 32, 29]]),
columns = ['iron', 'magnesium', 'copper', 'silver'])
ore2 = pd.DataFrame(np.array([[14, 34, 26, 26], [33, 19, 25, 23]]),
columns = ['iron', 'magnesium', 'gold', 'silver'])
What will be the output produced by the following code fragments ?
(a) print(ore1 + ore2)
ore3 = ore1.radd(ore2)
print(ore3)
(b) print(ore1 - ore2)
ore3 = ore1.rsub(ore2)
print(ore3)
(c) print(ore1 * ore2)
ore3 = ore1.mul(ore2)
print(ore3)
(d) print(ore1 / ore2)
ore3 = ore1.rdiv(ore2)
print(ore3)
(a)
copper gold iron magnesium silver
0 NaN NaN 34 69 46
1 NaN NaN 44 47 52
copper gold iron magnesium silver
0 NaN NaN 34 69 46
1 NaN NaN 44 47 52
print(ore1 + ore2)
: This line attempts to add the DataFrames ore1
and ore2
using the '+' operator. When adding DataFrames with different shapes and column names, pandas aligns the DataFrames based on indices and columns, resulting in NaN values where elements are missing in either DataFrame.ore3 = ore1.radd(ore2)
: This line uses the radd()
method, which reverses the addition operation between DataFrames ore1
and ore2
.radd()
function and '+' operator produce the same result in pandas, as the order of operands does not affect addition due to its commutative property.(b)
copper gold iron magnesium silver
0 NaN NaN 6 1 -6
1 NaN NaN -22 9 6
copper gold iron magnesium silver
0 NaN NaN -6 -1 6
1 NaN NaN 22 -9 -6
print(ore1 - ore2)
: This line performs a subtraction operation between corresponding elements in DataFrames ore1
and ore2
. When subtracting DataFrames with different shapes and column names, pandas aligns the DataFrames based on indices and columns, resulting in NaN values where elements are missing in either DataFrame.ore3 = ore1.rsub(ore2)
: This line uses the rsub()
method, which reverses the subtraction operation between DataFrames ore1
and ore2
.rsub()
function and the '-' operator in pandas do not produce the same result because the rsub()
function performs reverse subtraction, which means it subtracts the left operand from the right, while the '-' subtracts the right operand from the left operand.(c)
copper gold iron magnesium silver
0 NaN NaN 280 1190 520
1 NaN NaN 363 532 667
copper gold iron magnesium silver
0 NaN NaN 280 1190 520
1 NaN NaN 363 532 667
print(ore1 * ore2)
: This line attempts to perform element-wise multiplication between the DataFrames ore1
and ore2
using the '*' operator. When multiplying DataFrames with different shapes and column names, pandas aligns the DataFrames based on indices and columns, resulting in NaN values where elements are missing in either DataFrame.ore3 = ore1.mul(ore2)
: This line uses the mul()
method to perform element-wise multiplication between DataFrames ore1
and ore2
.(d)
copper gold iron magnesium silver
0 NaN NaN 1.428571 1.029412 0.769231
1 NaN NaN 0.333333 1.473684 1.260870
copper gold iron magnesium silver
0 NaN NaN 0.7 0.971429 1.300000
1 NaN NaN 3.0 0.678571 0.793103
print(ore1 / ore2)
: This line attempts to perform element-wise division between the DataFrames ore1
and ore2
using the '/' operator. When dividing DataFrames with different shapes and column names, pandas aligns the DataFrames based on indices and columns, resulting in NaN values where elements are missing in either DataFrame.ore3 = ore1.rdiv(ore2)
: This line uses the rdiv()
method to perform reciprocal division between DataFrames ore1
and ore2
.Consider the DataFrame wdf as shown below :
minTemp | maxTemp | Rainfall | Evaporation | |
---|---|---|---|---|
0 | 2.9 | 8.0 | 24.3 | 0.0 |
1 | 3.1 | 14.0 | 26.9 | 3.6 |
2 | 6.2 | 13.7 | 23.4 | 3.6 |
3 | 5.3 | 13.3 | 15.5 | 39.8 |
4 | 6.3 | 17.6 | 16.1 | 2.8 |
5 | 5.4 | 18.2 | 16.9 | 0.0 |
6 | 5.5 | 21.1 | 18.2 | 0.2 |
7 | 4.8 | 18.3 | 17.0 | 0.0 |
8 | 3.6 | 20.8 | 19.5 | 0.0 |
9 | 7.7 | 19.4 | 22.8 | 16.2 |
10 | 9.9 | 24.1 | 25.2 | 0.0 |
11 | 11.8 | 28.5 | 27.3 | 0.2 |
12 | 13.2 | 29.1 | 27.9 | 0.0 |
13 | 16.8 | 24.1 | 30.9 | 0.0 |
14 | 19.4 | 28.1 | 31.2 | 0.0 |
15 | 21.6 | 34.4 | 32.1 | 0.0 |
16 | 20.4 | 33.8 | 31.2 | 0.0 |
17 | 18.5 | 26.7 | 30.0 | 1.2 |
18 | 18.8 | 32.4 | 32.3 | 0.6 |
19 | 17.6 | 28.6 | 33.4 | 0.0 |
20 | 19.7 | 30.3 | 33.4 | 0.0 |
(a) Write statement(s) to calculate minimum value for each of the columns.
(b) Write statement(s) to calculate maximum value for each of the rows.
(c) Write statement(s) to calculate variance for column Rainfall.
(d) Write statement(s) to compute mean , mode median for last 10 rows.
(a)
>>> wdf.min()
Minimum values for each column:
minTemp 2.9
maxTemp 8.0
Rainfall 15.5
Evaporation 0.0
dtype: float64
(b)
>>> wdf.max(axis=1)
Maximum values for each row:
0 24.3
1 26.9
2 23.4
3 39.8
4 17.6
5 18.2
6 21.1
7 18.3
8 20.8
9 22.8
10 25.2
11 28.5
12 29.1
13 30.9
14 31.2
15 34.4
16 33.8
17 30.0
18 32.4
19 33.4
20 33.4
dtype: float64
(c)
>>> wdf['Rainfall'].var()
Variance for column Rainfall: 38.852999999999994
(d)
>>> last_10_rows = wdf.tail(10)
>>> last_10_rows.mean()
>>> last_10_rows.mode()
>>> last_10_rows.median()
Mean for last 10 rows:
minTemp 17.78
maxTemp 29.60
Rainfall 30.97
Evaporation 0.20
dtype: float64
Mode for last 10 rows:
minTemp maxTemp Rainfall Evaporation
0 11.8 24.1 31.2 0.0
1 13.2 26.7 33.4 NaN
2 16.8 28.1 NaN NaN
3 17.6 28.5 NaN NaN
4 18.5 28.6 NaN NaN
5 18.8 29.1 NaN NaN
6 19.4 30.3 NaN NaN
7 19.7 32.4 NaN NaN
8 20.4 33.8 NaN NaN
9 21.6 34.4 NaN NaN
Median for last 10 rows:
minTemp 18.65
maxTemp 28.85
Rainfall 31.20
Evaporation 0.00
dtype: float64
Write a program to print a DataFrame one column at a time and print only first three columns.
import pandas as pd
data = {
'Name': ['Aliya', 'Hemanth', 'Charlie'],
'Age': [25, 30, 35],
'City': ['Bangalore', 'Chennai', 'Mumbai'],
'Salary': [50000, 60000, 70000]
}
df = pd.DataFrame(data)
first_three_columns = df.iloc[:, :3]
print("Each column:")
for column_name in first_three_columns:
column_data = first_three_columns[column_name]
print(column_name)
print(column_data)
Each column:
Name
0 Aliya
1 Hemanth
2 Charlie
Name: Name, dtype: object
Age
0 25
1 30
2 35
Name: Age, dtype: int64
City
0 Bangalore
1 Chennai
2 Mumbai
Name: City, dtype: object
Write a program to print a DataFrame one row at a time and print only first five rows.
import pandas as pd
data = {
'Name': ['Amruta', 'Harsh', 'Yogesh', 'Shreya', 'Zoya', 'Nyra'],
'Age': [25, 30, 35, 40, 45, 28],
'City': ['Chandigarh', 'Jaipur', 'Dehradun', 'Delhi', 'Vadodara', 'Guwahati']
}
df = pd.DataFrame(data)
first_five_rows = df.head(5)
print("Each row:")
for index, row in first_five_rows.iterrows():
print("Index:", index)
print(row)
Each row:
Index: 0
Name Amruta
Age 25
City Chandigarh
Name: 0, dtype: object
Index: 1
Name Harsh
Age 30
City Jaipur
Name: 1, dtype: object
Index: 2
Name Yogesh
Age 35
City Dehradun
Name: 2, dtype: object
Index: 3
Name Shreya
Age 40
City Delhi
Name: 3, dtype: object
Index: 4
Name Zoya
Age 45
City Vadodara
Name: 4, dtype: object
Write a program that performs count, sum, max, and min functions :
On rows
On columns
import pandas as pd
data = {
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
}
df = pd.DataFrame(data)
print("Operations on rows:")
print("Count:")
print(df.count(axis=1))
print("Sum:")
print(df.sum(axis=1))
print("Max:")
print(df.max(axis=1))
print("Min:")
print(df.min(axis=1))
print("\nOperations on columns:")
print("Count:")
print(df.count())
print("Sum:")
print(df.sum())
print("Max:")
print(df.max())
print("Min:")
print(df.min())
Operations on rows:
Count:
0 3
1 3
2 3
dtype: int64
Sum:
0 12
1 15
2 18
dtype: int64
Max:
0 7
1 8
2 9
dtype: int64
Min:
0 1
1 2
2 3
dtype: int64
Operations on columns:
Count:
A 3
B 3
C 3
dtype: int64
Sum:
A 6
B 15
C 24
dtype: int64
Max:
A 3
B 6
C 9
dtype: int64
Min:
A 1
B 4
C 7
dtype: int64
Take a DataFrame of your choice. Write a program to calculate count of values only in a selective column.
import pandas as pd
data = {
'Name': ['Deepika', 'Simran', 'Shivang', 'Anurag', 'Ankith'],
'Age': [25, 30, None, 40, 45],
'City': ['Goa', 'Kochi', 'Pondicherry', 'Rohtak', 'Kasol']
}
df = pd.DataFrame(data)
selected_column = 'Age'
count_values = df[selected_column].count()
print("Count of values in", selected_column, ":" , count_values)
Count of values in Age : 4
Give two identical DataFrames Sales16 and Sales17. But Sales17 has some values missing. Write code so that Sales17 fills its missing values from the corresponding entries of Sales16.
import pandas as pd
data_sales16 = {
'Product': ['A', 'B', 'C', 'D'],
'Sales': [100, 150, 120, 180]
}
Sales16 = pd.DataFrame(data_sales16)
data_sales17 = {
'Product': ['A', 'B', 'C', 'D'],
'Sales': [100, None, 120, None]
}
Sales17 = pd.DataFrame(data_sales17)
Sales17 = Sales16.fillna({'B': 150, 'D': 180})
print("Sales16:")
print(Sales16)
print("\nSales17 (after filling missing values):")
print(Sales17)
Sales16:
Product Sales
0 A 100
1 B 150
2 C 120
3 D 180
Sales17 (after filling missing values):
Product Sales
0 A 100
1 B 150
2 C 120
3 D 180
Write code that just produces single True/False as a result for the presence of missing values in whole DataFrame namely df.
import pandas as pd
data = {'A': [1, 2, None, 4], 'B': [None, 5, 6, 7], 'C': [8, 9, 10, 11]}
df = pd.DataFrame(data)
has_missing_values = False
for column in df.columns:
for value in df[column]:
if pd.isnull(value):
has_missing_values = True
break
print(has_missing_values)
True
Four Series objects Temp1, Temp2, Temp3 and Temp4 store the temperatures of week1, week2, week3 and week4 respectively. Create a DataFrame from these four series objects where the indexes should be 'Sunday', 'Monday', ... , 'Saturday', and columns should be 'Week1', 'Week2', 'Week3' and 'week4'.
import pandas as pd
weekdays = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
week1 = [25.4, 26.7, 27.2, 28.9, 29.8, 30.2, 31.5]
week2 = [22.3, 23.2, 24.6, 25.4, 26.8, 27.9, 28.4]
week3 = [20.1, 21.2, 22.4, 23.9, 24.7, 25.57, 26.78]
week4 = [18.4, 19.1, 20.3, 21.7, 22.67, 23.4, 24.09]
Temp1 = pd.Series(week1, index=weekdays)
Temp2 = pd.Series(week2, index=weekdays)
Temp3 = pd.Series(week3, index=weekdays)
Temp4 = pd.Series(week4, index=weekdays)
df = pd.DataFrame({'Week1': Temp1, 'Week2': Temp2, 'Week3': Temp3, 'Week4': Temp4})
print(df)
Week1 Week2 Week3 Week4
Sunday 25.4 22.3 20.10 18.40
Monday 26.7 23.2 21.20 19.10
Tuesday 27.2 24.6 22.40 20.30
Wednesday 28.9 25.4 23.90 21.70
Thursday 29.8 26.8 24.70 22.67
Friday 30.2 27.9 25.57 23.40
Saturday 31.5 28.4 26.78 24.09
Given a DataFrame that stores the details of past 25 years' monthly sales. Some old data is however missing. Write a script to calculate average :
Monthly sales across years
Yearly sales
Make sure that missing values do not hamper the overall result.
import pandas as pd
data = {
'Year': [],
'Month': [],
'Sales': []
}
n = int(input("Enter the number of rows in the DataFrame: "))
for i in range(n):
year = int(input("Enter the year: "))
month = int(input("Enter the month (1-12): "))
sales = float(input("Enter the sales amount: "))
data['Year'].append(year)
data['Month'].append(month)
data['Sales'].append(sales)
df = pd.DataFrame(data)
df.fillna(0, inplace=True)
avg_monthly_sales = df.groupby('Month')['Sales'].mean()
yearly_sales = df.groupby('Year')['Sales'].mean()
print("Average Monthly Sales Across Years:")
print(avg_monthly_sales)
print("\nYearly Sales:")
print(yearly_sales)
Average Monthly Sales Across Years:
Month
1 543.25
2 644.25
3 486.50
4 583.50
5 376.50
6 644.75
7 847.00
8 497.75
9 564.50
10 701.00
11 524.75
12 522.25
Name: Sales, dtype: float64
Yearly Sales:
Year
2000 781.082333
2001 480.030233
2002 390.530000
2003 682.917767
2004 421.250050
2005 408.333333
2006 509.757500
2007 634.333667
2008 907.250000
2009 876.222222
2010 940.757578
2011 690.900067
2012 623.250043
2013 890.088888
2014 491.500000
2015 601.919997
2016 623.250055
2017 489.086633
2018 523.750000
2019 789.910067
2020 456.250000
2021 481.083333
2022 590.750000
2023 618.916667
2024 621.250000
Name: Sales, dtype: float64
Given two identical DataFrames Sales16 and Sales17. But Sales17 has some values missing. Write code so that Sales17 fills its missing values from corresponding entries of Sales16.
import pandas as pd
data_sales16 = {
'Product': ['A', 'B', 'C', 'D'],
'Sales': [100, 150, 120, 180]
}
Sales16 = pd.DataFrame(data_sales16)
data_sales17 = {
'Product': ['A', 'B', 'C', 'D'],
'Sales': [100, None, 120, None]
}
Sales17 = pd.DataFrame(data_sales17)
Sales17 = Sales16.fillna({'B': 150, 'D': 180})
print("Sales16:")
print(Sales16)
print("\nSales17 (after filling missing values):")
print(Sales17)
Sales16:
Product Sales
0 A 100
1 B 150
2 C 120
3 D 180
Sales17 (after filling missing values):
Product Sales
0 A 100
1 B 150
2 C 120
3 D 180
Write code that just produces single True/False as a result for the presence of missing values in whole DataFrame namely df.
import pandas as pd
data = {'A': [1, 2, None, 4], 'B': [None, 5, 6, 7], 'C': [8, 9, 10, 11]}
df = pd.DataFrame(data)
has_missing_values = False
for column in df.columns:
for value in df[column]:
if pd.isnull(value):
has_missing_values = True
break
print(has_missing_values)
True