Loading...
Please wait while we prepare your content
Please wait while we prepare your content
Solutions for Informatics Practices, Class 12, CBSE
Assertion (A). To use the Pandas library in a Python program, one must import it.
Reasoning (R). The only alias name that can be used with the Pandas library is pd.
A is true but R is false.
Explanation
In order to work with Pandas in Python, we need to import the Pandas library into our Python environment using the statement import pandas as pd
. While pd
is a common alias used with the Pandas library, it's not the only alias that can be used. We can import Pandas using other alias names as well.
Assertion. A series is a 1D data structure which is value-mutable but size-immutable.
Reason. Every time you change the size of a series object, change does not take place in the existing series object, rather a new series object is created with the new size.
Both A and R are true and R is the correct explanation of A.
Explanation
A series is a one-dimensional data structure that is value-mutable but size-immutable. This means that we can modify the values within a series, but we cannot change its size once it's created. Every time we attempt to change the size of a series object by adding or dropping an element, internally a new series object is created with the new size.
Assertion. A dataframe is a 2D data structure which is value mutable and size mutable.
Reason. Every change in a dataframe internally creates a new dataframe object.
A is true but R is false.
Explanation
A DataFrame is a two-dimensional data structure that is both value-mutable and size-mutable. This means that we can modify the values within a DataFrame, change its size once it's created, and add or drop elements in an existing DataFrame object without creating a new DataFrame internally.
Assertion. A dataframe is value mutable and size-mutable.
Reason. All changes occur in-place in a dataframe.
Both A and R are true and R is the correct explanation of A.
Explanation
A DataFrame is a two-dimensional data structure that is both value-mutable and size-mutable. This means that we can modify the values within a DataFrame, change its size in place once it's created, and add or drop elements in an existing DataFrame object without creating a new DataFrame internally.
Assertion. A series object stores values of homogeneous types.
Reason. Even if values appear to be of different types, internally they are stored in a common datatype.
Both A and R are true and R is the correct explanation of A.
Explanation
A Series object in Pandas stores values of homogeneous types, meaning all values are of the same data type. Even if values appear to be of different types, internally they are stored in a common datatype.
Assertion. Arithmetic operations on two series objects take place on matching indexes.
Reason. Non-matching indexes are removed from the result of arithmetic operation on series objects.
A is true but R is false.
Explanation
Arithmetic operations on two Series objects take place on matching indexes. When performing operations on objects with non-matching indexes, Pandas aligns the indexes and adds values for matching indexes, resulting in NaN (Not a Number) for non-matching indexes in both objects.
Assertion. Arithmetic operations on two series objects take place on matching indexes.
Reason. For non-matching indexes of series objects in an arithmetic operation, NaN is returned.
Both A and R are true and R is the correct explanation of A.
Explanation
Arithmetic operations on two Series objects take place on matching indexes. When performing operations on objects with non-matching indexes, Pandas aligns the indexes and adds values for matching indexes, resulting in NaN (Not a Number) for non-matching indexes in both objects.
Assertion. While changing the values of a column in a dataframe, if the column does not exist, an error occurs.
Reason. If values are provided for a non-existing column in a dataframe, a new column is added with those values.
A is false but R is true.
Explanation
While changing the values of a column in a dataframe where the column does not exist does not cause an error. Instead, a new column with those values is added to the dataframe. If values are provided for a non-existing column in a dataframe, a new column is added with those values.
Assertion. .loc() is a label based data selecting method to select a specific row(s) or column(s) which we want to select.
Reason. .iloc() can not be used with default indices if customized indices are provided.
A is true but R is false.
Explanation
The .loc()
is a label-based method in Pandas used for selecting specific rows or columns based on their labels (indices). While .iloc()
can be used with default indices (0-based integer indices) even if customized indices are provided. .iloc[]
is primarily used for integer-location based indexing.
Assertion. DataFrame has both a row and column index.
Reason. A DataFrame is a two-dimensional labelled data structure like a table of MySQL.
Both A and R are true and R is the correct explanation of A.
Explanation
A DataFrame in Pandas has both a row index and a column index. It is a two-dimensional labeled data structure, similar to a table in MySQL, each value is identifiable with the combination of row and column indices.
Pandas is the most popular library for data analysis. It offers data I/O, computations across rows/columns, subset selection, dataset merging, handling missing data, group-wise operations, data reshaping, time-series analysis, and integrates with visualization tools.
(i) size — It returns the number of elements in the underlying data.
(ii) itemsize — It returns the size of the dtype of the item of the underlying data.
(iii) nbytes — It returns the number of bytes in the underlying data.
NaN stands for 'Not a Number'. In Python libraries like NumPy and Pandas, NaN is the legal empty value used to represent missing or undefined values, and we can use np.NaN
(imported NumPy as np) to specify a missing value.
The inplace
argument in the rename()
function in pandas specifies whether to modify the dataframe in place or return a new dataframe with the changes. When inplace = True
is set, the dataframe is modified directly, and the changes are applied to the existing dataframe. If inplace = False
or not specified (default), a new dataframe with the changes is returned, leaving the original dataframe unchanged.
import pandas as pd
Reason — The syntax to import a library with an alias is import library_name as alias
. Therefore, the statement import pandas as pd
is used to import the pandas library in Python with the alias 'pd'.
pd.Series(data = array, dtype = numpy.int16)
Reason — The syntax to specify data type for a Series object is : <Series Object> = pandas.Series(data = None, index = None, dtype = None)
. Therefore, according to this syntax, pd.Series(data = array, dtype = numpy.int16)
is correct.
itemsize
Reason — The itemsize
attribute is used to know the number of bytes allocated to each data item in Series object. The syntax is <Series object>.itemsize
.
S[2]
Reason — The syntax to access individual elements of a Series object is <Series Object name>[<valid index>]
. Therefore, according to this syntax, to display third element of a Series object S
with zero based indexing, S[2]
is correct.
S[:3]
Reason — The syntax to extract slices from Series object is <Series Object>[start:end:step]
. Therefore, according to this syntax, the correct slice notation to display the first three elements of a Series object S
is S[:3]
.
tail(), tail(5)
Reason — The syntax to display the last n
rows of a Series object is <Series Object>.tail([n])
. Therefore, according to this syntax, tail(5)
will display last five rows of a Series object S
. If n
value is not specified, then tail()
will return the last 5 rows of a Series object.
We can't change the index of the series
Reason — We can change or rename the indexes of a Series object by assigning a new index array to its index attribute. The syntax is <Object>.index = <new index array>
.
Value Error
Reason — When specifying indexes explicitly using an index sequence, we must provide indexes equal to the number of values in the data array. Providing fewer or more indices will lead to an error, i.e., a ValueError.
0 0
1 0
2 0
Reason — The code creates a pandas Series object myser
with three elements [0, 0, 0], and when we print the Series, it displays the index along with the corresponding values. Since the Series is created with default indexes (0, 1, 2), the output shows the index values (0, 1, 2) along with the corresponding values (0, 0, 0).
S.tail()
Reason — The syntax to display the last n
rows of a Series object is <Series Object>.tail([n])
. Therefore, according to this syntax, S.tail()
will display last five rows of a Series object S
.
NaN
Reason — NaN stands for 'Not a Number'. In Python libraries like NumPy and Pandas, NaN is the legal empty value used to represent missing or undefined values, and we can use np.NaN
to specify a missing value.
NaN
Reason — When performing mathematical operations on pandas Series objects, index matching is implemented (this is called data alignment in Pandas objects), and missing values are filled with NaN (Not a Number) by default.
print(Sequences.head(4))
Reason — The syntax to display the first n rows from a Series object is <Series object>.head([n])
. Therefore, according to this syntax, the command to display the first 4 rows of Sequences is print(Sequences.head(4))
.
Homogeneous tabular data structure
Reason — The pandas DataFrames can hold heterogeneous data, meaning each column can have a different data type.
DataFrame
Reason — A DataFrame is a two-dimensional labelled array like Pandas data structure that stores an ordered collection columns that can store data of different types.
Union of the keys of the dictionaries
Reason — When we create a DataFrame from a list of dictionaries, the column labels are formed by the union of the keys of the dictionaries.
inner dictionary's keys
Reason — When a DataFrame is created using a 2D dictionary, then the indexes/row labels are formed from keys of inner dictionaries.
outer dictionary's keys
Reason — When a DataFrame is created using a 2D dictionary, then the column labels are formed from keys of outer dictionaries.
All of these
Reason — We can create a DataFrame object by passing data in many different ways, such as two-dimensional dictionaries (i.e., dictionaries having lists or dictionaries or ndarrays or series objects etc), two-dimensional ndarrays, series type object and another DataFrame object.
D1.T
Reason — We can transpose a DataFrame by swapping its indexes and columns using the attribute T
, with the syntax DataFrame.T
. Therefore, D1.T
is used to get the transpose of a DataFrame D1
.
DF.iloc[6:10, 3:6]
Reason — To display subset from dataframe using row and column numeric index/position, iloc
is used with syntax <DF object>.iloc[<start row index>:<end row index>, <start col index>:<end col index>]
. Therefore, according to this syntax, DF.iloc[6:10, 3:6]
is correct slice notation to display the 3rd, 4th and 5th columns from the 6th to 9th rows of a dataframe DF
.
DF.iat[3, 5] = 35
Reason — The syntax to modify values using row and column position is <DataFrame>.iat[<row position>, <column position>]
. Therefore, according to this syntax, DF.iat[3, 5] = 35
is used to change the 5th column's value at 3rd row as 35 in dataframe DF
.
All of these
Reason — We can create a DataFrame object in Pandas by passing data in many different ways, such as a scalar value, an ndarray and a Python dictionary.
Both (a) and (b)
Reason — NaN stands for "Not a Number" and is used in Pandas to represent missing or undefined values in a Series or DataFrame. A Series in Pandas is similar to a one-dimensional array or list in Python. It has an index and a corresponding array of data values. Series can be accessed, sliced, and manipulated in ways similar to arrays.
print(Data.iloc[0 : 4, 1 : 4])
Reason — To display subset from dataframe using row and column numeric index/position, iloc
is used with syntax <DF object>.iloc[<start row index>:<end row index>, <start col index>:<end col index>]
. Therefore, according to this syntax, print(Data.iloc[0 : 4, 1 : 4])
is correct statement to display first four rows and second to fourth columns from a DataFrame Data
.
Sudhanshu has written the following code to create a DataFrame with boolean index :
import numpy as np
import pandas as pd
df = pd.DataFrame(data = [[5, 6, 7]], index = [true, false, true])
print(df)
While executing the code, she is getting an error, help her to rectify the code :
df = pd.DataFrame(data = [5, 6, 7], index = [True, False, True])
Reason — The index values 'true' and 'false' should have the first letter capitalized to match Python's boolean values. Also, the 'data' parameter should contain the list of values to be included in the DataFrame. Hence, df = pd.DataFrame(data = [5, 6, 7], index = [True, False, True])
is correct.
True
Reason — When a Series object is used as a column in a DataFrame, it behaves like a column. However, when a Series is created with an index that matches the index of an existing DataFrame, it can behave like a row.
False
Reason — NumPy arrays can perform vectorized operations on two arrays only if their shapes match, while Series objects can perform vectorized operations on two Series objects even if their shapes differ, using NaN for non-matching indexes. Additionally, Series objects consume more memory compared to NumPy arrays. Hence, NumPy array and Series object are different.
True
Reason — A DataFrame can be thought of as a collection of multiple Series objects. This is because a DataFrame can be created using multiple Series objects. For example, in a 2D dictionary, the values can be represented as Series objects, and by passing this dictionary as an argument, a DataFrame object can be created.
False
Reason — If the inplace argument in the rename()
function is set to True, then it makes changes in the default DataFrame. If it is set to False, then it returns a new DataFrame with the changes applied.
The significance of Python Pandas library is as follows:
A Series object in Pandas is both similar to and different from ndarrays (NumPy arrays).
Similarities:
Both Series and ndarrays store homogeneous data, meaning all elements must be of the same data type (e.g., integers, floats, strings).
Differences:
Series Object | ndarrays |
---|---|
It supports explicit indexing, i.e., we can programmatically choose, provide and change indexes in terms of numbers or labels. | It does not support explicit indexing, only supports implicit indexing whereby the indexes are implicitly given 0 onwards. |
It supports indexes of numeric as well of string types. | It supports indexes of only numeric types. |
It can perform vectorized operations on two series objects, even if their shapes are different by using NaN for non-matching indexes/labels. | It can perform vectorized operations on two ndarrays only if their shapes match. |
It takes more memory compared to a numpy array. | It takes lesser memory compared to a Series object. |
Packets = pandas.Series([125, 92, 104, 92, 85, 116, 87, 90], name = 'Packets')
Consider two objects x and y. x is a list whereas y is a Series. Both have values 20, 40, 90, 110.
What will be the output of the following two statements considering that the above objects have been created already ?
(a) print (x*2)
(b) print (y*2)
Justify your answer.
(a)
[20, 40, 90, 110, 20, 40, 90, 110]
In the first statement, x
represents a list. When a list is multiplied by 2 (x*2), it replicates each element of the list twice.
(b)
0 40
1 80
2 180
3 220
dtype: int64
In the second statement, y
represents a Series. When a Series is multiplied by a value, each element of the Series is multiplied by 2, as Series supports vectorized operations.
(a) df['C'] = np.NaN
— This statement will add a new column 'C' to the dataframe and assign np.NaN
(Not a Number) to all rows in this new column.
The updated dataframe will look like this:
A B D C
0 15 17 19 NaN
1 16 18 20 NaN
2 20 21 22 NaN
(b) df['C'] = [2, 5]
— This statement will result in error because the length of the list [2, 5] does not match the number of rows in the DataFrame df
.
(c) df['C'] = [12, 15, 27]
— This statement will add a new column 'C' to the dataframe and assign the values from the list [12, 15, 27] to the new column. This time, all rows in the new column will be assigned a value.
The updated dataframe will look like this:
A B D C
0 15 17 19 12
1 16 18 20 15
2 20 21 22 27
(a)
>>> sales[['Item', 'Revenue']]
(b)
>>> sales.iloc[2:7]
(c)
>>> sales.Item[4]
The error in Hitesh's code is that the tail()
function in pandas by default returns the last 5 rows of the dataframe. To display the last 4 rows, Hitesh needs to specify the number of rows he wants to display.
Here's the correct code:
df.tail(4)
The syntax to add a new column to a DataFrame is <DF object>.[<column>] = <new value>
. Therefore, according to this syntax, the statement to add a column named 'val' to a dataframe df with 10 rows is :
df['val'] = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
(a)
>>> del df[<column_name>]
(b)
>>> df.drop(range(2, 6))
(c)
>>> df.isnull()
(d)
>>> df.fillna(999)
The statements to delete a column from a DataFrame is:
del <Df object>[<column name>]
OR
df.drop([<column name>], axis = 1)
.
For example, the statement to delete a column Population
from a dataframe df
is del df['Population]
or df.drop('Population', axis = 1)
.
iloc method | loc method |
---|---|
iloc is used for integer-based indexing. | loc is used for label-based indexing. |
It allows to access rows and columns using integer indices, where the first row or column has an index of 0. | It allows to access rows and columns using their labels (index or column names). |
With iloc , the end index/position in slices is excluded when given as start:end. | With loc , both the start label and end label are included when given as start:end. |
The syntax is df.iloc[row_index, column_index] . | The syntax is df.loc[row_label, column_label] . |
iat method | at method |
---|---|
iat is used for integer-based indexing. | at is used for label-based indexing. |
It allows to access a single value in the DataFrame by specifying the row and column indices using integers. | It allows to access a single value in the DataFrame by specifying the row and column labels (index or column names). |
The syntax is df.iat[row_index, col_index] . | The syntax is df.at[row_label, col_label] . |
To delete columns from a dataframe, we use the del
statement with the syntax:
del <Df object>[<column name>]
OR
df.drop([<column name], axis = 1)
.
For example, the statement to delete columns A, B
from a dataframe df
is del df['A'] and del df['B']
or df.drop(['A', 'B'], axis = 1)
.
To delete rows from a dataframe, we use the drop()
function with the syntax:
<DF>.drop(sequence of indexes)
.
For example, the statement to delete the rows with indexes 2, 3, 4 from a dataframe df
is df.drop([2, 3, 4])
.
Consider following Series object namely S :
0 0.430271
1 0.617328
2 -0.265421
3 -0.836113
dtype:float64
What will be returned by following statements ?
(a) S * 100
(b) S > 0
(c) S1 = pd.Series(S)
(d) S2 = pd.Series(S1) + 3
What will be the values of Series objects S1 and S2 created above ?
(a) S * 100
0 43.0271
1 61.7328
2 -26.5421
3 -83.6113
dtype: float64
(b) S > 0
0 True
1 True
2 False
3 False
dtype: bool
(c) S1 = pd.Series(S)
0 0.430271
1 0.617328
2 -0.265421
3 -0.836113
dtype: float64
(d) S2 = pd.Series(S1) + 3
0 3.430271
1 3.617328
2 2.734579
3 2.163887
dtype: float64
The values of Series object S1 created above is as follows:
0 0.430271
1 0.617328
2 -0.265421
3 -0.836113
dtype: float64
The values of Series object S2 created above is as follows:
0 3.430271
1 3.617328
2 2.734579
3 2.163887
dtype: float64
AMZN 0.430271
AAPL 0.617328
MSFT -0.265421
GOOG -0.836113
dtype: float64
0.430271
1.5
AMZN 1.500000
AAPL 0.617328
MSFT -0.265421
GOOG -0.836113
dtype: float64
The provided code fragment first changes the index labels of the Series S
to ['AMZN', 'AAPL', 'MSFT', 'GOOG'], prints the modified Series S
, and then proceeds to print and modify the value corresponding to the 'AMZN' index. Specifically, it prints the value at the 'AMZN' index before and after assigning a new value of 1.5 to that index. Finally, it prints the Series S
again, showing the updated value at the 'AMZN' index.
What will be the output produced by the following code ?
Stationery = ['pencils', 'notebooks', 'scales', 'erasers']
S = pd.Series([20, 33, 52, 10], index = Stationery)
S2 = pd.Series([17, 13, 31, 32], index = Stationery)
print(S + S2)
S = S + S2
print(S + S2)
pencils 37
notebooks 46
scales 83
erasers 42
dtype: int64
pencils 54
notebooks 59
scales 114
erasers 74
dtype: int64
The code creates two Pandas Series, S
and S2
. It then prints the result of adding these two Series element-wise based on their corresponding indices. After updating S
by adding S
and S2
, it prints the result of adding updated S
and S2
again.
(a)
Series([], dtype: int64)
The slice S[1:1]
starts at index 1 and ends at index 1, but because the end index is exclusive, it does not include any elements, resulting in an empty Series.
(b)
pencils 20
dtype: int64
The slice S[0:1]
starts at index 0 and ends at index 1, but because the end index is exclusive, it includes only one element i.e., the element at index 0.
(c)
pencils 20
notebooks 33
dtype: int64
The slice S[0:2]
starts at index 0 and ends at index 1, hence, it includes two elements i.e., elements from index 0 and 1.
(d)
pencils 12
notebooks 12
scales 52
erasers 10
dtype: int64
The slice S[0:2] = 12
assigns the value 12 to indices 0 and 1 in Series S
, directly modifying those elements. The updated Series is then printed.
(e)
Index(['pencils', 'notebooks', 'scales', 'erasers'], dtype = 'object')
[20 33 52 10]
The code print(S.index)
displays the indices of Series S
, while print(S.values)
displays the values of Series.
Write a Python program to create a series object, country using a list that stores the capital of each country.
Note. Assume four countries to be used as index of the series object are India, UK, Denmark and Thailand having their capitals as New Delhi, London, Copenhagen and Bangkok respectively.
import pandas as pd
capitals = ['New Delhi', 'London', 'Copenhagen', 'Bangkok']
countries = ['India', 'UK', 'Denmark', 'Thailand']
country = pd.Series(capitals, index=countries)
print(country)
India New Delhi
UK London
Denmark Copenhagen
Thailand Bangkok
dtype: object
S2 = pd.Series([101, 102, 102, 104])
print(S2.index)
S2.index = [0, 1, 2, 3, 4, 5] #Error 1
S2[5] = 220
print(S2)
Error 1 — The Series S2
initially has four elements, so assigning a new index list of six elements ([0, 1, 2, 3, 4, 5]) to S2.index
will raise a ValueError because the new index list length does not match the length of the Series.
The corrected code is:
S2 = pd.Series([101, 102, 102, 104])
print(S2.index)
S2.index = [0, 1, 2, 3]
S2[5] = 220
print(S2)
In the above code fragment, the data values should be enclosed in square brackets [] to form a list.
The corrected code is:
S = pd.Series([2, 3, 4, 5], index = range(4))
In the above code fragment, the data values should be enclosed in square brackets to form a list and the specified index range range(7)
is out of range for the provided data [1, 2, 3, 4]. Since there are only four data values, the index should have a length that matches the number of data values.
The corrected code is:
S1 = pd.Series([1, 2, 3, 4], index = range(4))
The error in the above code is in the line print(s[102, 103, 104])
. When accessing elements in a pandas Series using square brackets, we should use a list of index values, not multiple separate index values separated by commas.
The corrected code is:
data = np.array(['a', 'b', 'c', 'd', 'e', 'f'])
s = pd.Series(data, index = [100, 101, 102, 103, 104, 105])
print(s[[102, 103, 104]])
The code causes an error because the length of the data (range(1, 15, 3))
and the length of the index (list('abcd'))
do not match. The range(1, 15, 3)
generates the sequence [1, 4, 7, 10, 13], which has a length of 5. The list('abcd')
generates the list ['a', 'b', 'c', 'd'], which has a length of 4. When creating a pandas Series, the length of the data and the length of the index must be the same.
The statement s1['ab']
causes an Error because 'ab' is not a single key in the index. The index has individual keys 'a' and 'b', but not 'ab'.
The statements (a), (b), (c) and (d) are all used to view the values from a pandas Series object Ser
. However, they differ in the number of values they display.
(a) print(Ser.head())
: This statement will display the first 5 values from the Series Ser
.
(b) print(Ser.head(8))
: This statement will display the first 8 values from the Series Ser
.
(c) print(Ser.tail())
: This statement will display the last 5 values from the Series Ser
.
(d) print(Ser.tail(11))
: This statement will display the last 11 values from the Series Ser
.
The advantages of using a DataFrame over a Series are as follows:
If there is similar data stored in multiple Series and a single DataFrame, I would prefer to use the DataFrame. This is because a DataFrame allows us to store and manipulate data in a more organized and structured way, and it allows us to perform operations on entire columns. Additionally, a DataFrame allows us to index data using both row and column labels, which makes it easier to access and manipulate data.
Create a DataFrame in Python from the given list :
[['Divya', 'HR', 95000], ['Mamta', 'Marketing', 97000], ['Payal', 'IT', 980000], ['Deepak', 'Sales', 79000]]
Also give appropriate column headings as shown below :
Name | Department | Salary | |
---|---|---|---|
0 | Divya | HR | 95000 |
1 | Mamta | Marketing | 97000 |
2 | Payal | IT | 980000 |
3 | Deepak | Sales | 79000 |
import pandas as pd
data = [['Divya', 'HR', 95000], ['Mamta', 'Marketing', 97000], ['Payal', 'IT', 980000], ['Deepak', 'Sales', 79000]]
df = pd.DataFrame(data, columns=['Name', 'Department', 'Salary'])
print(df)
Name Department Salary
0 Divya HR 95000
1 Mamta Marketing 97000
2 Payal IT 980000
3 Deepak Sales 79000
Carefully observe the following code :
import pandas as pd
Year1 = {'Q1': 5000, 'Q2': 8000, 'Q3': 12000, 'Q4': 18000}
Year2 = {'A': 13000, 'B': 14000, 'C': 12000}
totSales = {1: Year1, 2: Year2}
df = pd.DataFrame(totSales)
print(df)
Answer the following :
(i) List the index of the DataFrame df.
(ii) List the column names of DataFrame df.
(i) The index of the DataFrame df
is: ['Q1', 'Q2', 'Q3', 'Q4', 'A', 'B', 'C'].
(ii) The column names of the DataFrame df
are: [1, 2].
Given :
import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index = ['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df1 = pd.DataFrame(d, index = ['d', 'b', 'a'])
df2 = pd.DataFrame(d, index = ['d', 'a'], columns = ['two', 'three'])
print(df)
print(df1)
print(df2)
What will Python show the result as if you execute above code ?
one two
a 1.0 1.0
b 2.0 2.0
c 3.0 3.0
d NaN 4.0
one two
d NaN 4.0
b 2.0 2.0
a 1.0 1.0
two three
d 4.0 NaN
a 1.0 NaN
The given code creates three pandas DataFrames df
, df1
, and df2
using the same dictionary d
with different index and column labels. The first DataFrame df
is created using the dictionary d
with index labels taken from the index of the Series objects in the dictionary. The resulting DataFrame has two columns 'one' and 'two' with index labels 'a', 'b', 'c', and 'd'. The values in the DataFrame are filled in accordance to the index and column labels. The second DataFrame df1
is created with the same dictionary d
but with a custom index ['d', 'b', 'a']. The third DataFrame df2
is created with a custom index ['d', 'a'] and a custom column label ['two', 'three']. Since the dictionary d
does not have a column label three
, all its values are NaN (Not a Number), indicating missing data.
From the DataFrames created in previous question, write code to display only row 'a' from DataFrames df, df1, and df2.
import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index = ['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df1 = pd.DataFrame(d, index = ['d', 'b', 'a'])
df2 = pd.DataFrame(d, index = ['d', 'a'], columns = ['two', 'three'])
print(df.loc['a',:])
print(df1.loc['a',:])
print(df2.loc['a',:])
one 1.0
two 1.0
Name: a, dtype: float64
one 1.0
two 1.0
Name: a, dtype: float64
two 1.0
three NaN
Name: a, dtype: object
From the DataFrames created in previous question, write code to display only rows 0 and 1 from DataFrames df, df1, and df2.
import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index = ['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df1 = pd.DataFrame(d, index = ['d', 'b', 'a'])
df2 = pd.DataFrame(d, index = ['d', 'a'], columns = ['two', 'three'])
print(df.iloc[0:2])
print(df1.iloc[0:2])
print(df2.iloc[0:2])
one two
a 1.0 1.0
b 2.0 2.0
one two
d NaN 4.0
b 2.0 2.0
two three
d 4.0 NaN
a 1.0 NaN
From the DataFrames created in previous question, write code to display only rows 'a' and 'b' for columns 1 and 2 from DataFrames df, df1 and df2.
import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index = ['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df1 = pd.DataFrame(d, index = ['d', 'b', 'a'])
df2 = pd.DataFrame(d, index = ['d', 'a'], columns = ['two', 'three'])
print(df.loc['a' : 'b', :])
print(df1.loc['b' : 'a', :])
print(df2.loc['d' : 'a', :])
one two
a 1.0 1.0
b 2.0 2.0
one two
b 2.0 2.0
a 1.0 1.0
two three
d 4.0 NaN
a 1.0 NaN
From the DataFrames created in previous question, write code to add an empty column 'x' to all DataFrames.
import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index = ['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df1 = pd.DataFrame(d, index = ['d', 'b', 'a'])
df2 = pd.DataFrame(d, index = ['d', 'a'], columns = ['two', 'three'])
df['x'] = None
df1['x'] = None
df2['x'] = None
print(df)
print(df1)
print(df2)
one two x
a 1.0 1.0 None
b 2.0 2.0 None
c 3.0 3.0 None
d NaN 4.0 None
one two x
d NaN 4.0 None
b 2.0 2.0 None
a 1.0 1.0 None
two three x
d 4.0 NaN None
a 1.0 NaN None
What will be the output of the following program ?
import pandas as pd
dic = {'Name' : ['Sapna', 'Anmol', 'Rishul', 'Sameep'], 'Agg' : [56, 67, 75, 76], 'Age' : [16, 18, 16, 19]}
df = pd.DataFrame(dic, columns = ['Name', 'Age'])
print(df)
(a)
Name Agg Age
101 Sapna 56 16
102 Anmol 67 18
103 Rishul 75 16
104 Sameep 76 19
(b)
Name Agg Age
0 Sapna 56 16
1 Anmol 67 18
2 Rishul 75 16
3 Sameep 76 19
(c)
Name
0 Sapna
1 Anmol
2 Rishul
3 Sameep
(d)
Name Age
0 Sapna 16
1 Anmol 18
2 Rishul 16
3 Sameep 19
Predict the output of following code (it uses below given dictionary my_di).
my_di = {"name" : ["Jiya", "Tim", "Rohan"],
"age" : np.array([10, 15, 20]),
"weight" : (75, 123, 239),
"height" : [4.5, 5, 6.1],
"siblings" : 1,
"gender" : "M"}
df = pd.DataFrame(my_di)
print(df)
name age weight height siblings gender
0 Jiya 10 75 4.5 1 M
1 Tim 15 123 5.0 1 M
2 Rohan 20 239 6.1 1 M
The given code creates a dictionary my_di
. Then, a DataFrame df
is created using the pd.DataFrame()
constructor and passing the my_di
dictionary. The print()
function is used to display the DataFrame.
Consider the same dictionary my_di in the previous question (shown below), what will be the output produced by following code ?
my_di = {"name" : ["Jiya", "Tim", "Rohan"],
"age" : np.array([10, 15, 20]),
"weight" : (75, 123, 239),
"height" : [4.5, 5, 6.1],
"siblings" : 1,
"gender" : "M"}
df2 = pd.DataFrame(my_di, index = my_di["name"])
print(df2)
name age weight height siblings gender
Jiya Jiya 10 75 4.5 1 M
Tim Tim 15 123 5.0 1 M
Rohan Rohan 20 239 6.1 1 M
The given code creates a dictionary my_di
. Then, a DataFrame df2
is created using the pd.DataFrame()
constructor and passing the my_di
dictionary and the my_di["name"]
list as the index. The print()
function is used to display the DataFrame.
Jiya 75
Tim 123
Rohan 239
Name: weight, dtype: int64
123
The given code creates a dictionary my_di
. Then, a DataFrame df2
is created using the pd.DataFrame()
constructor and passing the my_di
dictionary and the my_di["name"]
list as the index. The print()
function is used to display the 'weight' column of the DataFrame df2
and the value of the 'weight' column for the row with index 'Tim'.
name age weight height siblings gender IQ Married
Jiya Jiya 10 75 4.5 1 M 130 False
Tim Tim 15 123 5.0 1 M 105 False
Rohan Rohan 20 239 6.1 1 M 115 False
The code adds two new columns "IQ" with values [130, 105, 115] and "Married" with value "False" for all rows to DataFrame df2
, then prints the DataFrame.
Assume that required libraries (panda and numpy) are imported and dataframe df2 has been created as per questions 17 and 18 above. Predict the output produced by following code fragment :
df2["College"] = pd.Series(["IIT"], index=["Rohan"])
print(df2)
name age weight height siblings gender College
Jiya Jiya 10 75 4.5 1 M NaN
Tim Tim 15 123 5.0 1 M NaN
Rohan Rohan 20 239 6.1 1 M IIT
The code snippet uses the pandas and numpy libraries in Python to create a DataFrame named df2
from a dictionary my_di
. The DataFrame is indexed by names, and a new column "College" is added with "IIT" as the value only for the index named "Rohan."
Assume that required libraries (panda and numpy) are imported and dataframe df2 has been created as per questions 17 and 18 above. Predict the output produced by following code fragment :
print(df2.loc["Jiya"])
print(df2.loc["Jiya", "IQ"])
print(df2.loc["Jiya":"Tim", "IQ":"College"])
print(df2.iloc[0])
print(df2.iloc[0, 5])
print(df2.iloc[0:2, 5:8])
name Jiya
age 10
weight 75
height 4.5
siblings 1
gender M
IQ 130
College NaN
Name: Jiya, dtype: object
130
IQ College
Jiya 130 NaN
Tim 105 NaN
name Jiya
age 10
weight 75
height 4.5
siblings 1
gender M
IQ 130
College NaN
Name: Jiya, dtype: object
M
gender IQ College
Jiya M 130 NaN
Tim M 105 NaN
print(df2.loc["Jiya"])
— This line prints all columns of the row with the index "Jiya".print(df2.loc["Jiya", "IQ"])
— This line prints the value of the "IQ" column for the row with the index "Jiya".print(df2.loc["Jiya":"Tim", "IQ":"College"])
— This line prints a subset of rows and columns using labels, from "Jiya" to "Tim" for rows and from "IQ" to "College" for columns.print(df2.iloc[0])
— This line prints all columns of the first row using integer-based indexing (position 0).print(df2.iloc[0, 5])
— This line prints the value of the 6th column for the first row using integer-based indexing.print(df2.iloc[0:2, 5:8])
— This line prints a subset of rows and columns using integer-based indexing, selecting rows from position 0 to 1 and columns from position 5 to 7.Original DataFrame
col1 col2 col3
0 1 6 9
1 4 7 0
2 3 8 1
New DataFrame :
col1 col2 col3
0 1 6 9
The code creates a DataFrame using the pandas library in Python, named df
, with three columns ('col1', 'col2', 'col3') and three rows of data. The DataFrame df
is printed, and then a new DataFrame named dfn
is created by dropping the rows with indices 1 and 2 from the original DataFrame using df.drop(df.index[[1, 2]])
. The resulting DataFrame, dfn
, contains only the first row from the df
DataFrame, removing rows 2 and 3.
Before
age name
1 20 Ruhi
2 23 Ali
3 22 Sam
After
age name Edu
1 20 Ruhi BA
2 23 Ali BE
3 22 Sam MBA
The code utilizes the pandas library in Python to create a DataFrame named df1
using a dictionary data
. The df1
DataFrame is printed, showing the initial data. Then, a new column 'Edu' is added to the DataFrame using df1['Edu'] = ['BA', 'BE' , 'MBA']
. The updated DataFrame is printed.
Consider the given DataFrame 'Genre' :
No | Type | Code |
---|---|---|
0 | Fiction | F |
1 | Non-fiction | NF |
2 | Drama | D |
3 | Poetry | P |
Write suitable Python statements for the following :
(i) Add a column called Num_Copies with the following data : [300, 290, 450, 760].
(ii) Add a new genre of type 'Folk Tale' having code as "FT" and 600 number of copies.
(iii) Rename the column 'Code' to 'Book_Code'.
(i)
Genre['Num_Copies'] = [300, 290, 450, 760]
(ii)
Genre = Genre.append({'Type': 'Folk Tale', 'Code': 'FT', 'Num_Copies': 600}, ignore_index=True)
(iii)
Genre.rename(columns = {'Code': 'Book_Code'}, inplace = True)
Write a program in Python Pandas to create the following DataFrame batsman from a Dictionary :
B_NO | Name | Score1 | Score2 |
---|---|---|---|
1 | Sunil Pillai | 90 | 80 |
2 | Gaurav Sharma | 65 | 45 |
3 | Piyush Goel | 70 | 90 |
4 | Karthik Thakur | 80 | 76 |
Perform the following operations on the DataFrame :
(i) Add both the scores of a batsman and assign to column "Total".
(ii) Display the highest score in both Score1 and Score2 of the DataFrame.
(iii) Display the DataFrame.
import pandas as pd
data = {'B_NO': [1, 2, 3, 4], 'Name': ['Sunil Pillai', 'Gaurav Sharma', 'Piyush Goel', 'Karthik Thakur'], 'Score1': [90, 65, 70, 80], 'Score2': [80, 45, 90, 76]}
batsman = pd.DataFrame(data)
batsman['Total'] = batsman['Score1'] + batsman['Score2']
highest_score1 = batsman['Score1'].max()
highest_score2 = batsman['Score2'].max()
print("Highest score in Score1: ", highest_score1)
print("Highest score in Score2: ", highest_score2)
print(batsman)
Highest score in Score1: 90
Highest score in Score2: 90
B_NO Name Score1 Score2 Total
0 1 Sunil Pillai 90 80 170
1 2 Gaurav Sharma 65 45 110
2 3 Piyush Goel 70 90 160
3 4 Karthik Thakur 80 76 156
Consider the following dataframe, and answer the questions given below:
import pandas as pd
df = pd.DataFrame( { "Quarter1": [2000, 4000, 5000, 4400, 10000],
"Quarter2": [5800, 2500, 5400, 3000, 2900],
"Quarter3": [20000, 16000, 7000, 3600, 8200],
"Quarter4": [1400, 3700, 1700, 2000, 6000]})
(i) Write the code to find mean value from above dataframe df over the index and column axis.
(ii) Use sum() function to find the sum of all the values over the index axis.
(i)
import pandas as pd
df = pd.DataFrame( { "Quarter1": [2000, 4000, 5000, 4400, 10000],
"Quarter2": [5800, 2500, 5400, 3000, 2900],
"Quarter3": [20000, 16000, 7000, 3600, 8200],
"Quarter4": [1400, 3700, 1700, 2000, 6000]})
mean_over_columns = df.sum(axis=1) / df.count(axis=1)
print("Mean over columns: \n", mean_over_columns)
mean_over_rows = df.sum(axis=0) / df.count(axis=0)
print("Mean over rows: \n", mean_over_rows)
Mean over columns:
0 7300.0
1 6550.0
2 4775.0
3 3250.0
4 6775.0
dtype: float64
Mean over rows:
Quarter1 5080.0
Quarter2 3920.0
Quarter3 10960.0
Quarter4 2960.0
dtype: float64
(ii)
import pandas as pd
df = pd.DataFrame( { "Quarter1": [2000, 4000, 5000, 4400, 10000],
"Quarter2": [5800, 2500, 5400, 3000, 2900],
"Quarter3": [20000, 16000, 7000, 3600, 8200],
"Quarter4": [1400, 3700, 1700, 2000, 6000]})
sum_over_index = df.sum(axis=0)
print("Sum over index (columns):\n", sum_over_index)
Sum over index (columns):
Quarter1 25400
Quarter2 19600
Quarter3 54800
Quarter4 14800
dtype: int64
The rename()
method in pandas DataFrame is used to alter the names of columns or rows. It accepts various parameters, including mapper and axis, which can be used together to rename columns and rows based on a mapping dictionary. The mapper parameter allows for a dict-like object mapping old names to new names, while axis specifies whether the renaming should occur along columns (axis=1) or rows (axis=0).
Yes, the mapper parameter and the columns parameter can be used together in the rename()
method of a pandas DataFrame to rename columns. The mapper parameter is used to rename columns based on a mapping dictionary where keys represent the old column names and values represent the new column names. The columns parameter allows us to directly specify new column names without using a mapping dictionary. With columns, we provide a list-like input containing the new column names, and pandas will rename the columns accordingly.
The error in the code is that topDf.del['Sec D']
is not the correct syntax to delete a row from a DataFrame in pandas. The correct syntax to delete a row in pandas is using the drop()
method along with specifying the index label or index position of the row to be deleted.
The corrected code is:
>>> topDf.drop(['Sec D'])
RollNo Name Marks
Sec A 115 Pavni 97.5
Sec B 236 Rishi 98.0
Sec C 307 Preet 98.5
(i) The line topDf.rename(index=['a', 'b', 'c', 'd'])
attempts to rename the index of the DataFrame topDf
, but it doesn't assign the modified DataFrame back to topDf
or use the inplace = True
parameter to modify topDf
directly. Additionally, using a list of new index labels without specifying the current index labels will result in an error.
The corrected code is:
topDf.rename(index={'Sec A': 'a', 'Sec B': 'b', 'Sec C': 'c', 'Sec D': 'd'}, inplace = True)
(ii) The line topDf.rename(columns={})
attempts to rename columns in the DataFrame topDf
, but it provides an empty dictionary {} for renaming, which will not perform any renaming. We need to provide a mapping dictionary with old column names as keys and new column names as values. To modify topDf
directly, it should use the inplace = True
parameter.
The corrected code is:
topDf.rename(columns={'RollNo': 'NewRollNo', 'Name': 'NewName', 'Marks': 'NewMarks'}, inplace = True)
Write Python code to create a Series object Temp1 that stores temperatures of seven days in it. Take any random seven temperatures.
import pandas as pd
temperatures = [28.0, 30.4, 26.5, 29.4, 27.0, 31.2, 25.8]
Temp1 = pd.Series(temperatures)
print(Temp1)
0 28.0
1 30.4
2 26.5
3 29.4
4 27.0
5 31.2
6 25.8
dtype: float64
Write Python code to create a Series object Temp2 storing temperatures of seven days of week. Its indexes should be 'Sunday', 'Monday',... 'Saturday'.
import pandas as pd
temperatures = [28.9, 30.1, 26.2, 29.3, 27.5, 31.9, 25.5]
days_of_week = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
Temp2 = pd.Series(temperatures, index = days_of_week)
print(Temp2)
Sunday 28.9
Monday 30.1
Tuesday 26.2
Wednesday 29.3
Thursday 27.5
Friday 31.9
Saturday 25.5
dtype: float64
A series object (say T1) stores the average temperature recorded on each day of a month. Write code to display the temperatures recorded on :
(i) first 7 days
(ii) last 7 days.
import pandas as pd
T1 = pd.Series([25.6, 26.3, 27.9, 28.2, 29.1, 30.9, 31.2, 32.4, 33.2, 34.4, 33.3, 32.5, 31.4, 30.7, 29.6, 28.9, 27.0, 26.2, 25.32, 24.34, 23.4, 22.3, 21.6, 20.9, 19.8, 18.1, 17.2, 16.34, 15.5, 14.6])
first_7_days = T1.head(7)
print("Temperatures recorded on the first 7 days:")
print(first_7_days)
last_7_days = T1.tail(7)
print("\nTemperatures recorded on the last 7 days:")
print(last_7_days)
Temperatures recorded on the first 7 days:
0 25.6
1 26.3
2 27.9
3 28.2
4 29.1
5 30.9
6 31.2
dtype: float64
Temperatures recorded on the last 7 days:
23 20.90
24 19.80
25 18.10
26 17.20
27 16.34
28 15.50
29 14.60
dtype: float64
Series objects Temp1, Temp2, Temp3, Temp4 store the temperatures of days of week1, week2, week3, week4 respectively.
Write a script to
(a) print the average temperature per week.
(b) print average temperature of entire month.
import pandas as pd
Temp1 = pd.Series([28.0, 30.2, 26.1, 29.6, 27.7, 31.8, 25.9])
Temp2 = pd.Series([25.5, 24.5, 23.6, 22.7, 21.8, 20.3, 19.2])
Temp3 = pd.Series([32.4, 33.3, 34.1, 33.2, 32.4, 31.6, 30.9])
Temp4 = pd.Series([27.3, 28.1, 29.8, 30.6, 31.7, 32.8, 33.0])
Week_1 = sum(Temp1)
Week_2 = sum(Temp2)
Week_3 = sum(Temp3)
Week_4 = sum(Temp4)
print("Week 1 : Average Temperature is", Week_1 / 7, "degree Celsius")
print("Week 2 : Average Temperature is", Week_2 / 7, "degree Celsius")
print("Week 3 : Average Temperature is", Week_3 / 7, "degree Celsius")
print("Week 4 : Average Temperature is", Week_4 / 7, "degree Celsius")
total = Week_1 + Week_2 + Week_3 + Week_4
print("\nAverage temperature of entire month:", total / 28, "degree Celsius")
Week 1 : Average Temperature is 28.47142857142857 degree Celsius
Week 2 : Average Temperature is 22.514285714285712 degree Celsius
Week 3 : Average Temperature is 32.55714285714286 degree Celsius
Week 4 : Average Temperature is 30.47142857142857 degree Celsius
Average temperature of entire month: 28.503571428571426 degree Celsius
Ekam, a Data Analyst with a multinational brand has designed the DataFrame df that contains the four quarters' sales data of different stores as shown below :
Store | Qtr1 | Qtr2 | Qtr3 | Qtr4 | |
---|---|---|---|---|---|
0 | Store1 | 300 | 240 | 450 | 230 |
1 | Store2 | 350 | 340 | 403 | 210 |
2 | Store3 | 250 | 180 | 145 | 160 |
Answer the following questions :
(i) Predict the output of the following Python statement :
(a) print(df.size)
(b) print(df[1:3])
(ii) Delete the last row from the DataFrame.
(iii) Write Python statement to add a new column Total_Sales which is the addition of all the 4 quarter sales.
(i)
(a) print(df.size)
15
The size
attribute of a DataFrame returns the total number of elements in the DataFrame df
.
(b) print(df[1:3])
Store Qtr1 Qtr2 Qtr3 Qtr4
1 Store2 350 340 403 210
2 Store3 250 180 145 160
This statement uses slicing to extract rows 1 and 2 from the DataFrame df
.
(ii)
df = df.drop(2)
Store Qtr1 Qtr2 Qtr3 Qtr4
0 Store1 300 240 450 230
1 Store2 350 340 403 210
(iii)
df['Total_Sales'] = df['Qtr1'] + df['Qtr2'] + df['Qtr3'] + df['Qtr4']
Store Qtr1 Qtr2 Qtr3 Qtr4 Total_Sales
0 Store1 300 240 450 230 1220
1 Store2 350 340 403 210 1303
2 Store3 250 180 145 160 735
Consider the following DataFrame df and answer any four questions from (i)-(v):
rollno | name | UT1 | UT2 | UT3 | UT4 |
---|---|---|---|---|---|
1 | Prerna Singh | 24 | 24 | 20 | 22 |
2 | Manish Arora | 18 | 17 | 19 | 22 |
3 | Tanish Goel | 20 | 22 | 18 | 24 |
4 | Falguni Jain | 22 | 20 | 24 | 20 |
5 | Kanika Bhatnagar | 15 | 20 | 18 | 22 |
6 | Ramandeep Kaur | 20 | 15 | 22 | 24 |
Write down the command that will give the following output :
roll no 6
name Tanish Goel
UT1 24
UT2 24
UT3 24
UT4 24
dtype : object
(a) print(df.max)
(b) print(df.max())
(c) print(df.max(axis = 1))
(d) print(df.max, axis = 1)
Consider the following DataFrame df and answer any four questions from (i)-(v):
rollno | name | UT1 | UT2 | UT3 | UT4 |
---|---|---|---|---|---|
1 | Prerna Singh | 24 | 24 | 20 | 22 |
2 | Manish Arora | 18 | 17 | 19 | 22 |
3 | Tanish Goel | 20 | 22 | 18 | 24 |
4 | Falguni Jain | 22 | 20 | 24 | 20 |
5 | Kanika Bhatnagar | 15 | 20 | 18 | 22 |
6 | Ramandeep Kaur | 20 | 15 | 22 | 24 |
The teacher needs to know the marks scored by the student with roll number 4. Help her identify the correct set of statement/s from the given options:
(a) df1 = df[df['rollno'] == 4]
print(df1)
(b) df1 = df[rollno == 4]
print(df1)
(c) df1 = df.[df.rollno = 4]
print(df1)
(d) df1 = df[df.rollno == 4]
print(df1)
df1 = df[df.rollno == 4] print(df1)
The statement df1 = df[df.rollno == 4]
filters the DataFrame df
to include only the rows where the roll number is equal to 4. This is accomplished using boolean indexing, where a boolean mask is created by checking if each row's rollno is equal to 4. Rows that satisfy this condition (True in the boolean mask) are selected, while others are excluded. The resulting DataFrame df1
contains only the rows corresponding to roll number 4 from the original DataFrame df
.
Consider the following DataFrame df and answer any four questions from (i)-(v):
rollno | name | UT1 | UT2 | UT3 | UT4 |
---|---|---|---|---|---|
1 | Prerna Singh | 24 | 24 | 20 | 22 |
2 | Manish Arora | 18 | 17 | 19 | 22 |
3 | Tanish Goel | 20 | 22 | 18 | 24 |
4 | Falguni Jain | 22 | 20 | 24 | 20 |
5 | Kanika Bhatnagar | 15 | 20 | 18 | 22 |
6 | Ramandeep Kaur | 20 | 15 | 22 | 24 |
Which of the following statement/s will give the exact number of values in each column of the dataframe ?
(I) print(df.count())
(II) print(df.count(0))
(III) print(df.count)
(IV) print((df.count(axis = 'index')))
Choose the correct option :
(a) both (I) and (II)
(b) only (II)
(c) (I), (II) and (III)
(d) (I), (II) and (IV)
(I), (II) and (IV)
In pandas, the statement df.count()
and df.count(0)
calculate the number of non-null values in each column of the DataFrame df. The statement df.count(axis='index')
specifies the axis parameter as 'index', which is equivalent to specifying axis=0. This means it will count non-null values in each column of the DataFrame df
.
Consider the following DataFrame df and answer any four questions from (i)-(v):
rollno | name | UT1 | UT2 | UT3 | UT4 |
---|---|---|---|---|---|
1 | Prerna Singh | 24 | 24 | 20 | 22 |
2 | Manish Arora | 18 | 17 | 19 | 22 |
3 | Tanish Goel | 20 | 22 | 18 | 24 |
4 | Falguni Jain | 22 | 20 | 24 | 20 |
5 | Kanika Bhatnagar | 15 | 20 | 18 | 22 |
6 | Ramandeep Kaur | 20 | 15 | 22 | 24 |
Which of the following command will display the column labels of the DataFrame ?
(a) print(df.columns())
(b) print(df.column())
(c) print(df.column)
(d) print(df.columns)
Consider the following DataFrame df and answer any four questions from (i)-(v):
rollno | name | UT1 | UT2 | UT3 | UT4 |
---|---|---|---|---|---|
1 | Prerna Singh | 24 | 24 | 20 | 22 |
2 | Manish Arora | 18 | 17 | 19 | 22 |
3 | Tanish Goel | 20 | 22 | 18 | 24 |
4 | Falguni Jain | 22 | 20 | 24 | 20 |
5 | Kanika Bhatnagar | 15 | 20 | 18 | 22 |
6 | Ramandeep Kaur | 20 | 15 | 22 | 24 |
Ms. Sharma, the class teacher wants to add a new column, the scores of Grade with the values, 'A', 'B', 'A', 'A', 'B', 'A' , to the DataFrame.
Help her choose the command to do so :
(a) df.column = ['A', 'B', 'A', 'A', 'B', 'A']
(b) df['Grade'] = ['A', 'B', 'A', 'A', 'B', 'A']
(c) df.loc['Grade'] = ['A', 'B', 'A', 'A', 'B', 'A']
(d) Both (b) and (c) are correct
Write a program that stores the sales of 5 fast moving items of a store for each month in 12 Series objects, i.e., S1 Series object stores sales of these 5 items in 1st month, S2 stores sales of these 5 items in 2nd month, and so on.
The program should display the summary sales report like this :
Total Yearly Sales, item-wise (should display sum of items' sales over the months)
Maximum sales of item made : <name of item that was sold the maximum in whole year>
Maximum sales for individual items
Maximum sales of item 1 made : <month in which that item sold the maximum>
Maximum sales of item 2 made : <month in which that item sold the maximum>
Maximum sales of item 3 made : <month in which that item sold the maximum>
Maximum sales of item 4 made : <month in which that item sold the maximum>
Maximum sales of item 5 made : <month in which that item sold the maximum>
import pandas as pd
sales_data = {
'Month_1': pd.Series([300, 250, 200, 150, 350], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
'Month_2': pd.Series([380, 210, 220, 180, 320], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
'Month_3': pd.Series([320, 270, 230, 200, 380], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
'Month_4': pd.Series([310, 260, 210, 190, 360], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
'Month_5': pd.Series([290, 240, 220, 170, 340], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
'Month_6': pd.Series([300, 250, 400, 160, 350], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
'Month_7': pd.Series([310, 260, 230, 180, 370], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
'Month_8': pd.Series([320, 270, 240, 190, 380], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
'Month_9': pd.Series([330, 280, 250, 200, 400], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
'Month_10': pd.Series([340, 290, 260, 510, 420], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
'Month_11': pd.Series([350, 300, 270, 220, 440], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
'Month_12': pd.Series([360, 390, 280, 230, 260], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5'])
}
sales_df = pd.DataFrame(sales_data)
print("Total Yearly Sales, item-wise:")
total_sales = sales_df.sum()
print(total_sales)
t = sales_df.sum(axis=1)
max_sales_item = t.idxmax()
print("\nMaximum sales of item made: ", max_sales_item)
print("\nMaximum sales for individual items:")
for item_num in range(1, 6):
max_sales_month = None
max_sales_value = 0
for month in sales_df.columns:
if sales_df[month][f'Item_{item_num}'] > max_sales_value:
max_sales_value = sales_df[month][f'Item_{item_num}']
max_sales_month = month
print("Maximum sales of item", item_num, "made: ", max_sales_month)
Total Yearly Sales, item-wise:
Month_1 1250
Month_2 1310
Month_3 1400
Month_4 1330
Month_5 1260
Month_6 1460
Month_7 1350
Month_8 1400
Month_9 1460
Month_10 1820
Month_11 1580
Month_12 1520
dtype: int64
Maximum sales of item made: Item_5
Maximum sales for individual items:
Maximum sales of item 1 made: Month_2
Maximum sales of item 2 made: Month_12
Maximum sales of item 3 made: Month_6
Maximum sales of item 4 made: Month_10
Maximum sales of item 5 made: Month_11
Three Series objects store the marks of 10 students in three terms. Roll numbers of students form the index of these Series objects. The Three Series objects have the same indexes.
Calculate the total weighted marks obtained by students as per following formula :
Final marks = 25% Term 1 + 25% Term 2 + 50% Term 3
Store the Final marks of students in another Series object.
import pandas as pd
term1 = pd.Series([80, 70, 90, 85, 75, 95, 80, 70, 85, 90], index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
term2 = pd.Series([85, 90, 75, 80, 95, 85, 90, 75, 80, 85], index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
term3 = pd.Series([90, 85, 95, 90, 80, 85, 95, 90, 85, 90], index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
final_marks = (term1 * 0.25) + (term2 * 0.25) + (term3 * 0.50)
print(final_marks)
1 86.25
2 82.50
3 88.75
4 86.25
5 82.50
6 87.50
7 90.00
8 81.25
9 83.75
10 88.75
dtype: float64
Write code to print all the information about a Series object.
import pandas as pd
s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
print(s)
s.info()
a 1
b 2
c 3
d 4
dtype: int64
<class 'pandas.core.series.Series'>
Index: 4 entries, a to d
Series name: None
Non-Null Count Dtype
-------------- -----
4 non-null int64
dtypes: int64(1)
memory usage: 64.0+ bytes
Write a program to create three different Series objects from the three columns of a DataFrame df.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
s1 = df['A']
s2 = df['B']
s3 = df['C']
print(s1)
print(s2)
print(s3)
0 1
1 2
2 3
Name: A, dtype: int64
0 4
1 5
2 6
Name: B, dtype: int64
0 7
1 8
2 9
Name: C, dtype: int64
Write a program to create three different Series objects from the three rows of a DataFrame df.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
s1 = df.iloc[0]
s2 = df.iloc[1]
s3 = df.iloc[2]
print(s1)
print(s2)
print(s3)
A 1
B 4
C 7
Name: 0, dtype: int64
A 2
B 5
C 8
Name: 1, dtype: int64
A 3
B 6
C 9
Name: 2, dtype: int64
Write a program to create a Series object from an ndarray that stores characters from 'a' to 'g'.
import pandas as pd
import numpy as np
data = np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
S = pd.Series(data)
print(S)
0 a
1 b
2 c
3 d
4 e
5 f
6 g
dtype: object
Write a program to create a Dataframe that stores two columns, which store the Series objects of the previous two questions (12 and 13).
import pandas as pd
import numpy as np
data = np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
S1 = pd.Series(data)
arr = np.arange(1, 11)
S2 = pd.Series(arr * 5)
df = pd.DataFrame({'Characters': S1, 'Table of 5': S2})
print(df)
Characters Table of 5
0 a 5
1 b 10
2 c 15
3 d 20
4 e 25
5 f 30
6 g 35
7 NaN 40
8 NaN 45
9 NaN 50
Write a program to create a Dataframe storing salesmen details (name, zone, sales) of five salesmen.
import pandas as pd
salesmen = {'Name': ['Jahangir', 'Janavi', 'Manik', 'Lakshmi', 'Tanisha'], 'Zone': ['North', 'South', 'East', 'West', 'Central'], 'Sales': [5000, 7000, 3000, 8000, 6000]}
df = pd.DataFrame(salesmen)
print(df)
Name Zone Sales
0 Jahangir North 5000
1 Janavi South 7000
2 Manik East 3000
3 Lakshmi West 8000
4 Tanisha Central 6000
Four dictionaries store the details of four employees-of-the-month as (empno, name). Write a program to create a dataframe from these.
import pandas as pd
emp1 = {'empno': 1001, 'name': 'Ameesha'}
emp2 = {'empno': 1002, 'name': 'Akruti'}
emp3 = {'empno': 1003, 'name': 'Prithvi'}
emp4 = {'empno': 1004, 'name': 'Rajesh'}
employees = [emp1, emp2, emp3, emp4]
df = pd.DataFrame(employees)
print(df)
empno name
0 1001 Ameesha
1 1002 Akruti
2 1003 Prithvi
3 1004 Rajesh
A list stores three dictionaries each storing details, (old price, new price, change). Write a program to create a dataframe from it.
import pandas as pd
prices = [{'old_price': 10, 'new_price': 12, 'change': 2},
{'old_price': 20, 'new_price': 18, 'change': -2},
{'old_price': 30, 'new_price': 35, 'change': 5}]
df = pd.DataFrame(prices)
print(df)
old_price new_price change
0 10 12 2
1 20 18 -2
2 30 35 5