Loading...
Please wait while we prepare your content
Please wait while we prepare your content
Solutions for Informatics Practices, Class 12, CBSE
Assertion. Python Panda library offers functionality to interact with a CSV file.
Reason. Panda's read_csv() and to_csv() functions can read-from and write-to CSV files.
Both A and R are true and R is the correct explanation of A.
Explanation
Python's Pandas library offers functionality to interact with a CSV file. It provides two functions, read_csv()
and to_csv()
, which are used for reading data from a CSV file into a DataFrame and writing data from a DataFrame to a CSV file, respectively.
Assertion. The read_csv() function of Python Pandas can read data of a csv file into any of pandas data structures.
Reason. DataFrame is a compatible data structure for read_csv() function.
A is false but R is true.
Explanation
The read_csv()
function of Python Pandas can only read data from a CSV file into the DataFrame data structure of Pandas. DataFrame is a compatible data structure for the read_csv()
function, as it is the primary data structure designed to be created by read_csv()
.
Assertion. The read_csv() function reads a csv file's data into a DataFrame.
Reason. The to_csv() function writes a DataFrame on to a csv file.
Both A and R are true but R is not the correct explanation of A.
Explanation
The read_csv()
function in Pandas is used to read data from a CSV file into a DataFrame. Conversely, the to_csv()
function is used to write DataFrame data to a CSV file.
Assertion. The read_sql() function of Pandas can query upon any mysql database.
Reason. The read_sql() function can query upon only those databases that have a connection established through mysql database connector.
A is false but R is true.
Explanation
The read_sql()
function is capable of querying data from MySQL databases, it is limited to databases for which a connection has been established through a MySQL database connector. Without this connection, the read_sql()
function cannot query data from the MySQL database.
Assertion. A DataFrame's data can be exported as a table of a mysql database.
Reason. Over an established connection to a mysql database, <DF>.to_sql() would write the data of the DataFrame <DF> as a table in the mysql database.
Both A and R are true and R is the correct explanation of A.
Explanation
In Pandas, a DataFrame's data can be exported as a table of a mysql database. Over an established connection to a mysql database, the DF.to_sql()
method is used to write the data of the DataFrame DF
as a table in the mysql database.
no extra package required
Reason — Python's Pandas library offers two functions, read_csv()
and to_csv()
, that help bring data from a CSV file into a DataFrame and write a DataFrame's data to a CSV file. Therefore, we do not need to import any additional packages or modules other than pandas.
<DF> = pandas.read_csv(<files>)
Reason — The statement to read data from a CSV file into a DataFrame is <DF> = pandas.read_csv(<filepath>)
.
header = None
Reason — The header = None
argument is to be given in read_csv
to suppress the first row as the header.
nrows = <n>
Reason — The argument nrows = <n>
is to be given in read_csv()
to read the specified number of rows from the CSV file.
skiprows = 5
Reason — The skiprows = 5
argument in the read_csv()
function is used to skip the first 5 rows of the CSV file and start reading from the 6th row onwards.
skiprows = [1, 3, 5]
Reason — The argument to skip 1st, 3rd and 5th row of CSV file in read_csv()
is skiprows = [1, 3, 5]
.
index_col
Reason — The argument given in read_csv()
to use a column's values as index labels is index_col
. By specifying index_col
with the name of the desired column, pandas will use the values from that column as the index labels for the DataFrame.
na_rep = NULL
Reason — The na_rep
argument in the to_csv()
method is used to specify the representation of NaN values in the resulting CSV file. Using na_rep = NULL
means that NaN values will be represented as "NULL" in the CSV file.
mysql.connector, Pymysql, sqlalchemy
Reason — The libraries mysql.connector, PyMySQL, and SQLAlchemy all enable to establish connections with MySQL databases from within Python.
index = False
Reason — The index = False
argument in to_sql()
is used to suppress the creation of a column for index labels of a DataFrame. This means that the index labels will not be included as a separate column in the database table.
if_exists = "append"
Reason — The if_exists = "append"
argument in to_sql()
is used to append the content of a DataFrame to an existing table in MySQL. This ensures that the data from the DataFrame is added to the specified table without overwriting or deleting any existing data.
True
Reason — When we create a DataFrame from a CSV file using pandas, by default, the number of rows in the DataFrame will be equal to the number of rows in the CSV file.
False
Reason — Python's Pandas library provides two functions, read_csv()
and to_csv()
, which are used for reading data from a CSV file into a DataFrame and writing data from a DataFrame to a CSV file, respectively.
False
Reason — In Python, we don't need to import the CSV package specifically to store a DataFrame in a CSV file when using the Pandas library. The DataFrame.to_csv()
method is provided by Pandas itself and handles the conversion of DataFrame data to a CSV file format.
True
Reason — In order to read data from a MySQL table into a DataFrame using pandas, the table must exist in the MySQL database. If the table doesn't exist, the read operation will fail because there would be no data to retrieve.
The advantages of CSV file formats are as follows:
It is a simple, compact and ubiquitous format for data storage.
It is a common format for data interchange.
It can be opened in popular spreadsheet packages like MS-Excel, Calc etc.
Nearly all spreadsheets and databases support import/export to csv format.
To read data from a CSV file into a DataFrame while providing our own column names, we can use the names
argument in the read_csv()
function. The syntax is : <DF> = pandas.read_csv(<filepath>, names = <sequence containing column names>)
.
To ensure that the top/first row's data is used as data and not as column headers in a DataFrame when using the read_csv()
function, we need to use the header
argument and set it to None. The syntax is : <DF> = pandas.read_csv(<filepath>, header = None)
.
The nrows
argument can be used to read only the top 10 rows of data from a CSV file using the read_csv()
function in pandas. The nrows
argument specifies the number of rows of the file to read. The syntax is : df = pandas.read_csv(<filepath>, nrows = 10)
.
We need a connection to an SQL database in order to get data from a table because a database can be accessed by many programs simultaneously, and a connection represents a unique session with the database. This connection allows us to send SQL statements to the database and retrieve the results.
The if_exists
argument in the pandas to_sql()
function retains old data in a MySQL table. Using if_exists = append
appends new data without deleting existing data.
Let query
store the following SQL statement:
query = "SELECT * FROM EMPLOYEE WHERE department = 'Human Resource'"
import pandas as pd
import mysql.connector as sqltor
mycon = sqltor.connect(host = "localhost",
user = "root",
passwd = "MyPass",
database = "Mydata")
query = "SELECT * FROM EMPLOYEE WHERE department = 'Human Resource'"
mdf = pd.read_sql(query, mycon)
print(mdf)
mycon.close()
Predict the output of following code fragments one by one. For every next code fragment, consider that the changes by previous code fragment are in place. That is, for code fragment (b), changes made by code fragment (a) are persisting ; for (c), changes by (a) and (b) are persisting and so on.
(a)
import pandas as pd
columns=['2015', '2016', '2017', '2018']
index=['Messi', 'Ronaldo', 'Neymar', 'Hazard']
df = pd.DataFrame(columns = columns, index = index)
print(df)
df.to_csv("c:\one.csv")
(b)
df['2015']['Messi'] = 12
df['2016']['Ronaldo'] = 11
df['2017']['Neymar'] = 8
df['2018']['Hazard'] = 16
print(df)
df.to_csv("c:\\two.csv", sep = '@')
(c)
new_df = pd.read_csv('c:\one.csv', index_col = 0)
print(new_df)
(d)
new_df = pd.read_csv('c:\one.csv')
print(new_df)
(e)
new_df = pd.read_csv('c:\\two.csv')
print(new_df)
(f)
new_df = pd.read_csv('c:\\two.csv', sep = '@')
print(new_df)
(a)
2015 2016 2017 2018
Messi NaN NaN NaN NaN
Ronaldo NaN NaN NaN NaN
Neymar NaN NaN NaN NaN
Hazard NaN NaN NaN NaN
The DataFrame df
is created with defined columns and index labels. It is then printed, showing NaN values since no data is provided. Afterwards, df
is written to a CSV file named one.csv
.
The contents of the file one.csv
are as follows:
,2015,2016,2017,2018
Messi,,,,
Ronaldo,,,,
Neymar,,,,
Hazard,,,,
(b)
2015 2016 2017 2018
Messi 12 NaN NaN NaN
Ronaldo NaN 11 NaN NaN
Neymar NaN NaN 8 NaN
Hazard NaN NaN NaN 16
The code assigns values to specific cells in the DataFrame df
, such as setting the value of 'Messi' in the '2015' column to 12 and so on. After printing the modified DataFrame, it is saved to a CSV file named two.csv
using '@' as the separator.
The contents of the file two.csv
are as follows:
@2015@2016@2017@2018
Messi@12@@@
Ronaldo@@11@@
Neymar@@@8@
Hazard@@@@16
(c)
2015 2016 2017 2018
Messi NaN NaN NaN NaN
Ronaldo NaN NaN NaN NaN
Neymar NaN NaN NaN NaN
Hazard NaN NaN NaN NaN
The code reads the CSV file named one.csv
into a new DataFrame new_df
. The parameter index_col = 0
means the first column of the CSV file will be used as the index. It then prints the contents of new_df
. Since the CSV file has an empty DataFrame, the new_df
DataFrame will be empty as well.
The contents of the file one.csv
are as follows:
,2015,2016,2017,2018
Messi,,,,
Ronaldo,,,,
Neymar,,,,
Hazard,,,,
(d)
Unnamed: 0 2015 2016 2017 2018
0 Messi NaN NaN NaN NaN
1 Ronaldo NaN NaN NaN NaN
2 Neymar NaN NaN NaN NaN
3 Hazard NaN NaN NaN NaN
The code used the pandas read_csv()
function to extract data from one.csv
file. The retrieved data is then stored in a DataFrame called new_df
and then it is printed. Since the index_col
parameter is not specified, the DataFrame is using default numerical indexes. The DataFrame displays "Unnamed: 0" as the header for the first column because the CSV file doesn't have a header row.
The contents of the file one.csv
are as follows:
,2015,2016,2017,2018
Messi,,,,
Ronaldo,,,,
Neymar,,,,
Hazard,,,,
(e)
@2015@2016@2017@2018
0 Messi@12@@@
1 Ronaldo@@11@@
2 Neymar@@@8@
3 Hazard@@@@16
The code used the pandas read_csv()
function to extract data from the two.csv
file. The retrieved data is then stored in a DataFrame called new_df
and printed. However, if the actual separator in the two.csv
file is "@" instead of the default comma (,), then the code without specifying the sep
parameter, will not parse the data correctly. It will assume that the data is comma-separated and will not display the correct structure of the data in the DataFrame.
The contents of the file two.csv
are as follows:
@2015@2016@2017@2018
Messi@12@@@
Ronaldo@@11@@
Neymar@@@8@
Hazard@@@@16
(f)
Unnamed: 0 2015 2016 2017 2018
0 Messi 12.0 NaN NaN NaN
1 Ronaldo NaN 11.0 NaN NaN
2 Neymar NaN NaN 8.0 NaN
3 Hazard NaN NaN NaN 16.0
The code extracts data from the two.csv
file using the read_csv()
function and assigns it to the new_df
DataFrame. It then prints new_df
. The sep
parameter is used to ensure that the DataFrame is correctly parsed, and default indexes are used since index_col
is not specified. The term 'Unnamed' is used because a header for the first column is not mentioned in the CSV file.
The contents of the file two.csv
are as follows:
@2015@2016@2017@2018
Messi@12@@@
Ronaldo@@11@@
Neymar@@@8@
Hazard@@@@16
Yes, the two statements are same. The reason is that when we don't explicitly specify the sep
parameter in pd.read_csv()
, pandas assumes the default separator to be a comma (,). So, both statements are telling pandas to read the CSV file "zoo.csv" with comma-separated values.
The two codes are similar in that they both use pd.read_csv()
to read a CSV file named 'data.csv' into a pandas DataFrame df
. However, they differ in their usage of the nrows
parameter. The first code uses the nrows
parameter with a value of 5, indicating that it reads only the first 5 rows of the CSV file. On the other hand, the second code does not have the nrows
parameter, so it reads the entire CSV file.
For code (i), the output will be a DataFrame containing the first 5 rows of the 'data.csv' file. For code (ii), the output will be a DataFrame containing all the rows of the 'data.csv' file.
The difference between the two statements is whether the DataFrame's index is included as a separate column in the resulting SQL table. By default, when we use to_sql()
without specifying the index
parameter, index = True
is assumed, meaning that the DataFrame's index will be included in the SQL table, as in the first statement. Setting index = False
explicitly excludes the DataFrame's index from being saved as a separate column in the SQL table, as in the second statement.
Consider following code when conn is the name of established connection to MySQL database.
Cars = {'Brand': ['Alto', 'Zen', 'City', 'Kia'],
'Price': [22000, 25000, 27000, 35000]}
df = pd.DataFrame(Cars, columns= ['Brand', 'Price'])
df.to_sql('CARS', conn, if_exists = 'replace', index = False)
What will be the output of following query if executed on MySQL ?
SELECT * from CARS ;
+-------+-------+
| Brand | Price |
+-------+-------+
| Alto | 22000 |
| Zen | 25000 |
| City | 27000 |
| Kia | 35000 |
+-------+-------+
The code initializes a DataFrame df
. It then writes this data to an SQL database table named 'CARS' using to_sql()
. The SQL query SELECT * FROM CARS;
retrieves all columns and rows from the 'CARS' table.
The DataFrame df
includes all the columns where 'zone' equals 'central'.
The code executes an SQL query to select all columns (*) from the "Sales" table where the "zone" column equals to "central". It then reads the results of the query into a pandas DataFrame df
using pandas.read_sql()
. Then, it returns the first five rows of df
using df.head()
.
Write a program to read details such as Item, Sales made in a DataFrame and then store this data in a CSV file.
import pandas as pd
data = {'Item': ['Apple', 'Banana', 'Orange', 'Grapes'],
'Sales': [100, 150, 80, 120]}
df = pd.DataFrame(data)
df.to_csv('one.csv', index = False)
Item,Sales
Apple,100
Banana,150
Orange,80
Grapes,120
Let the contents of the file bike.csv
be the following:
Brand@Price
Honda@2500
Yamaha@2800
Suzuki@2100
Kawasaki@3200
Ducati@3500
BMW@4500
Harley-Davidson@5500
KTM@4000
Triumph@5300
Aprilia@4800
Indian@5700
Royal Enfield@3000
MV Agusta@4200
Moto Guzzi@4900
Victory@5600
Benelli@3900
Husqvarna@4800
Zero Motorcycles@6500
Energica@7500
Norton@5900
The program to read data from a CSV file is as follows:
import pandas as pd
d = pd.read_csv('one.csv', sep = '@', header = None, nrows = 20)
print(d)
0 1
0 Brand Price
1 Honda 2500
2 Yamaha 2800
3 Suzuki 2100
4 Kawasaki 3200
5 Ducati 3500
6 BMW 4500
7 Harley-Davidson 5500
8 KTM 4000
9 Triumph 5300
10 Aprilia 4800
11 Indian 5700
12 Royal Enfield 3000
13 MV Agusta 4200
14 Moto Guzzi 4900
15 Victory 5600
16 Benelli 3900
17 Husqvarna 4800
18 Zero Motorcycles 6500
19 Energica 7500
Write a program to get following data in two DataFrames :
df 1:
Roll no | Name |
---|---|
1 | ABC |
2 | DEF |
df2:
Roll no | Marks 1 | Marks 2 | Marks 3 |
---|---|---|---|
1 | 70 | 80 | 75 |
2 | 60 | 65 | 70 |
Store these DataFrames as two separate tables in the same database.
import pandas as pd
from sqlalchemy import create_engine
import pymysql
data1 = {'Roll no': [1, 2], 'Name': ['ABC', 'DEF']}
data2 = {'Roll no': [1, 2], 'Marks1' : [70, 60], 'Marks2': [80, 65],'Marks3' : [75, 70]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
engine = create_engine('mysql+pymysql://root:Mypass@localhost/database')
mycon = engine.connect()
df1.to_sql('first', mycon, if_exists = 'replace', index = False)
df2.to_sql('second', mycon, if_exists = 'replace', index = False)
First table:
+---------+------+
| Roll no | Name |
+---------+------+
| 1 | ABC |
| 2 | DEF |
+---------+------+
Second table:
+---------+--------+--------+--------+
| Roll no | Marks1 | Marks2 | Marks3 |
+---------+--------+--------+--------+
| 1 | 70 | 80 | 75 |
| 2 | 60 | 65 | 70 |
+---------+--------+--------+--------+
You have a database on MySQL namely school having three tables in it — Student, Subject, Teacher. Write a program to store these tables in three DataFrames.
import pandas as pd
import mysql.connector as sqltor
mycon = sqltor.connect(host = "localhost",
user = "root",
passwd = "mypass",
database = "school")
mdf = pd.read_sql("SELECT * FROM student;", mycon)
ndf = pd.read_sql("SELECT * FROM subject;", mycon)
ldf = pd.read_sql("SELECT * FROM teacher;", mycon)
print("Student dataframe:\n")
print(mdf)
print("Subject dataframe:\n")
print(ndf)
print("Teacher dataframe:\n")
print(ldf)
Student dataframe:
NO NAME AGE DEPARTMENT FEE SEX
0 1 PANKAJ 24 COMPUTER 120 M
1 2 SHALINI 21 HISTORY 200 F
2 3 SANJAY 22 HINDI 300 M
3 4 SUDHA 25 HISTORY 400 F
4 5 ANANYA 23 COMPUTER 450 F
Subject dataframe:
Subject_id name Teacher_id
0 101 math 201
1 102 science 202
2 103 english 203
3 104 hindi 204
Teacher dataframe:
No Name Age Department Salary Sex Dateofjoin
0 1 Jugal 34 Computer 12000 M 1997-01-10
1 2 Sharmila 31 History 20000 F 1998-03-24
2 3 Sandeep 32 Maths 30000 M 1996-12-12
3 4 Sangeeta 35 History 40000 F 1999-07-01
4 5 Rakesh 42 Maths 25000 M 1997-09-05
5 6 Shyam 50 History 30000 M 1998-06-27
6 7 Shiv Om 44 Computer 21000 M 1997-02-25
7 8 Shalakha 33 Maths 20000 F 1997-07-31
The DataFrame SDF stores the sales records of 100 salesmen. Write a program to store this as a table in database namely "company" on MySQL.
import pandas as pd
from sqlalchemy import create_engine
import pymysql
def sales(sdf):
engine = create_engine('mysql+pymysql://root:Mypass@localhost/company')
mycon = engine.connect()
sdf.to_sql('Records', mycon, if_exists = 'replace', index = False)
Consider the SDF DataFrame storing the sales records of 100 salesmen. Write a program that stores only the first 25 rows of the DataFrame in a table on MySQL database.
import pandas as pd
from sqlalchemy import create_engine
import pymysql
def sales(sdf):
engine = create_engine('mysql+pymysql://root:Mypass@localhost/company')
mycon = engine.connect()
first_25_rows = sdf.head(25)
first_25_rows.to_sql('Records', mycon, if_exists = 'replace', index = False)
The sales table of company database of MySQL stores the sales records of 100 salesmen. Write a program to load only those records in DataFrame which have made sales more than of 50000/-.
import pandas as pd
import mysql.connector as sqltor
mycon = sqltor.connect(host = "localhost", user = "root", passwd = "mypass", database = "company")
df = pd.read_sql("SELECT * FROM sales WHERE Sales > 50000;", mycon)
print(df)