Chp 11 Flashcards
Quiz-08 - Data Wrangling using pandas advanced
Consider the following pandas DataFrame:
student_data = pd.DataFrame({
‘school_code’: [‘s001’,’s002’,’s003’,’s001’,’s002’,’s004’],
‘class’: [‘V’, ‘V’, ‘VI’, ‘VI’, ‘V’, ‘VI’],
‘name’: [‘Alberto Franco’,’Gino Mcneill’,’Ryan Parkes’, ‘Eesha Hinton’, ‘Gino Mcneill’, ‘David Parkes’],
‘date_Of_Birth ‘: [‘15/05/2002’,’17/05/2002’,’16/02/1999’,’25/09/1998’,’11/05/2002’,’15/09/1997’],
‘age’: [12, 12, 13, 13, 14, 12],
‘height’: [173, 192, 186, 167, 151, 159],
‘weight’: [35, 32, 33, 30, 31, 32],
‘address’: [‘street1’, ‘street2’, ‘street3’, ‘street1’, ‘street2’, ‘street4’]},
index=[‘S1’, ‘S2’, ‘S3’, ‘S4’, ‘S5’, ‘S6’])
Fill in the blank to convert the age column to a floating-point data type.
student_data[‘age’] = student_data[‘age’]._astype_______(float)
apply or astype
Consider the following pandas dataframe
df = pd.DataFrame({“A” : [ 1, 2, 3, 4], “B” : [“apple”, “banana”, “cherry”, “date”]})
Which command returns the third row of the dataframe? Select all correct answers.
df.iloc[2]
Correct Answer
df.loc[2]
Consider the following pandas dataframe
student_data = pd.DataFrame({
‘school_code’: [‘s001’,’s002’,’s003’,’s001’,’s002’,’s004’],
‘class’: [‘V’, ‘V’, ‘VI’, ‘VI’, ‘V’, ‘VI’],
‘name’: [‘Alberto Franco’,’Gino Mcneill’,’Ryan Parkes’, ‘Eesha Hinton’, ‘Gino Mcneill’, ‘David Parkes’],
‘date_Of_Birth ‘: [‘15/05/2002’,’17/05/2002’,’16/02/1999’,’25/09/1998’,’11/05/2002’,’15/09/1997’],
‘age’: [12, 12, 13, 13, 14, 12],
‘height’: [173, 192, 186, 167, 151, 159],
‘weight’: [35, 32, 33, 30, 31, 32],
‘address’: [‘street1’, ‘street2’, ‘street3’, ‘street1’, ‘street2’, ‘street4’]},
index=[‘S1’, ‘S2’, ‘S3’, ‘S4’, ‘S5’, ‘S6’])
Fill in the blank to get mean, min, and max value of age for each school.
grouped_single = student_data.groupby(‘school_code’).
agg
({
“age”
: [‘mean’, ‘min’, ‘max’]})
Consider the following pandas dataframe
import numpy as np
df = pd.DataFrame({“A”: [1, 2, 1, 4, 2], “B” : [1, 3, np.nan, 1, 4],
“C” : [2, 1, 1, 2, 3], “D” : [10, 20, 15, 25, 30]})
Note that np.nan represents a missing value (Not a Number) in the column. The column “B” has one such missing value.
When you apply the following command, what will be the output?
df2 = df.groupby(“A”, axis=0)[“D”].mean()
print(df2.iloc[0])
12.5
Consider the following pandas dataframe
student_data1 = pd.DataFrame({
‘student_id’: [‘S1’, ‘S2’, ‘S3’, ‘S4’, ‘S5’],
‘name’: [‘Danniella Fenton’, ‘Ryder Storey’, ‘Bryce Jensen’, ‘Ed Bernal’, ‘Kwame Morin’],
‘marks’: [200, 210, 190, 222, 199]})
student_data2 = pd.DataFrame({
‘student_id’: [‘S4’, ‘S5’, ‘S6’, ‘S7’, ‘S8’],
‘name’: [‘Scarlette Fisher’, ‘Carla Williamson’, ‘Dante Morse’, ‘Kaiser William’, ‘Madeeha Preston’],
‘marks’: [201, 200, 198, 219, 201]})
Fill in the blank to join the two dataframes along rows. Use pd.concat() function.
result_data = pd.concat(______)
Answers:
[student_data1,student_data2], axis=0
[student_data1,student_data2]
[student_data1, student_data2]
[student_data1, student_data2], axis=0