Pandas - Loading DataSet Files Flashcards

1
Q

Where to Download Data Set Files ?

A

Navigate to https://www.kaggle.com/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to Download titanic Data Set ?

A

Navigate to https://www.kaggle.com/
Now search for titanic in search bar,
Download trains.csv

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to import csv file in python ?

A
import pandas module
create dataframe object
use read_csv()  method of data frame object
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to get current working directory ?

A

import os

print(os.getcwd())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to list directory contents of OS by using Python ?

A

import os

print(os.listdir())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to present first 10 rows of Dataset ?

A

df.head(10)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to Load csv file with no header ?

A

We have to use header =None Argument

df = pd.read_csv(‘titanic.csv’,header = None)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to Load CSV file with a parameter to indicate header line in csv File ?

A

We have to use header= 0 argument

df = pd.read_csv(‘dataSets/titanic.csv’,header = 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to load data set without its default column Labels ,how we add some text to column labels ?

A

We can use prefix when we manually creatimg labels of colums .
df=df.read_csv(‘dataSets/titanic.csv’,header=None,prefix= ‘Col-‘)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to assign column names manually while loading csv File ?

A

We have to use names & header arguments of read_csv() method , create a list colNames & assighn it to names .
colNames=[‘Col-1’,’Col-2’,’Col-3’,’Col-4’]
df = pd.read_csv(‘dataSets/titanic.csv’,usecols=[1,3,4,5],names=colNames)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to add own values to column labels while Loading Data Set ?

A

We have to use prefix option .

df = pd.read_csv(‘dataSets/titanic.csv’,header=None,prefix=’Col-‘)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to use a column values as row index ?

A

We have to use index_col argument of read_csv() method

df = pd.read_csv(‘dataSets/titanic.csv’,header=0,index_col=3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to create a Pandas Data Frame object by Selected columns from Data Set ?

A

We have to use usecols=[] argument ?

df = pd.read_csv(‘dataSets/titanic.csv’,header=0,usecols=[0,1,2,3,4]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to Present Datatypes of Dataset ?

A

df.dtypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to override existing data type of Data Set column ?

A

By using Function Argument

dtype= {‘Column Label’:’bool’}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to typecast the value of a column in Data Set ?

A

We have to use dtype argument of read_csv method , dictionary object is used.

dtype= {‘Column Label’:’bool’}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How to present only first four rows of Data Set?

A

df.head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How to present only Last four rows of Data Set ?

A

df.tail()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

While presenting datatypes of dataset columns ,if data type is identified as object type what it means ?

A

It means the data type is an python object, like strings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Where to get online help for pandas module ?

A

Navigate to

https://pandas.pydata.org/pandas-docs/stable/reference/io.html

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the name of data set which can be used to start learning data science ?

A

titanic data set,available on kaggle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How to ignore Column Labels while reading data sets ?

A

we have to use header =None Argument

df = pd.read_csv(‘dataSets/titanic.csv’,header=None)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How to present index attribute of dataFrame ?

A

df.index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How to present column attribute of dataFrame ?

A

df.columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How to present values attribute of dataFrame ?

A

index.values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How to preview data frame ?

A

df

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the value of empty cell of pandas data frame ?

A

NaN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How to know how many Rows & Columns are in data Frame ?

A

We have to use shape attribute of pandas Data Frame object.

df.shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How to present details of Data Frame ?

A

We have to use info method of pandas DataFrame?

df.info()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How to present Total Values of all colums ?

A

df.count()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How to present colums which have missing values ?

A

df.count()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How to present unique values of colums ?

A

df[‘Sex’].unique()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How to present frequency of unique values in given column ?

A

df[‘Sex’].value_counts()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How to normalize unique values of colums ?

A

Here Normalize means that ,percentage of value counts.

df[‘Sex’].value_counts(normalize=True)

34
Q

How to transpose Pandas Data Frame ?

A

df2 = df.T

35
Q

What is the minium age in Titanic Data Set ?

A

df[‘Age’].min()

36
Q

Q.What is the maximum age in Titanic Data Set ?

A

df[‘Age’].max()

37
Q

Q. What is the average age in Titanic Data Set

A

df[‘Age’].mean()

38
Q

What is the default Operation of Pandas Data Frame methods() ?

A

They exclude missing values ?

39
Q

Q. How to present of mean of two columns of data frame ?

A

df[[‘Age’,’Fare’]].mean()

40
Q

Q.How to get max of all columns in one notation ?

A

df.max()

40
Q

Q.How to know, How many people survived ?

A

We have to find out sum of all values in Survived Column
df[‘Survived’].sum()

print(f”Survived People : {(df[‘Survived’].value_counts())[1]}”)

40
Q

How to know total fare given by passengers ?

A

df[‘Fare’].sum()

40
Q

Q.What is the maximum age of 80 percent passangers ?

A

df.quantile(0.8)

40
Q

How to get Summary View of Data Set ?

A

df.describe()

40
Q

Q. How to describe pandas Data Frame column only ?

A

df[‘Age’].describe()

40
Q

What is Methods Chaining ?

A

Method chaining is a technique that is used for making multiple method calls on the same object.
df[‘Age’].dropna().mean()

40
Q

Q. Drop rows which have missing value cells, & find mean of age , campare this with non dropped data frame ?

A

df.dropna()[‘Age’].mean()

df[‘Age’].mean()

40
Q

Q. How to drop rows which have missing value cells ?

A

df.dropna()

41
Q

How to know the index range of Data Set ?

A

df.index

42
Q

How to know the columns in Data Set ?

A

df.columns

43
Q

How to know the values of DataFrame ?

A

df.values

44
Q

How to know the memory usage of Data Frame Object ?

A

We have to use info() method of pandas DataFrame.

df.info()

45
Q

How to check ,number of Missing Values in Data set ?

A

We have to use count() method of pandas Data Frame Object ?

df.count()

46
Q

How to know the percentage of values in DataFrame column,

for example percentage of Males & Females in Sex column ?

A

We have to use value_count() method of pandas dataframe object.
df[‘Sex’].value_count(normalize=True)

47
Q

What is Transpose of Data Frame ?

A

DataFrame Transpose is a technique that is used for swaping column labels & row indexes.
It means,displaying column labels as row index & row index as column Labels.
df2 = df.T
df2

48
Q

How to filter column from the output of dataFrame method ?

A

df.dropna()[‘Age’].mean()

49
Q

What is the Error while running python code ,

TypeError: ‘Index’ object is not callable

A

This error comes when we use object attribute as method,

like df.columns(), column is an attribute not an function.

50
Q

How to know total missing values of every column by using method chaining ?

A

df.isnull().sum()

51
Q

Why we use isnull() method, & where it belongs to ?

A

isnull() method is used for fetching total missing values of every column .

52
Q

What is the difference between count & isnull method ?

A

count method returns total values of each column,

while isnull method returns total missing values of each column.

53
Q

How to present total missing values of Data Set ?

A

We have to use method chaining for this ,

df.isnull().sum().sum()

54
Q

What is the alias of isnull method ?

A

isna

55
Q

How to sort Column of pandas Data Set by column ?

A

df.sort_values(‘Age’,ascending=False)

56
Q

How to Filter column after sorting of Data Frame ?

A

df.sort_values(‘Age’,ascending=False)[[‘Name’,’Age’]]

57
Q

How to print,how many Missing values in a Data Set ?

A

df.isnull().sum().sum()

58
Q

How to filter Name & Age,after sorting Data Frame by Age column,present first 10 rows only ?

A

df.sort_values(‘Age’,ascending=False)[[‘Name’,’Age’]].head(10)

59
Q

How many methods are for Data Frame Sorting

A

sort_values
nlargest
nsmallest

60
Q

What is Boolean Selection Filtering

A

Boolean Selection refers to selecting rows by providing
a boolean value,True or False for each Row.These Boolean values are usually created by applying a boolean condition to one or more columns in a Data Frame

61
Q

How to make Boolean Selection Filtering Condition ?

A

condition = df[‘Age’]>60

df[condition]

62
Q

How to make Boolean Selection Filtering Condition, & present it with loc method ?

A

condition = df[‘Age’]>60

df.loc[condition][[‘Name’,’Age’]]

63
Q

Select Passengers whose age is greater than 60 ?

A

df[df[‘Age’]>60]

64
Q

Select all Female Passengers in titanic Data Set?

A

df[df[‘Sex’]==’female’]

65
Q

Sort titanic Data Set by two columns , & display first 10 selected rows only with Name,Age,Sex?

A

df.sort_values([‘Age’,’Sex’],ascending=[False,True])[[‘Name’,’Age’,’Sex’]].head(10)

66
Q

Present ten rows which have maximum age ?

A

df.nlargest(10,’Age’)

67
Q

Present ten rows which have minimum age by using very less notation ? ?

A

df.nsmallest(10,’Age’)

68
Q

What is Aggregation ?

A

Aggregation is the process of grouping rows & convert down to a single value.

69
Q

What is the average ticket price for male compare to female passenges ?

A

averageTicketPriceFemale = df[df[‘Sex’]==’female’][‘Fare’].mean()
print(f”Average Ticket Price of Female Passengers : {averageTicketPriceFemale:,.2f}”)

70
Q

What are aggregation Functions ?

A

We can use aggregation for making one value from many items.

Aggregation Functions are mean().sum(),max(),count(), etc

71
Q

What is the average age of survived people ?

A

df.groupby(‘Survived’).mean()[‘Age’]

First we group Survived Column ,then we find the mean of grouped values 0 & 1 of Survived or Non Survived Passengers.

72
Q

How many people survived from they board on ship ?

A

df.groupby(‘Embarked’)[‘Survived’].count()

73
Q

How to find maximum age in Data Set ?

A

df.sort_values(‘Age’,ascending=True)[‘Age’].max()

74
Q

How to use multiple aggregate functions ?

A

df.groupby(‘Embarked’)[‘Age’].agg([‘count’,’mean’,’min’,’max’])

75
Q

What is the abreviation of NaN ?

A

Not a Number