Pandas - Loading DataSet Files Flashcards

1
Q

Where to Download Data Set Files ?

A

Navigate to https://www.kaggle.com/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to Download titanic Data Set ?

A

Navigate to https://www.kaggle.com/
Now search for titanic in search bar,
Download trains.csv

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to import csv file in python ?

A
import pandas module
create dataframe object
use read_csv()  method of data frame object
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to get current working directory ?

A

import os

print(os.getcwd())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to list directory contents of OS by using Python ?

A

import os

print(os.listdir())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to present first 10 rows of Dataset ?

A

df.head(10)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to Load csv file with no header ?

A

We have to use header =None Argument

df = pd.read_csv(‘titanic.csv’,header = None)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to Load CSV file with a parameter to indicate header line in csv File ?

A

We have to use header= 0 argument

df = pd.read_csv(‘dataSets/titanic.csv’,header = 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to load data set without its default column Labels ,how we add some text to column labels ?

A

We can use prefix when we manually creatimg labels of colums .
df=df.read_csv(‘dataSets/titanic.csv’,header=None,prefix= ‘Col-‘)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to assign column names manually while loading csv File ?

A

We have to use names & header arguments of read_csv() method , create a list colNames & assighn it to names .
colNames=[‘Col-1’,’Col-2’,’Col-3’,’Col-4’]
df = pd.read_csv(‘dataSets/titanic.csv’,usecols=[1,3,4,5],names=colNames)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to add own values to column labels while Loading Data Set ?

A

We have to use prefix option .

df = pd.read_csv(‘dataSets/titanic.csv’,header=None,prefix=’Col-‘)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to use a column values as row index ?

A

We have to use index_col argument of read_csv() method

df = pd.read_csv(‘dataSets/titanic.csv’,header=0,index_col=3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to create a Pandas Data Frame object by Selected columns from Data Set ?

A

We have to use usecols=[] argument ?

df = pd.read_csv(‘dataSets/titanic.csv’,header=0,usecols=[0,1,2,3,4]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to Present Datatypes of Dataset ?

A

df.dtypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to override existing data type of Data Set column ?

A

By using Function Argument

dtype= {‘Column Label’:’bool’}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to typecast the value of a column in Data Set ?

A

We have to use dtype argument of read_csv method , dictionary object is used.

dtype= {‘Column Label’:’bool’}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How to present only first four rows of Data Set?

A

df.head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How to present only Last four rows of Data Set ?

A

df.tail()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

While presenting datatypes of dataset columns ,if data type is identified as object type what it means ?

A

It means the data type is an python object, like strings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Where to get online help for pandas module ?

A

Navigate to

https://pandas.pydata.org/pandas-docs/stable/reference/io.html

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the name of data set which can be used to start learning data science ?

A

titanic data set,available on kaggle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How to ignore Column Labels while reading data sets ?

A

we have to use header =None Argument

df = pd.read_csv(‘dataSets/titanic.csv’,header=None)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How to present index attribute of dataFrame ?

A

df.index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How to present column attribute of dataFrame ?

A

df.columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How to present values attribute of dataFrame ?
index.values
26
How to preview data frame ?
df
27
What is the value of empty cell of pandas data frame ?
NaN
28
How to know how many Rows & Columns are in data Frame ?
We have to use shape attribute of pandas Data Frame object. | df.shape
29
How to present details of Data Frame ?
We have to use info method of pandas DataFrame? | df.info()
30
How to present Total Values of all colums ?
df.count()
30
How to present colums which have missing values ?
df.count()
31
How to present unique values of colums ?
df['Sex'].unique()
32
How to present frequency of unique values in given column ?
df['Sex'].value_counts()
33
How to normalize unique values of colums ?
Here Normalize means that ,percentage of value counts. df['Sex'].value_counts(normalize=True)
34
How to transpose Pandas Data Frame ?
df2 = df.T
35
What is the minium age in Titanic Data Set ?
df['Age'].min()
36
Q.What is the maximum age in Titanic Data Set ?
df['Age'].max()
37
Q. What is the average age in Titanic Data Set
df['Age'].mean()
38
What is the default Operation of Pandas Data Frame methods() ?
They exclude missing values ?
39
Q. How to present of mean of two columns of data frame ?
df[['Age','Fare']].mean()
40
Q.How to get max of all columns in one notation ?
df.max()
40
Q.How to know, How many people survived ?
We have to find out sum of all values in Survived Column df['Survived'].sum() print(f"Survived People : {(df['Survived'].value_counts())[1]}")
40
How to know total fare given by passengers ?
df['Fare'].sum()
40
Q.What is the maximum age of 80 percent passangers ?
df.quantile(0.8)
40
How to get Summary View of Data Set ?
df.describe()
40
Q. How to describe pandas Data Frame column only ?
df['Age'].describe()
40
What is Methods Chaining ?
Method chaining is a technique that is used for making multiple method calls on the same object. df['Age'].dropna().mean()
40
Q. Drop rows which have missing value cells, & find mean of age , campare this with non dropped data frame ?
df.dropna()['Age'].mean() | df['Age'].mean()
40
Q. How to drop rows which have missing value cells ?
df.dropna()
41
How to know the index range of Data Set ?
df.index
42
How to know the columns in Data Set ?
df.columns
43
How to know the values of DataFrame ?
df.values
44
How to know the memory usage of Data Frame Object ?
We have to use info() method of pandas DataFrame. | df.info()
45
How to check ,number of Missing Values in Data set ?
We have to use count() method of pandas Data Frame Object ? | df.count()
46
How to know the percentage of values in DataFrame column, | for example percentage of Males & Females in Sex column ?
We have to use value_count() method of pandas dataframe object. df['Sex'].value_count(normalize=True)
47
What is Transpose of Data Frame ?
DataFrame Transpose is a technique that is used for swaping column labels & row indexes. It means,displaying column labels as row index & row index as column Labels. df2 = df.T df2
48
How to filter column from the output of dataFrame method ?
df.dropna()['Age'].mean()
49
What is the Error while running python code , | TypeError: 'Index' object is not callable
This error comes when we use object attribute as method, | like df.columns(), column is an attribute not an function.
50
How to know total missing values of every column by using method chaining ?
df.isnull().sum()
51
Why we use isnull() method, & where it belongs to ?
isnull() method is used for fetching total missing values of every column .
52
What is the difference between count & isnull method ?
count method returns total values of each column, | while isnull method returns total missing values of each column.
53
How to present total missing values of Data Set ?
We have to use method chaining for this , | df.isnull().sum().sum()
54
What is the alias of isnull method ?
isna
55
How to sort Column of pandas Data Set by column ?
df.sort_values('Age',ascending=False)
56
How to Filter column after sorting of Data Frame ?
df.sort_values('Age',ascending=False)[['Name','Age']]
57
How to print,how many Missing values in a Data Set ?
df.isnull().sum().sum()
58
How to filter Name & Age,after sorting Data Frame by Age column,present first 10 rows only ?
df.sort_values('Age',ascending=False)[['Name','Age']].head(10)
59
How many methods are for Data Frame Sorting
sort_values nlargest nsmallest
60
What is Boolean Selection Filtering
Boolean Selection refers to selecting rows by providing a boolean value,True or False for each Row.These Boolean values are usually created by applying a boolean condition to one or more columns in a Data Frame
61
How to make Boolean Selection Filtering Condition ?
condition = df['Age']>60 | df[condition]
62
How to make Boolean Selection Filtering Condition, & present it with loc method ?
condition = df['Age']>60 | df.loc[condition][['Name','Age']]
63
Select Passengers whose age is greater than 60 ?
df[df['Age']>60]
64
Select all Female Passengers in titanic Data Set?
df[df['Sex']=='female']
65
Sort titanic Data Set by two columns , & display first 10 selected rows only with Name,Age,Sex?
df.sort_values(['Age','Sex'],ascending=[False,True])[['Name','Age','Sex']].head(10)
66
Present ten rows which have maximum age ?
df.nlargest(10,'Age')
67
Present ten rows which have minimum age by using very less notation ? ?
df.nsmallest(10,'Age')
68
What is Aggregation ?
Aggregation is the process of grouping rows & convert down to a single value.
69
What is the average ticket price for male compare to female passenges ?
averageTicketPriceFemale = df[df['Sex']=='female']['Fare'].mean() print(f"Average Ticket Price of Female Passengers : {averageTicketPriceFemale:,.2f}")
70
What are aggregation Functions ?
We can use aggregation for making one value from many items. | Aggregation Functions are mean().sum(),max(),count(), etc
71
What is the average age of survived people ?
df.groupby('Survived').mean()['Age'] | First we group Survived Column ,then we find the mean of grouped values 0 & 1 of Survived or Non Survived Passengers.
72
How many people survived from they board on ship ?
df.groupby('Embarked')['Survived'].count()
73
How to find maximum age in Data Set ?
df.sort_values('Age',ascending=True)['Age'].max()
74
How to use multiple aggregate functions ?
df.groupby('Embarked')['Age'].agg(['count','mean','min','max'])
75
What is the abreviation of NaN ?
Not a Number