Pandas - Loading DataSet Files Flashcards
Where to Download Data Set Files ?
Navigate to https://www.kaggle.com/
How to Download titanic Data Set ?
Navigate to https://www.kaggle.com/
Now search for titanic in search bar,
Download trains.csv
How to import csv file in python ?
import pandas module create dataframe object use read_csv() method of data frame object
How to get current working directory ?
import os
print(os.getcwd())
How to list directory contents of OS by using Python ?
import os
print(os.listdir())
How to present first 10 rows of Dataset ?
df.head(10)
How to Load csv file with no header ?
We have to use header =None Argument
df = pd.read_csv(‘titanic.csv’,header = None)
How to Load CSV file with a parameter to indicate header line in csv File ?
We have to use header= 0 argument
df = pd.read_csv(‘dataSets/titanic.csv’,header = 0)
How to load data set without its default column Labels ,how we add some text to column labels ?
We can use prefix when we manually creatimg labels of colums .
df=df.read_csv(‘dataSets/titanic.csv’,header=None,prefix= ‘Col-‘)
How to assign column names manually while loading csv File ?
We have to use names & header arguments of read_csv() method , create a list colNames & assighn it to names .
colNames=[‘Col-1’,’Col-2’,’Col-3’,’Col-4’]
df = pd.read_csv(‘dataSets/titanic.csv’,usecols=[1,3,4,5],names=colNames)
How to add own values to column labels while Loading Data Set ?
We have to use prefix option .
df = pd.read_csv(‘dataSets/titanic.csv’,header=None,prefix=’Col-‘)
How to use a column values as row index ?
We have to use index_col argument of read_csv() method
df = pd.read_csv(‘dataSets/titanic.csv’,header=0,index_col=3)
How to create a Pandas Data Frame object by Selected columns from Data Set ?
We have to use usecols=[] argument ?
df = pd.read_csv(‘dataSets/titanic.csv’,header=0,usecols=[0,1,2,3,4]
How to Present Datatypes of Dataset ?
df.dtypes
How to override existing data type of Data Set column ?
By using Function Argument
dtype= {‘Column Label’:’bool’}
How to typecast the value of a column in Data Set ?
We have to use dtype argument of read_csv method , dictionary object is used.
dtype= {‘Column Label’:’bool’}
How to present only first four rows of Data Set?
df.head()
How to present only Last four rows of Data Set ?
df.tail()
While presenting datatypes of dataset columns ,if data type is identified as object type what it means ?
It means the data type is an python object, like strings
Where to get online help for pandas module ?
Navigate to
https://pandas.pydata.org/pandas-docs/stable/reference/io.html
What is the name of data set which can be used to start learning data science ?
titanic data set,available on kaggle
How to ignore Column Labels while reading data sets ?
we have to use header =None Argument
df = pd.read_csv(‘dataSets/titanic.csv’,header=None)
How to present index attribute of dataFrame ?
df.index
How to present column attribute of dataFrame ?
df.columns
How to present values attribute of dataFrame ?
index.values
How to preview data frame ?
df
What is the value of empty cell of pandas data frame ?
NaN
How to know how many Rows & Columns are in data Frame ?
We have to use shape attribute of pandas Data Frame object.
df.shape
How to present details of Data Frame ?
We have to use info method of pandas DataFrame?
df.info()
How to present Total Values of all colums ?
df.count()
How to present colums which have missing values ?
df.count()
How to present unique values of colums ?
df[‘Sex’].unique()
How to present frequency of unique values in given column ?
df[‘Sex’].value_counts()
How to normalize unique values of colums ?
Here Normalize means that ,percentage of value counts.
df[‘Sex’].value_counts(normalize=True)
How to transpose Pandas Data Frame ?
df2 = df.T
What is the minium age in Titanic Data Set ?
df[‘Age’].min()
Q.What is the maximum age in Titanic Data Set ?
df[‘Age’].max()
Q. What is the average age in Titanic Data Set
df[‘Age’].mean()
What is the default Operation of Pandas Data Frame methods() ?
They exclude missing values ?
Q. How to present of mean of two columns of data frame ?
df[[‘Age’,’Fare’]].mean()
Q.How to get max of all columns in one notation ?
df.max()
Q.How to know, How many people survived ?
We have to find out sum of all values in Survived Column
df[‘Survived’].sum()
print(f”Survived People : {(df[‘Survived’].value_counts())[1]}”)
How to know total fare given by passengers ?
df[‘Fare’].sum()
Q.What is the maximum age of 80 percent passangers ?
df.quantile(0.8)
How to get Summary View of Data Set ?
df.describe()
Q. How to describe pandas Data Frame column only ?
df[‘Age’].describe()
What is Methods Chaining ?
Method chaining is a technique that is used for making multiple method calls on the same object.
df[‘Age’].dropna().mean()
Q. Drop rows which have missing value cells, & find mean of age , campare this with non dropped data frame ?
df.dropna()[‘Age’].mean()
df[‘Age’].mean()
Q. How to drop rows which have missing value cells ?
df.dropna()
How to know the index range of Data Set ?
df.index
How to know the columns in Data Set ?
df.columns
How to know the values of DataFrame ?
df.values
How to know the memory usage of Data Frame Object ?
We have to use info() method of pandas DataFrame.
df.info()
How to check ,number of Missing Values in Data set ?
We have to use count() method of pandas Data Frame Object ?
df.count()
How to know the percentage of values in DataFrame column,
for example percentage of Males & Females in Sex column ?
We have to use value_count() method of pandas dataframe object.
df[‘Sex’].value_count(normalize=True)
What is Transpose of Data Frame ?
DataFrame Transpose is a technique that is used for swaping column labels & row indexes.
It means,displaying column labels as row index & row index as column Labels.
df2 = df.T
df2
How to filter column from the output of dataFrame method ?
df.dropna()[‘Age’].mean()
What is the Error while running python code ,
TypeError: ‘Index’ object is not callable
This error comes when we use object attribute as method,
like df.columns(), column is an attribute not an function.
How to know total missing values of every column by using method chaining ?
df.isnull().sum()
Why we use isnull() method, & where it belongs to ?
isnull() method is used for fetching total missing values of every column .
What is the difference between count & isnull method ?
count method returns total values of each column,
while isnull method returns total missing values of each column.
How to present total missing values of Data Set ?
We have to use method chaining for this ,
df.isnull().sum().sum()
What is the alias of isnull method ?
isna
How to sort Column of pandas Data Set by column ?
df.sort_values(‘Age’,ascending=False)
How to Filter column after sorting of Data Frame ?
df.sort_values(‘Age’,ascending=False)[[‘Name’,’Age’]]
How to print,how many Missing values in a Data Set ?
df.isnull().sum().sum()
How to filter Name & Age,after sorting Data Frame by Age column,present first 10 rows only ?
df.sort_values(‘Age’,ascending=False)[[‘Name’,’Age’]].head(10)
How many methods are for Data Frame Sorting
sort_values
nlargest
nsmallest
What is Boolean Selection Filtering
Boolean Selection refers to selecting rows by providing
a boolean value,True or False for each Row.These Boolean values are usually created by applying a boolean condition to one or more columns in a Data Frame
How to make Boolean Selection Filtering Condition ?
condition = df[‘Age’]>60
df[condition]
How to make Boolean Selection Filtering Condition, & present it with loc method ?
condition = df[‘Age’]>60
df.loc[condition][[‘Name’,’Age’]]
Select Passengers whose age is greater than 60 ?
df[df[‘Age’]>60]
Select all Female Passengers in titanic Data Set?
df[df[‘Sex’]==’female’]
Sort titanic Data Set by two columns , & display first 10 selected rows only with Name,Age,Sex?
df.sort_values([‘Age’,’Sex’],ascending=[False,True])[[‘Name’,’Age’,’Sex’]].head(10)
Present ten rows which have maximum age ?
df.nlargest(10,’Age’)
Present ten rows which have minimum age by using very less notation ? ?
df.nsmallest(10,’Age’)
What is Aggregation ?
Aggregation is the process of grouping rows & convert down to a single value.
What is the average ticket price for male compare to female passenges ?
averageTicketPriceFemale = df[df[‘Sex’]==’female’][‘Fare’].mean()
print(f”Average Ticket Price of Female Passengers : {averageTicketPriceFemale:,.2f}”)
What are aggregation Functions ?
We can use aggregation for making one value from many items.
Aggregation Functions are mean().sum(),max(),count(), etc
What is the average age of survived people ?
df.groupby(‘Survived’).mean()[‘Age’]
First we group Survived Column ,then we find the mean of grouped values 0 & 1 of Survived or Non Survived Passengers.
How many people survived from they board on ship ?
df.groupby(‘Embarked’)[‘Survived’].count()
How to find maximum age in Data Set ?
df.sort_values(‘Age’,ascending=True)[‘Age’].max()
How to use multiple aggregate functions ?
df.groupby(‘Embarked’)[‘Age’].agg([‘count’,’mean’,’min’,’max’])
What is the abreviation of NaN ?
Not a Number