Pandas - Loading DataSet Files Flashcards
Where to Download Data Set Files ?
Navigate to https://www.kaggle.com/
How to Download titanic Data Set ?
Navigate to https://www.kaggle.com/
Now search for titanic in search bar,
Download trains.csv
How to import csv file in python ?
import pandas module create dataframe object use read_csv() method of data frame object
How to get current working directory ?
import os
print(os.getcwd())
How to list directory contents of OS by using Python ?
import os
print(os.listdir())
How to present first 10 rows of Dataset ?
df.head(10)
How to Load csv file with no header ?
We have to use header =None Argument
df = pd.read_csv(‘titanic.csv’,header = None)
How to Load CSV file with a parameter to indicate header line in csv File ?
We have to use header= 0 argument
df = pd.read_csv(‘dataSets/titanic.csv’,header = 0)
How to load data set without its default column Labels ,how we add some text to column labels ?
We can use prefix when we manually creatimg labels of colums .
df=df.read_csv(‘dataSets/titanic.csv’,header=None,prefix= ‘Col-‘)
How to assign column names manually while loading csv File ?
We have to use names & header arguments of read_csv() method , create a list colNames & assighn it to names .
colNames=[‘Col-1’,’Col-2’,’Col-3’,’Col-4’]
df = pd.read_csv(‘dataSets/titanic.csv’,usecols=[1,3,4,5],names=colNames)
How to add own values to column labels while Loading Data Set ?
We have to use prefix option .
df = pd.read_csv(‘dataSets/titanic.csv’,header=None,prefix=’Col-‘)
How to use a column values as row index ?
We have to use index_col argument of read_csv() method
df = pd.read_csv(‘dataSets/titanic.csv’,header=0,index_col=3)
How to create a Pandas Data Frame object by Selected columns from Data Set ?
We have to use usecols=[] argument ?
df = pd.read_csv(‘dataSets/titanic.csv’,header=0,usecols=[0,1,2,3,4]
How to Present Datatypes of Dataset ?
df.dtypes
How to override existing data type of Data Set column ?
By using Function Argument
dtype= {‘Column Label’:’bool’}
How to typecast the value of a column in Data Set ?
We have to use dtype argument of read_csv method , dictionary object is used.
dtype= {‘Column Label’:’bool’}
How to present only first four rows of Data Set?
df.head()
How to present only Last four rows of Data Set ?
df.tail()
While presenting datatypes of dataset columns ,if data type is identified as object type what it means ?
It means the data type is an python object, like strings
Where to get online help for pandas module ?
Navigate to
https://pandas.pydata.org/pandas-docs/stable/reference/io.html
What is the name of data set which can be used to start learning data science ?
titanic data set,available on kaggle
How to ignore Column Labels while reading data sets ?
we have to use header =None Argument
df = pd.read_csv(‘dataSets/titanic.csv’,header=None)
How to present index attribute of dataFrame ?
df.index
How to present column attribute of dataFrame ?
df.columns
How to present values attribute of dataFrame ?
index.values
How to preview data frame ?
df
What is the value of empty cell of pandas data frame ?
NaN
How to know how many Rows & Columns are in data Frame ?
We have to use shape attribute of pandas Data Frame object.
df.shape
How to present details of Data Frame ?
We have to use info method of pandas DataFrame?
df.info()
How to present Total Values of all colums ?
df.count()
How to present colums which have missing values ?
df.count()
How to present unique values of colums ?
df[‘Sex’].unique()
How to present frequency of unique values in given column ?
df[‘Sex’].value_counts()