Pandas Flashcards

Question 1

Q

DataFrame:

Create a dataframe from scratch

Answer

A

First create lists
age = [25,20,26,30]
height = [150,160,180,170]
names = ['Mārtiņš','Aiga','Kristiāns','Valters']
column_names = ['Age','Height','Names']

list_cols = [name, age, height]

zipped = list(zip(column_names,list_cols))

data = dict(zipped)

df = pd.DataFrame(data)

Question 2

Q

DataFrame:

Add a new column with values 0

Answer

A

df[‘Salary’] = 0

Question 3

Q

DataFrame:

Rename columns

Answer

A

df.columns = [‘name’,’school’,age’]

Question 4

Q

DataFrame:

Rename indexes

Answer

A

df.index = [‘A’, ‘B’, ‘C’]

Question 5

Q

DataFrame:

Create a dictionary from lists

Answer

A

list1 = [  ]
list2 = [  ]
column_names = [  ]

columns = [list1, list2]

created_tuples = list(zip(column_names,columns]

created_dict = dict(created_tuples)

Question 6

Q

Get how many rows and columns the DataFrame has

Answer

A

my_dataframe.shape

Question 7

Q

Get the column names

Answer

A

my_dataframe.columns

Question 8

Q

DataFrame: 
Slice the dataframe.
1) first 5 rows
2) last 5 rows
3) columns 3 to 5 included
4) each 3rd row

Answer

A

my_data.iloc[:6,:]
my_data.iloc[-5:,:]
my_data.iloc[:,3:6]
my_data.iloc[::3,:]

Question 9

Q

DataFrame:

see the first 10 rows quickly

Answer

A

my_data.head(10)

Question 10

Q

DataFrame:

see the last 8 rows quickly

Answer

A

my_data.tail(8)

Question 11

Q

DataFrame:

see the column names, their types and count quickly

Answer

A

my_data.info()

Question 12

Q

DataFrame:

assign a value to some element in DataFrame

Answer

A

my_data.iloc[5,10] = 29

Question 13

Q

DataFrame:
assign NaN to every 3rd row in the last column.
Which rows will be affected?

Answer

A

import numpy as np
my_data.iloc[::3,-1] = np.nan

nan
unchanged
unchanged
nan
unchanged
unchanged
nan
...

Question 14

Q

DataFrame:

transform DataFrame to numpy array

Answer

A

my_data.values

Question 15

Q

DataFrame:

get the index column

Answer

A

my_data.index

Question 16

Q

DataFrame:

Create a new column with some values

Answer

A

df[‘new_col’] = 0

Question 17

Q

DataFrame:

Assign new names for the columns

Answer

A

df.columns = [‘name’,’surname’,’age’]

Question 18

Q

DataFrame: 
Fully define pandas  csv import with 
1) custom column names, 
2) what multiple values indicate invalid data, 
3) what symbol separates values, 
4) transform a column to datetime,
5) how not to show first few rows.

Answer

A

path = /folder/file.csv
col_names = [‘name’, ‘surname’, ‘age’]
pd.read_csv(path, header = none, names = col_names, na_values = ‘-1’)

Question 19

Q

DataFrame:

If csv has year, month, day in separate columns, how to read_csv so tha it combines the 3 columns in one.

Answer

A

pd.read_csv(path, parse_dates = [[0 , 1 , 2]] )

Question 20

Q

DataFrame:

Get rid of one column in dataframe. How would you do it?

Answer

A

If the df has more columns than needed, define meaningul cols names

meaningful_columns = [‘name’, ‘surname’]

assign the columns to itself

df = df[meaningful_columns]

Question 21

Q

DataFrame:

Export the the dataframe to csv or excel

Answer

A

path = ‘my_file.csv’

df.to_csv(path)

path2 = ‘my_file2.xlsx

df.to_excel(path2)

Question 22

Q

DataFrame:

If data has a column with dates, that you want to have as index how would you import that csv

Answer

A

pd.read_csv(path, index_col = ‘dates’, parse_dates = True)

Question 23

Q

DataFrame:

Have the name of the plot lines on the plot.

Answer

A

df[‘open’].plot(legend = True)
df[‘close’].plot(legend = True)
plt.show()

Question 24

Q

DataFrame:

Plot specific columns (not the index).

Answer

A

df. plot( x = ‘Month’, y = [‘salary’, ‘overhead’]

plt. show()

Question 25

Q

DataFrame:

Create a scatter plot with different sizes of dots.

Answer

A

df. plot(kind = ‘scatter’, x = ‘year’, y = ‘age’, s = df[‘size’])
plt. show()

Question 26

Q

DataFrame:

Plot two columns separately. Create a box plot.

Answer

A

df[‘first’, ‘second’].plot(kind = ‘box’, subplots = True)

plt.show()

Question 27

Q

DataFrame:
Create a CDF and PDF plots in two rows. Scale the vertical axis. Change the horizontal division - make it finer. State the horizontal from … to… values.

Answer

A

fig, axes = plt.subplots(nrows=2, ncols=1)

Plot the PDF

df. fraction.plot(ax=axes[0], kind=’hist’, normed = True, bins = 30, range=(0,.3))
plt. show()

Plot the CDF

df. fraction.plot(ax = axes[1], kind = ‘hist’, normed=True, cumulative = True, bins = 30, range=(0,.3))
plt. show()

Question 28

Q

DataFrame:

Get the statistical information about the dataset.

Answer

A

df.describe()

Question 29

Q

DataFrame:

How many non nul entries in a DataFrame column?

Answer

A

df[‘my_col’].count()

Question 30

Q

DataFrame:

Get the mean of a DataFrame column. What does it do to null values?

Answer

A

df[‘my_col’].mean()

Question 31

Q

DataFrame:

Get the standart deviation of a DataFrame column. What does it do to null values?

Answer

A

df[‘my_col’].std()

Question 32

Q

DataFrame:

What is a median? How to get it for a DataFrame column?

Answer

A

df[‘my_col’].median()

Question 33

Q

DataFrame: 
First quartile : 25%
Second quartile : ....?
Third quartile : ....?
Get the value for 20 %  percentile.

Answer

A

df.quantile(0.2)

Question 34

Q

DataFrame:

Get the interquantile range between 0.2 and 0.8

Answer

A

df.quantile([0.2, 0.8])

Question 35

Q

DataFrame:

Get the minimum and maximum values.

Answer

A

df. min()

df. max()

Question 36

Q

DataFrame:
Get the mean value calculated over
1) each row
2) each column

Answer

A

#mean calculated for each row
df.mean(axis = 1)

#mean calculated for each column
df.mean(axis = 0)

Question 37

Q

DataFrame:

What information can you get with df[‘species’].describe() Species column has categorical values.

Answer

A

df['species'].describe()
give 
count
top
unique
frequency - occurrences of top

Question 38

Q

DataFrame:

Get list of values in a column without repeating.

Answer

A

df[‘species’].unique()

Question 39

Q

DataFrame:

Add transparency to a plot

Answer

A

df. plot(alpha = 0.8)

plt. show()

Question 40

Q

DataFrame:
Create two separate DataFrames from one by subsetting- one DF has rows about ‘Dog’ and the other one about ‘Cat’. Main DF has a column animal.

Answer

A

dog_indices = df['Animal'] == 'Dog'
cat_indices = df['Animal'] == 'Cat'
dog_df = df.loc[dog_indice]
cat_df = df.loc[cat_indice]

Question 41

Q

DataFrame:
Df has a index column with datetime.
1) Get rows in a specific datetime range from a specific column.
2) Get rows on a specific day from a specific column.

Answer

A

df. loc[‘2019-02-25’:’2019-03-25’, ‘product’]

df. loc[‘2019-02-25’, ‘product’]

Question 42

Q

DataFrame:
Df has a index column with datetime.
1) Get rows in a specific datetime range from a specific column.
2) Get rows on a specific day from a specific column.

Answer

A

df. loc[‘2019-02-25’:’2019-03-25’, ‘product’]

df. loc[‘2019-02-25’, ‘product’]

Question 43

Q

DataFrame:

Transform a list of datetime strings to df datetime

Answer

A

string = [‘2019-02-25’,’2019-03-25’]

pd.to_datetime(string)

Question 44

Q

DataFrame:
Have a dataframe df1 adapt to a new datetime index from another dataframe df2.

What will be the result if df1 doesn’t have data on a dates in df2?

Answer

A

df.reindex(df2.index)

Question 45

Q

DataFrame:
Have a dataframe df1 adapt to a new datetime index from another dataframe df2. If df1 doesn’t have data on all df2 dates, have the empty places be the nearest preceeding entry in df1.

Answer

A

df.reindex(df2.index, method = ‘ffill’)

Question 46

Q

DataFrame:
Have a dataframe df1 adapt to a new datetime index from another dataframe df2. If df1 doesn’t have data on all df2 dates, have the empty places be the nearest next entry in df1.

Answer

A

df.reindex(df2.index, method = ‘bfill’)

Question 47

Q

DataFrame:
List contains this:
time_list = [‘20100101 00:00’, ‘20100101 01:00’, ‘20100101 02:00’]

How to transform it to datetime?

Answer

A

pd.to_datetime(time_list, format = ‘%Y-%m-%d %H:%M’)

Question 48

Q

DataFrame:
We have a dataframe df1 that has hourly entries.

Create a df2 that show daily mean, sum, max, etc.
How to get second, minute, hourly, daily, weekly, monthly, yearly calculations?

How do you get calculations over say 3 hour periods?

Answer

A

df2 = df1.resample(‘D’).mean()

Seconds S
mins T or min
hours H
days D
weeks W
Months M
Yearly A

To get 3 hour periods use 3H.

Question 49

Q

DataFrame:

How to get the maximum value of mean values in a month if the df has daily entries?

Answer

A

df.resample(‘M’).mean().max()

Question 50

Q

DataFrame:

Get the rolling mean values over defined period of time.

Answer

A

df.rolling(window = 24).mean()

Question 51

Q

DataFrame:

read_csv and specify which column should be transformed to datetime objects.

Answer

A

pd.read_csv(path, parse_dates = [‘my dates’])

Question 52

Q

DataFrame:

Create a df series that has it’s text in upper or lower case.

Answer

A

df[‘company’].str.upper()

Question 53

Q

DataFrame:

Create a df series that has True False for rows that contain a word ‘ware’.

Answer

A

df[‘company’].str.contains(‘ware’)

Question 54

Q

DataFrame:

Count the number of rows that has the letter combination ‘ware’ in a column.

Answer

A

df[‘product’].str.contrains(‘ware’).sum()

the first par return series of True / False. True + True = 2 etc.

Question 55

Q

DataFrame:

Extract the value of hours from df datetime column.

Answer

A

df[‘my date’].dt.hour

Question 56

Q

DataFrame:
df has the population of the world for every decade. Upsample the df to every year and fill in the blanks with linearly changing values.

Answer

A

df.resample(‘A’).first().interpolate(‘linear’)

Question 57

Q

DataFrame:

Strip the whitespaces from a df header.

Answer

A

df.columns = df.columns.str.strip()

Question 58

Q

DataFrame:

Two columns contain date and time. Create series to combine both and have the type of series as Datetime.

Answer

A

pd.to_datetime( la[‘Date (MM/DD/YYYY)’] + ‘ ‘ + la[Wheels-off Time’] )

Question 59

Q

DataFrame:
How to define line style in plots?
What are the options?

Answer

A

df.plot(style = ‘k.-‘)

colors:
k - black
b - blue
c - cyan
r - red

marker type:
. dots
o circles
s squares
* stars

line type:
: dotted
- solid

Question 60

Q

DataFrame:

Have everything bellow a plot line in a color.

Answer

A

df.plot(kind = ‘area’)

Question 61

Q

DataFrame:
You have a df with columns:
index date temperature

make the date column become the index column.

Answer

A

df.set_index(‘date’, inplace = True)

Question 62

Q

DataFrame:
You have a string, that contains all names for the columns.

String looks like this: cols = ‘Wban,date,Time,StationType,sky_condition,sky_conditionFlag’

How would you get this become the header of a datafram?

Answer

A

df_cols = cols.split(',')
df.columns = df_cols

Question 63

Q

DataFrame:

You have a list of column names that you want to remove from a df. How to do it?

Answer

A

df_dropped = df.drop(list_to_drop, axis = 1)

Question 64

Q

DataFrame:

Convert a column to a different data type - a string

Answer

A

df_dropped[‘date’] = df_dropped[‘date’].astype(str)

Answer 65

A

my_series.reset_index()

Answer 66

A

df.corr()

Answer 67

A

df2 = df1.loc[df1[‘column name’] > 10]

Answer 68

A

names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

my_dict = {‘country’: names, ‘drives_right’: dr, ‘cars_per_cap’: cpc}

cars = pd.DataFrame(my_dict)

Answer 69

A

1) df.loc[[‘ru’]]

2) df.iloc[[2]]

Answer 70

A

1a) df.loc[[‘ru’,’de’],[‘country’,’capital’]]
1b) df. loc[:,[‘country’,’capital’]]

2a) df.iloc[[1,2],[2,3]]
2b) df.iloc[:,[2.3]]

Answer 71

A

for lab, row in cars.iterrows() :

print(lab + ‘: ‘ + str(row[‘cars_per_cap’]))

Answer 72

A

cars[‘COUNTRY’] = cars[‘country’].apply(str.upper)

Brainscape's Knowledge GenomeTM

Pandas Flashcards

Brainscape's Knowledge Genome^TM