Pandas Flashcards

1
Q

DataFrame:

Create a dataframe from scratch

A
First create lists
age = [25,20,26,30]
height = [150,160,180,170]
names = ['Mārtiņš','Aiga','Kristiāns','Valters']
column_names = ['Age','Height','Names']

list_cols = [name, age, height]

zipped = list(zip(column_names,list_cols))

data = dict(zipped)

df = pd.DataFrame(data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DataFrame:

Add a new column with values 0

A

df[‘Salary’] = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

DataFrame:

Rename columns

A

df.columns = [‘name’,’school’,age’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DataFrame:

Rename indexes

A

df.index = [‘A’, ‘B’, ‘C’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DataFrame:

Create a dictionary from lists

A
list1 = [  ]
list2 = [  ]
column_names = [  ]

columns = [list1, list2]

created_tuples = list(zip(column_names,columns]

created_dict = dict(created_tuples)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Get how many rows and columns the DataFrame has

A

my_dataframe.shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Get the column names

A

my_dataframe.columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
DataFrame: 
Slice the dataframe.
1) first 5 rows
2) last 5 rows
3) columns 3 to 5 included
4) each 3rd row
A

my_data.iloc[:6,:]
my_data.iloc[-5:,:]
my_data.iloc[:,3:6]
my_data.iloc[::3,:]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

DataFrame:

see the first 10 rows quickly

A

my_data.head(10)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

DataFrame:

see the last 8 rows quickly

A

my_data.tail(8)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DataFrame:

see the column names, their types and count quickly

A

my_data.info()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DataFrame:

assign a value to some element in DataFrame

A

my_data.iloc[5,10] = 29

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DataFrame:
assign NaN to every 3rd row in the last column.
Which rows will be affected?

A

import numpy as np
my_data.iloc[::3,-1] = np.nan

nan
unchanged
unchanged
nan
unchanged
unchanged
nan
...
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DataFrame:

transform DataFrame to numpy array

A

my_data.values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DataFrame:

get the index column

A

my_data.index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DataFrame:

Create a new column with some values

A

df[‘new_col’] = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

DataFrame:

Assign new names for the columns

A

df.columns = [‘name’,’surname’,’age’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
DataFrame: 
Fully define pandas  csv import with 
1) custom column names, 
2) what multiple values indicate invalid data, 
3) what symbol separates values, 
4) transform a column to datetime,
5) how not to show first few rows.
A

path = /folder/file.csv
col_names = [‘name’, ‘surname’, ‘age’]
pd.read_csv(path, header = none, names = col_names, na_values = ‘-1’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

DataFrame:

If csv has year, month, day in separate columns, how to read_csv so tha it combines the 3 columns in one.

A

pd.read_csv(path, parse_dates = [[0 , 1 , 2]] )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

DataFrame:

Get rid of one column in dataframe. How would you do it?

A

If the df has more columns than needed, define meaningul cols names

meaningful_columns = [‘name’, ‘surname’]

assign the columns to itself

df = df[meaningful_columns]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

DataFrame:

Export the the dataframe to csv or excel

A

path = ‘my_file.csv’

df.to_csv(path)

path2 = ‘my_file2.xlsx

df.to_excel(path2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

DataFrame:

If data has a column with dates, that you want to have as index how would you import that csv

A

pd.read_csv(path, index_col = ‘dates’, parse_dates = True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

DataFrame:

Have the name of the plot lines on the plot.

A

df[‘open’].plot(legend = True)
df[‘close’].plot(legend = True)
plt.show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

DataFrame:

Plot specific columns (not the index).

A

df. plot( x = ‘Month’, y = [‘salary’, ‘overhead’]

plt. show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

DataFrame:

Create a scatter plot with different sizes of dots.

A

df. plot(kind = ‘scatter’, x = ‘year’, y = ‘age’, s = df[‘size’])
plt. show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

DataFrame:

Plot two columns separately. Create a box plot.

A

df[‘first’, ‘second’].plot(kind = ‘box’, subplots = True)

plt.show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

DataFrame:
Create a CDF and PDF plots in two rows. Scale the vertical axis. Change the horizontal division - make it finer. State the horizontal from … to… values.

A

fig, axes = plt.subplots(nrows=2, ncols=1)

Plot the PDF

df. fraction.plot(ax=axes[0], kind=’hist’, normed = True, bins = 30, range=(0,.3))
plt. show()

Plot the CDF

df. fraction.plot(ax = axes[1], kind = ‘hist’, normed=True, cumulative = True, bins = 30, range=(0,.3))
plt. show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

DataFrame:

Get the statistical information about the dataset.

A

df.describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

DataFrame:

How many non nul entries in a DataFrame column?

A

df[‘my_col’].count()

30
Q

DataFrame:

Get the mean of a DataFrame column. What does it do to null values?

A

df[‘my_col’].mean()

31
Q

DataFrame:

Get the standart deviation of a DataFrame column. What does it do to null values?

A

df[‘my_col’].std()

32
Q

DataFrame:

What is a median? How to get it for a DataFrame column?

A

df[‘my_col’].median()

33
Q
DataFrame: 
First quartile : 25%
Second quartile : ....?
Third quartile : ....?
Get the value for 20 %  percentile.
A

df.quantile(0.2)

34
Q

DataFrame:

Get the interquantile range between 0.2 and 0.8

A

df.quantile([0.2, 0.8])

35
Q

DataFrame:

Get the minimum and maximum values.

A

df. min()

df. max()

36
Q

DataFrame:
Get the mean value calculated over
1) each row
2) each column

A
#mean calculated for each row
df.mean(axis = 1)
#mean calculated for each column
df.mean(axis = 0)
37
Q

DataFrame:

What information can you get with df[‘species’].describe() Species column has categorical values.

A
df['species'].describe()
give 
count
top
unique
frequency - occurrences of top
38
Q

DataFrame:

Get list of values in a column without repeating.

A

df[‘species’].unique()

39
Q

DataFrame:

Add transparency to a plot

A

df. plot(alpha = 0.8)

plt. show()

40
Q

DataFrame:
Create two separate DataFrames from one by subsetting- one DF has rows about ‘Dog’ and the other one about ‘Cat’. Main DF has a column animal.

A
dog_indices = df['Animal'] == 'Dog'
cat_indices = df['Animal'] == 'Cat'
dog_df = df.loc[dog_indice]
cat_df = df.loc[cat_indice]
41
Q

DataFrame:
Df has a index column with datetime.
1) Get rows in a specific datetime range from a specific column.
2) Get rows on a specific day from a specific column.

A

df. loc[‘2019-02-25’:’2019-03-25’, ‘product’]

df. loc[‘2019-02-25’, ‘product’]

42
Q

DataFrame:
Df has a index column with datetime.
1) Get rows in a specific datetime range from a specific column.
2) Get rows on a specific day from a specific column.

A

df. loc[‘2019-02-25’:’2019-03-25’, ‘product’]

df. loc[‘2019-02-25’, ‘product’]

43
Q

DataFrame:

Transform a list of datetime strings to df datetime

A

string = [‘2019-02-25’,’2019-03-25’]

pd.to_datetime(string)

44
Q

DataFrame:
Have a dataframe df1 adapt to a new datetime index from another dataframe df2.

What will be the result if df1 doesn’t have data on a dates in df2?

A

df.reindex(df2.index)

45
Q

DataFrame:
Have a dataframe df1 adapt to a new datetime index from another dataframe df2. If df1 doesn’t have data on all df2 dates, have the empty places be the nearest preceeding entry in df1.

A

df.reindex(df2.index, method = ‘ffill’)

46
Q

DataFrame:
Have a dataframe df1 adapt to a new datetime index from another dataframe df2. If df1 doesn’t have data on all df2 dates, have the empty places be the nearest next entry in df1.

A

df.reindex(df2.index, method = ‘bfill’)

47
Q

DataFrame:
List contains this:
time_list = [‘20100101 00:00’, ‘20100101 01:00’, ‘20100101 02:00’]

How to transform it to datetime?

A

pd.to_datetime(time_list, format = ‘%Y-%m-%d %H:%M’)

48
Q

DataFrame:
We have a dataframe df1 that has hourly entries.

Create a df2 that show daily mean, sum, max, etc.
How to get second, minute, hourly, daily, weekly, monthly, yearly calculations?

How do you get calculations over say 3 hour periods?

A

df2 = df1.resample(‘D’).mean()

Seconds S
mins T or min
hours H
days D
weeks W
Months M
Yearly A

To get 3 hour periods use 3H.

49
Q

DataFrame:

How to get the maximum value of mean values in a month if the df has daily entries?

A

df.resample(‘M’).mean().max()

50
Q

DataFrame:

Get the rolling mean values over defined period of time.

A

df.rolling(window = 24).mean()

51
Q

DataFrame:

read_csv and specify which column should be transformed to datetime objects.

A

pd.read_csv(path, parse_dates = [‘my dates’])

52
Q

DataFrame:

Create a df series that has it’s text in upper or lower case.

A

df[‘company’].str.upper()

53
Q

DataFrame:

Create a df series that has True False for rows that contain a word ‘ware’.

A

df[‘company’].str.contains(‘ware’)

54
Q

DataFrame:

Count the number of rows that has the letter combination ‘ware’ in a column.

A

df[‘product’].str.contrains(‘ware’).sum()

the first par return series of True / False. True + True = 2 etc.

55
Q

DataFrame:

Extract the value of hours from df datetime column.

A

df[‘my date’].dt.hour

56
Q

DataFrame:
df has the population of the world for every decade. Upsample the df to every year and fill in the blanks with linearly changing values.

A

df.resample(‘A’).first().interpolate(‘linear’)

57
Q

DataFrame:

Strip the whitespaces from a df header.

A

df.columns = df.columns.str.strip()

58
Q

DataFrame:

Two columns contain date and time. Create series to combine both and have the type of series as Datetime.

A

pd.to_datetime( la[‘Date (MM/DD/YYYY)’] + ‘ ‘ + la[Wheels-off Time’] )

59
Q

DataFrame:
How to define line style in plots?
What are the options?

A

df.plot(style = ‘k.-‘)

colors:
k - black
b - blue
c - cyan
r - red
marker type:
. dots
o circles
s squares
* stars

line type:
: dotted
- solid

60
Q

DataFrame:

Have everything bellow a plot line in a color.

A

df.plot(kind = ‘area’)

61
Q

DataFrame:
You have a df with columns:
index date temperature

make the date column become the index column.

A

df.set_index(‘date’, inplace = True)

62
Q

DataFrame:
You have a string, that contains all names for the columns.

String looks like this: cols = ‘Wban,date,Time,StationType,sky_condition,sky_conditionFlag’

How would you get this become the header of a datafram?

A
df_cols = cols.split(',')
df.columns = df_cols
63
Q

DataFrame:

You have a list of column names that you want to remove from a df. How to do it?

A

df_dropped = df.drop(list_to_drop, axis = 1)

64
Q

DataFrame:

Convert a column to a different data type - a string

A

df_dropped[‘date’] = df_dropped[‘date’].astype(str)

65
Q

DataFrame:

If you want to have df indexes to be regular 1, 2 , 3 … etc.

A

my_series.reset_index()

66
Q

DataFrame:

Find if there is a relationship between columns!

A

df.corr()

67
Q

DataFrame:

extract df2 from df1 such that df2 contains only the rows where df1 column’s values are above, say, 10.

A

df2 = df1.loc[df1[‘column name’] > 10]

68
Q

Create a DataFrame with 3 columns:

names = [‘United States’, ‘Australia’, ‘Japan’, ‘India’, ‘Russia’, ‘Morocco’, ‘Egypt’]
dr = [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

A
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

my_dict = {‘country’: names, ‘drives_right’: dr, ‘cars_per_cap’: cpc}

cars = pd.DataFrame(my_dict)

69
Q

Extract a row from dataframe in index column as dataframe

1) by a label
2) by number

A

1) df.loc[[‘ru’]]

2) df.iloc[[2]]

70
Q

Extract multiple rows and columns by their column and row labels.

1a) specific columns and rows by label
1b) all rows and specific columns by label

1a) specific columns and rows by number
1b) all rows and specific columns by number

A

1a) df.loc[[‘ru’,’de’],[‘country’,’capital’]]
1b) df. loc[:,[‘country’,’capital’]]

2a) df.iloc[[1,2],[2,3]]
2b) df.iloc[:,[2.3]]

71
Q

given:
dataframe with index | country | cars_per_cap

output:
for each row print:
“country: cars_per_cap”

A

for lab, row in cars.iterrows() :

print(lab + ‘: ‘ + str(row[‘cars_per_cap’]))

72
Q

given:
dataframe with index | country

output:
append the dataframe with a new column COUNTRY that has country in upper case

A

cars[‘COUNTRY’] = cars[‘country’].apply(str.upper)