pandas Flashcards

1
Q

Importing Data

From a CSV file

A

pd.read_csv(filename)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Importing Data

From a delimited text file (like TSV)

A

pd.read_table(filename)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Importing Data

From an Excel file

A

pd.read_excel(filename)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Importing Data

Read from a SQL table/database

A

pd.read_sql(query, connection_object)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Importing Data

Read from a JSON formatted string, URL or file.

A

pd.read_json(json_string)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Importing Data

Parses an html URL, string or file and extracts tables to a list of dataframes

A

pd.read_html(url)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Importing Data

Takes the contents of your clipboard and passes it to read_table()

A

pd.read_clipboard()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Importing Data

From a dict, keys for columns names, values for data as lists

A

pd.DataFrame(dict)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Exporting Data

Write to a CSV file

A

df.to_csv(filename)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Exporting Data

Write to an Excel file

A

df.to_excel(filename)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Exporting Data

Write to a SQL table

A

df.to_sql(table_name, connection_object)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Exporting Data

Write to a file in JSON format

A

df.to_json(filename)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Create Test Objects

Create a series from an iterable my_list

A

pd.Series(my_list)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Create Test Objects

5 columns and 20 rows of random floats

A

pd.DataFrame(np.random.rand(20,5))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Create Test Objects

Add a date index

A

df.index = pd.date_range(‘1900/1/30’, periods=df.shape[0])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Viewing/Inspecting Data

First n rows of the DataFrame

A

df.head(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Last n rows of the DataFrame

A

df.tail(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Number of rows and columns

A

df.shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Index, Datatype and Memory information

A

df.info()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Summary statistics for numerical columns

A

df.describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

View unique values and counts

A

s.value_counts(dropna=False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Unique values and counts for all columns

A

df.apply(pd.Series.value_counts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Returns column with label col as Series

A

df[col]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Returns columns as a new DataFrame

A

df[[col1, col2]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Selection by position

A

s.iloc[0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Selection by index

A

s.loc[‘index_one’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

First row

A

df.iloc[0,:]

28
Q

First element of first column

A

df.iloc[0,0]

29
Q

Rename columns

A

df.columns = [‘a’,’b’,’c’]

30
Q

Checks for null Values, Returns Boolean Arrray

A

pd.isnull()

31
Q

Opposite of pd.isnull()

A

pd.notnull()

32
Q

Drop all rows that contain null values

A

df.dropna()

33
Q

Drop all columns that contain null values

A

df.dropna(axis=1)

34
Q

Drop all rows have have less than n non null values

A

df.dropna(axis=1,thresh=n)

35
Q

Replace all null values with x

A

df.fillna(x)

36
Q

Replace all null values with the mean (mean can be replaced with almost any function from the statistics section)

A

s.fillna(s.mean())

37
Q

Convert the datatype of the series to float

A

s.astype(float)

38
Q

Replace all values equal to 1 with ‘one’

A

s.replace(1,’one’)

39
Q

Replace all 1 with ‘one’ and 3 with ‘three’

A

s.replace([1,3],[‘one’,’three’])

40
Q

Mass renaming of columns

A

df.rename(columns=lambda x: x + 1)

41
Q

Selective renaming

A

df.rename(columns={‘old_name’: ‘new_ name’})

42
Q

Change the index

A

df.set_index(‘column_one’)

43
Q

Mass renaming of index

A

df.rename(index=lambda x: x + 1)

44
Q

Rows where the column col is greater than 0.5

A

df[df[col] > 0.5]

45
Q

Rows where 0.7 > col > 0.5

A

df[(df[col] > 0.5) & (df[col] < 0.7)]

46
Q

Sort values by col1 in ascending order

A

df.sort_values(col1)

47
Q

Sort values by col2 in descending order

A

df.sort_values(col2,ascending=False)

48
Q

Sort values by col1 in ascending order then col2 in descending order

A

df.sort_values([col1,col2],ascending=[True,False])

49
Q

Returns a groupby object for values from one column

A

df.groupby(col)

50
Q

Returns groupby object for values from multiple columns

A

df.groupby([col1,col2])

51
Q

Returns the mean of the values in col2, grouped by the values in col1 (mean can be replaced with almost any function from the statistics section)

A

df.groupby(col1)[col2]

52
Q

Create a pivot table that groups by col1 and calculates the mean of col2 and col3
df.groupby(col1).agg(np.mean) | Find the average across all columns for every unique col1 group

A

df.pivot_table(index=col1,values=[col2,col3],aggfunc=mean)

53
Q

Find the average across all columns for every unique col1 group

A

df.groupby(col1).agg(np.mean)

54
Q

Apply the function np.mean() across each column

A

df.apply(np.mean)

55
Q

Apply the function np.max() across each row

A

nf.apply(np.max,axis=1)

56
Q

Add the rows in df1 to the end of df2 (columns should be identical)

A

df1.append(df2)

57
Q

Add the columns in df1 to the end of df2 (rows should be identical)

A

pd.concat([df1, df2],axis=1)

58
Q

SQL-style join the columns in df1 with the columns on df2 where the rows for col have identical values. how can be one of ‘left’, ‘right’, ‘outer’, ‘inner’

A

df1.join(df2,on=col1,how=’inner’)

59
Q

Summary statistics for numerical columns

A

df.describe()

60
Q

Returns the mean of all columns

A

df.mean()

61
Q

Returns the correlation between columns in a DataFrame

A

df.corr()

62
Q

Returns the number of non-null values in each DataFrame column

A

df.count()

63
Q

Returns the highest value in each column

A

df.max()

64
Q

Returns the lowest value in each column

A

df.min()

65
Q

Returns the median of each column

A

df.median()

66
Q

Returns the standard deviation of each column

A

df.std()