Pandas Flashcards

1
Q

Select multiple columns of pandas dataframe?

A

df[[‘column 1’, ‘column 2’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three ways to index data in pandas?

A

df[ ]

df. loc[ ]
df. iloc[ ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you select one column from a dataframe as a series?

A

df[‘food’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you selection one column from a dataframe as a dataframe?

A

df[[‘food’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you select multiple columns from a dataframe?

A

df[[‘color’, ‘food’, ‘score’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can you change the column order when selecting columns?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When selecting a column from a dataframe as a series what happens to the column label?

A

Becomes the name of the series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you select rows and columns using .loc?

A

df.loc[row_selection, column_selection]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you select multiple rows and columns using .loc?

A

df.loc[[‘Dean’, ‘Cornelia’], [‘age’, ‘state’, ‘score’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Do you need apotosphes when using list names for selecting rows / columns?

A

no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Does .loc include the last item?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is iloc index on?

A

Integer index location

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does df.iloc[3] find?

A

The 4th row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

To select multiple rows using integers .iloc what do you have to use?

A

A list df.iloc[[5, 2, 4]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you slice rows using .iloc?

A

df.iloc[3:5] (no double bracket required)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Selecting rows and columns using iloc and integreers?

A

f.iloc[[2,3], [0, 4]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you select rows using a slice and columns using integers using iloc?

A

df.iloc[3:6, [1, 4]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What should you use the indexing operator for?

A
Columns
1) A string - returns a series
2) A list of strings - returns a dataframe
Rows
3) A slice
4) Booleans
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Can you use the indexing operator to select both rows and columns?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Can you use the indexing operator to select rows?

A

Yes but don’t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you set the index after reading in the csv

A

df.set_index()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What can you use dot notation for?

A

Selecting a single column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What 2 methods can you use for boolean selection?

A

[] and .loc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What method can you use to test multiple conditions in the same column?

A

isin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What method can find all missing values in a column?

A

isnull

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What operators are used in pandas?

A

And (&), or (|), and not (~)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

When should you use [] and .loc when using boolean queries

A

[] for just rows, .loc when both rows and columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

When slicing using .loc do you need to put the slice in []?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the difference between df[2,:] and df[[2],:]

A

The second returns a dataframe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

When grouping data, how can you sort by value?

A

df.sort_values()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

When grouping data, how can you sort by index?

A

df.sort_index()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

When grouping data and sorting by value, how do you sort by decreasing value?

A

df.sort_values(by = ‘col name’, ascending =False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How do you sort by more than one column at a time?

A

df.sort_values(by=[‘col1’ , ‘col2’ ])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How do you use more than one calculation on a column?

A

df.groupby(‘name’).col.agg([min,max])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How do you sort by more than one column?

A

df.sort_values(by = [‘col1’,’col2’])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What does size() do?

A

For Series, returns the number of rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is the difference between size and count?

A

size includes NAN, count does not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is the format for dtype?

A

No brackets at the end

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

How do you change the type of data?

A

astype()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

How can you add up how many rows of a boolean query are true?

A

.sum()

41
Q

How can you count by category?

A

.value_counts()

42
Q

What method can you change a column name?

A

df.rename()

43
Q

How do you rename column names?

A

df.rename(columns = {‘original’: ‘new’, ‘original’,’new’}

44
Q

What do you use when you want to add more rows to data when it has the same column names?

A

pd.concat([df1,df2])

45
Q

When using concat do you pass it the df names or a list?

A

a list

46
Q

What is shape?

A

An attribute

47
Q

shape() or shape

A

shape

48
Q

What is dtype

A

An attribute

49
Q

dtype() or dtype

A

dtype

50
Q

How do you tell the data type of one column?

A

df[‘col’].dtype

51
Q

How do you get the count of the number of columns?

A

df.shape[1]

52
Q

How do you get the data types of a dataframe?

A

df.dtypes

53
Q

How do you work out how many unique instances there are?

A

df[‘col’].nunique()

54
Q

How do you get summary statistics for a dataframe?

A

df.describe()

55
Q

With the default options, what does describe give you?

A

Just the numerical columns

56
Q

What do you need to include in describe to give you a summary of all the columns?

A

include = all

57
Q

What is the code to import a CSV?

A

pd.read_csv(file, sep = ‘x’)

58
Q

How do you import a tab separated file?

A

sep = ‘\t’

59
Q

How do you divide elementwise in a data frame?

A

.div

60
Q

What does passing {} to agg do?

A

Selects a column and does that action to that column

61
Q

What does unstack do?

A

Pivot second level index to columns

62
Q

How do you pivot second level index?

A

.unstack()

63
Q

How do you convert data to a time / date?

A

pd.to_datetime

64
Q

How do you remove a column of data?

A

drop

65
Q

How does drop work?

A

You have to put the column in brackets rather than selection it before the method

66
Q

How do you remove blank columns?

A

dropna but make sure axis = 1

67
Q

How do you join 2 dataframes together?

A

pd.concat([frames])

68
Q

What can you do when joining 2 dataframes together to work out which came from each?

A

keys = [x,y]

69
Q

How do you generate random numbers?

A

np.random.randint

70
Q

What is an alternative to pd.concat?

A

pd.append()

71
Q

How do you identify where the data came from when concatenating?

A

pd.concat([frames], keys = [‘x’,’y’]

72
Q

What is an alternative to keys when merging 2 dataframes?

A

Instead of pd.concat([frames]), pass a dictionary pd.concat({‘x’ : data1, ‘y’: data2})

73
Q

By default what is the way that pd.concat joins the data?

A

Adds it to the bottom?

74
Q

How do you make pd.concat add data as columns?

A

pd.concat([frames], axis = 1)

75
Q

What is the default behaviour of merge?

A

It selects only the rows with the IDs that match between 2 DFs

76
Q

How do you get full outer join?

A

pd.merge(df1,df2, on = ‘x’, how = ‘outer’)

77
Q

When you merge different dataframes how to you tell which came from each column?

A

By default pandas adds _x / _y but you can change this using suffixes argumnet

78
Q

What does a right join do?

A

takes all the entries from the ‘right’ table and returns matching entries from the ‘left’

79
Q

What are the different options for merging data?

A

concat, merge, join

80
Q

What are the defaults for concat, merge and join?

A
concat = row wise, outer
merge = column wise, inner
join = column wise, left
81
Q

How do concat and merge take DF?

A

concat([frames])

merge(df1,df2)

82
Q

Why choose merge over concat?

A

concat has to match the elements along the axis, merge you specify how to match the data

83
Q

What datatypes can pd.concat take?

A

SEries or DF, i.e. not numpy arrays

84
Q

How do you read space separated data?

A

sep =’ \s+’

85
Q

How do you read a string as a datetime?

A

from dateutil.parser import parse

parse(‘January 31, 2010’)

86
Q

What are the different classes under datetime?

A

Datetime, date, time, timedelta

87
Q

In datetime format what is the difference between %Y and %y

A

% Y is 2020 and %y is 20

88
Q

How do you convert a date to a string?

A

strftime

89
Q

How do you get the weekend

A

.weekday()

90
Q

When you pass a dictionary to pd.DataFrame does it retain the original order of columns?

A

No - alphabetical as dictionaries are inherently unordered

91
Q

What is the syntax to create a dictionary

A

{‘col’: [‘x’,’y’,’z’],

‘col2’: [1,2,3]}

92
Q

How do you order the columns in pd.DataFrame?

A

pd.DataFrame(data, columns = [])

93
Q

pd.DataFrame or pd.Dataframe?

A

pd.DataFrame

94
Q

How do you set an index when using pd.DataFrame?

A

pd.DataFrame(data, columns = [], index = [])

You cannot use a column name as an index - it must be the same length. But you can follow this with set_index()

95
Q

How do you establish if there are any duplicate values in a series?

A

is_unique (without parentheses)

96
Q

Can you use is_unique on a df?

A

No

97
Q

is_unique or is_unique()

A

is_unique

98
Q

Does slicing .loc with label names include the last item or not?

A

It includes the last item