Pandas Flashcards
Explain what a series is in Pandas?
A 1d indexed array object containing elements of one type.
How would you create a series from a list, dictionary? How do you specify the index?
pd. Series([‘a’,’b’,’c’], index=[1,11,111])
pd. Series({1 : ‘a’, 11 : ‘b’, 111 : ‘c’})
How do you get the values, index, shape and data type from a series?
s. values
s. index
s. shape
s. dtype
What is the underlying data structure of a Pandas Series?
numpy arrays
What is the difference between a DataFrame and a Series?
A dataframe contains multiple series.
Construct a DataFrame from a list, series, list of lists, list of dictionaries, dictionary of list values, dictionary of series
pd.DataFrame(…
[‘a’,’b’,’c’]
[[‘a’, 1, True], [‘b’, 2, False]]
[{‘a’ : 1, ‘b’ : 2}, {‘a’ : 11, ‘b’ : 22}]
{‘a’ : [1,2,3], ‘b’ : [True, True, False]}
{‘a’ : s1, ‘b’ : s2}
)
How do you get the index, columns, values, shape from a DataFrame?
df. index
df. columns
df. index
df. values
df. shape
How would you create a pandas index object?
pd.Index([1,2,3])
What is the difference between .loc and .iloc?
.loc[ : . : ] indexed by index and column values, slicing is inclusive
.iloc[ : , : ] indexed by position in df.index/df.columns, slicing is exclusive upper.
How do you do Boolean Indexing?
df[mask]
where mask is a series of bools with same shape as a df.column
How do you select just the desired columns in pandas?
df[list of cols]
What is a ufunc? Why should you use them?
Universal functions can be efficiently performed on arrays (index-aligned operations). Should be used where possible.
What is ‘broadcasting’ in pandas?
series + 3 is broadcast to series + pd.Series([3,3,…])
How do you apply a function element-wise to a series or DataFrame?
df.apply(func/lambda)
Give two ways to handle missing data, explain when you might use either
df. fillna(value, method=’bfill’/’ffill’)
df. fillna(df.interpolate()/df.mean())
df. dropna(axis=0 or 1)
Give two ways of combining DataFrames, how do you specify if they go side by side, or on top of each other?
pd.concat(list_of_dfs, axis = 0 (rows keep shape) or 1 (columns keep shape))
What arguments does the df.merge() function take?
df1, df2, left_on = col1 or left_index = True , right (same), how = ‘inner’/’left’…
Explain what is meant by Split-Apply-Combine?
This is the way Pandas goes about groupby. First split data into instances of column, apply an aggregation on each group to get a single row, recombine to get aggregated data.
How would you group the DataFrame based on a column and apply multiple aggregation functions to it?
df.groupby(column).agg({col1 : agg1, …})
How would you implement the ‘Having’ statement from SQL in Pandas?
df.groupby(…).filter(lambda x : bool)
How can we join aggregate statistics back to our DataFrame?
df. loc[‘new_col’, …] = df.groupby().agg().loc[…] or
df. groupby()[col].transform(agg)
How is a pivot table useful? How would you implement it?
Useful for displaynig two group by aggregations in one table. df.pivot_table(index=, columns=, values = , aggfunc=…)
How do you change the index of a DataFrame?
df.set_index(…, inplace=True)
How do you change the sampling frequency of the data?
df.resample(period).agg()
How would you create a 30 day rolling average of a DataFrame?
df[…].rolling(30).mean()
Give examples of other Time series methods
.shift(1)
.diff(1)