Pandas Flashcards
Review key Pandas concepts
What are the commands to load and save cvs files?
pd.read_csv(path) and pd.to_csv(path, index = False)
Name the three methods to combine DataFrames.
Pandas.concat([DataFrame1, DataFrame2], axis=0/1)
Pandas.merge([DataFrame1, DataFrame2], how=’outer’/’inner’/’right’/’left’, on=column_name)
DataFrame1.join(DataFrame2, how=’outer’/’inner’/’right’/’left’)
What is the key difference (other than syntax) between merge and join?
merge requires a column to merge.
join combines data based on index.
(note: you probably can force a column to be an index value when creating a DataFrame)
What does the inplace=True/False parameter do?
It forces the changes made to the DataFrame rather than returning a value and leaving the original untouched (if set to true; default is false)
when axis parameter is called, what is the x value and what is the y value?
x = 0
y = 1
Describe the difference between the and iloc methods.
loc requires index labels, meaning the names of rows and columns (which can be numbers, but don’t have to be). iloc requires the integer values of those indexes.
What would DataFrame.iloc[[1,2], [‘date’, ‘stock’]] return?
Index Error. The iloc method requires the integer values of the index labels, and will error if the labels are given.
Explain the groupby() method
DataFrame.groupby() function is used to group occurances of common values in a particular column and can further split the data of another column based on some criterial (like mean, median, etc.).
What method can you use to execute a custom function across the entirety of a DataFrame?
DataFrame[column_name].apply(function)
What does strftime stand for
String Format Time
Describe how to format a loc call.
Take the DataFrame with a ‘.’ behind it and then put your parameters in [] after the loc call. Within the [], there should be other [] with values. The first set of [] will have the row values listed, separated by a , between each value. the second [] will have the same, but with columns.
i.e. DataFrame.loc[[row1, row2, row3],[‘col1’,’col2’,’col3’]]
Write an example of a conditional DataFrame call
DataFrame.loc[DataFrame[‘column’]>x]
Execute the .value_counts() function on the ‘Name’ column of a DataFrame. Return the respective counts of each distinct value in relation to the whole set.
DataFrame[‘Name’].value_counts(normalize=True)
Extract month from the date column in the data DataFrame.
data[‘date’].dt.month
Explain what the .map() function does.
.map() function will map values produced by a function, dictionary, or series to their appropriate counterparts in the series the .map() is transforming.
import pandas as pd
Sample Series
s = pd.Series([‘apple’, ‘banana’, ‘cherry’, ‘date’])
Mapping dictionary
fruit_codes = {‘apple’: 1, ‘banana’: 2, ‘cherry’: 3}
Apply the mapping
coded_s = s.map(fruit_codes)
print(coded_s)
# Expected output
# 0 1.0
# 1 2.0
# 2 3.0
# 3 NaN
# dtype: float64