Pandas Flashcards
Primary data structures in pandas
Series: one dimensional labeled array that can hold any data type.
DataFrame: two dimensional labeled data structure.
Creating dataframes in pandas could be from?
Dictionary
Numpy array
Comma separated values (csv)
How do you import a CSV file using pd
df = pd read_csv(‘/file_path/file_name.csv)
What is an attribute
A value associated with an object or class that is referenced by name using dotted expression (like characteristics of the object)
What are methods?
A method is a function that is defined inside a class body and typically performs an action
Common DataFrame attribute;
Columns: column labels
Dtypes: data types in dataframe
iloc: access grp of rows & columns using integer based indexing
loc: accesses a group of rows and columns by labels or a boolean array
shape: returns a tuple representing the dimensionality of the DataFrame
values: returns a numpy representation of the dataframe
methods in dataframes;
apply: applies a function over an axis of the dataframe
copy: makes a copy of the dataframe’s indices and data
describe: returns descriptive statistics of the dataframe
drop: drops specified labels from rows or columns
groupby: splits the dataframe, applies a function, and combines the results
head(n=5): returns the first n rows of a data frame( default = 5)
DataFrame methods 2;
info: returns a concise summary of the dataframe
isna: returns a same sized boolean dataframe indicating whether each value is null(can also use isnull() as an alias)
sort values: sorts by the values across a given axis
value counts: returns a series containing counts of unique rows in dataframe
where: replaces values in the dataframe where a given condition is false
Selecting rows from a dataframe
print(df.loc[‘row_1]): this will return a
series object.
print(df.loc[[‘row_1]]): returns a dataframe object
How do you change date object to datetime?
df[date] = pd.to_datetime(df[date])
All columns in quotations marks
Create year column from datetime column?
df[year] = df[date].Dr.year
All columns in quotations
How do you take a sample from a data frame?
sample = df.sample(n=50, random_stae = 42)
Using the unicorn df, create years till unicorn column?
sample[years_till_unicorn] = sample[Year Joined] - sample[Year Founded]
Group data(unicorn) by industry, for each industry get the max value in the year till unicorn column
grouped = (sample[[Industry, years_till_unicorn]]
.groupby(Industry)
.max()
.sort_values(by = years_till_unicorn)
)
grouped