Pandas Flashcards

1
Q

Primary data structures in pandas

A

Series: one dimensional labeled array that can hold any data type.

DataFrame: two dimensional labeled data structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Creating dataframes in pandas could be from?

A

Dictionary

Numpy array

Comma separated values (csv)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you import a CSV file using pd

A

df = pd read_csv(‘/file_path/file_name.csv)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an attribute

A

A value associated with an object or class that is referenced by name using dotted expression (like characteristics of the object)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are methods?

A

A method is a function that is defined inside a class body and typically performs an action

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Common DataFrame attribute;

A

Columns: column labels

Dtypes: data types in dataframe

iloc: access grp of rows & columns using integer based indexing

loc: accesses a group of rows and columns by labels or a boolean array

shape: returns a tuple representing the dimensionality of the DataFrame

values: returns a numpy representation of the dataframe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

methods in dataframes;

A

apply: applies a function over an axis of the dataframe

copy: makes a copy of the dataframe’s indices and data

describe: returns descriptive statistics of the dataframe

drop: drops specified labels from rows or columns

groupby: splits the dataframe, applies a function, and combines the results

head(n=5): returns the first n rows of a data frame( default = 5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DataFrame methods 2;

A

info: returns a concise summary of the dataframe

isna: returns a same sized boolean dataframe indicating whether each value is null(can also use isnull() as an alias)

sort values: sorts by the values across a given axis

value counts: returns a series containing counts of unique rows in dataframe

where: replaces values in the dataframe where a given condition is false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Selecting rows from a dataframe

A

print(df.loc[‘row_1]): this will return a
series object.

print(df.loc[[‘row_1]]): returns a dataframe object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you change date object to datetime?

A

df[date] = pd.to_datetime(df[date])
All columns in quotations marks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Create year column from datetime column?

A

df[year] = df[date].Dr.year
All columns in quotations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you take a sample from a data frame?

A

sample = df.sample(n=50, random_stae = 42)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Using the unicorn df, create years till unicorn column?

A

sample[years_till_unicorn] = sample[Year Joined] - sample[Year Founded]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Group data(unicorn) by industry, for each industry get the max value in the year till unicorn column

A

grouped = (sample[[Industry, years_till_unicorn]]
.groupby(Industry)
.max()
.sort_values(by = years_till_unicorn)
)

grouped

How well did you know this?
1
Not at all
2
3
4
5
Perfectly