DataFrames Flashcards

1
Q

Array

A
  • one-dimensional
  • unordered collection
  • contains only one data type

Each item has an index and a value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Three main components of tables

A

rows, columns, index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Rows of a table

A

“entry” or “observation”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Columns of a table

A

Each column of a table represents some attribute that entries (rows) have.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Index of a table

A
  • The first column
  • Meaningful or arbitrary
  • Unique values
  • Identify rows
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Series

A
  • The most basic pandas object
  • Has two sections: the index and the values
  • Under the hood, columns of a Series are actually NumPy arrays
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DataFrames

A
  • Pandas table object
  • Contains an Index, Rows, and Columns
  • Each column is a Series
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you read a DataFrame?

A

pd.read_csv(filepath)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

df.loc[]

A
  • Accesses rows/columns by label
  • Loc slicing is right-inclusive
  • Syntax: df.loc[A:B, C:D]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

filtering using df.loc[]

A

Ex: movies.loc[movies[“Year”] < 1950]

  • You can also filter by more than one condition using
    condition1 = movies[“Year”] >= 2000
    condition2 = movies[“Studio”] == “Fox”

filtered_or = movies.loc[condition1 | condition2]

filtered_and = movies.loc[condition1 & condition2]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you assign columns to a DataFrame?

A

Using indexing/loc:
- df[“column”] data
Using df.assign():
- new_df = df.assign(label=data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you sort DataFrames?

A

df.sort_values()

Ex: movies.sort_values(“Studio”, ascending=True)
Ex: movies.sort_values(“Year”, ascending=False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

df.groupby()

A

Creates new df grouped by certain column(s)
Ex: df.groupby([col1, col2, …])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Ways of grouping by 2 columns

A
  1. using df.groupby()
    Ex: movies.groupby([“Year”, “Studio”])[“Title”].count().to_frame()
  2. using df.pivot_table()
    Ex: pt = movies.pivot_table(values=”Title”, index=”Year”, columns=”Studio”, aggfunc=”count”)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Merging DataFrames

A

pd.merge()
Inner Join:
- This will only include rows with a match in both DataFrames.
Ex: pd.merge(adf, bdf, how=”inner”, on=”x1”)
Outer Join:
- This will retain all rows in both DataFrames.
Ex: pd.merge(adf, bdf, how=”outer”, on=”x1”)
Left Join:
Use all rows form the First DataFrame
Right Join:
Use all rows from the second DataFrame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

df.apply()

A

Applies a function to a DataFrame
Applying to a column:
- Will return a Series with the function applied to each row in the column
df[column].apply(function)
It is possible to apply a function to a row with axis=1
df.apply(function, axis=1)