VA Session 7 ^ Flashcards

1
Q

Subsetting & Slicing methods with pandas

A
  • loc
  • iloc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

df.loc[]

A

Access rows or columns by labels or Boolean array; slices: start & stop included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

df.loc[]:
- elements
- row with label 1 as series
- row with label 1 as Data Frame
- rows from start (0) to end (5)
- all rows & named column

A
  • df.loc[“row_name”, “column_name”]
  • df.loc[“A”]
  • df.loc[[“A”]]
  • df. loc[“A”:”C”]
  • df.loc[:,“Nr-in_Alphabet”]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

df.iloc[]

A
  • Access rows or columns by their integer position or with Boolean array (position go from 0 to length -1)
  • slice: end number not included
  • faster for selecting rows than loc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  • df.iloc[]
  • elements
  • row at position 1 as series
  • row at position 1 as Data Frame
  • rows from index (0) to (4)
  • all rows & first two columns
A
  • df.iloc[row_index, colum_index]
  • df.iloc[1]
  • df.iloc[[1]]
  • df.iloc[0:5]
  • df.iloc[:,0:2]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Subsetting rows with index 0 to 4

A

df[0:5]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Subset columns

A
  • df[“column_name”]
  • df[[“col_name1”, “col_name2”]]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Subset rows & columns

A

df[0:5] [[“col_name1”, “col_name2”]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Subsetting vs referencing of datasets

A
  • Copying: df.copy() -> Create new true copy of Data Frame
  • Referencing: subsetting & storing results in new Data Frame -> new one still referencing the original DataFrame: when changing original data, also copied would be changed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Filtering data: Subset Dataframe’s rows or columns according to specified row or column labels

A

df.filter(like=”culmen”, axis = 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Cleaning data - Checking & removing duplicates

A
  • Duplicates: already exist in stored data or created when merging datasets
  • df.duplicated(): Check duplicates row-wise
  • df.nunique(): Check duplicates column-wise
  • df.drop_duplicates(): Remove duplicate rows
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Cleaning data - Remapping values (e.g. due to faulty data or analysis requires transformation)

A

df.replace()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Cleaning data - Dealing with text

A
  • Common issues in text data quality -> more categories than actually exist
  • e.g. “Copenhagen”, “COPENHAGEN”, “Copenhagen
  • df[“column”].str.strip(): Remove spaces
  • df[“column”].str.upper()
  • df[“column”].str.upper(): Change capitalization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Reshaping data - Unpivot DataFrame from wide to long format

A

melt()
wide_to_long()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Reshaping data - Change from long to wide format

A

pivot()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

JavaScript Object Notation (JSON)

A
  • common format for strong data as human-readable text (e.g. used by many web applications & servers)
  • pd.read_json(“/path/file.json”): Read a JSON file
  • pd.to_json(“/path/file.json”): Write a JSON file
17
Q

Joining data by join

A
  • pd.merge(df1, df2, how =”inner”, left_on=”id”, right_on=”id”)
  • Join datasets, can also be done on row indexes rather than specific id columns
18
Q

Joining data by union

A
  • concat()
  • datasets that have same columns
19
Q

Joining data by intersections

A

merge()

20
Q

Df[…] vs loc & iloc

A
  • [] + index -> selects rows
  • [] + Names -> selects rows and columns
  • df[index] [names] -> selects rows and columns