VA Session 7 ^ Flashcards

Question 1

Q

Subsetting & Slicing methods with pandas

Question 2

Q

df.loc[]

Answer

A

Access rows or columns by labels or Boolean array; slices: start & stop included

Question 3

Q

df.loc[]:
- elements
- row with label 1 as series
- row with label 1 as Data Frame
- rows from start (0) to end (5)
- all rows & named column

Answer

A

df.loc[“row_name”, “column_name”]
df.loc[“A”]
df.loc[[“A”]]
df. loc[“A”:”C”]
df.loc[:,“Nr-in_Alphabet”]

Question 4

Q

df.iloc[]

Answer

A

Access rows or columns by their integer position or with Boolean array (position go from 0 to length -1)
slice: end number not included
faster for selecting rows than loc

Question 5

Q

df.iloc[]
elements
row at position 1 as series
row at position 1 as Data Frame
rows from index (0) to (4)
all rows & first two columns

Answer

A

df.iloc[row_index, colum_index]
df.iloc[1]
df.iloc[[1]]
df.iloc[0:5]
df.iloc[:,0:2]

Question 6

Q

Subsetting rows with index 0 to 4

Question 7

Q

Subset columns

Answer

A

df[“column_name”]
df[[“col_name1”, “col_name2”]]

Question 8

Q

Subset rows & columns

Answer

A

df[0:5] [[“col_name1”, “col_name2”]]

Question 9

Q

Subsetting vs referencing of datasets

Answer

A

Copying: df.copy() -> Create new true copy of Data Frame
Referencing: subsetting & storing results in new Data Frame -> new one still referencing the original DataFrame: when changing original data, also copied would be changed

Question 10

Q

Filtering data: Subset Dataframe’s rows or columns according to specified row or column labels

Answer

A

df.filter(like=”culmen”, axis = 1)

Question 11

Q

Cleaning data - Checking & removing duplicates

Answer

A

Duplicates: already exist in stored data or created when merging datasets
df.duplicated(): Check duplicates row-wise
df.nunique(): Check duplicates column-wise
df.drop_duplicates(): Remove duplicate rows

Question 12

Q

Cleaning data - Remapping values (e.g. due to faulty data or analysis requires transformation)

Answer

A

df.replace()

Question 13

Q

Cleaning data - Dealing with text

Answer

A

Common issues in text data quality -> more categories than actually exist
e.g. “Copenhagen”, “COPENHAGEN”, “Copenhagen
df[“column”].str.strip(): Remove spaces
df[“column”].str.upper()
df[“column”].str.upper(): Change capitalization

Question 14

Q

Reshaping data - Unpivot DataFrame from wide to long format

Answer

A

melt()
wide_to_long()

Question 15

Q

Reshaping data - Change from long to wide format

Question 16

Q

JavaScript Object Notation (JSON)

Answer

Study These Flashcards

A

common format for strong data as human-readable text (e.g. used by many web applications & servers)
pd.read_json(“/path/file.json”): Read a JSON file
pd.to_json(“/path/file.json”): Write a JSON file