Data Cleaning Flashcards

1
Q

“What does the .duplicated() method do in Pandas?”

A

“It checks for duplicate rows in a DataFrame and returns a boolean series.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

“How do you remove duplicate rows using .drop_duplicates() in Pandas?”

A

“Use .drop_duplicates() to remove duplicate rows from a DataFrame.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

“What is the output of the following code?

df = pd.DataFrame({"A": [1, 2, 2, 3], "B": [4, 5, 5, 6]}) df[df.duplicated()]

A

“It returns a boolean series where True represents the duplicate rows, based on previous rows.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

“How do you rename columns in a Pandas DataFrame?”

A

“Use .rename() method, passing a dictionary with old column names as keys and new names as values.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

“What will be the result of the following?

df = pd.DataFrame({"A": [1, 2], "B": [3, 4]}) df.rename(columns={"A": "X"})

A

“It will rename column A to X and leave other columns unchanged.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

“How can you change a column’s data type in Pandas?”

A

“Use the .astype() method to change the data type of a column.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

“What does the following code do?

df = pd.DataFrame({"A": [1, 2, 3]}) df["A"] = df["A"].astype(str)

A

“It converts the values in column A from integers to strings.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

“What is the output of the following?

df = pd.DataFrame({"text": ["apple", "banana", "cherry"]}) df["text"].str.contains("an")

A

“It returns a boolean series where True represents rows where the string contains the substring an.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

“How do you replace a substring in a column with a new substring?”

A

“Use the .str.replace() method to replace a substring in each element of the column.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

“What will be the output of the following?

df = pd.DataFrame({"text": ["apple", "banana", "cherry"]}) df["text"].str.replace("a", "o")

A

“It will replace all occurrences of a with o in each string of the text column.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

“What does the .apply() method do in Pandas?”

A

“The .apply() method is used to apply a function along a DataFrame axis (rows or columns).”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

“What is the output of the following code?

df = pd.DataFrame({"A": [1, 2, 3]}) df["A"].apply(lambda x: x**2)

A

“It squares each element in column A and returns a new Series: [1, 4, 9].”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

“What is the difference between .map() and .apply()?”

A

.map() is used for element-wise operations in a Series, while .apply() is used for row/column-wise operations in a DataFrame.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

“What is the output of the following code?

df = pd.DataFrame({"A": [1, 2, 3]}) df["A"].map({1: "one", 2: "two", 3: "three"})

A

“It returns a new Series with the mapped values: ["one", "two", "three"].”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

“What does the .applymap() method do in Pandas?”

A

.applymap() is used to apply a function to each element of a DataFrame, performing element-wise operations.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

“What will be the result of this code?

df = pd.DataFrame({"A": [1, 2, 3]}) df.applymap(lambda x: x+10)

A

“It will add 10 to each element in the DataFrame, resulting in [[11, 12, 13]].”

17
Q

“How would you convert a column A of strings like ["10", "20", "30"] into integers?”

A

“Use df["A"].astype(int) to convert the column to integers.”

18
Q

“How do you remove rows with missing values in a DataFrame?”

A

“Use df.dropna() to remove rows containing NaN values.”

19
Q

“What would be the result of the following?

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) df["A"].apply(lambda x: x*2)

A

“It will double each value in column A and return [2, 4, 6].”

20
Q

“How do you apply a function to a column and return a transformed result?”

A

“Use the .apply() method on a column, e.g., df["A"].apply(func).”

21
Q

“What is the difference between .replace() and .str.replace()?”

A

.replace() is used for general string replacement, while .str.replace() is used for element-wise string replacement in Series.”

22
Q

“How can you apply a custom function to every element in a Series?”

A

“Use .map() to apply a function element-wise in a Series.”