Data Cleaning Flashcards
“What does the .duplicated()
method do in Pandas?”
“It checks for duplicate rows in a DataFrame and returns a boolean series.”
“How do you remove duplicate rows using .drop_duplicates()
in Pandas?”
“Use .drop_duplicates()
to remove duplicate rows from a DataFrame.”
“What is the output of the following code?
df = pd.DataFrame({"A": [1, 2, 2, 3], "B": [4, 5, 5, 6]})
df[df.duplicated()]
”
“It returns a boolean series where True
represents the duplicate rows, based on previous rows.”
“How do you rename columns in a Pandas DataFrame?”
“Use .rename()
method, passing a dictionary with old column names as keys and new names as values.”
“What will be the result of the following?
df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
df.rename(columns={"A": "X"})
”
“It will rename column A
to X
and leave other columns unchanged.”
“How can you change a column’s data type in Pandas?”
“Use the .astype()
method to change the data type of a column.”
“What does the following code do?
df = pd.DataFrame({"A": [1, 2, 3]})
df["A"] = df["A"].astype(str)
”
“It converts the values in column A
from integers to strings.”
“What is the output of the following?
df = pd.DataFrame({"text": ["apple", "banana", "cherry"]})
df["text"].str.contains("an")
”
“It returns a boolean series where True
represents rows where the string contains the substring an
.”
“How do you replace a substring in a column with a new substring?”
“Use the .str.replace()
method to replace a substring in each element of the column.”
“What will be the output of the following?
df = pd.DataFrame({"text": ["apple", "banana", "cherry"]})
df["text"].str.replace("a", "o")
”
“It will replace all occurrences of a
with o
in each string of the text
column.”
“What does the .apply()
method do in Pandas?”
“The .apply()
method is used to apply a function along a DataFrame axis (rows or columns).”
“What is the output of the following code?
df = pd.DataFrame({"A": [1, 2, 3]})
df["A"].apply(lambda x: x**2)
”
“It squares each element in column A
and returns a new Series: [1, 4, 9]
.”
“What is the difference between .map()
and .apply()
?”
”.map()
is used for element-wise operations in a Series, while .apply()
is used for row/column-wise operations in a DataFrame.”
“What is the output of the following code?
df = pd.DataFrame({"A": [1, 2, 3]})
df["A"].map({1: "one", 2: "two", 3: "three"})
”
“It returns a new Series with the mapped values: ["one", "two", "three"]
.”
“What does the .applymap()
method do in Pandas?”
”.applymap()
is used to apply a function to each element of a DataFrame, performing element-wise operations.”
“What will be the result of this code?
df = pd.DataFrame({"A": [1, 2, 3]})
df.applymap(lambda x: x+10)
”
“It will add 10 to each element in the DataFrame, resulting in [[11, 12, 13]]
.”
“How would you convert a column A
of strings like ["10", "20", "30"]
into integers?”
“Use df["A"].astype(int)
to convert the column to integers.”
“How do you remove rows with missing values in a DataFrame?”
“Use df.dropna()
to remove rows containing NaN
values.”
“What would be the result of the following?
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df["A"].apply(lambda x: x*2)
”
“It will double each value in column A
and return [2, 4, 6]
.”
“How do you apply a function to a column and return a transformed result?”
“Use the .apply()
method on a column, e.g., df["A"].apply(func)
.”
“What is the difference between .replace()
and .str.replace()
?”
”.replace()
is used for general string replacement, while .str.replace()
is used for element-wise string replacement in Series.”
“How can you apply a custom function to every element in a Series?”
“Use .map()
to apply a function element-wise in a Series.”