Pandas Flashcards
What method is used to subset rows by index label in Python?
loc
loc is used to access a group of rows and columns by labels or a boolean array.
How does Python count rows?
From 0
What method is used to get the second row in a DataFrame?
iloc
What does using -1 with iloc do?
Gets the last row
What syntax is used to subset columns in Python?
Colon (:)
A colon is used to refer to all rows when subsetting columns.
How do you subset the first column using loc?
df.loc[:, [columns]]
How can you select the last column using iloc?
-1
What is the method to calculate the average life expectancy by year?
Split data by year and calculate mean of ‘lifeExp’ column
What method can be used to flatten a DataFrame?
reset_index
What function is used to get counts of unique values on a Pandas Series?
nunique
What is a histogram?
Vertical bar chart of frequencies
What type of graph is a frequency polygon?
Line graph of frequencies
What does an ogive represent?
Line graph of cumulative frequencies
What type of chart provides proportional representation for categories of a whole?
Pie Chart
What are the methods of visual presentation of data?
- Table
- Graphs
- Pie Chart
- Multiple bar chart
- Simple pictogram
What is a frequency distribution?
A summary of how often different values occur in a dataset.
What is the cumulative frequency?
The running total of frequencies up to a certain class interval.
What does a Pareto chart display?
Frequency of categories in descending order
What is the principle of excellent graphs regarding data distortion?
The graph should not distort the data
What should the scale on the vertical axis of a graph begin with?
Zero
What is considered ‘chart junk’?
Unnecessary adornments in a graph
True or False: All axes in a graph should be properly labeled.
True
What is the simplest possible graph used for?
To represent a given set of data
What is a graphical error related to compressing the vertical axis?
Misleading representation of data
Fill in the blank: The method to create a frequency polygon is to plot the __________ against the class intervals.
Frequency
What is the purpose of a scatter plot?
To show the relationship between two variables
What should a good presentation of data avoid?
Graphical errors
What is the command to install the pandas library using pip?
pip install pandas
True or False: Pandas is primarily used for data manipulation and analysis in Python.
True
Fill in the blank: To load a CSV file into a pandas DataFrame, you would use the function ___.
pd.read_csv()
What is the primary data structure used in pandas?
DataFrame
How do you access the first five rows of a DataFrame called ‘df’?
df.head()
What method would you use to view the last three rows of a DataFrame?
df.tail(3)
True or False: You can access a column in a DataFrame using the dot notation.
True
What is the command to access the ‘Age’ column from a DataFrame named ‘df’?
df[‘Age’]
What function would you use to select rows based on a condition?
df[df[‘column_name’] condition]
How can you subset a DataFrame to include only rows where the ‘Salary’ is greater than 50000?
df[df[‘Salary’] > 50000]
What does the .iloc method do in pandas?
It allows indexing and selecting by integer position.
How do you select the first row of a DataFrame using .iloc?
df.iloc[0]
True or False: You can slice a DataFrame using .loc and .iloc.
True
What is the syntax to access a specific cell at row index 2 and column ‘Name’?
df.at[2, ‘Name’]
Fill in the blank: To select multiple columns, you can pass a list to the DataFrame like this: df[___].
[‘column1’, ‘column2’]
What is the command to load an Excel file into a pandas DataFrame?
pd.read_excel()
How do you rename a column in a DataFrame?
df.rename(columns={‘old_name’: ‘new_name’}, inplace=True)
True or False: Pandas can handle missing data.
True
What command would you use to check for missing values in a DataFrame?
df.isnull().sum()
What method is used to drop rows with missing values?
df.dropna()
How do you select rows with index labels 1 to 3 using .loc?
df.loc[1:3]
What does the .shape attribute return?
It returns a tuple representing the dimensionality of the DataFrame.
Fill in the blank: To filter a DataFrame based on multiple conditions, you can use ___ operators.
logical
What is the syntax to select the ‘Name’ and ‘Age’ columns from a DataFrame?
df[[‘Name’, ‘Age’]]
True or False: You can use the .query() method to filter DataFrames using a query string.
True
What do you use to reset the index of a DataFrame?
df.reset_index()
What function is used to concatenate two DataFrames?
pd.concat()
How can you access a specific row by its index using .loc?
df.loc[index]
What is the difference between .loc and .iloc?
.loc is label-based, while .iloc is position-based.
Fill in the blank: The command to save a DataFrame to a CSV file is df.to___(‘filename.csv’).
csv
What is the method to group data in a DataFrame?
df.groupby()
How do you access rows where the ‘Department’ is ‘Sales’?
df[df[‘Department’] == ‘Sales’]
True or False: You can use .apply() to apply a function along an axis of the DataFrame.
True
What is the purpose of the .sort_values() method?
It sorts the DataFrame by the specified column(s).
How do you select a specific subset of rows and columns in a DataFrame?
df.loc[row_indices, [‘column1’, ‘column2’]]
What is the command to get descriptive statistics of a DataFrame?
df.describe()
Fill in the blank: You can create a new column in a DataFrame by assigning to df[‘___’].
new_column
What does the .info() method provide?
It provides a summary of the DataFrame including the data types and non-null counts.
How do you filter a DataFrame to include only unique values in a column?
df[‘column_name’].unique()
What command would you use to drop a specific column from a DataFrame?
df.drop(‘column_name’, axis=1, inplace=True)