Pandas Flashcards

1
Q

What method is used to subset rows by index label in Python?

A

loc

loc is used to access a group of rows and columns by labels or a boolean array.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does Python count rows?

A

From 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What method is used to get the second row in a DataFrame?

A

iloc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does using -1 with iloc do?

A

Gets the last row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What syntax is used to subset columns in Python?

A

Colon (:)

A colon is used to refer to all rows when subsetting columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you subset the first column using loc?

A

df.loc[:, [columns]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can you select the last column using iloc?

A

-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the method to calculate the average life expectancy by year?

A

Split data by year and calculate mean of ‘lifeExp’ column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What method can be used to flatten a DataFrame?

A

reset_index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What function is used to get counts of unique values on a Pandas Series?

A

nunique

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a histogram?

A

Vertical bar chart of frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What type of graph is a frequency polygon?

A

Line graph of frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does an ogive represent?

A

Line graph of cumulative frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What type of chart provides proportional representation for categories of a whole?

A

Pie Chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the methods of visual presentation of data?

A
  • Table
  • Graphs
  • Pie Chart
  • Multiple bar chart
  • Simple pictogram
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a frequency distribution?

A

A summary of how often different values occur in a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the cumulative frequency?

A

The running total of frequencies up to a certain class interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does a Pareto chart display?

A

Frequency of categories in descending order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the principle of excellent graphs regarding data distortion?

A

The graph should not distort the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What should the scale on the vertical axis of a graph begin with?

A

Zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is considered ‘chart junk’?

A

Unnecessary adornments in a graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

True or False: All axes in a graph should be properly labeled.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the simplest possible graph used for?

A

To represent a given set of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a graphical error related to compressing the vertical axis?

A

Misleading representation of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Fill in the blank: The method to create a frequency polygon is to plot the __________ against the class intervals.

A

Frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the purpose of a scatter plot?

A

To show the relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What should a good presentation of data avoid?

A

Graphical errors

28
Q

What is the command to install the pandas library using pip?

A

pip install pandas

29
Q

True or False: Pandas is primarily used for data manipulation and analysis in Python.

30
Q

Fill in the blank: To load a CSV file into a pandas DataFrame, you would use the function ___.

A

pd.read_csv()

31
Q

What is the primary data structure used in pandas?

32
Q

How do you access the first five rows of a DataFrame called ‘df’?

33
Q

What method would you use to view the last three rows of a DataFrame?

A

df.tail(3)

34
Q

True or False: You can access a column in a DataFrame using the dot notation.

35
Q

What is the command to access the ‘Age’ column from a DataFrame named ‘df’?

A

df[‘Age’]

36
Q

What function would you use to select rows based on a condition?

A

df[df[‘column_name’] condition]

37
Q

How can you subset a DataFrame to include only rows where the ‘Salary’ is greater than 50000?

A

df[df[‘Salary’] > 50000]

38
Q

What does the .iloc method do in pandas?

A

It allows indexing and selecting by integer position.

39
Q

How do you select the first row of a DataFrame using .iloc?

A

df.iloc[0]

40
Q

True or False: You can slice a DataFrame using .loc and .iloc.

41
Q

What is the syntax to access a specific cell at row index 2 and column ‘Name’?

A

df.at[2, ‘Name’]

42
Q

Fill in the blank: To select multiple columns, you can pass a list to the DataFrame like this: df[___].

A

[‘column1’, ‘column2’]

43
Q

What is the command to load an Excel file into a pandas DataFrame?

A

pd.read_excel()

44
Q

How do you rename a column in a DataFrame?

A

df.rename(columns={‘old_name’: ‘new_name’}, inplace=True)

45
Q

True or False: Pandas can handle missing data.

46
Q

What command would you use to check for missing values in a DataFrame?

A

df.isnull().sum()

47
Q

What method is used to drop rows with missing values?

A

df.dropna()

48
Q

How do you select rows with index labels 1 to 3 using .loc?

A

df.loc[1:3]

49
Q

What does the .shape attribute return?

A

It returns a tuple representing the dimensionality of the DataFrame.

50
Q

Fill in the blank: To filter a DataFrame based on multiple conditions, you can use ___ operators.

51
Q

What is the syntax to select the ‘Name’ and ‘Age’ columns from a DataFrame?

A

df[[‘Name’, ‘Age’]]

52
Q

True or False: You can use the .query() method to filter DataFrames using a query string.

53
Q

What do you use to reset the index of a DataFrame?

A

df.reset_index()

54
Q

What function is used to concatenate two DataFrames?

A

pd.concat()

55
Q

How can you access a specific row by its index using .loc?

A

df.loc[index]

56
Q

What is the difference between .loc and .iloc?

A

.loc is label-based, while .iloc is position-based.

57
Q

Fill in the blank: The command to save a DataFrame to a CSV file is df.to___(‘filename.csv’).

58
Q

What is the method to group data in a DataFrame?

A

df.groupby()

59
Q

How do you access rows where the ‘Department’ is ‘Sales’?

A

df[df[‘Department’] == ‘Sales’]

60
Q

True or False: You can use .apply() to apply a function along an axis of the DataFrame.

61
Q

What is the purpose of the .sort_values() method?

A

It sorts the DataFrame by the specified column(s).

62
Q

How do you select a specific subset of rows and columns in a DataFrame?

A

df.loc[row_indices, [‘column1’, ‘column2’]]

63
Q

What is the command to get descriptive statistics of a DataFrame?

A

df.describe()

64
Q

Fill in the blank: You can create a new column in a DataFrame by assigning to df[‘___’].

A

new_column

65
Q

What does the .info() method provide?

A

It provides a summary of the DataFrame including the data types and non-null counts.

66
Q

How do you filter a DataFrame to include only unique values in a column?

A

df[‘column_name’].unique()

67
Q

What command would you use to drop a specific column from a DataFrame?

A

df.drop(‘column_name’, axis=1, inplace=True)