Introduction to Data Analysis using Python Flashcards
This section tackles base python, list&dict comprehension, pandas and basic manipulation
What is data analysis in Python?
The process of inspecting, cleaning, and modeling data to extract insights.
Python offers built-in functions and libraries like pandas and numpy.
Name two built-in Python functions useful for data analysis.
- sum()
- len()
- count()
Other useful functions: min(), max(), sorted().
True or False:
Base Python has built-in support for data frames.
False.
Data frames require external libraries like pandas.
Fill in the blank:
The function ____ () returns a sequence of numbers.
range()
Example: range(5) gives [0,1,2,3,4].
Which module in Python provides mathematical functions for data analysis?
math
Example: math.sqrt(25) → 5.
What is list comprehension in Python?
A concise way to create lists using a single line of code.
Example: [x**2 for x in range(5)].
How do you create a dictionary comprehension?
{key: value for key, value in iterable}
Example: {x: x**2 for x in range(3)}.
True or False:
List comprehensions are faster than traditional loops.
True.
They are optimized for performance in Python.
Fill in the blank:
[x for x in range(5) if x % 2 == 0] creates a list of ____ numbers.
Even
Output: [0, 2, 4].
How do you modify a list comprehension to include an else statement?
Using if-else before for
Example: [x if x % 2 == 0 else “odd” for x in range(5)].
What function opens a file in Python?
open()
Example: f = open(‘data.txt’, ‘r’).
How do you read all lines of a file at once?
.readlines()
Returns a list of lines.
True or False:
The with statement automatically closes a file.
True.
Example: with open(‘file.txt’) as f:.
Fill in the blank:
To write to a file, use the mode ______.
w
“w” creates a new file or overwrites an existing one.
What does file.write(“Hello”) do?
Writes “Hello” to the file.
Does not automatically add a newline.
Which module helps read CSV files in base Python?
csv
Example: import csv.
How do you read a CSV file in Python?
Using csv.reader()
Example: csv.reader(file).
True or False:
CSV files can only contain numerical data.
False.
They can contain text, dates, and more.
Fill in the blank:
csv.writer() is used to ____ data to a CSV file.
Write
Example: writer.writerow([‘Name’, ‘Age’]).
Which library makes reading CSV files easier than using csv?
pandas
Example: df = pd.read_csv(‘data.csv’).
What is Pandas?
A Python library for data manipulation and analysis.
Provides DataFrame and Series structures.
How do you import Pandas?
import pandas as pd
pd is the most used alias.
What is a DataFrame in Pandas?
A 2D labeled data structure.
Similar to a table in SQL or Excel.
True or False:
Pandas requires NumPy to function.
True.
Pandas is built on top of NumPy.
Fill in the blank:
A Pandas Series is similar to a ______.
Column in a spreadsheet
Series is a 1D labeled array.
How do you access the first five rows of a DataFrame?
.head()
Example: df.head(). If zou want a specific number of rows, define the number inside the brackets. eg df.head(7)
What does df.loc[2] return?
The row at index 2.
.loc[] is label-based.
True or False:
.iloc[] is label-based indexing.
False.
.iloc[] is position-based.
How do you filter rows where age > 30?
df[df[‘age’] > 30]
Boolean filtering.
Fill in the blank:
df[‘column’] accesses a ______.
Series
Example: df[‘name’].
How do you check for missing values?
.isnull()
Returns True for missing values.
What does .isnull().sum() do?
Summarize the totals of missing values in each column
What does df.dropna() do?
Removes rows with missing values.
Default: drops any row with at least one missing value.
True or False:
.fillna() removes NaN values.
False.
.fillna() replaces NaNs with a specified value.
Fill in the blank:
.dropna(how=’all’) removes rows where ____ values are missing.
All
Keeps rows with at least one non-null value.
How do you replace missing values with the column mean?
df[‘col’].fillna(df[‘col’].mean())
Useful for numerical data.
What does df.groupby(‘column’) do?
Groups rows by a column’s values.
Used for aggregating data.
Which method calculates the mean for each group?
.mean()
Example: df.groupby(‘gender’).mean().
True or False:
Pivot tables are a form of aggregation.
True.
They reshape and summarize data.
How do you count occurrences of unique values?
.value_counts()
Example: df[‘city’].value_counts().
Fill in the blank:
.agg({‘col’: ‘sum’}) applies ______.
A specific function to a column
Example: df.groupby(‘category’).agg({‘sales’: ‘sum’}).