Data Types & Structures Flashcards
This section will help you understand various data types and structures in Python - also similar to other programming languages
What is a data type?
A classification that specifies the type of data a variable can hold.
Common types include integers, floats, strings, and booleans.
List the four primary data types in Python.
- Integer
- Float
- String
- Boolean
Python supports dynamic typing, meaning variables can change types.
What is the difference between an integer and a float?
An integer is a whole number, while a float is a decimal number.
Example: 5 (int) vs. 5.2 (float).
True or False:
A string can contain both letters and numbers.
True.
Strings are sequences of characters, including letters, numbers, and symbols.
Define a list in Python.
A list is an ordered, mutable collection of elements.
Example: [1, “apple”, 3.14].
Mutable means that an object can be modified after it is created
What is the key difference between a tuple and a list?
A tuple is immutable, while a list is mutable.
Tuples use () and lists use [].
Which data structure in Python allows key-value pairs?
Dictionary (dict).
Example: {“name”: “Alice”, “age”: 25}.
Fill in the blank:
A _______ is a collection of unique elements in Python.
Set.
Example: {1, 2, 3, 3} → {1, 2, 3} (duplicates removed).
What is a DataFrame in pandas?
A two-dimensional, tabular data structure.
Think of it as an Excel spreadsheet in Python.
Which pandas function creates a DataFrame?
pd.DataFrame()
Requires data in lists, dictionaries, or NumPy arrays.
Define a series in pandas.
A one-dimensional labeled array.
Example: pd.Series([1, 2, 3]).
True or False:
NumPy arrays are more memory-efficient than Python lists.
True.
NumPy arrays store data more compactly and perform faster operations.
What does the .dtypes attribute in pandas return?
The data types of each column in a DataFrame.
Example output: int64, float64, object.
How do you convert a column to a different data type in pandas?
Using .astype()
Example: df[“age”] = df[“age”].astype(int).
Which Python module is best suited for working with large numerical data efficiently?
NumPy
It provides powerful array operations. NumPy is derrived from NUMerical PYthon
What is the main advantage of using a tuple over a list?
Tuples are faster and memory-efficient.
They are immutable, making them safer for data integrity.
Fill in the blank:
A _____ is a high-performance multi-dimensional array in NumPy.
ndarray (N-dimensional array).
Example: np.array([[1, 2], [3, 4]]).
What function checks for missing values in a pandas DataFrame?
.isnull()
Returns a Boolean mask where True indicates missing values.
How do you get the shape of a pandas DataFrame?
Using .shape
Returns a tuple (rows, columns).
What is the purpose of the .info() function in pandas?
Displays a summary of the DataFrame, including data types and missing values.
Helps understand dataset structure quickly.
How do you select a specific column from a pandas DataFrame?
df[“column_name”]
Alternative: df.column_name (if no spaces in column name).
Why is df[“column_name”] method more preferred than df.column_name ?
- Works with all column names irrespective of spaces between
- Avoids conflicts with pandas attributes incase column name matches a built-in DataFrame
- More consistent with indexing syntax. i.e same logic with dict, list
True or False:
Sets in Python maintain the order of elements.
False.
Sets are unordered collections of unique elements.
What function combines two pandas DataFrames?
- pd.concat()
- merge()
concat() stacks vertically/horizontally; merge() joins based on keys.
What is a sparse matrix?
A matrix with mostly zero values.
Used in machine learning for efficient storage.
How do you convert a pandas Series to a NumPy array?
- .to_numpy()
- .values
Example: df[“column”].to_numpy().
What is the difference between .iloc[] and .loc[]?
.iloc[] uses integer positions; .loc[] uses labels.
Example: df.iloc[0,1] vs. df.loc[0, “column”].
Which data type in pandas is best for categorical data?
category
Reduces memory usage and speeds up operations. E.g df[“gender”] = df[“gender”].astype(“category”)
What does pd.get_dummies() do?
Converts categorical variables into dummy variables.
Useful for machine learning models.
How do you find unique values in a column?
df[“column”].unique()
Returns an array of unique values.
Which pandas function converts JSON data into a DataFrame?
pd.read_json()
Used for reading JSON-formatted data.