Data Types & Structures Flashcards

This section will help you understand various data types and structures in Python - also similar to other programming languages

1
Q

What is a data type?

A

A classification that specifies the type of data a variable can hold.

Common types include integers, floats, strings, and booleans.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

List the four primary data types in Python.

A
  • Integer
  • Float
  • String
  • Boolean

Python supports dynamic typing, meaning variables can change types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between an integer and a float?

A

An integer is a whole number, while a float is a decimal number.

Example: 5 (int) vs. 5.2 (float).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True or False:

A string can contain both letters and numbers.

A

True.

Strings are sequences of characters, including letters, numbers, and symbols.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define a list in Python.

A

A list is an ordered, mutable collection of elements.

Example: [1, “apple”, 3.14].
Mutable means that an object can be modified after it is created

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the key difference between a tuple and a list?

A

A tuple is immutable, while a list is mutable.

Tuples use () and lists use [].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which data structure in Python allows key-value pairs?

A

Dictionary (dict).

Example: {“name”: “Alice”, “age”: 25}.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Fill in the blank:

A _______ is a collection of unique elements in Python.

A

Set.

Example: {1, 2, 3, 3} → {1, 2, 3} (duplicates removed).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a DataFrame in pandas?

A

A two-dimensional, tabular data structure.

Think of it as an Excel spreadsheet in Python.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which pandas function creates a DataFrame?

A

pd.DataFrame()

Requires data in lists, dictionaries, or NumPy arrays.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define a series in pandas.

A

A one-dimensional labeled array.

Example: pd.Series([1, 2, 3]).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

True or False:

NumPy arrays are more memory-efficient than Python lists.

A

True.

NumPy arrays store data more compactly and perform faster operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the .dtypes attribute in pandas return?

A

The data types of each column in a DataFrame.

Example output: int64, float64, object.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you convert a column to a different data type in pandas?

A

Using .astype()

Example: df[“age”] = df[“age”].astype(int).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which Python module is best suited for working with large numerical data efficiently?

A

NumPy

It provides powerful array operations. NumPy is derrived from NUMerical PYthon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the main advantage of using a tuple over a list?

A

Tuples are faster and memory-efficient.

They are immutable, making them safer for data integrity.

17
Q

Fill in the blank:

A _____ is a high-performance multi-dimensional array in NumPy.

A

ndarray (N-dimensional array).

Example: np.array([[1, 2], [3, 4]]).

18
Q

What function checks for missing values in a pandas DataFrame?

A

.isnull()

Returns a Boolean mask where True indicates missing values.

19
Q

How do you get the shape of a pandas DataFrame?

A

Using .shape

Returns a tuple (rows, columns).

20
Q

What is the purpose of the .info() function in pandas?

A

Displays a summary of the DataFrame, including data types and missing values.

Helps understand dataset structure quickly.

21
Q

How do you select a specific column from a pandas DataFrame?

A

df[“column_name”]

Alternative: df.column_name (if no spaces in column name).

22
Q

Why is df[“column_name”] method more preferred than df.column_name ?

A
  • Works with all column names irrespective of spaces between
  • Avoids conflicts with pandas attributes incase column name matches a built-in DataFrame
  • More consistent with indexing syntax. i.e same logic with dict, list
23
Q

True or False:

Sets in Python maintain the order of elements.

A

False.

Sets are unordered collections of unique elements.

24
Q

What function combines two pandas DataFrames?

A
  • pd.concat()
  • merge()

concat() stacks vertically/horizontally; merge() joins based on keys.

25
Q

What is a sparse matrix?

A

A matrix with mostly zero values.

Used in machine learning for efficient storage.

26
Q

How do you convert a pandas Series to a NumPy array?

A
  • .to_numpy()
  • .values

Example: df[“column”].to_numpy().

27
Q

What is the difference between .iloc[] and .loc[]?

A

.iloc[] uses integer positions; .loc[] uses labels.

Example: df.iloc[0,1] vs. df.loc[0, “column”].

28
Q

Which data type in pandas is best for categorical data?

A

category

Reduces memory usage and speeds up operations. E.g df[“gender”] = df[“gender”].astype(“category”)

29
Q

What does pd.get_dummies() do?

A

Converts categorical variables into dummy variables.

Useful for machine learning models.

30
Q

How do you find unique values in a column?

A

df[“column”].unique()

Returns an array of unique values.

31
Q

Which pandas function converts JSON data into a DataFrame?

A

pd.read_json()

Used for reading JSON-formatted data.