Priority 5 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Which is faster, Python lists or Numpy arrays?

A

NumPy arrays

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why are NumPy arrays faster than Python lists?

A

NumPy arrays are implemented in C versus Python lists are implemented in Python. Because C is a compiled language, it is faster than Python, which is an interpreted language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the differences between Python lists and tuples?

3 bullet points

A
  • Lists are mutable whereas tuples are not.
  • Lists are defined using square brackets [] whereas tuples are defined using parentheses ().
  • Tuples are generally faster than lists given immutability, allowing for code optimization.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the similarities between Python lists and tuples?

3 bullet points

A
  • Both collection of objects.
  • Both comma-separated values.
  • Both ordered.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Python set?

A

Unordered collection of unique objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the typical use case of Python sets?

A

Often used to store a collection of distinct objects and perform membership tests (i.e., to check if an object is in the set).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How are Python sets defined?

A

Curly braces, {}, and a comma-separated list of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the key properties of Python sets?

5 bullet points

A
  • Unordered
  • Unique
  • Mutable
  • Not indexed/do not support slicing
  • Not hashable (cannot be used as keys in dictionaries or as elements in other sets)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between Python split and join?

1 bullet point for each

A
  • Split function is used to create a list from a string based on some delimiter (e.g., space).
  • Join function concatenates a list of strings into a single string.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Syntax: Python split

Include definition of any class objects and/or parameters

A
string.split(separator, maxsplit)
  • string: The string you want to split.
  • separator: (optional): The delimiter used to split the string. If not specified, it defaults to whitespace.
  • maxsplit: (optional): The maximum number of splits to perform. If not specified, it splits the string at all occurrences of the separator.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Syntax: Python join

Include definition of any class objects and/or parameters

A
separator.join(iterable)
  • separator: The string that will be used to separate the elements of the iterable in the resulting string.
  • iterable: An iterable object (e.g., a list, tuple, or string) whose elements will be joined together.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the logical operators in Python? What are they used for?

A
  • and, or, not
  • Used to perform boolean operations on bool values.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Logical operators in Python: and

A

Returns True if both operands are True; otherwise, False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Logical operators in Python: or

A

Returns True if either of the operands are True; returns False if both operands are False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Logical operators in Python: not

A

Returns True if the operand is False; returns False if the operand is True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the top 6 functions used for Python strings?

A
  1. len()
  2. strip()
  3. split()
  4. replace()
  5. upper()
  6. lower()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Top 6 functions used for Python strings: len()

A

Returns the length of a string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Top 6 functions used for Python strings: strip()

A

Removes leading and trailing whitespace from a string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Top 6 functions used for Python strings: split()

A

Splits a string into a list of substrings based on a delimiter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Top 6 functions used for Python strings: replace()

A

Replaces all occurrences of a specified string with another string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Top 6 functions used for Python strings: upper()

A

Converts a string to uppercase.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Top 6 functions used for Python strings: lower()

A

Converts a string to lowercase.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the pass keyword in Python? What is it used for?

A

pass is a null statement that does nothing. It is often used as a placeholder where a statement is required syntactically, but no action needs to be taken.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are some common use cases of the pass keyword in Python?

3 bullet points

A
  • Empty functions or classes: When you define a function/class but haven’t implemented any logic yet. Use pass to avoid syntax errors.
  • Conditional statements: If you need an if statement but don’t want to take any action in the if block, you can use pass.
  • Loops: You can use pass in loops when you don’t want to perform any action in a specific iteration.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the use of the continue keyword in Python?

A

continue is used in a loop to skip over the current iteration and move on to the next one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Definition: immutable data type in Python

A

Object whose state cannot be modified after it is created.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Definition: mutable data type in Python

A

Object whose state can be modified after it is created.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Examples of immutable data types in Python

A
  • Numbers: int, float, complex
  • bool
  • str
  • Tuples
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Examples of mutable data types in Python

A
  • Lists
  • Dictionaries
  • Sets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Because numbers are immutable data types in Python, what happens when you change the value of a number variable?

A

Old value gets garbage-collected, freeing up the memory assigned to stroing the object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Python variables versus objects

A
  • Variables are names that refer to or hold references to concrete objects.
  • Objects are concrete pieces of information that live in specific memory positions on computer.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Can you use sort() on tuples? Why or why not?

A

No. Tuples are immutable. You would have to create a new sorted tuple from the original tuple.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are try except blocks used for in Python?

A

Exception handling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

try except blocks in Python: what is the try block?

A

Contains code that might cause an exception to be raised.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

try except blocks in Python: what is the except block?

A

Contains code that is executed if an exception is raised during the execution of a try block.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are the similarites between Python functions and methods?

3 bullet points

A
  • Both blocks of code that perform a specific task.
  • Both can take input parameters and return a value.
  • Both defined using the def keyword.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What are the key differences between Python functions and methods?

4 bullet points

A
  • Functions are defined outside of classes; methods are functions that are associated with a specific object or class.
  • Functions can be called on a standalone basis; methods are called using the dot notation on an object of a class.
  • Functions perform general tasks; methods perform actions specific to the object they belong to.
  • Parameters are optional for functions; for methods, the first parameter is usually self, which refers to the instance of the class.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

How do functions help in code optimization?

4 high-level points

A
  1. Code reuse
  2. Improved readability
  3. Easier testing
  4. Improved performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Functions + code optimization: Code reuse

A

Allow you to reuse code by encapsulating it in a single place and calling it multiple times from different parts of your program. Reduces redundancy, making code more concise and easier to maintain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Functions + code optimization: Improved readability

A

Functions make your code more readable and easier to understand by dividing your code into logical blocks. This makes it easier to identify bugs and make changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Functions + code optimization: Easier testing

A

Functions allow you to test individual blocks of code separately, which can make it easier to find and fix bugs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Functions + code optimization: Improved performance

A

Functions allow you to use optimized code libraries and/or allow the Python interpreter to optimize the code more effectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Why is NumPy often used for data science?

3 bullet points

A
  • Fast and efficient operations on arrays and matrices of numerical data versus Python’s built-in data structures. This is because it uses optimized C and Fortran code behind the scenes.
  • Large number of functions for performing mathematical and statistical operations on arrays and matrices.
  • Integrates well with other scientific computing libraries in Python, such as SciPy and pandas.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Definition: list comprehension in Python

A

Shorter syntax when creating a new list based on the values of an existing list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Syntax: Python list comprehension

A
new_list = [expression for item in iterable if condition]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Definition: dict comprehension in Python

A

Concise way of creating dictionaries in Python

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Syntax: Python dict comprehension

A
{key: value for item in iterable}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Definition: global variable in Python

A

A variable that is defined outside of any function or class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Definition: local variable in Python

A

A variable that is defined inside a function or class

50
Q

Where can a Python global variable be accessed?

A

Can be accessed from anywhere in the program

51
Q

Where can a Python local variable be accessed?

A

Can only be accessed within the function or class in which it is defined

52
Q

What happens inside a Python function if you have a local variable and global variable with the same name?

A

The local variable will take precedence over the global variable within the function or class in which it is defined

53
Q

What will this code output?

# Adding a long comment so that it left-aligns the text
x = 10

def func():
  x = 5
  print(x)

func()
print(x)
A

5
10

54
Q

Definition: Python ordered dictionary

A

Subclass of Python dictionary class that maintains the order of elements in which they were added

55
Q

Python ordered dictionary class name

A

OrderedDict

56
Q

How do Python ordered dictionaries maintain the order of elements in the dictionary?

A

A doubly linked list

57
Q

What do return and yield in Python have in common?

A

Both are keywords used to send values back from a function

58
Q

What is the functionality of the return keyword in Python?

A

Terminates the function and returns a value to the caller

59
Q

What is the functionality of the yield keyword in Python?

A

Pauses the function’s execution and returns a value to the caller but maintains the function’s state so that it can be resumed later

60
Q

What is the use case of the return keyword in Python?

A

Used in regular functions when you want to compute a single result and return it

61
Q

What is the use case of the yield keyword in Python?

A

Used to create generator functions that produce a sequence of values over time

62
Q

Definition: Python lambda function

A

Small anonymous function that can take any number of arguments but can only have one expression

63
Q

Syntax: Python lambda function

A
lambda arguments : expression
64
Q

What will this code output?

# Adding a long comment so that it left-aligns the text
x = lambda a : a + 10
x(5)
A

15

65
Q

How are Python lambda functions typically used in practice?

A

Often used in combination with higher-order functions, such as map(), filter(), and reduce()

66
Q

What does the assert keyword in Python do?

A

Used to test a condition. If the condition is True, the program continues to execute. If the condition is False, then the program raises an AssertionError exception.

67
Q

What is the assert keyword in Python used for?

A

Used for debugging purposes and is not intended to be used as a way to handle runtime errors

68
Q

For exception handling within production Python code, should you use try-except or assert? Why?

A

try-except
* Allows recovery and custom actions versus termination with AssertionError
* Fully customizable exception messages versus limited to raising AssertionError

69
Q

What are decorators in Python?

A

Used to modify or extend the functionality of a function, method, or class without changing its source code

70
Q

Syntax: Python decorators

A

Adding a long comment so that it left-aligns the text

# Adding a long comment so that it left-aligns the text
@decorator_function
def function_to_be_decorated():
    # Function code here
71
Q

What does this code output?

def my_decorator(func):
    def wrapper():
        print("Something is happening before the function is called.")
        func()
        print("Something is happening after the function is called.")
    return wrapper

@my_decorator
def say_hello():
    print("Hello!")

say_hello()
A

Something is happening before the function is called.
Hello!
Something is happening after the function is called.

72
Q

What is univariate analysis?

A

Used to analyze and describe the characteristics of a single variable

73
Q

Common steps when conducting univariate analysis on a numerical variable

4 bullet points

A
  • Calculate descriptive statistics, such as mean, median, mode, and standard deviation, to summarize the distribution of the data.
  • Visualize the distribution of the data using plots such as histograms, boxplots, or density plots.
  • Check for outliers and anomalies in the data.
  • Check for normality in the data using statistical tests or visualizations such as a Q-Q plot.
74
Q

Common steps when conducting univariate analysis on a categorical variable

4 bullet points

A
  • Calculate the frequency of each category in the data.
  • Calculate the percentage of each category in the data.
  • Visalize the distribution of the data using plots such as bar and pie charts.
  • Check for imbalances or abnormalities in the distribution of the data.
75
Q

Common ways to find outliers in a data set

3 bullet points

A
  • Visual Inspection: Identification via visual inspection of data using plots such as histograms, scatterplots, or boxplots.
  • Summary Statistics: Identification via calculating summary statistics, such as mean, median, or interquartile range. For example, if the mean is significantly different from the median, it could indicate the presence of outliers.
  • Z-Score: z-score measures how many standard deviations a given data point is from the mean. Data points with a z-score > threshold (e.g., 3 or 4) may be considered outliers.
76
Q

What are common methods to handle the missing values in a data set?

5 main points

A
  1. Drop rows
  2. Drop columns
  3. Imputation with mean or median
  4. Imputation with mode
  5. Imputation with a predictive model
77
Q

Drop rows

Common methods to handle the missing values in a data set

Explanation + Pro/Con

A

Drop rows with null values
* Pro: Simple and fast
* Con: Can signicantly reduce sample size and impact the statistical power of the analysis

78
Q

Drop columns

Common methods to handle the missing values in a data set

Explanation + Pro/Con

A

Drop columns with null values
* Pro: Can be a good option if many values are missing from column or column is irrelevant
* Con: Can result in omitted variable bias

79
Q

Imputation with mean or median

Common methods to handle the missing values in a data set

Explanation + Pro/Con

A

Replace null values with the mean or median of the non-null values in the column
* Pro: Good option if the data are missing at random and mean/median is a reasonable representation of the data
* Con: Introduces bias if the data are not missing at random

80
Q

Imputation with mode

Common methods to handle the missing values in a data set

Explanation + Pro/Con

A

Replace null values with the mode of the non-null values in the column
* Pro: Good option for categorical data where mode is a reasonable representation of the data
* Con: Introduces bias if the data are not missing at random

81
Q

Imputation with a predictive model

Common methods to handle the missing values in a data set

Explanation + Pro/Con

A

Use a predictive model to estimate the missing values based on other available data
* Pro: Can be more accurate/less biased if the data are not missing at random and there is a strong relationship between the missing values and other data
* Con: More complex/time-consuming

82
Q

Definition: skewness

A

Measure of asymmetry or distortion of symmetric distribution. A distribution is skewed if it is not symmetrical, with more data points concentrated on one side of the mean than the other.

83
Q

What are the different types of skewness?

A
  • Positive skewness
  • Negative skewness
84
Q

Positive skewness

Different types of skewness

3 bullet points

A
  • Long tail on the right side
  • Majority of data points concentrated on the left side of the mean
  • A few extreme values on the right side of the distribution that are pulling the mean to the right
85
Q

Negative skewness

Different types of skewness

3 bullet points

A
  • Long tail on the left side
  • Majority of data points concentrated on the right side of the mean
  • A few extreme values on the left side of the distribution that are pulling the mean to the left
86
Q

What are the three main measures of central tendency?

A
  1. Mean
  2. Median
  3. Mode
87
Q

Mean

Three main measures of central tendency

3 bullet points

A
  • Arithmetic average of a dataset
  • Calculated by adding all the values in the dataset and dividing by the number of values
  • Sensitive to outliers
88
Q

Median

Three main measures of central tendency

3 bullet points

A
  • Middle value of the dataset when the values are arranged in order from smallest to largest
  • Arrange the values in order and find the middle value. If there is an odd number of values, the median is the middle value. If there is an even number of values, the median is the mean of the two middle values.
  • Not sensitive to outliers
89
Q

Mode

Three main measures of central tendency

3 bullet points

A
  • Value that occurs most frequently in a dataset
  • May have multiple modes or no modes at all
  • Not sensitive to outliers
90
Q

Definition: descriptive statistics

A

Used to summarize and describe a dataset by using measures of central tendency (mean, median, mode) and measures of spread (standard deviation, variance, range)

91
Q

Definition: inferential statistics

A

Used to make inferences about a population based on sample data using statitical models, hypothesis testing, and estimation

92
Q

What are the four key elements of an EDA report

A
  1. Univariate analysis
  2. Bivariate analysis
  3. Missing data analysis
  4. Data visualization
93
Q

Univariate analysis

Four key elements of an EDA report

How does it contribute to understanding a dataset?

A

Helps understand the distribution of individual variables

94
Q

Bivariate analysis

Four key elements of an EDA report

How does it contribute to understanding a dataset?

A

Helps understand the relationship between variables

95
Q

Missing data analysis

Four key elements of an EDA report

How does it contribute to understanding a dataset?

A

Helps understand the quality of the data

96
Q

Data visualization

Four key elements of an EDA report

How does it contribute to understanding a dataset?

A

Provides a visual interpretation of the data

97
Q

Definition: central limit theorem

2 bullet points

A
  • As sample size increases, the distribution of the sample mean will approach a normal distribution
  • True regardless of the underlying distribution from which the sample is drawn
98
Q

What is the benefit of the central limit theorem?

A

Even if the individual data points in a sample are not normally distributed, we can use normal distribution-based methods to make inferences about the population by taking the average of a large enough number of data points

99
Q

Two main types of target variables for predictive modeling

A
  • Numeric variable
  • Categorical variable
100
Q

Numeric variable

Main types of target variables for predictive modeling

2 bullet points

A
  • Quantifiable characteristic whose values are numbers
  • May be continuous or discrete
101
Q

Categorical variable

Main types of target variables for predictive modeling

A
  • Values can take on one of a limited, usually fixed, number of possible values
102
Q

Definition: binary categorical variable

A

Categorical variable that can take on exactly two values

103
Q

Definition: polytomous categorical variable

A

Categorical variable with more than two possible values

104
Q

When will the mean, median, and mode be the same for a given dataset?

A

Symmetric unimodal distribution: symmetrically distributed with a single peak

105
Q

Definition: model variance

A

Error from sensitivity to small fluctuations in training data

106
Q

Definition: model bias

A

Error from overly simplistic assumptions (e.g., data is linear when it’s not, omitted variable bias)

107
Q

What will be the result of a model with low bias and high variance?

A

Overfitting: model will be to sensitive to noise and random fluctuations in the data, failing to generalize well to new data

108
Q

What will be the result of a model with high bias and low variance?

A

Underfitting: model will miss important relationships in the data

109
Q

What are the types of errors in hypothesis testing?

A
  • Type I error
  • Type II error
110
Q

Type I error

Types of errors in hypothesis testing

4 bullet points

A
  • False positive
  • Null hypothesis is true but is rejected
  • Denoted by the Greek letter α
  • Usually set at a level of 0.05, meaning there is a 5% chance of making a Type I error
111
Q

Type II error

Types of errors in hypothesis testing

4 bullet points

A
  • False negative
  • Null hypothesis is false but is not rejected
  • Denoted by the Greek letter β
  • Often represented as 1 - β, or the power of the test. The power of the test is the probability of correctly rejecting the null hypothesis when it is false.
112
Q

Definition: confidence interval

A

Range of values expected to contain the true population parameter with a specific level of confidence

113
Q

What is the most common confidence interval?

A

95%

114
Q

What is the primary difference between correlation and covariance?

A
  • Correlation is the normalized version of covariance, meaning correlation adjusts for the scales of the variables
115
Q

Definition: correlation

A

Strength and direction of a linear relationship between two variables

116
Q

Equation: correlation

A
117
Q

What is the range and meaning of different values of correlation?

A

-1 and 1
* +1: Perfect positive linear relationship
* -1: Perfect negative linear relationship
* 0: No linear relationship

118
Q

What are the units of correlation?

A

Unitless

119
Q

Definition: covariance

A

Measures the degree to which two random variables change together. Indicates the direction of the linear relationship between variables

120
Q

Equation: covariance

A
121
Q

What is the range and meaning of different values of correlation?

A

-1 and 1
* +1: Perfect positive linear relationship
* -1: Perfect negative linear relationship
* 0: No linear relationship