Priority 5 Flashcards

Question

What is the use of the `continue` keyword in Python?

Answer 1

`continue` is used in a loop to skip over the current iteration and move on to the next one.

Answer 2

Object whose state cannot be modified after it is created.

Answer 3

Object whose state can be modified after it is created.

Answer 4

* Numbers: `int`, `float`, `complex` * `bool` * `str` * Tuples

Answer 5

* Lists * Dictionaries * Sets

Answer 6

Old value gets garbage-collected, freeing up the memory assigned to stroing the object

Answer 7

* Variables are names that refer to or hold references to concrete objects. * Objects are concrete pieces of information that live in specific memory positions on computer.

Answer 8

No. Tuples are immutable. You would have to create a new sorted tuple from the original tuple.

Answer 9

Exception handling

Answer 10

Contains code that might cause an exception to be raised.

Answer 11

Contains code that is executed if an exception is raised during the execution of a `try` block.

Answer 12

* Both blocks of code that perform a specific task. * Both can take input parameters and return a value. * Both defined using the `def` keyword.

Answer 13

* Functions are defined outside of classes; methods are functions that are associated with a specific object or class. * Functions can be called on a standalone basis; methods are called using the dot notation on an object of a class. * Functions perform general tasks; methods perform actions specific to the object they belong to. * Parameters are optional for functions; for methods, the first parameter is usually `self`, which refers to the instance of the class.

Answer 14

1. Code reuse 2. Improved readability 3. Easier testing 4. Improved performance

Answer 15

Allow you to reuse code by encapsulating it in a single place and calling it multiple times from different parts of your program. Reduces redundancy, making code more concise and easier to maintain.

Answer 16

Functions make your code more readable and easier to understand by dividing your code into logical blocks. This makes it easier to identify bugs and make changes.

Answer 17

Functions allow you to test individual blocks of code separately, which can make it easier to find and fix bugs.

Answer 18

Functions allow you to use optimized code libraries and/or allow the Python interpreter to optimize the code more effectively.

Answer 19

* Fast and efficient operations on arrays and matrices of numerical data versus Python's built-in data structures. This is because it uses optimized C and Fortran code behind the scenes. * Large number of functions for performing mathematical and statistical operations on arrays and matrices. * Integrates well with other scientific computing libraries in Python, such as SciPy and pandas.

Answer 20

Shorter syntax when creating a new list based on the values of an existing list.

Answer 21

``` new_list = [expression for item in iterable if condition] ```

Answer 22

Concise way of creating dictionaries in Python

Answer 23

``` {key: value for item in iterable} ```

Answer 24

A variable that is defined outside of any function or class

Answer 25

A variable that is defined inside a function or class

Answer 26

Can be accessed from anywhere in the program

Answer 27

Can only be accessed within the function or class in which it is defined

Answer 28

The local variable will take precedence over the global variable within the function or class in which it is defined

Answer 29

Subclass of Python dictionary class that maintains the order of elements in which they were added

Answer 30

`OrderedDict`

Answer 31

A doubly linked list

Answer 32

Both are keywords used to send values back from a function

Answer 33

Terminates the function and returns a value to the caller

Answer 34

Pauses the function's execution and returns a value to the caller but maintains the function's state so that it can be resumed later

Answer 35

Used in regular functions when you want to compute a single result and return it

Answer 36

Used to create generator functions that produce a sequence of values over time

Answer 37

Small anonymous function that can take any number of arguments but can only have one expression

Answer 38

``` lambda arguments : expression ```

Answer 39

Often used in combination with higher-order functions, such as `map()`, `filter()`, and `reduce()`

Answer 40

Used to test a condition. If the condition is `True`, the program continues to execute. If the condition is `False`, then the program raises an `AssertionError` exception.

Answer 41

Used for debugging purposes and is not intended to be used as a way to handle runtime errors

Answer 42

`try-except` * Allows recovery and custom actions versus termination with `AssertionError` * Fully customizable exception messages versus limited to raising `AssertionError`

Answer 43

Used to modify or extend the functionality of a function, method, or class without changing its source code

Answer 44

# Adding a long comment so that it left-aligns the text ``` # Adding a long comment so that it left-aligns the text @decorator_function def function_to_be_decorated(): # Function code here ```

Answer 45

Something is happening before the function is called. Hello! Something is happening after the function is called.

Answer 46

Used to analyze and describe the characteristics of a single variable

Answer 47

* Calculate descriptive statistics, such as mean, median, mode, and standard deviation, to summarize the distribution of the data. * Visualize the distribution of the data using plots such as histograms, boxplots, or density plots. * Check for outliers and anomalies in the data. * Check for normality in the data using statistical tests or visualizations such as a Q-Q plot.

Answer 48

* Calculate the frequency of each category in the data. * Calculate the percentage of each category in the data. * Visalize the distribution of the data using plots such as bar and pie charts. * Check for imbalances or abnormalities in the distribution of the data.

Answer 49

* **Visual Inspection:** Identification via visual inspection of data using plots such as histograms, scatterplots, or boxplots. * **Summary Statistics:** Identification via calculating summary statistics, such as mean, median, or interquartile range. For example, if the mean is significantly different from the median, it could indicate the presence of outliers. * **Z-Score:** z-score measures how many standard deviations a given data point is from the mean. Data points with a z-score > threshold (e.g., 3 or 4) may be considered outliers.

Answer 50

1. Drop rows 2. Drop columns 3. Imputation with mean or median 4. Imputation with mode 5. Imputation with a predictive model

Answer 51

Drop rows with null values * Pro: Simple and fast * Con: Can signicantly reduce sample size and impact the statistical power of the analysis

Answer 52

Drop columns with null values * Pro: Can be a good option if many values are missing from column or column is irrelevant * Con: Can result in omitted variable bias

Answer 53

Replace null values with the mean or median of the non-null values in the column * Pro: Good option if the data are missing at random and mean/median is a reasonable representation of the data * Con: Introduces bias if the data are not missing at random

Answer 54

Replace null values with the mode of the non-null values in the column * Pro: Good option for categorical data where mode is a reasonable representation of the data * Con: Introduces bias if the data are not missing at random

Answer 55

Use a predictive model to estimate the missing values based on other available data * Pro: Can be more accurate/less biased if the data are not missing at random and there is a strong relationship between the missing values and other data * Con: More complex/time-consuming

Answer 56

Measure of asymmetry or distortion of symmetric distribution. A distribution is skewed if it is not symmetrical, with more data points concentrated on one side of the mean than the other.

Answer 57

* Positive skewness * Negative skewness

Answer 58

* Long tail on the right side * Majority of data points concentrated on the left side of the mean * A few extreme values on the right side of the distribution that are pulling the mean to the right

Answer 59

* Long tail on the left side * Majority of data points concentrated on the right side of the mean * A few extreme values on the left side of the distribution that are pulling the mean to the left

Answer 60

1. Mean 2. Median 3. Mode

Answer 61

* Arithmetic average of a dataset * Calculated by adding all the values in the dataset and dividing by the number of values * Sensitive to outliers

Answer 62

* Middle value of the dataset when the values are arranged in order from smallest to largest * Arrange the values in order and find the middle value. If there is an odd number of values, the median is the middle value. If there is an even number of values, the median is the mean of the two middle values. * Not sensitive to outliers

Answer 63

* Value that occurs most frequently in a dataset * May have multiple modes or no modes at all * Not sensitive to outliers

Answer 64

Used to summarize and describe a dataset by using measures of central tendency (mean, median, mode) and measures of spread (standard deviation, variance, range)

Answer 65

Used to make inferences about a population based on sample data using statitical models, hypothesis testing, and estimation

Answer 66

1. Univariate analysis 2. Bivariate analysis 3. Missing data analysis 4. Data visualization

Answer 67

Helps understand the distribution of individual variables

Answer 68

Helps understand the relationship between variables

Answer 69

Helps understand the quality of the data

Answer 70

Provides a visual interpretation of the data

Answer 71

* As sample size increases, the distribution of the sample mean will approach a normal distribution * True regardless of the underlying distribution from which the sample is drawn

Answer 72

Even if the individual data points in a sample are not normally distributed, we can use normal distribution-based methods to make inferences about the population by taking the average of a large enough number of data points

Answer 73

* Numeric variable * Categorical variable

Answer 74

* Quantifiable characteristic whose values are numbers * May be continuous or discrete

Answer 75

* Values can take on one of a limited, usually fixed, number of possible values

Answer 76

Categorical variable that can take on exactly two values

Answer 77

Categorical variable with more than two possible values

Answer 78

Symmetric unimodal distribution: symmetrically distributed with a single peak

Answer 79

Error from sensitivity to small fluctuations in training data

Answer 80

Error from overly simplistic assumptions (e.g., data is linear when it's not, omitted variable bias)

Answer 81

Overfitting: model will be to sensitive to noise and random fluctuations in the data, failing to generalize well to new data

Answer 82

Underfitting: model will miss important relationships in the data

Answer 83

* Type I error * Type II error

Answer 84

* False positive * Null hypothesis is true but is rejected * Denoted by the Greek letter α * Usually set at a level of 0.05, meaning there is a 5% chance of making a Type I error

Answer 85

* False negative * Null hypothesis is false but is not rejected * Denoted by the Greek letter β * Often represented as 1 - β, or the power of the test. The power of the test is the probability of correctly rejecting the null hypothesis when it is false.

Answer 86

Range of values expected to contain the true population parameter with a specific level of confidence

Answer 87

* Correlation is the normalized version of covariance, meaning correlation adjusts for the scales of the variables

Answer 88

Strength and direction of a linear relationship between two variables

Answer 89

-1 and 1 * +1: Perfect positive linear relationship * -1: Perfect negative linear relationship * 0: No linear relationship

Answer 90

Measures the degree to which two random variables change together. Indicates the direction of the linear relationship between variables

Answer 91

Any value, positive, negative, or zero * Positive: When X increases, Y tends to increase * Negative: When X increases, Y tends to decrease * 0: No linear relationship

Answer 92

Product of the units of the two variables

Answer 93

Statistical method to determine whether there is enough evidence in a sample of data to support or reject a stated assumption (hypothesis) about a population

Answer 94

* Can make decisions based on statistical evidence, rather than relying on assumptions or opinions. * Formal, standardized approach, making results interpretable and reproducible. * Allows for clear and credible communication of findings.

Answer 95

* A/B Testing: Evaluate if new feature, design, or change has a significant impact * Feature Selection: Test the significance of variables in statistical or ML models * Model Significance: Assess the significance of a predictive model

Answer 96

Tabular format used to display the frequencies (counts) of data points across two or more categorical variables

Answer 97

Statistical test used to determine whether there is a significant association between two categorical variables in a contingency table

Answer 98

* Null Hypothesis (*H_0*): The two variables are independent (no association). * Alternative Hypothesis (*H_a*): The two variables are not independent (there is an association).

Answer 99

1. Calculate the chi-square statistic 2. Compare the computed *Χ^2* statistic to a critical value from the chi-square distribution table (based on df and significance level, e.g., 0.5) 3. If *Χ^2* exceeds the critical value, reject the null hypothesis

Answer 100

Probability, assuming the null hypothesis is true, of obtaining a test statistic as extreme as or more extreme than the one observed

Answer 101

Significance level, or a predetermined threshold representing the maximum acceptable probability of making a Type I error (i.e., rejecting a true null hypothesis). Criterion against which the p-value is compared to decide whether to reject the null hypothesis

Answer 102

1. Simple random sampling 2. Stratefied random sampling 3. Cluster sampling 4. Systematic sampling

Answer 103

Each member of the population has an equal chance of being selected for the sample

Answer 104

Involves dividing the population into subgroups (or strata) based on certain characteristics and selecting a random sample from each stratum

Answer 105

Involves dividing the population into smaller groups (or clusters) and then selecting a random sample of clusters

Answer 106

Involves selecting every kth member of the population to be included in the sample

Priority 5 Flashcards

(141 cards)