Review Flashcards

Question 1

Q

Discrete metrics

Answer

A

Discrete metrics are metrics that count distinct, individual events or items. These metrics do not deal with fractional or continuous values.

Ex:
On-time delivery (C): This metric counts whether a delivery was on time or not (yes or no), which is discrete because it is a countable event that can only be categorized as “on-time” or “not on-time.”

Question 2

Q

Continuous metrics

Answer

A

Continuous metrics take on any value within a range and are not limited to whole numbers.

Ex:
delivery time and package weight, which are continuous because they can take any value within a range (e.g., 1.5 hours, 10.2 kg).

Question 3

Q

Categorical (Nominal) Data

Answer

A

Data that is divided into categories that are mutually exclusive, meaning items can only belong to one category, and the categories do not have any order or ranking.

Key Characteristics: No quantitative value, no inherent order.

Ex:
Customer region: North America, Europe, Asia
Favorite fruit: Apple, Banana, Orange

Question 4

Q

Ordinal Data

Answer

A

Data that can be ranked or ordered based on a relative scale, but the differences between data points are not necessarily equal. There is no measurable distance between categories.

Key Characteristics: Order matters, but the intervals between data points are not consistent or measurable.

Ex:
Customer satisfaction rating: Poor, Average, Good, Excellent
Education level: High school, Bachelor’s degree, Master’s degree, Doctorate

Question 5

Q

Interval Data

Answer

A

Data where the differences between values are consistent and measurable, but there is no true zero point. Ratios between values are not meaningful.

Key Characteristics: Equal intervals between data points, but no true zero point (so ratios like “twice as much” don’t make sense).

Ex:
Temperature: 20°C, 30°C, 40°C (the difference between 20°C and 30°C is the same as between 30°C and 40°C, but 0°C does not mean “no temperature”)
Time of day: 2:00 PM, 4:00 PM, 6:00 PM (the intervals between times are consistent, but time does not have a true zero point)

Question 6

Q

Ratio Data

Answer

A

Data that has a meaningful zero point, and both differences and ratios between data points are meaningful. This type of data allows for all arithmetic operations, including calculating ratios.

Key Characteristics: Equal intervals between data points, true zero point (which represents the absence of the variable).

Ex:
Height: 150 cm, 180 cm (height has a true zero, and you can say someone is twice as tall as someone else)
Salary: $30,000, $60,000 (you can say one person earns twice as much as another, and $0 represents no salary)

Question 7

Q

Hierarchy or data

Answer

A

Nominal, ordinal, interval, ratio

Question 8

Q

Conditional Probability

Answer

A

Conditional probability is the probability of an event A occurring given that another event B has already occurred. This is denoted as P(A | B)

Question 9

Q

Joint Probability

Answer

A

P(A ∩ B) = P(A) × P(B)

refers to the probability of two (or more) events occurring at the same time or in conjunction with one another. It is the likelihood that both events happen simultaneously.

Question 10

Q

Union Probability

Answer

A

P(A ∪ B) = P(A) + P(B)

refers to the probability that at least one of two (or more) events will occur. In other words, it’s the probability that either event A, event B, or both events occur.

Question 11

Q

Normalization Methods

Answer

A

Z-score normalization
Min-Max normalization
Normalization by decimal scaling

Question 12

Q

Z-score Normalization

Answer

A

is a technique used to rescale data so that it has a mean of 0 and a standard deviation of 1. This process helps in comparing values from different scales by transforming the data into a standard format where it can be interpreted in terms of standard deviations from the mean.

Uses:
- Standardizing Data
- Outlier Detection - Extreme z-scores (x < -3, x > 3)
- Handling Standardized Data - When working with data, it’s common to encounter features (or variables) that are measured on different scales. For instance, one feature may represent age in years (which might range from 0 to 100), while another might represent income in dollars (which could range from 10,000 to 1,000,000). If features are on different scales, the algorithm might be biased toward the feature with the larger numerical range, leading to suboptimal performance.

Question 13

Q

Median

Answer

A

Middle value of data set

Question 14

Q

Mode

Answer

A

Most occurring value (can be multiple)

Question 15

Q

Mean

Answer

A

Average of numbers

Question 16

Q

Point estimate

Answer

A

a single value used to estimate a population parameter, such as the sample mean (𝑥bar)
x as an estimate of the population mean (𝜇bar)

Doesnt give any indication of its precision or reliability

Question 17

Q

Confidence Interval

Answer

A

A confidence interval surrounds the point estimate with a margin of error, creating a range within which we believe the true population parameter lies.

A range for a population characteristic based on a sample

Question 18

Q

Confidence Level

Answer

A

the probability that the confidence interval contains the true population parameter.

a 95% confidence interval means that if we were to take 100 random samples and calculate a confidence interval for each, about 95 of those intervals would contain the true population mean.

Question 19

Q

Types of probability

Answer

A

Joint
Conditional
Marginal
Empirical
Union
Mutually Exclusive
Independent
Bayesian
Complementary

Question 20

Q

Which are measures of central tendencies

Answer

A

Mean
Median
Mode

Question 21

Q

Collective outliers

Answer

A

Group of data points that, when considered together, deviate significantly from the overall pattern of the dataset. However, when you look at these data points individually, they may not appear unusual or extreme. It’s the collective behavior of the group that makes them stand out as outliers.

Question 22

Q