Review Flashcards

1
Q

Discrete metrics

A

Discrete metrics are metrics that count distinct, individual events or items. These metrics do not deal with fractional or continuous values.

Ex:
On-time delivery (C): This metric counts whether a delivery was on time or not (yes or no), which is discrete because it is a countable event that can only be categorized as “on-time” or “not on-time.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Continuous metrics

A

Continuous metrics take on any value within a range and are not limited to whole numbers.

Ex:
delivery time and package weight, which are continuous because they can take any value within a range (e.g., 1.5 hours, 10.2 kg).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Categorical (Nominal) Data

A

Data that is divided into categories that are mutually exclusive, meaning items can only belong to one category, and the categories do not have any order or ranking.

Key Characteristics: No quantitative value, no inherent order.

Ex:
Customer region: North America, Europe, Asia
Favorite fruit: Apple, Banana, Orange

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Ordinal Data

A

Data that can be ranked or ordered based on a relative scale, but the differences between data points are not necessarily equal. There is no measurable distance between categories.

Key Characteristics: Order matters, but the intervals between data points are not consistent or measurable.

Ex:
Customer satisfaction rating: Poor, Average, Good, Excellent
Education level: High school, Bachelor’s degree, Master’s degree, Doctorate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Interval Data

A

Data where the differences between values are consistent and measurable, but there is no true zero point. Ratios between values are not meaningful.

Key Characteristics: Equal intervals between data points, but no true zero point (so ratios like “twice as much” don’t make sense).

Ex:
Temperature: 20°C, 30°C, 40°C (the difference between 20°C and 30°C is the same as between 30°C and 40°C, but 0°C does not mean “no temperature”)
Time of day: 2:00 PM, 4:00 PM, 6:00 PM (the intervals between times are consistent, but time does not have a true zero point)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Ratio Data

A

Data that has a meaningful zero point, and both differences and ratios between data points are meaningful. This type of data allows for all arithmetic operations, including calculating ratios.

Key Characteristics: Equal intervals between data points, true zero point (which represents the absence of the variable).

Ex:
Height: 150 cm, 180 cm (height has a true zero, and you can say someone is twice as tall as someone else)
Salary: $30,000, $60,000 (you can say one person earns twice as much as another, and $0 represents no salary)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hierarchy or data

A

Nominal, ordinal, interval, ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Conditional Probability

A

Conditional probability is the probability of an event A occurring given that another event B has already occurred. This is denoted as P(A | B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Joint Probability

A

P(A ∩ B) = P(A) × P(B)

refers to the probability of two (or more) events occurring at the same time or in conjunction with one another. It is the likelihood that both events happen simultaneously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Union Probability

A

P(A ∪ B) = P(A) + P(B)

refers to the probability that at least one of two (or more) events will occur. In other words, it’s the probability that either event A, event B, or both events occur.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Normalization Methods

A

Z-score normalization
Min-Max normalization
Normalization by decimal scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Z-score Normalization

A

is a technique used to rescale data so that it has a mean of 0 and a standard deviation of 1. This process helps in comparing values from different scales by transforming the data into a standard format where it can be interpreted in terms of standard deviations from the mean.

Uses:
- Standardizing Data
- Outlier Detection - Extreme z-scores (x < -3, x > 3)
- Handling Standardized Data - When working with data, it’s common to encounter features (or variables) that are measured on different scales. For instance, one feature may represent age in years (which might range from 0 to 100), while another might represent income in dollars (which could range from 10,000 to 1,000,000). If features are on different scales, the algorithm might be biased toward the feature with the larger numerical range, leading to suboptimal performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Median

A

Middle value of data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Mode

A

Most occurring value (can be multiple)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Mean

A

Average of numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Point estimate

A

a single value used to estimate a population parameter, such as the sample mean (𝑥bar)
x as an estimate of the population mean (𝜇bar)

  • Doesnt give any indication of its precision or reliability
17
Q

Confidence Interval

A

A confidence interval surrounds the point estimate with a margin of error, creating a range within which we believe the true population parameter lies.

A range for a population characteristic based on a sample

18
Q

Confidence Level

A

the probability that the confidence interval contains the true population parameter.

a 95% confidence interval means that if we were to take 100 random samples and calculate a confidence interval for each, about 95 of those intervals would contain the true population mean.

19
Q

Types of probability

A

Joint
Conditional
Marginal
Empirical
Union
Mutually Exclusive
Independent
Bayesian
Complementary

20
Q

Which are measures of central tendencies

A

Mean
Median
Mode

21
Q

Collective outliers

A

Group of data points that, when considered together, deviate significantly from the overall pattern of the dataset. However, when you look at these data points individually, they may not appear unusual or extreme. It’s the collective behavior of the group that makes them stand out as outliers.

22
Q
A