Quiz 2 Flashcards

1
Q

What is Cluster Analysis?

A

Segments observations into similar groups based on observed variables.

Used in market segmentation and identifying outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What type of machine learning does Cluster Analysis fall under?

A

Unsupervised Machine Learning

There is no dependent variable to predict.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some uses of Cluster Analysis?

A
  • Market segmentation
  • Identifying outliers

Applications include fraud detection and anomalies in data sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Hierarchical Clustering?

A

A clustering method that starts with each observation in its own cluster and merges similar clusters iteratively.

Forms a dendrogram.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the methods for measuring similarity in Hierarchical Clustering?

A
  • Single Linkage
  • Complete Linkage
  • Group Average Linkage
  • Median Linkage
  • Centroid Linkage

Each method uses different approaches to measure cluster similarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is K-Means Clustering?

A

A clustering method that requires predefining k clusters and iteratively assigns observations to these clusters.

It includes initialization, update, and assignment steps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between Hierarchical Clustering and K-Means Clustering?

A
  • Hierarchical: Better for small datasets (≤500 obs.), forms a dendrogram
  • K-Means: Better for large datasets (>500 obs.), predefined number of clusters

K-Means creates distinct clusters while Hierarchical captures nested clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Euclidean Distance?

A

Measures straight-line distance between points.

It is affected by scale differences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the solution to the scale differences issue in Euclidean Distance?

A

Use z-scores for standardization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Manhattan Distance?

A

Measures grid-based distance, like navigating city blocks.

More robust to outliers compared to Euclidean Distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are Categorical Data Similarity Measures?

A
  • Matching Coefficient: Counts total matches between two binary variables
  • Jaccard’s Coefficient: Ignores matching zero entries

Jaccard’s is more effective for categorical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What factors influence the choice between Hierarchical and K-Means Clustering?

A

Dataset size, cluster relationships, and computational resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Probability?

A

A numerical measure of the likelihood of an event occurring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a Random Experiment?

A

A process that generates uncertain outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define Sample Space.

A

The set of all possible outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a Random Variable?

A

a variable whose value is unknown or a function that assigns values to each of an experiment’s outcomeA numerical representation of an experiment’s outcome.

17
Q

What are the types of Random Variables?

A
  • Discrete Random Variable
  • Continuous Random Variable

Discrete takes specific values; continuous can take any value in an interval.

18
Q

What does a Discrete Probability Distribution describe?

A

Range and likelihood of values for a discrete random variable.

19
Q

What is the formula for Expected Value?

A

Central tendency of a probability distribution.

20
Q

What is Variance?

A

Measures how spread out values are.

21
Q

What characterizes a Discrete Uniform Distribution?

A

All values in the sample space are equally likely.

22
Q

What is the Binomial Distribution used for?

A

Models repeated independent trials with two outcomes.

23
Q

What does the Poisson Distribution model?

A

The number of occurrences in a fixed interval.

24
Q

What distinguishes Continuous Probability Distributions from Discrete?

A

Continuous distributions use Probability Density Functions (PDFs).

25
Q

What is the Normal Distribution?

A

A bell-shaped curve defined by mean (μ) and standard deviation (σ).

26
Q

What is the empirical rule for Normal Distribution?

A
  • 68% of data falls within 1 standard deviation
  • 95% within 2 standard deviations
  • 99.7% within 3 standard deviations.
27
Q

What are some applications of Probability Distributions?

A
  • Market Analysis
  • Finance
  • Operations Management
  • Medical Research

Used for predicting demand fluctuations, modeling stock price movements, etc.

28
Q

What is the Excel function for Binomial Distribution?

A

BINOM.DIST(x, n, p, cumulative).

29
Q

What is the Excel function for Poisson Distribution?

A

POISSON.DIST(x, lambda, cumulative).

30
Q

What is the Excel function for Normal Distribution?

A

NORM.DIST(x, mean, std_dev, cumulative).

31
Q

What is an example use of a Discrete Uniform Distribution?

A

Rolling a fair die.

32
Q

What is an example use of a Binomial Distribution?

A

Success/failure in repeated trials (e.g., pass/fail test results).

33
Q

What is an example use of a Poisson Distribution?

A

Number of calls per hour.

34
Q

What is an example use of a Uniform Distribution?

A

Randomly generated wait times.

35
Q

What is an example use of a Normal Distribution?

A

Heights, test scores, stock returns.