Quiz 2 Flashcards
What is Cluster Analysis?
Segments observations into similar groups based on observed variables.
Used in market segmentation and identifying outliers.
What type of machine learning does Cluster Analysis fall under?
Unsupervised Machine Learning
There is no dependent variable to predict.
What are some uses of Cluster Analysis?
- Market segmentation
- Identifying outliers
Applications include fraud detection and anomalies in data sets.
What is Hierarchical Clustering?
A clustering method that starts with each observation in its own cluster and merges similar clusters iteratively.
Forms a dendrogram.
What are the methods for measuring similarity in Hierarchical Clustering?
- Single Linkage
- Complete Linkage
- Group Average Linkage
- Median Linkage
- Centroid Linkage
Each method uses different approaches to measure cluster similarity.
What is K-Means Clustering?
A clustering method that requires predefining k clusters and iteratively assigns observations to these clusters.
It includes initialization, update, and assignment steps.
What is the difference between Hierarchical Clustering and K-Means Clustering?
- Hierarchical: Better for small datasets (≤500 obs.), forms a dendrogram
- K-Means: Better for large datasets (>500 obs.), predefined number of clusters
K-Means creates distinct clusters while Hierarchical captures nested clusters.
What is Euclidean Distance?
Measures straight-line distance between points.
It is affected by scale differences.
What is the solution to the scale differences issue in Euclidean Distance?
Use z-scores for standardization.
What is Manhattan Distance?
Measures grid-based distance, like navigating city blocks.
More robust to outliers compared to Euclidean Distance.
What are Categorical Data Similarity Measures?
- Matching Coefficient: Counts total matches between two binary variables
- Jaccard’s Coefficient: Ignores matching zero entries
Jaccard’s is more effective for categorical data.
What factors influence the choice between Hierarchical and K-Means Clustering?
Dataset size, cluster relationships, and computational resources.
What is Probability?
A numerical measure of the likelihood of an event occurring.
What is a Random Experiment?
A process that generates uncertain outcomes.
Define Sample Space.
The set of all possible outcomes.
What is a Random Variable?
a variable whose value is unknown or a function that assigns values to each of an experiment’s outcomeA numerical representation of an experiment’s outcome.
What are the types of Random Variables?
- Discrete Random Variable
- Continuous Random Variable
Discrete takes specific values; continuous can take any value in an interval.
What does a Discrete Probability Distribution describe?
Range and likelihood of values for a discrete random variable.
What is the formula for Expected Value?
Central tendency of a probability distribution.
What is Variance?
Measures how spread out values are.
What characterizes a Discrete Uniform Distribution?
All values in the sample space are equally likely.
What is the Binomial Distribution used for?
Models repeated independent trials with two outcomes.
What does the Poisson Distribution model?
The number of occurrences in a fixed interval.
What distinguishes Continuous Probability Distributions from Discrete?
Continuous distributions use Probability Density Functions (PDFs).
What is the Normal Distribution?
A bell-shaped curve defined by mean (μ) and standard deviation (σ).
What is the empirical rule for Normal Distribution?
- 68% of data falls within 1 standard deviation
- 95% within 2 standard deviations
- 99.7% within 3 standard deviations.
What are some applications of Probability Distributions?
- Market Analysis
- Finance
- Operations Management
- Medical Research
Used for predicting demand fluctuations, modeling stock price movements, etc.
What is the Excel function for Binomial Distribution?
BINOM.DIST(x, n, p, cumulative).
What is the Excel function for Poisson Distribution?
POISSON.DIST(x, lambda, cumulative).
What is the Excel function for Normal Distribution?
NORM.DIST(x, mean, std_dev, cumulative).
What is an example use of a Discrete Uniform Distribution?
Rolling a fair die.
What is an example use of a Binomial Distribution?
Success/failure in repeated trials (e.g., pass/fail test results).
What is an example use of a Poisson Distribution?
Number of calls per hour.
What is an example use of a Uniform Distribution?
Randomly generated wait times.
What is an example use of a Normal Distribution?
Heights, test scores, stock returns.