Midterm 2 Flashcards
Know how sample distribution is related to the population distribution
Population Distribution: Frequency distribution of all elements in a population, described by parameters such as population mean (μ) and standard deviation (σ).
Sample Distribution: Distribution derived from a sample, described by sample statistics such as sample mean (̄x) and standard deviation (s).
Parameters (μ, σ) describe the population, while statistics (̄x, s) describe the sample.
Know the difference between statistics (e.g., sample mean , sample variance , sample correlation ) and parameters (e.g., population mean , population variance , population correlation ). (Lesson 6-1 slides, pp.14-19)
Population distribution - frequency distribution
of all elements (people) of the population
- A smooth line
- Population Mean denoted by the Greek letter μ
- Population Standard Deviation denoted by the Greek
letter σ
- μ and σ are called parameters; they are unknown –
we can only guess about them
Sample distribution - frequency distribution (histogram) of all elements (people) in your sample
It is known exactly once you have your sample
- Not a smooth line
- Sample Mean denoted by 𝑋̅ (XBAR)
- Sample Standard Deviation denoted by s
- 𝑋̅ (XBAR) and s are called statistics; they are known – we use them to guess about the population distribution
Know why use sample distribution to study population distribution in statistics. (Lesson 6-1 slides, pp.11-13)
- Descriptive statistics describes data in a sample
- Inferential statistics uses data from samples and make inferences/generalizations about a population
Know the concept of outliers and how to identify outliers. (Lesson 6-1 slides, pp.34, 39)
- Something unusual or rare
How to identify outliers?
Sorting the data to find (when the data is small)
Using z-scores (only if population distribution is normal)
>+3 or <-3
Graphing the data
Histogram
Scatter plots
Boxplots
What is normal distribution?
The Normal Distribution is a Probability Distribution
An example of population distribution
Which parameters are used to define a normal distribution?
Mean and Standard Deviation tell you shape:
Mean = m, Standard Deviation = s
Know μ and σ can compute probability values
Normal Distribution is denoted as 𝑁(𝜇,𝜎^2 )
What is the shape of normal distribution? (Lesson 6-1 slides, pp.22)
“Bell shaped” and Symmetric
Know how to compute Z score.
What is the distribution of Z score? (Lesson 6-1 slides, pp.30-32)
Intuition: measures how far is an observation from the mean
How many standard deviations is an observation away from the mean
A way of measuring how unusual an observation
What is sampling frame? (Lesson 6-1 slides, pp.44)
Sampling Frame – the source material or device from which the sample
may be drawn
o Working population
o Mailing lists – Database Marketers
o Phone book
What are the three types of sampling errors? (Lesson 6-1 slides, pp.43-45)
- Sampling Frame Error
o A frame error occurs when the wrong sub-population is used to select a sample
o 1936 presidential election between Franklin D. Roosevelt and Alf Landon - Literary Digest favored Landon
- Gallup predicted Roosevelt’s win with small sample
o How to reduce: understand research question before selecting the sample - Random Sampling Error
o The error caused by a particular sample not being representative of the population
of interest due to random variation
o Even randomized samples will have some sampling error since it is only an
approximation of the whole population
o How to reduce: random selection, increase the sample size - Nonresponse Error
o Happens when there is a significant difference between those who responded to the
survey and those who did not
o How to reduce: design a better survey (e.g., funnel approach, projective techniques,
counter-biasing statement, etc.)
What are the commonly used sampling techniques (e.g., 4 probability sampling techniques, 4 nonprobability sampling techniques)?
- Simple Random Sampling
o Everyone gets equal likelihood of being selected
Systematic samplingis a type of probabilitysamplingmethod in whichsamplemembers from a larger population are selected according to a random starting point but with a fixed, periodic interval.
Stratified samplingis a method of sampling from a population which can be partitioned into subpopulations.The strata should define a partition of the population. That is, it should becollectively exhaustiveandmutually exclusive: every element in the population must be assigned to one and only one stratum. It ensures each subgroup within the population receives proper representation within thesample.
Cluster sampling, the total population is divided into clusters and random samples are then collected from each group.
_________________________________________________________
* Convenience Sampling
o Obtaining the people that are most conveniently available
o E.g., mall intercept interviews
* Judgement/Expert Sampling
o Experienced individual selects the sample based on
judgment about appropriate characteristics
o Used when the population of interest is very small or
specific
* Quota Sampling
o Selects sample such that various subgroups are
represented. Similar to Stratified sampling, but non-
probability sampling within each quota
* Snowball
o An initial group of respondents is first selected.
o After being interviewed, these respondents are asked to identify others
who belong to the target population of interest.
o Subsequent respondents are selected based on the referrals.
o Commonly used in network marketing, e.g., who are members of your
network? Who among your friends influence your decision the most.
Given the acceptable margin of error and acceptable confidence level, how to determine the sample size required for the study? (Lesson 6-1 slides, pp.51-58)
Factors: Acceptable margin of error, confidence level, and population variability.
Example formula: For 90% confidence level with known standard deviation, .
Know the 4 steps to transform responses to numeric data, especially how to code responses in open-ended and closed-ended questions. (Lesson 7-1 slides, pp.6-11)
Validation & Editing:
Check for fraud, screening, procedural adherence, and completeness.
Coding:
Closed-ended: Assign numerical values to responses.
Open-ended: Group responses, assign codes, and tag.
Data Capture:
Convert responses into machine-readable format (e.g., SPSS).
Logical Cleaning:
Use software to detect errors and inconsistencies.
Know which type of chart needs to be used under different circumstances. (Lesson 7-1 slides, pp.15-17)
Line Chart: Time trends.
Pie Chart: Proportions.
Bar Chart: Comparisons between groups.
Stacked/Clustered Bar Chart: Multiple group comparisons.
What is sampling distribution?
In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given random-sample-based statistic.
Know central limit theorem (i.e., when sample size n is large enough, according to CLT, sampling distribution follows a normal distribution , where is the population mean, is the population variance, no matter what shape the population distribution is.) (Lesson 8-1 slides, pp.10-13, 15-17)
For large sample sizes, the sampling distribution of the sample mean approaches normality regardless of population shape.
Mean = μ, variance = σ^2/n.
Know the difference between point estimate and interval estimate (i.e., confidence interval).
Point Estimate: Single value estimate (e.g., ̄x).
Interval Estimate: Range of values with a confidence level.
What is the advantage of interval estimate compared to point estimate? (Lesson 8-1 slides, pp.19-22)
Advantage: Accounts for sampling variability.
Know how to construct a confidence interval for population mean at 99%, 95% and 90% confidence level.
Interpretation: “We are X% confident that the interval contains the true mean.”