Exam 1 Flashcards
Population
All subjects being studied in a given area (ex. all people with schizophrenia in the United States)
Sample
Data from a subset of the population (sample size = n)
Biased Sample
The biased sample systematically overestimates or systematically underestimates a characteristic of the population
Random Sample
Every member of the population has the same chance of being included in the sample and the members of the sample are chosen independently from one another (meaning that the chance of a given number of the population being chosen does not depend on which other members are chosen
Sampling Error
Chance effects causing discrepancy between sample and population.
Sampling Bias
Systematic tendency for some individuals of the population to be selected more readily than others
Nonresponse bias
Bias caused by persons not responding to some of the questions in a survey or not returning a written survey
Systematic Sampling
Samples next to one another will never be picked (every nth term will be included in sample)
Stratified Random
Each subpopulation (stratum) is made up of a more homogenous collection of subjects (i.e. sample of state of New York- out of a larger whole)
Subsample size
Number of population size of each stratum
Statistical Inference
Making conclusions about a population based on a sample
Dotplot
A simple graph that can be used to show the distribution of a numeric variable when the sample size is small
Histogram
A bar graph version of a dot plot
Skewed to the right
Values are more concentrated to the left side in the distribution, so the right side has a longer tail
Population Mean
Population Standard Deviation
mu = population mean sigma = population standard deviation
Response vs. Explanatory Variable
Response Variable: focus of a question in a study or experiment
Explanatory Variable: Explains changes in that variable
Qualitative vs. Quantitative Variables
Quantitative: measures that can be written by numbers, discrete and continuous
Qualitative: exploratory research on underlying opinions, motivations. Nominal and Ordinal
What is the difference between a statistic and a parameter?
Statistic: a numerical measure that describes a sample
Parameter: a numerical measure that describes a population
What is noise in statistical terms?
A term for recognized amounts of unexplained variation in a sample.
Efficiency
Takes full advantage of all the information in the data, so fewer observations or data points are needed to get the same performance
What are 4 ways to measure the spread of your data?
Range, Average Deviation, Variance, and Standard Deviation
What is a significant result?
The likelihood that a relationship between two or more variables is caused by something other than random chance. Statistical hypothesis testing is used to determine whether the result of a data set is statistically significant.
(A) U (B) - Union or intersection?
Union = one or the other or both events occur Intersection = event that both occurred
Type I Error
We reject the null hypothesis when HA is true. Claim that data provide evidence that significantly supports HA when H0 is true.
Type I Error
We reject the null hypothesis when HA is true (finding significant evidence for HA when H0 is true). If HA is true but we do not observe sufficient evidence to support HA.
Type II Error
The null is true when its actually not true (not finding significant evidence for HA when HA is true)
P-Value of a hypothesis test
Probability, under the condition that H0 is true, of the test statistic being at least as extreme as the of the observed value. (measure of compatability between the data and H0)
Random variable
Takes on numerical values that depend on the outcome of a chance operation
QQ Plot
We are comparing our sample values to theoretical values (aka comparing our distribution to the normal distribution to find the relationship) to see how the values fit to see how normal our data is
Sampling Variability
Variability among random samples from the same population
Sampling Distribution
A probability distribution that characterizes some aspect of sampling variability
Central Limit Theorem
No matter what distribution Y may have in the population, if the sample size is large enough, then the sampling distribution of Y will be approximately a normal distribution
Beta
The probability of committing a Type II Error
Power
1 - Beta
Power = 1 - Pr(accepting H0 when it is false)
A test is more powerful when it correctly rejects the null when there is an alternate hypothesis available
ts (test statistic)
It is a measure of how far the difference between the sample means (y bar) is from the difference we would expect to see if H0 were true expressed in relation to the SE of the difference - the amount of variation we expect to see in differences of means from random samples.
P-value
is the area under students t curve in the double tails beyond -ts and +ts
Why do we divide the variance by n-1 instead of n?
We want average of the sample variances for all possible sets of samples to equal the population variance
Using n – 1 as denominator produces an unbiased estimate of σ2