Test 1 Flashcards
Descriptive Statistics
Graphs and numerical summaries
Inferential Statistics
Draw conclusions
Process of Statistics:
- Present question
- Gather data
- Summarize data
- Draw conclusions
Categorical/Qualitative Variable
Places an individual into one of several groups or categories based on some quality of the individual
Quantitative Variable
Takes numerical values for which arithmetic operations such as adding and averaging make sense
- Usually recorded with a unit of measurement
Discrete Variable
A quantitative variable that only takes on a limited, finite number of values.
- Often when something is counted
- Can be subdivided
Continuous Variable
A quantitative variable that can take on any real numerical value over an interval
- Often when things can be measured
- Decimals
Nominal Variable
A categorical variable in which the categories cannot be ordered
- Independent
- Ex.) Color
Ordinal Variable
A categorical variable in which the categories can be ordered, ranked, or have a relationship to one another
Experiment
A study in which the researcher imposes conditions on the subjects of the study
Observational Study
A study in which the researcher collects data without imposition of specific conditions
Sampling Design
Describes exactly how to choose a sample from a population
Probability Design
A sample chosen by chance
Simple Random Sample (SRS)
A sample of size (n) that consists of (n) amount of individuals from the population, chosen in such a way that:
- Every individual in the population has an equal chance of being selected
- Every subset has the same chance of being selected
How to take an SRS?
- 1) Assign each member of the sampling group a unique numerical label
- 2) Use a random number generator to select individuals
Stratified Samp
- Divide the population into subgroups (strata) that have some common characteristic
- For each stratum, obtain a SRS of size that is proportional to the size of the stratum
- Use all individuals obtained in step 2 as the sample
- Use when groups are homogenous
Cluster Sample
- Divide the population into subgroups (clusters) that share some common characteristic
- Obtain an SRS of the clusters (the group in entirety)
- Use all members of the clusters selected in step 2 as the sample
- Use when groups are heterogeneous
Response Variable (Dependent)
Measure the outcome of the study
- What is changed
Explanatory Variable (Independent)
May explain or influence changes in the response variable
- What is being adjusted in order to measure the outcome
Observations can only reveal what?
Associations
Well-designed experiments can reveal what?
Causations
Lurking Variable
A variable that is not an explanatory or response variable but still may impact the relationship between the explanatory and response variables
- Ex) A scientist is studying the effect of a good diet and exercise on heart rate and blood pressure, however, whether or not the person being studied is a smoker and their stress levels could be lurking factors
Confounding Variables
When the effects of two variables on the response variable cannot be distinguished from each other
Factor(s)
The explanatory variables controlled by the experimenter
Treatment
Any specific experimental condition applied to the subjects
- If an experiment has more than one factor, a treatment is a combination of the specific values of each factor
Designs must do what?
Compare something
- A control and experimental group
Replication
The repetition of an experimental condition (treatment) so that the variability associated with the phenomenon can be estimated
- The # of replicates is the number of experimental units to which the treatment is applied
Randomized Comparative Experiment
An experiment that uses both comparison of two or more treatments and random assignment of subjects to treatments
Statistically Significant
An observed effect so large that it would rarely occur by chance
- Evidence that the result seen in the sample also exists in the population
Block
A group of individuals with some common characteristic thought to have a significant impact on the response
Completely Randomized Designs
All of the individuals are allocated at random among the treatments
- It is not necessary, but often done, to assign the same number of individuals to each treatment
Randomized Complete Block Design
The random assignment of individuals to treatments is carried out separately within each treatment
Matched Pairs Design
Uses a form of blocking to compare just two treatments (often control/trt)
- Pairs of subjects (experimental units) are chosen such that they are as closely matched as possible
Distribution
Tells us what values the variable takes and how often it takes those values
Categorical/Qualitative Graphs
Bar graphs and Pie charts
Single-Variable Quantitative Graphs
- Boxplots
- Histograms
- Dot-plot
- Stem-plot
Two-Variable Quantitative Graphs
- Scatterplots
- Time Series Plots
Right Skew
Graph moves to the left
Left Skew
Graph moves to the right
Center
Described using the mean and median
Spread (Variation)
Described using range, IQR, and variance/standard deviation
Mode
The vale in a dataset that occurs most often
Storing in R
x = c(#, #, #, . . . #)
Mean in R
mean(x)
Median in R
median(x)
Resistance
When a statistic is not sensitive to the influence of a few extreme outliers
Is the median resistant to outliers?
Yes
Is the mean resistant to outliers?
No
Range
The difference between the maximum and minimum values in the dataset
Interquartile Range (IQR)
The range for the center half of the data
- The difference between the third and first quartiles
Sample Variance
Used to find the variation about them mean
- 1/n-1 (SUM (Xi - X)^2
Sample Standard Deviation
Measures the spread about the mean and should only be sued when the mean is chosen as the measure of center
- The square root of the sample variance
Empirical Rule
For a symmetrical, ‘bell-shaped’ distribution, approximately 68%, 95%, and 99.7% of the observations fall within one, two or, three standard deviations respectively on either side of the mean