Week 2 Flashcards
Key components to a statistical investigation
Planning the study, examining the data, inferring the data, drawing conclusions, distributional thinking
Examining the data
The most fundamental principle of statistics is that data may vary. The pattern of that variation is crucial to capture and understand. Values of a variable vary. Analyzing the pattern of variation, called the distribution of the variable, often reveals insights
Statistical significance
Even when patterns in data are found, there is often still uncertainty in various aspects of the data. There may be potential for measurement errors (even body temperature can fluctuate by almost 1 degree over the course of the day) We may only have a “snapshot” of observations from a more long-term process or only a small subset of individuals from the population of interest
P-value
The probability of observing a particular outcome in a sample, or more extreme, under a conjecture about the larger population or process. Tells you how often a random process would give a result at least as extreme as what was found in the actual study, assuming there was nothing other than random chance at play
Level of significance
A result is statistically significant if it is unlikely to arise by chance alone. If the p-value is smaller than the cut-off value, then we reject the hypothesis that only random chance was at play
sample
The collection of individuals on which we collect data. Sample from a larger group of individuals (the population) in such a way that conclusions from the sample can be generalized to the larger population
Generalized
Related to whether the results from the sample can be generalized to a larger population
population
A larger collection of individuals that we would like to generalize our results to
Random sample
using a probability-based method to select a subset of individuals for the sample from the population. Involves numbering every member of the population and then using a computer to randomly select the subject to be surveyed
Margin or error
The expected amount of random variation in a statistic; often defined for 95% confidence level
Non-random samples
Often suspect to bias, meaning the sampling method systemically over-represents some segments of the population and under-represents others. Consider other sources of bias, such as individuals not responding honestly
cause and effect
Related to whether we say one variable is casing changes in the other variable, versus other variables that may be related to these two variables
Randomly assigning
using a probability-based method to divide a sample into treatment groups. Apply the probability model to approximate as a p-value, but this time the models will be a bit different
Operational definitions
How researchers specifically measure a concept
Independent variable
The variable the researcher manipulates ad controls in an experiment
Dependent variable
The variable the researcher measures but does not manipulate in an experiment
Random assignment
Using a probability-based method to divide a sample into treatment groups. Critical to experimentation because if the only difference between the two groups is the independent variable, we can infer that the independent variable is the cause of any observable difference between the two groups
confounds
Things that could undermine your ability to draw casual inferences
How to prevent confounds
using a double-blind procedure
double blind procedure
neither the participant nor the experimenter knows which condition the participant is in
correlational designs
when scientists passively observe and measure phenomena. We do not intervene and change behaviour as we do in experiments. Identify patterns of a relationship, but usually cannot infer what causes what