Final (stats) Flashcards
Table
used to present many numerical values
Figure
used to show patterns, trends, or relationships
Qualities of a good table
Should be understandable on its own
Includes appropriate title in proper location
Logical format
Justified numbers → decimal points line up
Good / consistent spacing
Legend
Qualities of a good figure
Understandable on its own
Axes labels (with units)
Appropriate scaling of axis
Symbols
Customized (not the excel default)
No need for box borders around graph
Trendline should be thicker and clear
Figure legends (caption)
The key to understanding a figure
A good figure legend includes:
Title
Materials and methods (description of techniques used)
Results (further explanation of the data)
Definitions (of symbols, patterns, lines, abbreviations, etc).
Monty Hall Problem
It involves a scenario where you have a 1/3 chance of initially choosing the door with a prize behind it. When the host reveals one of the other doors with no prize, the probabilities shift. By switching doors, you essentially capitalize on the new information and increase your chances of winning to 2/3.
Probability
The degree of certainty or chance that something will happen.
Statistics
Help us…
Reduce and describe data
Quantify relationships among data
Determine if sets of data are similar / different
Goals of a data analysis
Data reduction (and description)
Reduce measures to make more meaningful
Averages, spread, bar chart / plots / histograms (descriptive)
Easier and more meaningful to read than all the individual data.
Establish relationships
Descriptive – describe relationship between two observations
Relationship between height and weight
Casual – did something cause the other
Intervention → caused some response
Inference
Infer outcome from sample to population
Is what we see in sample true in population
Purpose of sampling
to approximate a larger population on characteristics relevant to the research question.
Histograms
Graphical representations
Mainly represent frequency (# of subjects that fall into a range).
Measures of central tendency
Mean
average
x̄ = ΣX / N
Median
middle of distribution
Mode
most frequently occurring value
Range
difference between high and low values in a data set
Confidence interval
interval estimate of the population mean (using SEM)
Standard Error of the Mean (equation)
standard deviation / √sample size
Normal distribution
probability that is symmetric about the mean
Kurtosis
measure of outliers in a distribution
High kurtosis → heavy tails or outliers (platykurtic
Low kurtosis → light tails or no outliers (leptokurtic)
Standard deviation
Measure of data around the mean
Amount by which every value varies from the mean
How tightly values in dataset are bunched around the mean
Variability of individual observations around a single sample mean
Central limit theory
when many samples are drawn from a population, the means of these samples tend to be normally distributed.
Empirical rule
for a normal distribution, nearly all data fall within three standard distributions of the mean.
Standard error of the mean (SEM)
how close sample values are to the average of all data points
also shows how accurately the average reflects the sample data
essentially compares the experimental mean to the true sample mean
SEM will always be lower than SDEV
the larger the sample, the lower the SEM which is good
Confidence intervals
give an estimate of how well the sample mean represents the population mean.
Range of likely values for population parameter
Uses reliability coefficient (s) and SEM
Statistical hypothesis testing
Applies the scientific method to data with random fluctuation
The null hypothesis (H0)
effect of data does not represent real effect in hypothesis but is merely a result of random fluctuation.
Hypothesis that there will be no difference nor relationship between variables.
Alternative hypothesis (Ha)
hypothesis formulated based on existing knowledge, theories, or observations.
Difference between variables is specified (one group is greater / uses the other)