Lesson0.2: Statistical Analysis Flashcards
Descriptive statistics _________ data. It does not seek __________ within it
Describe, relationships
What are descriptive statistics used for?
Measures of central tendency and measures of dispersion
What do measures of central tendency do?
Estimate the center position of values in a data set
What do measures of dispersion do?
Describe how spread out the values of the data are
What is discreet data?
Numerical data restricted to certain (usually integer) values
Example: rolling a die, can only yield 1, 2, 3, 4, 5, or 6. You can’t get a 5.6
What is continuous data?
Numerical data not restricted to certain number values
Example: the mass of a person can be 63kg, 62.6kg, 62.6523782 kg
What is a uniform distribution?
A type of continuous probability distribution where all probabilities are equal
Example: date/time of birth
What is a normal distribution?
A type of continuous probability distribution with a bell curve shape
Example: heights of adult Canadian females
All normal distributions have the same properties. Name the 3 properties
1) They have a bell shape and are symmetrical
2) The mean is in the center of the distribution
3) The area under the curve is 1
The Y axis in a continuous probability distribution is the …
Frequency
The X axis in a continuous probability distribution is the …
Variable of interest (e.g., mass)
What is an advantage and disadvantage of using the mean?
Pro: it takes all values into account and can thus help minimize error
Con: it takes into account outliers, which can dramatically skew the mean
What does x̄ represent?
Sample mean
What does µ represent?
Population mean
What is the median?
The middle value of an ordered set; the 50th percentile
In what type of data set are the mean and median the same?
In a symmetric distribution
Which measure(s) of central tendency can be used with nominal data sets?
Mode
Which measure(s) of central tendency can be used with ordinal data sets?
Mode, median
Which measure(s) of central tendency can be used with interval data sets?
Mode, median, mean
Which measure(s) of central tendency can be used with ratio data sets?
Mode, median, mean
What is the most appropriate measure of central tendency for interval or ratio data that are skewed or contain outliers?
Median
What is the most appropriate measure of central tendency for non-skewed data?
Mean
Measures of dispersion describe …
How spread out the data is
How is the range calculated?
Subtracting the smallest value in a set from the largest value
The first quartile Q1 is larger than _____ of the observations
25%
The third quartile Q3 is larger than ____ of the observations
75%
How do you calculate the interquartile range (IQR)?
IQR = Q3 - Q1
What is standard deviation?
A statistical measure of variability that indicates the average amount that a set of numbers deviates from their mean
What is variance?
The square of the standard deviation
What does s represent?
Standard deviation for a sample
What does σ represent?
The standard deviation of a population
What is considered an outlier?
Data that is above Q3 + 1.5IQR or below Q1 - 1.5IQR
What causes random error and which measure does it decrease?
Caused by human or intstrumental error and decreases precision
What causes systematic error and which measure does it decrease?
Caused by observer, instrument, or subject bias and decreases accuracy
Which type of error is consistent? Random or systematic?
Systematic
What does accuracy measure?
How close the data points are to the actual value
What does precision measure?
How close the data points are to each other (how well they cluster)
What does a correlation coefficient of 0 indicate?
That there is no LINEAR relationship
What is the difference between correlation and simple linear regression?
Correlation does not establish which variable is causing the other. Simple linear regression describes how one variable is associated with another and is an extension of correlation
What is a residual and how are they calculated?
A residual is the difference between an observed value of the response variable (DV) and the predicted value.
residual = y(observed) - y(predicted)
What is a chi-square test?
A test used to calculate p-values when all variables are categorical
Example: are people who watch action movies more likely to buy popcorn?
What is a t-test?
A test used to calculate p-values and compare the average values of a quantitative variable between two categorical groups
Example: is life expectancy different between Canadians and Americans?
What is ANOVA
A test similar to a t-test but for more than two groups
Example: is the life expectancy different between Canadians, Americans and Mexicans?
What is a confidence interval?
An estimated range of values, that is likely to include an unknown population parameter at a given confidence level
What is the level of confidence?
The probability that the interval estimate contains the population parameter
What is a Type I error?
A false positive: when the null hypothesis is rejected even when it is true
What is a Type II error?
A false negative: when the null hypothesis is not rejected when it is false
What is internal validity?
The degree to which the independent variable has been demonstrated to cause the dependent variable
What is a threat to internal validity?
Confoudning variables
How can confounding variables be minimized?
Randomization
What is temporality?
The idea that, for variables to be causally related, the independent variable must occur before the dependent variable
What is external validity?
The ability of a research design to provide results that can be GENERALIZED to other situations, especially to natural (“real life”) situations
Name and describe the two factors external validity depends on.
1) The participants included in the sample: they should be representative of the populationto which one wants to generalize
2) The physical realm of the research setting: it should be similar with respect to relevant and important characteristics of the natural situation to which one wants to generalize
There is trade-off between ________ validity and ________ validity
Internal, external
What is the biopsychosocial (BPS) approach and what are the two central tenets of this model?
An approach to medicine that integrates psychology, sociology, and biology in diagnoses and treatments
Two central tenets:
1) illness is a product of more than biology (social and psychological factors)
2) illness has multiple causes (genetic, environmental, psychological0