1. Descriptive and inferential statistics Flashcards
Classify these variables as NOMINAL or CONTINUOUS:
A) Age
B) Gender
C) Height
A) Age = Continuous
B) Gender = Nominal
C) Height = Continuous
Describe what a confounding variable is
A variable that affects the outcome being measured as well as, or instead of, the independent variable.
- because a confounding variable is an unforeseen and unaccounted-for variable that jeopardizes reliability and validity of an experiment’s outcome
If a test is valid, what does this mean?
The test measures what it claims to measure
If a test is reliable what does this mean?
The test will give consistent results.
The discrepancy between the numbers used to represent something that we are trying to measure and the actual value of what we are measuring is called:
Measurement error
What is the ‘fit’ of the model?
The ‘fit’ of the model is the degree to which a statistical model represents the data collected
What is variance?
The variance is the average error between the mean and the observations made
A frequency distribution in which low scores are most frequent (i.e. bars on the graph are highest on the left hand side) is said to be:
Positively skewed
How can we compensate for practice effects?
Counterbalancing
How can we compensate for boredom effects?
Giving participants a break between tasks
Variation due to variables that have not been measured is known as:
Unsystematic variation
- Unsystematic variation results from random factors that exist between the experimental conditions (such as natural differences in ability, the time of day, etc.)
What is the assumption of homogeneity of variance?
That the variance within each of the populations is equal.
Variation due to the experimenter doing something in one condition but not in the other condition is known as:
Systematic variation
What does residual variance tell us?
Residual variance helps us confirm how well a regression line that we constructed fits the actual data set. The smaller the variance, the more accurate the predictions are
The purpose of a control condition is to
Allow inferences about cause
- A properly constructed control condition provides you with a reference point to determine what change (if any) occurred when a variable was modified
What helps to control for participant characteristics (thus minimize unsystematic variation)?
Randomization
How are Z scores calculated?
By subtracting the mean from the score and dividing the answer by the standard deviation
SCORE - MEAN = X
X / STDEV = Z-SCORE
The standard deviation is the square root of the:
Variance
What is the coefficient of determination?
A measure of the amount of variability in one variable that is shared by the other
Calculated as:
correlation coefficient squared
Complete the following sentence:
A large standard deviation (relative to the value of the mean itself)…
Indicates that the data points are distant from the mean
(i.e. the mean is a poor fit of the data).
The probability is p = 0.80 that a patient with a certain disease will be successfully treated with a new medical treatment. Suppose that the treatment is used on 40 patients. What is the “expected value” of the number of patients who are successfully treated?
32
because 80% of 40 patients is 32 (or 40 x .80 = 32)
What is the Confusion of the inverse?
A logical fallacy whereupon a conditional probability is equated with its inverse
- that is, given two events A and B, the probability of A happening given that B has happened is assumed to be about the same as the probability of B given A, when there is actually no evidence for this assumption.
More formally, P(A|B) is assumed to be approximately equal to P(B|A).
The test statistics we use to assess a linear model are usually _______ based on the normal distribution.
Parametric tests
What are the assumptions of the general linear model?
Independence:
- The errors in your model should not be related to each other
Additivity/Linearity:
- If you have several predictors then their combined effect is best described by adding their effects together
- The outcome variable is, in reality, linearly related to any predictors
Normality:
- The core element of the
Assumption of Normality asserts that the distribution of sample means (across independent
samples) is normal.
(In technical terms, the Assumption of Normality claims that the sampling
distribution of the mean is normal or that the distribution of means across samples is normal)
Homogeneity of variance:
- When testing several groups of participants, samples should come from populations with the same variance
Finish the sentence
The further the values of skewness and kurtosis are from zero, the more likely…
…it is that the data are not normally distributed
Parameters are numbers that summarize data for…
an entire population
Statistics are numbers that summarize data from…
a sample
What are the measures of central tendency?
- Mean
- Median
- Mode
What are the measures of spread or dispersion?
- Range
- Variance
- Standard Deviation
What does kurtosis tell us?
what data points are outliers
Distributions:
Leptokurtic = relatively large tails (heavy drop off)
Platykurtic = relatively small tails (light/no drop off)
Mesokurtic = same kurtosis as the normal distribution
What is Gambler’s fallacy?
mistaken belief that, if something happens more frequently than normal during some period, it will happen less frequently in the future, or that, if something happens less frequently than normal during some period, it will happen more frequently in the future
What is the Law of small numbers?
exaggerated confidence in the validity of conclusions based on small samples.
- Misperceive a small sample to be indicative of the entire population
What does the Sum of squared errors (SS) indicate?
The total dispersion, or total deviance of scores from the mean
How does an increasing number of participants affect the distribution of the sample?
- Distribution becomes more normal
- Spread of the distribution decreases
What do confidence intervals tell us?
There is a tradeoff between degree of certainty and width of the CI:
- The more certain you want to be, the wider (larger) the interval needs to be
- The goal is to have a high level of confidence paired with a small interval.
- One way to help achieve this is to have less variability in your sample (i.e. smaller error or mean)
Sum of squares, Variance and standard deviation represent the same things.
What do they represent?
- The ‘fit’ of the mean to the data
- the variability in the data
- How well the mean represents the observed data
- error
What does standard error tell you?
It is the standard deviation of the sampling distribution of a statistic
How accurate the mean of any given sample from that population is likely to be compared to the true population mean.
When the standard error increases, i.e. the means are more spread out, it becomes more likely that any given mean is an inaccurate representation of the true population mean.
Which t-test has more power to find an effect given that everything else is equal?
Repeated measures
vs
independent measures
repeated measures t-test:
- When the same participants are used across conditions the unsystematic variance (often called the error variance) is reduced dramatically, making it easier to detect any systematic variance