Final Exam Flashcards
What is the concept of a population?
An entire group of individuals
Ex. All voters in the United States
What is the concept of a sample?
- Usually populations are too large to examine the entire group, so a smaller sample is taken to represent the population
- Goal is to use sample results to answer questions about a population (inferential statistics)
sample statistics are not perfect representatives of population parameters (this discrepancy is called sampling error)
Know how to define and identify a nominal variable
- Unordered set of categories that are identified by their different names
- Measurements can label and categorize observations, but do not make any quantitative distinctions between observations
- Only determination you can make is whether two individuals are the same or different on that variable
ex. favorite ice cream flavor
Know how to define and identify an ordinal variable
- An ordered set of categories
- Tells you the direction of difference between two individuals (but not the size of said difference)
Ex. class rank, place in a race, large vs. small drink
Know how to define and identify an interval variable
- An ordered series of equal-sized categories
- Identifies the direction and magnitude of a difference
- Zero point is located arbitrarily… zero does not mean none of the thing
ex. Temperature in C or F
Know how to define and identify a ratio variable.
- Ordered series of equal sized categories
- Can identify the direction and magnitude of a difference
- Zero indicates none of the thing
Ex. Temperature in K, distance
Know how to define and identify a correlational study
- Goal is to determine the strength and direction of the relaitonship between two variables
- Uses observations of the two variables as they exist naturally
- Correlation cannot determine causation
Know how to define and identify an experimental study
- Examine the relationship between 2 or more variables by changing one variable and observing the effects on the other variable
- To establish a cause and effect relationship between the two variables, an experiment attempts to control all other variables to prevent them from influencing the results
Know how to define and identify a nonexperimental study
- Compare groups of scores but do not use a manipulated variable to differentiate groups
- Therefore, no causal determinations can be made
When given a dataset, know how to compute: the mode
- The mode is the most frequently occuring score or class interval in the distribution
- In a frequency distribution graph, the mode corresponds to the high point of the distribution
- Can be measured for data measured on any scale of measurement; is the only measure of central tendency that can be used for data measured on a nominal scale
- General term is also used to describe a peak in a distribution that is not necessarily the highest point… (major mode at the highest peak and a minor mode at a secondary peak, used when distribution is clearly humped)
Possible to have more thn one mode: 2=bimodal, 3+=multimodal…
When given a dataset, know how to compute: the median
- The median divides the scores so that 50% have values equal to or less than the median
- If scores are listed smallest to largest, the median is the midpoint of the list
- requires scores that can be placed in rank order and measured on an ordinal, interval, or ratio scale
- If odd # scores, median is the middle score… if even # scores, median is the sum of the 2 middle scores divided by 2
Median is relatively unaffected by extreme scores, so tends to stay in the center of the distribution even when there are a few extreme scores or the distribution is very skewed. In these situations, the median serves as a good alternative to the mean
When given a dataset, know hot to compute: the mean
- The mean is calculated by computing hte sum of the entire set of scores, and dividing this sum by the number of scores
- Most commonly used measure of central tendency
- Can be used for ordinal, interval, or ratio scales (best for interval and ratio)
- Conceptually, the mean can also be defined as the balance point of the distribution (sum of the distances below the mean is exactly equal to the sum of the distances above the mean)
-Changing the value of any score will always change the mean. Discarding or adding new scores will almost always change the mean (unless you discard or add a score that is equal to the mean)
-If a constant value is added or subtracted from every score the mean is also changed by that same constant value. Smae when multiplying or dividing
Know under what cirumstances the mean does not provide a representative value
- When the distribution contains a few extreme scores (like US income)
- Or is very skewed… the mean will be pulled toward the tail or toward the extreme scores (the mean will not provide a central value)
- In a definitively humped distribution, the mean score may actually represent nobody in the distribution
- With data from a nominal scale, it is impossible to compute a mean
Know how to identify the different shapes of distribution graphs: Symmetrical
-left side is roughly a mirror image of the right (can be normal curve, can be bimodal is the two peaks are mirror images)
Know how to identify Positive and Negative skew
- Skewed distribution: scores pile up on one side of the distribution
- Leave a “tail” of a few extreme values on the other side
- Positive skew- scores pile on the left side with the tail pointing right
- Negative skew- scores pile on the right side with the tail pointing left
Know how to identify bimodal distribution graph
Clearly has two humps, or two peaks
Relationships between mean, median, and mode
- The three are often systematically related because they all measure central tendency
- Ex. In a symmetrical distribution, mean=median=mode (if there is one mode)
- Ex. in skewed distribution: mode located at the peak on one side, mean usually displaced toward the tail on other side, median usually located between the mean and the mode
What do the notations S and σ mean?
- S=sample standard deviation
- σ=population standard deviation
Know how to calculate the sum of squares given a dataset
- Find the mean of the dataset
- Subtract the magnitude of the mean from each (find the deviations)
- Square this value (square the deviations)
- Add up the squared deviations (this is the sum of squares)!
Know how to calculate the variance given a dataset
- Find the sum of squares
- Divide sum of squares by n-1(sample) or N (population)
- This value is the variance (s squared!)
What is the difference between the standard deviation for a sample vs. a population?
- For samples, we divide the SS by n-1 when finding the variance
- For populations, we divide the SS by N when finding the variance
- This is to inflate the estimate of variance, to account for the fact that sample variance will typically underestimate population variance (this effect is stronger with smaller samplse and the effect of df helps account for that too)