psyc test 3 Flashcards
What is statistics?
A branch of mathematics devoted to the collection, compilation, display and interpretation of numerical data.
What are the two types of statistics?
Inferential and descriptive
What is descriptive statistics
Presenting, organizing and summarizing data
What is inferential statistics?
Drawing conclusions about a population based on data observed in a sample (hypothesis testing)
What do we need statistics for?
Description
Prediction
Causation
What is description?
aims to describe the prevalence of something (prevalence of heavy drinking)
What is Prediction?
aims to forecast likely outcomes
What is Causation?
aims to establish cause and effect
What are units? Where are they found?
the objects we are studying (people, companies, students)
Usually the rows in the data set
What are variables? Where are they found?
measurements that vary across people (height, weight)
Usually the columns
What is the dependent variable? What are other words for it?
The variable to be explained (outcome variable, response variable, primary endpoint)
What is the independent variable? What are other words for it?
Determinants of the dependent variable (explanatory variable, predictor variable, covariate)
What is the control variable? What is another word for it?
Any other variable that may plausibly alter the relationship between the IV and the DV (covariate)
What are the 4 types of measurements/ variables?
Nominal, ordinal, continuous- interval, continuous- ratio
What are nominal variables and give an example?
The data are categorized with no inherent order (hair colour)
What are ordinal variables and give an example?
The data are categorized and ranked (score on likert scale, year of university)
What is interval data and give an example?
Data that is continuous, ranked and evenly spaced (test scores)
What is ratio data and give an example?
Data is continuous, evenly spaced and has a natural zero (height)
What is the first step to describing data?
categories and sort it
What is a frequency table? And how are they formatted?
A table that lists each value and the number of times it appears
Tables are listed from highest to lowest value and each value is included even if tis frequency is zero
What is relative frequency?
proportion. dive the frequency by the total n
What are grouped frequency tables and why are they used?
Are data is too speed out (too many 0 frequencies). We create grouped intervals that has equal width and always start with a multiple of the width
What is a measure of central tendency? what are the measures of central tendency?
The value of a “typical” observation. Mean, mode, median
What is the mode?
most common value
What are the 4 types of modes you can have?
unimodal, bimodal, multimodal, amodal
What type of data can you use for median?
Since It has to be ordered it can only be used for ordinal, ration or interval
What type of data can you use for mean?
interval or ratio scales
What types of data are the mean, median and mode most appropriate for
Mean is best for continuous
Median is best for ordinal
Mode is best for nominal
What is a histogram?
Plots frequency data for continuous variables. The bars all touch and a space means there is no data for that value
What is a bar chart? And what are they used for?
Used for ordinal or nominal data
Bars do not touch and represent categories. Shows the frequency. The order doesn’t matter for nominal
When will the mode, median and mean all be the same
When the data is symmetrical and unimodal
What happens if you have a skewed distribution?
The mean will be pulled towards the skew and the median will be between the mode and the mean
What happens to the mean if there is an extreme skew (extreme outliers)
It is biased towards the outliers
What type of skew extends out to the left
negative
What type of skew extends out to the right
positive
What are the measures of variability?
Range, SD, interquartile range
What is the range?
Difference between the lowest and the highest score
What is the inter-quartile range (IQR)
measure of variability in non-normally distributed data. Separates data into 4 equal parts and considers the different between the lower and upper quartile
What is standard deviation?
measure of how close values are to the mean
What does a low SD mean?
Values are close to the mean
What does a high SD mean?
values are spread out
What is an outlier? Are all extreme scores bad?
An extreme score that is much higher or lower than the rest of the scores. Some may be important to our data
What can cause outliers?
errors, misunderstandings, equipment failures
What are the mean, median and mode affected by outliers?
Mean is most sensitive and will be pulled towards the outlier
Median is not really impacted unless there are many outliers
Mode is not impacted
How are the range and SD affected by outliers
SD will be larger
Range will be greatly impacted
What is a z-score? What are the units?
How far an individual score is from the mean. Measures exactly how many standard deviations above or below the mean a data point is.
Z-score is standardized so it has no units
What can Z scores be used for?
to determine if a value is an outlier or not
What is the formula for Z-score?
z= (X-M)/SD
Why is the normal distribution important
Statistical test assumptions
Method selection between parametric and non parametric methods
Data transformations
What tests assume that data follows a normal distribution?
t-test, ANOVA
What is parametric vs non-parametric methods?
Parametric assumes normality
What is data transformation?
transforming data to meet normality assumptions
What percentage of data is within 1 SD, 2SD, or 3 SD?
68% within 1SD
95% within 2SD
99.7 within 3 SD
What is skewness
Whether the data is distributed symmetrically around the mean. Describes asymmetry of distribution
What does 0 represent for skewness.
What does positive represent for skewness
What does negative represent for skewness?
Draw Skewness
0= perfect symmetry
negative = left skew
positive = right skew
What is kurtosis?
whether data is peaked or flat. heaviness of a distributions tails relative to a normal distribution
When is kurtosis high? when is it low?
high when data is near the mean
low when data is spread out
What is Platykurtic
What is mesokurtic
What is leptokurtic?
<3 is platykurtic = flat
= 3 mesokurtic =normal
leptokurtic >3 = tall
What the 2 ways to assess normality?
Visually or with statistical analyses
What are the two ways to visually assess normality?
Using a histogram or Q-Q-plots
What do Q-Q plots show?
If data is normal in a Q-Q test, the data will follow the diagonal line. If it is skewed it will deviate
What are types of statistical analyses that can assess normality?
Skewness score
Kurtosis score
Shapiro Wilks for Normality
What is Shapiro Wilks test for normality?
Means data is not significantly different from a normal distribution. Use p >.05 to
What happens to power when there are more participants? What is the risk?
Power increases with more participants, it becomes more likely to detect small or subtle effects. This can also increase the likely hood of finding a false positive since there’s a greater sensitivity to detect any signal, whether it’s a true effect or just random noise.
What should determine your distribution model?
The nature of the data and the phenomenon it represents