Descriptive questions: Data Flashcards
What is descriptive statistics?
Descriptive statistics are about describing the structure of a population and to identify descriptive data.
Keywords: related, correlated, difference
What is testing statistics?
Testing statistics is all about the population, a sample of the entire population is taken
What are the 3 questions of Stephen Toulmin’s model of argumentation?
Claim: What is your decision
Ground: On what decision is your data or conclusion based?
Warrant: Why is the choice of your decision adequate, given the information you gave?
Approach for analyzing and creating effective arguments
Numerical methods when describing datasets
What is a frequency table?
- The basis of all describing data
- Can be used for any variable on any measurement level
- Not very useful for variables with a lot of categories
Numerical methods when describing datasets
What are the measures of central tendency and variability?
Central tendency: Mean, median, Mode
Variability: Range, Variance, Standard deviation
Characteristics to summarize information
Measurement levels
What are the measurement levels?
Nominal: categorical, if only 2 categories: binary
Ordinal: categorical, ranked order
Interval: categorical, ranked order, equal spacing
Ratio: categorical, ranked order, equal spacing, true zero
Nominal and Ordinal= categorical
Interval and ratio= quantitative
Measurement levels
What are continuous and discrete data?
Continuous: any value, infinite range
Discrete: only certain and limited values, finite range (often whole numbers)
Measurement levels
What is a measurement error?
A discrepancy between the numbers we use to represent the thing we’re measuring and the actual value of the thing we’re measuring
Central tendency
What are the Mode, Median, and Mean?
Mode: score that occurs most frequently, can be used for any measurement level (two modes= bimodal, multiple modes= multimodal)
Median: middle number when values are arranged in ascending order, can be used for ordinal, interval, and ratio
Mean: average of a quantitative data set, can be used for interval and ratio, is highly influenced by outliers
Dispersion (variability)
What are the Range, Interquartile range (IQR), Variance, and Standard deviation?
Range: difference between largest and smallest number in the observation (theres observed and theoretical range), can be used for ordinal, interval, and ratio variables
Interquartile range: difference between Q3 and Q1, can only be used if there is a median, can be used for ordinal, interval and ratio variables
Variance: describes the relationship between the average of all values and the observed values. First the mean is calculated, then subtracted from each observed value, then squared and added together. Must divide by degrees of freedom, can be used for interval and ratio
Standard deviation: root of variance, can only be used when there is a mean, so only interval and ratio
Shape of skewness
When do you have a positive/negative skew?
Positive skew: the mean is higher than the median
Negative skew: the mean is lower than the median
Shape of skewness
What is kurtosis?
Shape of a probability distribution
Positive kurtosis: many scores in the tails, pointy, leptokurtic distribution
Negative kurtosis: little scores in the tails, flatter, platykurtic distribution
Mesokurtic distribution = normal distribution
Chebyshev and the empirical rule
What is Chebyshev’s rule?
If the distrbution is skewed, you cannot say how many observations you will find one standard deviation away from the average
At least 0% of all the observations lie within 1 standard deviation away from the mean
At least 75% of all observations lie within 2 standard deviations away from the mean
At least 88.9% of all the observations lie within 3 standard deviations away from the mean
(Shape of the distribution does not matter)
Chebyshev and empirical rule
What is the empirical rule?
According to the empirical rule, the form of division must be normal and symmetric
68% of the observations are 1 standard deviation away from the average
95% of the observations are within 2 standard deviations from the average
99.7% of the observations are within 3 standard deviations from the average