Introduction and Descriptive Statistics Flashcards
What is Descriptive Statistics?
A way of describing the data distribution derived from the sample
Used in tabulating, summarizing, and describing data
What models are used to capture and simplify the sample data distribution?
Central tendency
Variance
Shape
Can the models be used to describe the population from which the data was samples?
Yes, if it is representative of the population and is sufficiently large
What are statistical models?
They are simplifications of reality
They are imperfect, but can be valuable for understanding and prediction
What are the goals of statistics?
Summary of salient characteristics (description) - central tendency (expected value), variability (variance), shape of distribution (skew)
Estimation - infer an unknown parameter of a population using sample data via a probability function
Hypothesis testing - differences among groups (comparative) and relationships among variables (associative)
Are population parameters constants at a fixed point in time?
Yes
Statistics are estimates that change over time across different samples from the same population
Observations from the sample, as well as the summary of statistics generated from the sample, are assumed to be random variants
What is a random variants?
Different outcomes generated by the same random process (margin of error)
What are inferential statistics?
Used to estimate characteristics (parameters) of a population based on data measured in a (representative) sample from the population
Is the standard deviation the square root of the variant?
Yes
What does epsilon mean?
Summing a set of values
What are the measures of central tendency?
Mode (most frequently observed)
Mean (sum of scores divided by number of scores)
Median (middle score when scores are in rank order, 50th percentile)
What are the measures of variability?
Range
Interquartile range (IQR)
Sum of squares (variance and standard deviation)
Coefficient of variability
What is range?
Maximum - minimum scores
Very gross descriptor, but typically reported for comparative purposes
What is interquartile range?
75th percentile (P75) - 25th percentile (P25)
Boundaries of the middle 50% of the distribution
What do the variance and the standard deviation tell you?
The variability
Based on the mean
What is the standard deviation?
The average absolute difference in scores from the mean value
What do narrow confidence intervals mean?
The more precisely we have estimated the population parameter
The confidence interval is inversely related to the size of the sample
What are the degrees of freedom?
The number of independent values that can be estimated in a statistical analysis
How many items can be randomly selected before constraint must be put in place
If a data set has 10 values, 9 of the values of free to vary, but the 10th value is determined
What is the coefficient of variation?
Unit-free measure of the precision of an estimate
Useful for comparing the degree of variation (precision) from one distribution to another, even if means are very different
Ratio of the SD to the mean times 100
(SD/x)100
Does the study with a smaller coefficient of variation has more precisely estimated the mean for the population?
Yes
The data used to calculate the mean are less variable
What is a symmetric distribution?
Mean=median=mode
What is a positive skew?
Mode<median<mean
Mean is most sensitive to skew
Tail is to the right
What is a negative skew?
Mean<median<mode
Tail to the left
What is a normal distribution curve?
Also known as a gaussian curve
Most important distribution in statistics
Many physical measures naturally result in normal distributions (height, weight, reaction times, etc.)
Problems present if the distribution is not normal
What characteristics do normal distributions have?
Unimodal
Symmetric
Can be described with 2 parameters (mu - population mean and sigma - population SD
Have tails that asymptotically (in very large samples) approach the x axis
Do all normal density curves satisfy the empirical rule?
68% of the observations fall within 1 standard deviation of the mean: between mu-1sigma and mu+1sigma
95% of the observations fall within 2 standard deviations of the mean: between mu-2sigma and mu+2sigma
99.7% of the observations fall within 3 standard deviations of the mean: between mu-3sigma and mu+3sigma
In normal distributions, do almost all values lie within 3 SDs of the mean?
Yes
What do z scores mean?
Observations expressed in terms of the number of standard deviation units from the mean
What is the mu and sigma for standard normal distribution?
mu = 0
sigma = 1
How do you compute the z score?
(xi - mu)/sigma
This gives you the number of SD, need to do another calculation to get the percentage
What do plots allow us to do?
View characteristics of the data
Detect oddities in the data
Understand relationships among variables
Make predictions