14. Descriptive Statistics Flashcards
descriptive Statistics
The branch of statistics dealing with how to describe and summarize data.
How can I communicate the important characteristics of my data?
frequency distribution
a (chart) showing the unique values of the data set, along with their frequency within the data set
bar graph
used to depict a frequency distribution of CATEGORICAL variables (space between bars)
histogram
used to depict a frequency distribution of a QUANTITATIVE variable (no space between bars)
mean
sum of all values divided by number of values
X = E(x) / n
median
centermost value when the set is ordered
mode
most frequent value in a set
mean, median & mode
measures of central tendency
nominal
a variable that can be CATEGORIZED, but not quantified.
ordinal
a variable that can be RANKED, but not quantified.
interval
a variable that can be QUANTIFIED, without a true relationship to 0
ratio
a variable that can be QUANTIFIED, where 0 indicates absence of quantity
variance
measure of average distance to mean, measured in square units
standard deviation
measure of average distance to mean
variance (formula)
E(x-M)^2 / n
standard deviation
(E (x-M)^2 / n ) ^1/2
normal distribution
a distribution where 68% falls within one standard dev, 95% within 2 standard dev, and 99.7 within 3 standard devs of the mean
unstandardized difference between means
compare two data sets
by finding the difference between the data set means, in natural units.
(ie. M1 - M2)
cohen’s d
compare two data sets
by finding the difference between the data set means, in standardized units.
ie) M1 - M2 / SD
note: The SD can be for set 1 or 2
- 2 = small
- 5 = medium
- 8 = large
thresholds of effect size for interpreting cohen’s d
effect size
magnitude of relationship between two variables
Pearson correlation coefficient
vector value [-1,1] that describes magnitude (absolute value) and direction (sign) of relationship between variables, when when variable is controlled for.
p = E (Zx Zy) / n
- only valid for linear relationships*
- scatterplot data first*
partial correlation coefficient
vector value describing magnitude and direction of relationship between variables, when more than one variable is controlled for.
curvilinear regression
technique used to determine nature of relationship between variables that have a curviliear relationship (ie. elliptical)
regression analysis
using one or more independent variables to predict the values of the dependent variables
regression analysis (appropriate cases)
predict values dependent variable with quantitative IV and DV
ANOVA (appropriate cases)
predict dependent variable values with categorical IV, quantitative DV
ANCOVA (appropriate cases)
predict dependant variable values with mixed IVs and quantitative DV
simple regression Y = dependent variable value m = slope = regression coefficient x = the single IV value b = y intercept of the line of regression
regression analysis in which only one IV is controlled for
Y = mx + b
multiple regression Y = dependent variable value m1 = first regression coefficient x1 = first x value m2 = second regression coefficient x2 = second x value ... etc b = line of regression y intercept
regression analysis in which more than one IV is controlled for
Y = mx1 + mx2 + … + b
contingency table
used to compare the relationship of categorical variables.
if % are horizontal, compare down the columns, else reverse