statistics Flashcards
population
the entire set of individuals or objects of interest
sample
a portion, a selected part, of the population
individuals
the minimum unit that can be studied
statistic
an approximation to the parameter, that can be calculated from our data
simple random (probability) sampling
random numbers form 1 - N
cluster (probability) sampling
simple random sampling of clusters
stratified (probability) sampling
- order population in strata
- simple random sampling in the strata
- these might have different sample sizes (or not)
convenience (non-probability) sampling
when units are selected for inclusion in the sample because they are the easiest for the researcher to access
snowball (non-probability) sampling
a recruitment technique in which research participants are asked to assist researchers in identifying other potential subjects.
quota (non-probability) sampling
it relies on the non-random selection of a predetermined number or proportion of units.
frequency
the number of observations for each group
absolute frequencies
counting the observations
relative frequencies
percentage (or fraction) of observations in each group
measures of centrality
trying to summarize the data by identifying the central position of the data
mean (or average)
the sum of the data divided by the number of observations
median
midpoint of the values ordered in size
mode
most frequent observation
dispersion
informs about the variability in the data
variance
a measurement of how far each number in a data set is from the mean, and thus from every other number in the set.
standard deviation
a statistic that measures the dispersion of a dataset relative to its mean and it is calculated as the square root of the variance
boxplot
univariate descriptives
multivariate descriptive statistics
shows the relation between two or more variables, which can be of different types
statistical inference
data analysis to study the underlying probability distribution
hypothesis testing
an act in statistics whereby an analyst tests an assumption regarding a population parameter
null hypothesis
a type of statistical hypothesis that proposes that no statistical significance exists in a set of given observations.
false positive
an investigator rejects a null hypothesis that is actually true in the population. It is usually more problematic
false negative
the investigator fails to reject a null hypothesis that is actually false in the population.
probability (alpha)
statistical significance
p-value
probability, under the null hypothesis, of sampling a test statistic at least as extreme as that which was observed
low p-value
reject H0 and accept H1
high p-value
cannot reject H0 and cannot accept H1
homogeneity of contingency tables
when the distribution of observations in the rows (or in the columns) could be explained by random sampling of the observations in the columns (or rows)
Shapiro-Wilk
normality test but only for n<50
Kolmogorov-Smirnov
one sample vs a distribution, or two samples
95% interval of confidence
has a 95% likelihood of containing the parameter value