stats 3 Flashcards
methods to describe and summarize data to describe its main features.
Descriptive statistics
the type of value that a variable takes on
Measurement metric
Categorical variables
- Nominal variables
- Ordinal variables
- Binary variables
Numerical variables
- Continuous variables
- Discrete variables
Categorical variables (def)
represent categories or groups and do not have a
numeric value
Nominal variables (def)
categorical variables with no inherent order or
ranking among the categories
Ordinal variables (def)
categorical variables that have a meaningful order
or ranking, but the intervals between the categories are not necessarily
equal
Binary variables (def)
A special type of categorical variable that has only two possible values. (A.k.a, dichotomous or dummy variables)
Numerical variables (def)
represent quantities and can be measured on a numeric scale.
Continuous variables
can take any value within a range and can be subdivided into finer increments with equal unit distances
Discrete variables
can only take specific, distinct values, often counts or integers.
Rank statistics
a class of statistics used to describe the variation of continuous variables based on their ranking from lowest to highest values
Median value
the value of the case that sits at the exact center of the cases when
we rank the values of a single variable from the smallest to the largest observed
values
Range
the difference between the minimum and maximum value of a variable
Quartile
a statistical term that divides a dataset into four equal parts, with
each quartile containing 25% of the data
Interquartile range (IQR):
the difference between the variable value at the 25% and the 75% ranks
The interquartile range is a measure of the – or spread of values
dispersion
Box-whisker plot
a graphical representation of data
that displays the median, quartiles, and potential outliers, using a box to show the interquartile range and “whiskers” to indicate the range of the data.
— summarize data based on the order of values, while moments
provide numerical measures that describe the shape and distribution of the data.
Rank statistics
Moments
numerical measures derived from the data values themselves and
their positions relative to the mean or origin. They provide information about the shape of the data distribution, including measures such as mean (first moment), variance (second moment), skewness (third moment), and kurtosis (fourth moment).
Mean (first moment)
the average value of a variable
The zero-sum property of the mean
if you subtract the mean of a dataset
from each data point, the sum of these deviations will always be zero.
The – tells us that the mean is the value that best balances the total differences between each data point
least-squares property of the mean
The mean of a variable is often called its - because it is the value you would most expect the variable to take.
expected value
Variance (second moment)
a measure of the dispersion of a variable around its mean.
Standard deviation
another measure of the dispersion of a variable around
it’s mean
We can use a - to depict the dispersion of a variable (its variance and standard deviation)
histogram
- in histograms are the intervals or ranges into which data is grouped to
show how frequently values fall within each range
Bins
- smooth out histograms.
Kernal density plots
Kernal density plot
a visual depiction of the distribution of a single variable based on a smoothed calculation of the density of cases across the range of values
Skewness (third moment)
a measure that indicates the symmetry of the variable’s distribution around the mean
Skewness (third moment):
a measure that indicates the symmetry of the
variable’s distribution around the mean