descriptive statistics Flashcards
observation on one variable may be shown visually by putting variables on one axis and putting the frequency on the other
visual presentation of data
they are best used to interpret the frequency distribution visually
histogram:
y axis - no. of units
x axis - measurement lvl
bars are visually proportional to e/o
frequency polygon:
shorthanded presents a histogram
dot is placed at top of bars then connected = polygon (must be shaded)
better enunciates the data shape
the graph starts and ends at zero to “close” the shape
line graph:
can illustrate more than one data sets in one graph
- arithmetic line graph
- semilogarithmic line graph
differences between histogram, frequency polygon and line graph
histogram - data distribution
frequency polygon - connects those bar’s midpoints with lines
line graph - trends/ changes over time
briefly explain
arithmetic line graph:
semilogarithmic line graph:
arithmetic line graph:
both x and y axis has arithmetic scale (numerical)
semilogarithmic line graph: y axis has logarithmic axes
arithmetic - evenly spaced interval semilogarithmic -scale increase by multiples bcs of the exponential changes (bacteria)
it is how well distributed are the instances of a data
frequency distribution
frequency distribution from ____ data is defined by …
continuous data
types of descriptors aka parameters
what are the types of parameters of a frequency distribution
central tendency
dispersion
it is defined as the value used to represent the center or the middle of a set of data values
central tendency
it locates observations on a measurement scale
central tendency
it describes the spread of values in a given data set
dispersion
it suggest how widely spread out the observations are
dispersion
high SD =
low SD =
high SD = scattered data or spread out far from the mean
low SD =clumped data around the mean
it is the average or the sum (∑) of all observer values (xi) divided by the total no. of observation (N)
mean, x̄
it has the most mathematical properties and is most representative of the dataset if not for outliers.
mean, x̄
in median, the middle has been arranged from ____ to ____
highest to lowest
median is
frequently used in -
rarely -
- healthcare and economics
- used to make inferential conclusion from
it is the most commonly observed value
mode
true or false:
mode is frequently used in statistics
false - seldomly
arithmetic mean:
weighted mean:
arithmetic mean: for each indiv observation
weighted mean calculated by multiplying the weight associated with a particular outcome (grading system)
what are the downside of using mode as a measure of central tendency
may have no mode
have more than one mode
it is a statical measurement of the spread between numbers in a data set
variance
The differenced between the observed value of a data point and the Expected value is known as deviation in statistics.
mean deviation
it is the average deviation of a data point from the mean, median or mode of the data set.
It measures how far each number in the set is from the mean and thus from every other in the set
variance
it is the average amount of variability in your dataset.
SD
mean deviation is aka
mean absolute deviation
values that split sorted data or a probability distribution into equal parts
quantiles
a statistical term that describes a division of observation into four defined intervals based on the values of the data and how they compare to the entire set of observations.
quartiles
lower Q
median Q
upper Q
how to calculate percentiles
data ordered from lowest to highest
divided into 100 equal parts
how to find range
highest value minus lowest value
In descriptive statistics, the range of a set of data is the size of the narrowest interval which contains all the data.
range
IQR
it is defined as as the difference between third and the first quartile.
IQR
a measure of asymmetry of a distribution
explain why it is asymmetry
skewness (horizontal imbalance)
because left and right images are not mirror images
it is used to help measure how data disperse between a distribution’s center and tails, with larger values indicating a data distribution may have “heavy” tails that are thickly concentrated with observations or that are long with extreme observations
kurtosis (vertical imbalance)