Descriptive Stats / Intro Flashcards
A frequency distribution table is a summary table that shows the number of occurrences (frequency) of different values or ranges of values in a dataset.
A frequency distribution table
a graphical representation of the distribution of a dataset, displaying the frequencies of data values within specific intervals or bins.
A histogram
Table that shows the frequencies or proportions up to a certain point in a dataset, providing a running total of the frequencies.
A cumulative distribution table
refers to a distribution with kurtosis equal to the normal distribution, indicating a moderate peakedness and tail behaviour.
Mesokurtic kurtosis (normal Kurtosis)
s a distribution with a higher peak and heavier tails than the normal distribution, indicating more extreme values.
Leptokurtic kurtosis (positive kurtosis)
distribution with a lower peak and lighter tails than the normal distribution, indicating fewer extreme values.
Platykurtic kurtosis (negative kurtosis)
What is the difference between inferential and descriptive statistics?
Descriptive statistics summarise and describe data, while inferential statistics make predictions or inferences about a population based on a sample.
What are descriptive statistics?
Descriptive statistics are methods used to summarise and describe the main aspects of a dataset, such as central tendency, variability, and distribution.
What are the main aspects of a dataset that descriptive statistics summarise
Central tendency
Variability
distribution
If the numbering scheme is arbitrary then it’s probably best to use the —– as a measure of central tendency.
Mode
If your data are ordinal scale you’re more likely to want to use the ——- as a measure of central tendency.
median
(The median only makes use of the order information in your data (i.e., which numbers are bigger) but doesn’t depend on the precise numbers involved. That’s exactly the situation that applies when your data are ordinal scale. The mean, on the other hand, makes use of the precise numeric values assigned to the observations, so it’s not really appropriate for ordinal data.)
The —— has the advantage that it uses all the information in the data (which is useful when you don’t have a lot of data). But it’s very sensitive to extreme, outlying values.
mean
——- of the data. That is, how “spread out” are the data? How “far” away from the mean or median do the observed values tend to be?
variability
the 50th percentile is the same as the ——– value
median
The —— ——– (—-) is like the range, but instead of the difference between the biggest and smallest value the difference between the 25th percentile and the 75th percentile is taken.
The interquartile range (IQR)
Variability
Mean absolute deviation
deviations, added and averaged
what is the RMSD
“root mean squared deviation”
Properties of distributions.
- What the central tendency is (mean, median or mode).
- How symmetrical the data is either side of the mean (skew).
- How variable the data is (e.g. data range, standard deviation and kurtosis). * If it’s a “normal distribution”