Descriptive Statistics Flashcards
Which comes first? Descriptive or Inferential?
- Descriptive
- Inferential
Descriptive Statistics ?
Descriptive statistics is a means of describing features of a data set by generating summaries about data samples.
Inferential Statistics?
Inferential statistics use measurements from the sample of subjects in the experiment to compare the treatment groups and make generalizations about the larger population of subjects.
What are the measures of central tendency?
- Arithmetic Mean
- Median
3.Mode
Arithmetic Mean?
Average Value
What is arithmetic mean suitable for?
Suitable for symmetrical distributions
Symmetrical Graph Shape and distribution ?
Bell-shaped & Normal Distribution
Asymmetrical Graph Shape and distribution ?
Distribution skewed to right
Positively skewed
What affects the mean of asymmetrical graphs?
The outliers
Median?
The median is the value in the middle of a data set, meaning that 50% of data points have a value smaller or equal to the median and 50% of data points have a value higher or equal to the median.
Why is median not affected by outliers?
Median is a robust measure thus not affected but outliers
What is median ideal for?
Asymmetrical Distribution
Why is median ideal for asymmetrical data?
It is a robust measurement and not affected by outliers
Measures of central tendency?
Central tendency is a descriptive summary of a dataset through a single value that reflects the center of the data distribution. Along with the variability (dispersion) of a dataset, central tendency is a branch of descriptive statistics. The central tendency is one of the most quintessential concepts in statistics.
Which measure of central tendency do you use for a/symmetrical data?
Symmetrical Data-arithmetic mean
Asymmetrical Data-Median
Mode?
Most frequent value
Why is mode not affected by outliers?
It is a robust measurement
Robust measurement?
Robust measures of scale are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers. The most common such robust statistics are the interquartile range (IQR) and the median absolute deviation (MAD).
How can we know that a graph is symmetrically distributed ?
The measures of central tendency will be close together.
For asymmetrical distribution has measures of central tendency spaced out
Measures of spread?
- Variance and standard deviation
- Range
3.Interquartile Range
Variance?
Average squared distance from the mean
Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value.
Why don’t we use variance?
Variance gives you a squared unit and thus not easy to use and apply
Standard Deviation?
Square root of the variance
The standard deviation is the average amount of variability in your data set. It tells you, on average, how far each score lies from the mean.
Why is standard deviation ideal to use?
-Doesn’t have squared root
-Affected by outliers
-
Which kind of distribution do we use variance and standard deviation for?
We use standard deviation + mean for symmetrical distribution.
We do not use for asymmetrical distribution.
When are statistics stronger for variance and standard deviation?
When the standard deviation is low
What is standard deviation a good decriptor of?
Good descriptor of spread for normal, symmetrical, distributions
Range?
Range is between the maximum and minimum values?
Range equation?
Range=Maximum value-Minimum Value
Range>Interquartile Range?
Interquartile range
Interquartile range?
-25th and 75th percentiles
-covers 50% of values
- the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data.
Nominal Data:
-Mode
-Median, IQR ,Range
-Median, SD
Mode: Yes
Median: No(can’t establish logical order)
Mean, SD: NO
Ordinal Data:
-Mode
-Median, IQR ,Range
-Median, SD
Mode: Yes
Median: Yes(data has order)
Mean, SD: No
Interval Data:
-Mode
-Median, IQR ,Range
-Median, SD
Mode: Yes
Median: Yes
Mean, SD: Yes
Ratio Data:
-Mode
-Median, IQR ,Range
-Median, SD
Mode: Yes
Median: Yes
Mean, SD: Yes
Why do we prefer to use IQR more than range
When measuring variability, statisticians prefer using the interquartile range instead of the full data range because extreme values and outliers affect it less. Typically, use the IQR with a measure of central tendency, such as the median, to understand your data’s center and spread.