Descriptive Statistics and Descriptive Research Flashcards
What is the difference between continuous data and categorical data?
Continuous data involves using data with numeric values (Age, weight, height) while categorical data is used with categorical values (Gender, race, exposure/disease status).
Categorical data can be divided into ________ or __________. What is the difference?
- Nominal- Numerals are category labels
- Ordinal- Numbers indicate rank order
List these as categorical or continuous:
- Age
- Sex
- Race
- Education
- Language
- STOFHLA
- Age = continuous
- Sex = categorical
- Race = categorical
- Education = categorical
- Language = categorical
- STOFHLA = continuous
What 3 things do we do to describe continuous data?
- ) Check it’s distribution (symmetric, skew left, bimodal, multimodal)
- ) Measure its center
- ) Measure its spread
Draw a __________ to check the distribution of a continuous variable.
histogram
Normal distribution is a probability distribution that is:
- _________ about the center
- data near the center are more _________
- a ____ curve in graph form
- symmetric
- frequent
- bell
What are some non-normal distributions of data?
- Bimodal
- Skewed to the right
- Skewed to the left
- Multimodal
What are 3 ways to measure the data “center”? Which is the most common used for value of central tendency?
- mean (most common)
- median
- mode
When is median better than mean?
When there is an outlier or data is skewed.
For data distributed symmetrical -\_\_\_\_ = \_\_\_\_\_\_ = \_\_\_\_\_\_ For data distributed skewed to the right -\_\_\_\_\_ < \_\_\_\_\_\_ < \_\_\_\_\_ For data distributed skewed to the left -\_\_\_\_\_\_ < \_\_\_\_\_ < \_\_\_\_\_
For data distributed symmetrical - mean = median = mode For data distributed skewed to the right - mode < median < mean For data distributed skewed to the left - mean < median < mode
What are 3 ways to measure the data “spread”?
- Standard deviation (SD)
- Range
- Interquartile rnage (IQR)
- What is standard deviation?
- Standard deviation is the most common value for spread of the data around the _____.
- square root of {(sum of square of the deviance from the mean) / (total number of values – 1)}
- mean
- What is range?
- Range is the most common value for spread of the data around the _____.
- When can range be misleading?
- max-min
- median
- in data with an outlier
- What is IQR?
- What is meant by Q1,Q2,Q3?
- IQR measures better than range when the data has an ________.
-Q3-Q1
- Q1 = the value that occurs at the first quarter mark
- Q2 = the value that occurs at the second quarter mark = Median
- Q3 = the value that occurs at the third quarter mark
-outlier
- __% of data falls within 1 standard deviation.
- __% of data falls within 2 standard deviation.
- __% of data falls within 3 standard deviation.
- 68%
- 95%
- 99.7%