STATS Lec-1 Descriptive stats Flashcards
1
Q
Types of statistics
A
- Descriptive- describing data sets- this allows us to take a lot of data and summarise it so that it can be understood by many people
- Inferential- Making inferences (Looking for similarities or differences in a data set)- from data about the general population from samples of data i.e. was this pattern due to chance or real effect
2
Q
Types of descriptive statistics
A
- Measures of central tendency- Where is the middle of the data set, or the most common trend
- The measure of the dispersion-How variable is the data?- can be very broad or narrow, this helps to describe and define a data set we have collected
- Measures of Kurtosis and Skewness (These are a measure of non-symmetrical data)- Is the data set symmetrical around the central tendency?
- Graphical representations
- Raw data
3
Q
3 measures of central tendency- Mean
A
- What we normally think of as the average
- The arithmetic average
- Add up all of the scores and divide by the number of scores
+ Takes into account the value of all scores
- BUT is affected by anomalies or extremes of value
4
Q
3 measures of central tendency- Median
A
- The middle score if all the scores are put in rank order of size
+ Less affected by outliers (anomalies)
- Provides less detail than the mean
- NB in a data set {1,2,2,3,3,4} median = (2+3) / 2 = 2.5
5
Q
3 measures of central tendency- Mode
A
- The most common score
+Not at all affected by outliners
- Very crude- doesn’t give a lot of detail about the data set
- Often mode is used for non-numerical data
- Qualitative data
6
Q
Nominal data
A
- Data in which the data are neither measure no ordered by subjects are merely allocated to distinct categories
- We can only use the mode
7
Q
Ordinal data
A
- This is when the data have a natural variable, ordered categories and distances between the categories are not known
- e.g. Number of students that got a certain result
- You can use the median or the mode
8
Q
Categorical data: Uniform distribution
A
- Uniform distribution is when for each category selected there is roughly the same frequency
- Measurements of central tendency are often useless due to the fact that there is no central tendency as the value are all similar
9
Q
Appropriate measures of different distribution: central tendency
A
- The mean is sensitive to outliers so not always good if outliers or extreme scores are present - The median may be a better measure
- Neither mean nor medians are useful for categorical data- the mode would be more appropriate
- The mode can be misleading if its frequency is only just greater than that of other scores or categories
10
Q
The appropriate measure for different distribution: Spread of data
A
- Range: Maximum score minus minimum score
- Suitable for ordinal or interval data
- Limited in its descriptive powers
- Interquartile range: Split into quartiles and calculate the difference between 3rd and 1st quartile
- Useful when the median is used as a measure of central tendency
- If looked at carefully can give you clues as for the shape of the distribution
- The most powerful use is the variance or Standard Deviation (Measure of deviation from the mean)
- Mathematical: for normal distribution only