Descriptive statistics and representing your data 0 of 15 Cards studied Study Flashcards
What is the difference between mean, median, and mode?
Mean (X): is defined as the total score divided by the number of scores obtained
Median (Md): middle value in a series of values (put scores in order of rank and then divide them into two)
Mode (Mo): simply the most frequently occurring value of a variable
- a small number of participants
- extreme high/low scores (outliers)
- data isn’t ‘normally’ distributed
What is central tendency?
Central tendency refers to the middle of a data set or where the scores in a data set tend to fall
Central tendency measures give us an idea of the typical score in a sample.
Another important aspect is how dispersed (spread out) the scores are from the average - a little vs. a lot?
(i.e. how much variation there is in a sample of scores?)
We can examine this:
- visually
- by calculating some more descriptive statistics!!
How do you visualise dispersion?
Histograms show..
- frequency of each score obtained
- quick visual of central tendency
- the extent of dispersion
- extreme high/low scores
- density plot
What do you mean by dispersion and how do you calculate for it?
Extent to which the data varies around the average
i.e. how good is your average?
Variance:
∑ squared deviations
(# participants -1)
Standard deviation:
square root of 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
Interquartile range (IQR): 75% - 25%
Mean average
is total sum of all scores divided by the # of scores
Variance
is calculated by determining how much each score differs from the mean average, squaring each value, adding them up and then dividing by the # of scores -1
Std Deviation
is calculated by finding the square root of the variance. You can then determine how your data is dispersed around the mean in a comparative unit of measurement - e.g. ±1 SD, ±2 SD etc… Any data above 3SD from the mean is considered a ‘statistical outlier’
What are outliers?
They are NOT representative of the group, and so could either be removed from the data; transformed in some way; or analysed with non-parametric statistics instead!
What are the common ways ways to visualise data?
- histogram
- Scattergrams or scatterplots
- Boxplots
- Error bars/bar charts/line graphs
Histogram
It is a graph showing the frequency of the scores per group in this example.
Here it is overlaid with a density curve (it smooths out the overall trend of the distribution)
A histogram is a graphical display of data using bars of different heights. In a histogram, each bar groups numbers into ranges. Taller bars show that more data falls in that range. A histogram displays the shape and spread of continuous sample data.
Scattergrams or scatterplots
- Each dot is a pair of scores
from each participant - pattern of association
- Line of best fit?
A scatterplot is a type of data display that shows the relationship between two numerical variables. Each member of the dataset gets plotted as a point whose ( x , y ) (x, y) (x,y)left parenthesis, x, comma, y, right parenthesis coordinates relates to its values for the two variables.
Boxplots
- thick line in middle is median
- box contain 50% of data (quartiles)
- Whiskers are highest/lowest scores not calculated as ‘extreme’
(such scores are depicted as separate dots if present)
A Box and Whisker Plot (or Box Plot) is a convenient way of visually displaying the data distribution through their quartiles. It is often used in explanatory data analysis
Error bars/bar charts/line graphs
- Similar to as a box plot…
- Dot/top of column/peak of line is the mean average
- The error whiskers typically demonstrate the standard deviation
Error bars are graphical representations of the variability of data and used on graphs to indicate the error or uncertainty in a reported measurement. When standard error (SE) bars do not overlap, you cannot be sure that the difference between two means is statistically significant. Even though the error bars do not overlap in experiment 1, the difference is not statistically significant (P=0.09 by unpaired t test).
A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent.
A line graph, also known as a line chart, is a type of chart used to visualize the value of something over time
Average scores will be similar but not if
small number of participants
extreme high/low scores (outliers)
data isn’t ‘normally’ distributed
misleading graphs
- no labels on the horizontal and vertical axis, so we do not know what the bars represent
- missing title - we do not know anything about the data
- vertical axis has uneven scale - makes it appear that the first scale is closer to the second bar than it is