descriptive statistics Flashcards

Question 1

Q

Categorical Data (factor, nominal)

Answer

A

No particular relationship between the different possibilities
Example: what prison sentence does someone have?
Answers might be suspended, determinate, indeterminate
Can’t average them or do maths with them
Doesn’t make sense to talk about “average” prison sentence
But you could talk about the most / least frequently occurring prison sentence

Question 2

Q

Continuous data (interval)

Answer

A

Goes in a specific order
Example: what is a patient’s weight today?
Can do maths with interval data
It would make sense to talk about average weight or weight increase or decrease
Its meaningful to say someone who is 60kg is 10kg heavier than someone who is 50kg

Question 3

Q

Ordinal Data (ordered categorical, ordered factor, Likert scale)

Answer

A

• Like categorical but there is an order to the sequence
• Example: how tired are you feeling right now? Pick one of the following options
1. Very tired
2. Tired
3. Alert
4. Very alert
• Can’t do maths with ordinal data
• Like categorical data, we can talk about the most chosen and least chosen options, but not the average tiredness

Question 4

Q

Mode

Answer

A

• The score / value / number / response that happens most often

• You can have more than one mode
• One mode = unimodal
• Two modes = bimodal
• Can take modes of continuous data too
What is the mode of the variable bdi.8m, shown on the right? Interpret the result in the context of the data
A score of 0 is the mode (it has 7 appearances or “counts” in the data)

Question 5

Q

median

Answer

A

The middle number
Only useful for continuous data
Sort data in ascending order
Find the number in the middle of the dataset
You cannot have more than one median
If there wasn’t a middle value you’d take the average of the two middle values

Question 6

Q

mean

Answer

A

What most people mean when they say “average”
Only useful for continuous data
It is the sum (total) of all the values divided by the number of values

 Would need to know more about the measurement scale used
Let’s formalise that in a formula X ̅=Σx/n The mean has outliers

Question 7

Q

variability

Answer

A

Talked about centres or “averageness” in the data
Another type of statistic we need to calculate to understand our data: measures of variability
How spread out the data are
How far away from the mean or median do the datapoints tend to be
Bdi.pre mean was 23.33 – how near to this value do most of the patients’ depression scores tend to be?

Question 8

Q

range

Answer

A

Simply the highest – lowest value
14 - 0 = 14
Know the boundaries of our data
Useful to detect outliers or data input errors
But doesn’t tell you how common really high or low numbers are

Question 9

Q

interquartile range

Answer

A

Split our data into quarters
Each quarter contains 25% of the datapoints
To do that we need to find the quartiles
The three points that split the data into the quarters

Question 10

Q

interpreting the IQR

Answer

A

For first 13 people in variable bdi.8m have depression score IQR of 9
The IQR is the range (max-min) of the middle 50% of the data
IQR plays a key role in data visualization (boxplots)
IQR is useful as it is not as affected by outliers compared to the following measures

Question 11

Q

variance

Answer

A

How far numbers are spread out from the mean
Big number that isn’t useful on its own
Feeds into other statistics that we’ll use lots

Question 12

Q

interpreting the variance

Answer

A

Variance is
A really big number
Not in the original units
It is difficult to interpret – this is a problem with using the measure
Not useful to say that the spread of depression scores before treatment was 89.12 around the mean
So we need to ‘undo’ the squaring that we did earlier
Means that the variances is interpretable in the same units as the data
Which gives us the standard deviation….

Question 13

Q

standard deviation

Answer

A

just the square route of the variance

Question 14

Q

interpreting standard deviation

Answer

A

The average distance between the values in the dataset and the mean of that dataset
Most often used to understand the variability in continuous / ordinal data

Question 15

Q

Calculating skew: Pearson’s coefficient of skewness

Answer

A

Negative number means data are negatively skewed
Positive number means data are positively skewed
Symmetrical data has skew of zero

μ = mean
ν = median 
σ = standard deviation

Question 16

Q

R studio

Answer

Study These Flashcards

A

Describe(data, mean = mean(dataset), stdev = sd(dataset))

Question 17

Q

R studio, arrange by the descriptive stats

Answer

Study These Flashcards

A

Describe(data, by dataset, mean = mean(dataset), stdev = sd(dataset))

Question 18

Q

R studio min and max

Answer

Study These Flashcards

A

describe(data =data, mean_dataset = mean(dataset), SD_dataset = sd(dataset), max_dataset = max(dataset), min_dataset = min(Intrusion), by = Condition)

Question 19

Q

R studio example

Answer

Study These Flashcards

A

describe(data =tetris, mean_intrusion = mean(Intrusion), SD_Intrusion = sd(Intrusion), max_Intrusion = max(Intrusion), min_Intrusion = min(Intrusion), by = Condition)

Question 20

Q

r studio correlation

Answer

Study These Flashcards

A

cor(data)

gives you something like:
exercise_mins stai_state
exercise_mins 1.0000000 -0.3985458
stai_state -0.3985458 1.0000000

Question 21

Q

Calculate the variance explained in R studio

Answer

Study These Flashcards

A

r_exercise_anxiety = 0.3985458

descriptive statistics Flashcards

(21 cards)