Week Three - Data Types/Variables/Descriptive Statistics Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What are the 4 main data types?

A

categorical/nominal
ordinal
interval
ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Categorical/Nominal Variables

A

Discrete

An arbitrary label (eg., male, non-smoker)

Label can be nominal or numerical
Nominal: vanilla, chocolate, strawberry
Numerical: 1, 18, 7

Only valid mathematical operation is counting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define key characteristics of an Ordinal Scale

A

Discrete

Inherent order (ranks)

Some information about quantity

Movement along the scale indicates a change in amount, but doesn’t indicate how much change

Can perform logical operations on this scale

(eg., kinder, primary, high, college, bachelor, masters, phd)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define key characteristics of an Interval Scale

A

Interval Scales

Order + equal intervals

Continuous (though measurement may not be)

Mathematical operations (addition, subtraction)

How much more (or less) of something is there?

Does not have true zero

If the scale has zero in it, 0 does not mean absence of the thing.

Eg., Temperature (Celsius)
0 ° vs 5° ; 25° vs 30° : difference is 5° (0 ° C does not mean no heat)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define the key characteristics of Ratio Scales

A

Order, equal intervals + a true zero

Physical quantities are ratio scale (mass, length, time, etc.)

0 kg = absence of mass; 0 meters = absence of length

Can calculate ratios of different values

50kg is 2X greater than 25kg

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 2 forms of discrete variables?

A

Categorical and ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the Mode?

A

Most commonly occurring value in a set

Sample can have more than one mode
Bimodal = two modal values
Multimodal > two modal values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Median?

A

Same number of observations below and above the median (middle number)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the Mean?

A

Value around which scores are distributed (average)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 3 most commonly used measures of spread/dispersion?

A

Range
IQR
Sample SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define the ‘Range’. What happens if a range score is an outlier?

A

Maximum - Minimum

If min and/or max is an outlier, the range overestimates variability in the data

Range tends to increase as sample size increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define ‘quartiles’

A

Quartiles group the data into four ordered, equal groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the lower quartile?

What is the upper quartile?

A

25% & 75%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the IQR?
What does it measure?
Bigger IQR = ?

A

The difference between Q3 and Q1

IQR measures how the data is spread out

Bigger IQR = greater dispersion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is variance? What does it measure?

A

A measurement of the spread between numbers in a data set.

It measures how far each number in the set is from the mean and therefore from every other number in the set.

Variance is roughly the average of the squared difference to the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What descriptive statistics can ordinal data be used with?

A

Median
Quartiles
IQR

17
Q

Data values tend to be symmetrically clustered around what DS?

What distribution does this tend to have?

A

The mean

Normal bell-shaped distribution

18
Q

Characteristics of Positive Skewness

A

right-skewed = data had high data values more spread out than low values (the mean is dragged to the right end)

19
Q

Characteristics of Negative Skewness

A

left-skewed = data has low values more spread out than high values (the mean is dragged to the left end)

20
Q

What type of graph can be used to assess skewness?

A

histograms

21
Q

For symmetric data, mean and median are usually what?

A

approx equal

22
Q

What is Kurtosis?

A

The shape of the two tails

23
Q

Long and fat tail means? (kurtosis)

A

low kurtosis (platykurtic)

24
Q

Peaked distribution and small tails refers to?

A

high kurtosis (leptokurtic)

25
Q

What is EDA? Exploratory Data Analysis

A

Refers to procedures designed to present data in an informative way, using graphical, pictorial and summary methods

eg graphs and tables

26
Q

What are the 2 ways to present/summarise data for one categorical variable?

A

frequency and pie charts

27
Q

Define frequency

A

Frequency represents the count of observations in each category

28
Q

What is relative frequency?

A

Refers to the proportion of the whole represented by the counts in a category

29
Q

Describe a pie chart

A

graphical representations used for a single categorical variable with, typically, few categories

30
Q

What can be used to summarise data from one or more categorical variables

A

Bar/column graphs

31
Q

What 3 things can be used to summarise data for one continuous variable?

A

Stem and leaf plots
histograms
box plots

32
Q

Describe Stem-and-leaf plots

A

Group data into intervals of equal length

Actual values of the variable are retained, possibly in a rounded form

Each observation is represented by its last digit

33
Q

Describe Histograms

A

Group the data, usually into equal-sized intervals

The area of each box in a histogram is proportional to the frequencies of the intervals of values

Intervals between the boxes in a histogram are often called ‘bins’

34
Q

Describe Box Plots

A

Way of presenting continuous data and giving a picture of how the data are distributed
Focus is on the central 50 per cent of the data (median and IQR)
Whiskers cover the remaining data (length = 1.5*IQR)

35
Q

What can be used to summarise data from more one continuous and one categorical variable?

A

Histograms, box plots, bar graphs and other graphs and plots can be used to compare continuous data across different categories (levels) of a categorical variable

Plots must be constructed on the same scale as the continuous variable

36
Q

What can be used to summarise data from more than one continuous variable?

A

scatterplots

37
Q

Describe scatter plots

A

Used to consider the relationship between two quantitative variables

They are particularly useful to understanding the relationship between a response and an explanatory variable

38
Q

What are some principles of good graphs?

A

Images are clear

Lines are smooth and sharp

Font is legible and simple

Units of measurement are provided

Axes are clearly labeled

39
Q

Examples of bad graphs include?

A

Graphs containing outright mistakes (such as the percentages in a pie chart not summing to 100)

Three-dimensional graphs in which the third dimension does not represent anything and obscures or distorts the information that should be represented

Bar charts with a scale starting above zero