Week 3 (Data Types) Flashcards

1
Q

What are the two main data types?

A

Categorical or nominal: things that can be counted

Measurement: things that can be measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are categorical/nominal variables?

A

Discrete- certain number
Label can be represented by name of number (vanilla strawberry chocolate, 1, 25 ,18)
Only valid mathematical operation is counting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Ordinal scales?

A
Discrete- certain number 
Inherent order (ranks) 
Some information about quantity 
Steps may not be equal 
Movement along the scale indicates a change in amount but doesn’t indicate how much change 
Can’t calculate means etc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Interval scales?

A

Order and equal intervals

Continuous

Mathematical operations - addition and subtraction

No true 0

0 doesn’t mean absence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ratio scales?

A
Order 
Equal intervals 
True 0 = absence 
Physical quantities (mass, length, time) 
Can calculate ratios of different values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Continuous variables?

A

Theoretically infinite resolution between minimum and maximum

Can be converted to discrete variables but not view versa - conversion causes loss of information/precision

A construct can be continuous but the method of quantifying it may be discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 3 measures of central tendency?

A

Mean
Median
Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the mode?

A

Only used for categorical data
Most commonly occurring value in a set

Sample can have more than one mode
Bimodal: two modal values
Multimodal: more than two modal values

If there are no values that occur more than once, there isn’t a mode for the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the median?

A

All scores ordered in increasing value and the middle score is the median.

Same number of observations below and above the median
Odd sample size- middle score
Even sample size - average elf two middle points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the mean?

A

Most commonly used

Add all values and divide by total number
Value around which scores are distributed
Won’t over or under estimate
Isn’t biased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens with extreme values?

A

If the outlier isn’t obvious, need to be careful about discarding

Median is unaffected by end points whereas mean uses all of the data and therefore in some cases the mean is not representative

Outliers can sometimes be seen using visual inspection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What measures are used to measure the spread of dispersion?

A

Range

Interquartile range

Sample standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is range?

A

Maximum - minimum

Depended entirely on two extreme scores - if either are outliers, the range overestimates variability in the data

Range increases as sample size increases and this is because the bigger the sample size, the more opportunities there are to get extreme variables. Large samples allow for a good look and feel for extremes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain quartiles?

A

Group the data into 4 ordered, equal groups

Q1 lower quartile: 25% are below, 75% are above
Q3 upper quartile: 75% below, 25% above

Interquartile range: difference between Q1 and Q3
How much spread in the middle 50% of scores
the bigger the IQR = bigger dispersion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Variance and standard deviation?

A

Variance is roughly the average of the squared differences of the mean

Calculate how far away each score is from the mean- some are below and some are above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do we choose descriptive statistics?

A

Nominal data should not be summarised using interquartile range mean median and mode

ordinal data can be associated with some descriptive statistics including median quartiles and interquartile range

17
Q

Explain distributional shape?

A

Data values tend to be symmetrically clustered around the mean

18
Q

Skewness of distributional shape?

A

Instead of being symmetrical the tail part is spread out much further on one side this is not a normal distribution

Positive: right skewed. More high values then low

Negative: left skewed. More low than high

Histograms can be used to assess skewness

19
Q

What is Kurtosis?

A

Is about the shape of the two tails

Long and fat tails: low kurtosis
Peaked distribution and small tails: High kurtosis

It is typically measured in relation to the normal distribution

Skewness is more important

20
Q

EDA?

A

Exploratory data analysis

Refers to procedures designed to present data in an informative way using graphical pictorial and summary methods
Graphs and tables help to organise explore and present data and highlight its features

21
Q

What can you do with one categorical variable to turn it into visual information?

A

Frequency: represents counts in each category

Contingency table (two way table) of frequency: row or column percentages are means of summarising the relationship between the two variable

Pie charts: graphical representations used for a single categorical variable with typically a few categories

22
Q

What can we do we one or more categorical variable to turn it into visual information?

A

Bar (column) graphs

Can be used for either one or two variables

23
Q

How do we turn one continuous variable into visual information?

A

Stem and leaf plots: group data into intervals of equal length

Histograms: groups the data into equally sized intervals
The area of each box in a histogram is proportional to the frequencies of the intervals of the values

Box plots:
A way of presenting continuous data and giving a picture of how the data are distributed shows the median the interquartile range and whiskers cover the remaining data minimum and the maximum

24
Q

How can we present more than one continuous variable in a visual way?

A

Scatter plots

Used to consider relationship between two quantitative variables eg. Price and thickness of textbooks

Can also be constructed with the inclusion. If categorical variables to differentiate the relationship eg. Price vs thickness differentiates by cover type

25
Q

Principles of good graphs?

A

Clear images
Smooth and sharp lines
Legible and simple font
Measurement units are provided
Clearly labelled axes
Elements within figure are clearly labelled or explained
Error bars included when graphing descriptive statistics

26
Q

Bad graphs?

A

Obscure or misinterpret info.

Graphs containing out right mistakes three-dimensional graphs in which the third dimension does not represent anything
bar charts with the scale starting above zero scatterplots with the Y and X scales not restricted to the range of the data
fanciful plot that result in optical illusion that are known to mislead