Week 3 (Data Types) Flashcards by Alicia Lowe

What are the two main data types?

Categorical or nominal: things that can be counted

Measurement: things that can be measured

How well did you know this?

Not at all

Perfectly

What are categorical/nominal variables?

Discrete- certain number
Label can be represented by name of number (vanilla strawberry chocolate, 1, 25 ,18)
Only valid mathematical operation is counting

How well did you know this?

Not at all

Perfectly

Ordinal scales?

Discrete- certain number 
Inherent order (ranks) 
Some information about quantity 
Steps may not be equal 
Movement along the scale indicates a change in amount but doesn’t indicate how much change 
Can’t calculate means etc

How well did you know this?

Not at all

Perfectly

Interval scales?

Order and equal intervals

Continuous

Mathematical operations - addition and subtraction

No true 0

0 doesn’t mean absence

How well did you know this?

Not at all

Perfectly

Ratio scales?

Order 
Equal intervals 
True 0 = absence 
Physical quantities (mass, length, time) 
Can calculate ratios of different values

How well did you know this?

Not at all

Perfectly

Continuous variables?

Theoretically infinite resolution between minimum and maximum

Can be converted to discrete variables but not view versa - conversion causes loss of information/precision

A construct can be continuous but the method of quantifying it may be discrete

How well did you know this?

Not at all

Perfectly

What are the 3 measures of central tendency?

Mean
Median
Mode

How well did you know this?

Not at all

Perfectly

What is the mode?

Only used for categorical data
Most commonly occurring value in a set

Sample can have more than one mode
Bimodal: two modal values
Multimodal: more than two modal values

If there are no values that occur more than once, there isn’t a mode for the data set

How well did you know this?

Not at all

Perfectly

What is the median?

All scores ordered in increasing value and the middle score is the median.

Same number of observations below and above the median
Odd sample size- middle score
Even sample size - average elf two middle points

How well did you know this?

Not at all

Perfectly

What is the mean?

Most commonly used

Add all values and divide by total number
Value around which scores are distributed
Won’t over or under estimate
Isn’t biased

How well did you know this?

Not at all

Perfectly

What happens with extreme values?

If the outlier isn’t obvious, need to be careful about discarding

Median is unaffected by end points whereas mean uses all of the data and therefore in some cases the mean is not representative

Outliers can sometimes be seen using visual inspection

How well did you know this?

Not at all

Perfectly

What measures are used to measure the spread of dispersion?

Range

Interquartile range

Sample standard deviation

How well did you know this?

Not at all

Perfectly

What is range?

Maximum - minimum

Depended entirely on two extreme scores - if either are outliers, the range overestimates variability in the data

Range increases as sample size increases and this is because the bigger the sample size, the more opportunities there are to get extreme variables. Large samples allow for a good look and feel for extremes

How well did you know this?

Not at all

Perfectly

Explain quartiles?

Group the data into 4 ordered, equal groups

Q1 lower quartile: 25% are below, 75% are above
Q3 upper quartile: 75% below, 25% above

Interquartile range: difference between Q1 and Q3
How much spread in the middle 50% of scores
the bigger the IQR = bigger dispersion

How well did you know this?

Not at all

Perfectly

Variance and standard deviation?

Variance is roughly the average of the squared differences of the mean

Calculate how far away each score is from the mean- some are below and some are above

How well did you know this?

Not at all

Perfectly

How do we choose descriptive statistics?

Study These Flashcards

Nominal data should not be summarised using interquartile range mean median and mode

ordinal data can be associated with some descriptive statistics including median quartiles and interquartile range

Explain distributional shape?

Study These Flashcards

Data values tend to be symmetrically clustered around the mean

Skewness of distributional shape?

Study These Flashcards

Instead of being symmetrical the tail part is spread out much further on one side this is not a normal distribution

Positive: right skewed. More high values then low

Negative: left skewed. More low than high

Histograms can be used to assess skewness

What is Kurtosis?

Study These Flashcards

Is about the shape of the two tails

Long and fat tails: low kurtosis
Peaked distribution and small tails: High kurtosis

It is typically measured in relation to the normal distribution

Skewness is more important

EDA?

Study These Flashcards

Exploratory data analysis

Refers to procedures designed to present data in an informative way using graphical pictorial and summary methods
Graphs and tables help to organise explore and present data and highlight its features

What can you do with one categorical variable to turn it into visual information?

Study These Flashcards

Frequency: represents counts in each category

Contingency table (two way table) of frequency: row or column percentages are means of summarising the relationship between the two variable

Pie charts: graphical representations used for a single categorical variable with typically a few categories

What can we do we one or more categorical variable to turn it into visual information?

Study These Flashcards

Bar (column) graphs

Can be used for either one or two variables

How do we turn one continuous variable into visual information?

Study These Flashcards

Stem and leaf plots: group data into intervals of equal length

Histograms: groups the data into equally sized intervals
The area of each box in a histogram is proportional to the frequencies of the intervals of the values

Box plots:
A way of presenting continuous data and giving a picture of how the data are distributed shows the median the interquartile range and whiskers cover the remaining data minimum and the maximum

How can we present more than one continuous variable in a visual way?

Study These Flashcards

Scatter plots

Used to consider relationship between two quantitative variables eg. Price and thickness of textbooks

Can also be constructed with the inclusion. If categorical variables to differentiate the relationship eg. Price vs thickness differentiates by cover type

Principles of good graphs?

Clear images Smooth and sharp lines Legible and simple font Measurement units are provided Clearly labelled axes Elements within figure are clearly labelled or explained Error bars included when graphing descriptive statistics

Bad graphs?

Obscure or misinterpret info. Graphs containing out right mistakes three-dimensional graphs in which the third dimension does not represent anything bar charts with the scale starting above zero scatterplots with the Y and X scales not restricted to the range of the data fanciful plot that result in optical illusion that are known to mislead

Week 3 (Data Types) Flashcards

(26 cards)