WEEK 1 Flashcards

1
Q

2 types of Data

A

Categorical and Scalar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Categorical has 2, which are?

A

Nominal and Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Scalar has 2, which are? WHAT IS WORKS WELL WITH?

A

Continuous and Discrete

It works well with median, range interquartile range (IQR)

Doesn’t work well with mode unless you group the data and the frequency table would have too many different values for the continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Nominal?

A

Data that does not have a numerical value and can only be placed in a suitable category like gender and yes and no questions, they give a label such as College or Breakfast in the example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Ordinal?

A

Ordinal is data that can be arranged in some meaningful order such as confidence with numbers (agree, dissagree, etc). The data includes the idea of order because it is categorical and the bar chart is generally the best to use.

They assume that all the distances between the confidence with a number (disagree and strongly disagree) but if it weight we can measure the distance

(Categorical variables with 2 different categories are called Dichotomous)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Continous?

A

Measured on a scale such as temperature or weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Discrete?

A

Data that takes on whole values, usually obtained by counting e.g, the number of defective items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is mode? IN bar chart too

A

The most frequent score in our data set, the bar chart shows the tallest one is the mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is spread? In the bar chart too

A

How many different categories do we have, in the bar chart shown below

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the median number calculation?

A

159 +1 then divided by 2 to find the median point of the data, firstly order the data, even if it’s an even number

Only accurate if there is an even number of data points, having discrete or continuous data helps you to find the 2 observations add them and divide by 2

if categorical need to be lucky to find it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Scalar is?

A

It has height, weight and the guessing variable, it adds the idea of distance not just order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Interquartile range?

A

Describethe s the middle of 50% of values when ordered from lowest to highest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to find IQR?

A

Find the median (middle value) of the lower and upper half of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Range?

A

The highest value (Maximum) - the lowest value (Minimum)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

IQR CALCULATION

A

n+1th divided by 4 is the LOWER QUARTILE (LQ) and the UPPER QUARTILE (UQ) is the same but times by 3 x n+1th divided by 4 in front, then IQR = UQ - LQ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is five number summary?

A

It’s a set of descriptive statistics that provides information about a dataset e.g, BOX PLOT SHOULD BE ALL EQUAL DISTANCE

1) the sample of minimum (smallest observation)
2) the lower quartile or first quartile TOTAL + 1 DIVIDE BY 4
3) the median (the middle value) TOTAL + 1 DIVIDE BY 2
4) the upper quartile or third quartile (TOTAL + 1 DIVIDE BY 4) X BY 3
5) the sample maximum (larger observation)

17
Q

AVERAGE - SCALAR DATA the mean of the data, how to calculate?

A

ADD all values and divide by total number of observation to find the mean

18
Q

THE IDEA OF DISTANCE, wha it is?

A

Standard deviation is the distance of the data from the mean, it does not matter if the value of the observation is above or below the mean because of the squaring (distance matters)

The variance is the standard deviation squared, variance is sometimes more useful than the standard deviation

19
Q

STANDARD DEVIATION CALCULATION

A

So first calculate the mean by adding al and dividing by the n (number of observations)

x = muna reflects each individual height we pick the first individual height and minus the average (the mean)

And take the square plus we go to the second individual we subtract the mean from the height and square it

Calculate the square differences and add them up (1;40 min in the first lecture)

We divide by N at the end = gives us variance in the inner part

We take the square root gives us the spread standard deviation (sigma a greek letter)

20
Q

What is trimmed mean?

A

Means there might be very high or small heigh so it’s cutting the lowest 5 % of lower data and get rid o low values (outliners) and 5% of high values if there are any to make it more accurate answer of mean

21
Q

Standard deviation?

A

1 find mean the u word by adding all numbers and dividing by how many there are

2 the numbers given are then minus the mean of all

3 we then get the inside bracket and we do the “2

4 The E PART add all and divide by ( n ) how many points

5 then we square root it to get standard deviation

22
Q

What is the best measure of spread?

A

Variance (before square rooting the standard deviation)

23
Q

Standardisation

A

z = x - mean divided by the standard deviation

x is the number we want to standardise

to get same unit when there are different units

24
Q

When comparing spread of 2 or more distributions we should?

A

compare the coefficients of variations for each as these take into account differences in the means

25
Q

coefficient of variations

A

CV = standard deviation (sigma) divided by the mean

if the dispersion around the mean is large there is more uncertain and low accuracy of data

can be positive and negative

26
Q

what are 2 relative measures?

A

coefficient of variation

and idea of standardising data

27
Q

Different statical measures

A

all of them are absolute measures

28
Q

IQR uses the middle 50 %| and is

A

less influenced by extreme values

29
Q

WHAT ARE 3 MEASURES OF AVERAGE (CENTRALITY)

A

MEAN MEDIAN AND MODE

30
Q

median if there’s sales and frequency table

A

n = the data points = all frequencies added

then we do n +1 divide by 2

we get the 24th (for example)

then we count from frequencies which one is 24th

order data can be written down like 0 x 5, 1 x 16, 2 x 12 etc to find the 24th data point

31
Q

five number summary TIPS

A

the LQ = WE FIND THE DATA POINT the that’s the answer

UQ =we find the data point and its the answer

DONT SUBTRACT THEM

32
Q

inter quartile range (IQR)

A

order data in small to large

FIND THE UQ -LQ BY DOING THE FORMULA n+ 1 divide by 4 and then times by 3 for UQTO find which data point is the number

then find for each of them the number that corresponds to the TH e.g. (9th) - (3th) number

then minus the actual number from the data set and u got IQR

33
Q

If all datapoints all decrease by 7, the IQR decreases by 7 true of false?

A

FALSE = BECAUSE IQR is a measure of spread not centrality doesn’t change as the dataset moves

34
Q

IQR is not affected by outliners why?

A

Because it measures the UQ-LQ so it doesnt affect the data point

35
Q

what are outliers?

A

they are points that are far away from other data points

36
Q

What is a boxplot?

A

it demonstrates skew in the data

if all sides equal = no skew as there is balance

skew = no equal sides or median not in the middle of data , distribution is more concerted in left or right side

its basically 5 number summary

37
Q

MEAN IF there’s 1,2,3,4,

A

sum all then divide by 4 (the number of data points)

38
Q

Mean if there is x and frequency table

A

find mean by doing the x number times by frequency for each of them then add all

then divide by (n) the data points

n = add all of the frequencies together