WEEK 1 Flashcards

Question 1

Q

2 types of Data

Answer

A

Categorical and Scalar

Question 2

Q

Categorical has 2, which are?

Answer

A

Nominal and Ordinal

Question 3

Q

Scalar has 2, which are? WHAT IS WORKS WELL WITH?

Answer

A

Continuous and Discrete

It works well with median, range interquartile range (IQR)

Doesn’t work well with mode unless you group the data and the frequency table would have too many different values for the continuous data

Question 4

Q

What is Nominal?

Answer

A

Data that does not have a numerical value and can only be placed in a suitable category like gender and yes and no questions, they give a label such as College or Breakfast in the example

Question 5

Q

What is Ordinal?

Answer

A

Ordinal is data that can be arranged in some meaningful order such as confidence with numbers (agree, dissagree, etc). The data includes the idea of order because it is categorical and the bar chart is generally the best to use.

They assume that all the distances between the confidence with a number (disagree and strongly disagree) but if it weight we can measure the distance

(Categorical variables with 2 different categories are called Dichotomous)

Question 6

Q

What is Continous?

Answer

A

Measured on a scale such as temperature or weight

Question 7

Q

What is Discrete?

Answer

A

Data that takes on whole values, usually obtained by counting e.g, the number of defective items

Question 8

Q

What is mode? IN bar chart too

Answer

A

The most frequent score in our data set, the bar chart shows the tallest one is the mode

Question 9

Q

What is spread? In the bar chart too

Answer

A

How many different categories do we have, in the bar chart shown below

Question 10

Q

What is the median number calculation?

Answer

A

159 +1 then divided by 2 to find the median point of the data, firstly order the data, even if it’s an even number

Only accurate if there is an even number of data points, having discrete or continuous data helps you to find the 2 observations add them and divide by 2

if categorical need to be lucky to find it

Question 11

Q

Scalar is?

Answer

A

It has height, weight and the guessing variable, it adds the idea of distance not just order.

Question 12

Q

What is the Interquartile range?

Answer

A

Describethe s the middle of 50% of values when ordered from lowest to highest

Question 13

Q

How to find IQR?

Answer

A

Find the median (middle value) of the lower and upper half of the data

Question 14

Q

What is Range?

Answer

A

The highest value (Maximum) - the lowest value (Minimum)

Question 15

Q

IQR CALCULATION

Answer

A

n+1th divided by 4 is the LOWER QUARTILE (LQ) and the UPPER QUARTILE (UQ) is the same but times by 3 x n+1th divided by 4 in front, then IQR = UQ - LQ

Question 16

Q

What is five number summary?

Answer

A

It’s a set of descriptive statistics that provides information about a dataset e.g, BOX PLOT SHOULD BE ALL EQUAL DISTANCE

1) the sample of minimum (smallest observation)
2) the lower quartile or first quartile TOTAL + 1 DIVIDE BY 4
3) the median (the middle value) TOTAL + 1 DIVIDE BY 2
4) the upper quartile or third quartile (TOTAL + 1 DIVIDE BY 4) X BY 3
5) the sample maximum (larger observation)

Question 17

Q

AVERAGE - SCALAR DATA the mean of the data, how to calculate?

Answer

A

ADD all values and divide by total number of observation to find the mean

Question 18

Q

THE IDEA OF DISTANCE, wha it is?

Answer

A

Standard deviation is the distance of the data from the mean, it does not matter if the value of the observation is above or below the mean because of the squaring (distance matters)

The variance is the standard deviation squared, variance is sometimes more useful than the standard deviation

Question 19

Q

STANDARD DEVIATION CALCULATION

Answer

A

So first calculate the mean by adding al and dividing by the n (number of observations)

x = muna reflects each individual height we pick the first individual height and minus the average (the mean)

And take the square plus we go to the second individual we subtract the mean from the height and square it

Calculate the square differences and add them up (1;40 min in the first lecture)

We divide by N at the end = gives us variance in the inner part

We take the square root gives us the spread standard deviation (sigma a greek letter)

Question 20

Q

What is trimmed mean?

Answer

A

Means there might be very high or small heigh so it’s cutting the lowest 5 % of lower data and get rid o low values (outliners) and 5% of high values if there are any to make it more accurate answer of mean

Question 21

Q

Standard deviation?

Answer

A

1 find mean the u word by adding all numbers and dividing by how many there are

2 the numbers given are then minus the mean of all

3 we then get the inside bracket and we do the “2

4 The E PART add all and divide by ( n ) how many points

5 then we square root it to get standard deviation

Question 22

Q

What is the best measure of spread?

Answer

A

Variance (before square rooting the standard deviation)

Question 23

Q

Standardisation

Answer

A

z = x - mean divided by the standard deviation

x is the number we want to standardise

to get same unit when there are different units

Question 24

Q

When comparing spread of 2 or more distributions we should?

Answer

A

compare the coefficients of variations for each as these take into account differences in the means

Question 25

Q

coefficient of variations

Answer

A

CV = standard deviation (sigma) divided by the mean

if the dispersion around the mean is large there is more uncertain and low accuracy of data

can be positive and negative

Question 26

Q

what are 2 relative measures?

Answer

A

coefficient of variation

and idea of standardising data

Question 27

Q

Different statical measures

Answer

A

all of them are absolute measures

Question 28

Q

IQR uses the middle 50 %| and is

Answer

A

less influenced by extreme values

Question 29

Q

WHAT ARE 3 MEASURES OF AVERAGE (CENTRALITY)

Answer

A

MEAN MEDIAN AND MODE

Question 30

Q

median if there’s sales and frequency table

Answer

A

n = the data points = all frequencies added

then we do n +1 divide by 2

we get the 24th (for example)

then we count from frequencies which one is 24th

order data can be written down like 0 x 5, 1 x 16, 2 x 12 etc to find the 24th data point

Question 31

Q

five number summary TIPS

Answer

A

the LQ = WE FIND THE DATA POINT the that’s the answer

UQ =we find the data point and its the answer

DONT SUBTRACT THEM

Question 32

Q

inter quartile range (IQR)

Answer

A

order data in small to large

FIND THE UQ -LQ BY DOING THE FORMULA n+ 1 divide by 4 and then times by 3 for UQTO find which data point is the number

then find for each of them the number that corresponds to the TH e.g. (9th) - (3th) number

then minus the actual number from the data set and u got IQR

Question 33

Q

If all datapoints all decrease by 7, the IQR decreases by 7 true of false?

Answer

A

FALSE = BECAUSE IQR is a measure of spread not centrality doesn’t change as the dataset moves

Question 34

Q

IQR is not affected by outliners why?

Answer

A

Because it measures the UQ-LQ so it doesnt affect the data point

Question 35

Q

what are outliers?

Answer

A

they are points that are far away from other data points

Question 36

Q

What is a boxplot?

Answer

A

it demonstrates skew in the data

if all sides equal = no skew as there is balance

skew = no equal sides or median not in the middle of data , distribution is more concerted in left or right side

its basically 5 number summary

Question 37

Q

MEAN IF there’s 1,2,3,4,

Answer

A

sum all then divide by 4 (the number of data points)

Question 38

Q

Mean if there is x and frequency table

Answer

A

find mean by doing the x number times by frequency for each of them then add all

then divide by (n) the data points

n = add all of the frequencies together