Stats Flashcards

1
Q

Discrete data vs CONTINOUS

A

Discrete can only take certain values, and her euse things like bar chart

CONTINOUS are any values over a range , and csntbe listed as this is infinite without being missed out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Remmeber to not overlap any data with discrete data
And how choose intervals?
But for CONTINOUS?

A

So honbeloew,

Yiu can use inequalities

Use the ones thst gived best feel for the data without too small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why stem and leaf diagram good!

A

Yiu lose raw data when grouping in frewuency, but stem and leaf preserves raw data whilst organising them to see thr distribution

Makes easy to see the mode and calculate median too
And can compare two things side by side

And ken5h if line gives shape of each distribution
, and asored too

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to represent discrete/ categorical data

A

Use bar charts frewuency diagrams and pie chart

Reason pie chart could be better as ti shoes you the distribution as a proprtionsl easy tk see, as it scaled to 360 so easy to compare side by side

But loses raw data , ehwreas bar chsrt preserved

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why use a vertical line chart

Dot plot

A

Again ti respresent discrete data, but thin line is good because bar charts the width of the line is sometimes cinfused with the range of values it represents when even tho it jus represent done value

Dot plot allows you to see data much. Ore Emily’s as you can give a key to the dots and then comparing data very easy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why do we use a histogram instead of a frequency against class

A

This gives distorted picture of the data, as if your class width is big, the area is huge, making it seem like there’s more when there isn’t

Thus we plot against area , and maybe a scale factor too

Here frewuency don’t have ti be equal but proteins, to area so valid comsoriodms csn be made

And to do this frequency density against class width should be llotted
And be more specific too

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Skews of data

A

If more data is positive then it’s negriabley skeewd
If more data negative then positbe oh sekewed
And same is symmetrical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is midrange

A

Midrange is addition of highest sndlowest and divide by 2, whereas range is higher - lowest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Mean median möde

A

Mean can be easily influence by car ash outliers as it takes an all data values

Median typically not influenced by outliers, gives a good representation of the average, but should only be used if no outliers

Möde is most frequent , only good if there is high frequency data ,motherwise pojtnelss

Midrange is also susceptible to outliers in this case, assumes data is symmetrical etc typically don’t use this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to find mean median mode range in frewuency tables

A

Mean = frewuency times x and then Ober total frewuency
Median is given, as itd already ordered, if it’s even add one and divide by 2, so it’s gonna be between 10 and 11 basically find those and mean it rest od easy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

But for grouped data?

A

Mean is midpoint times frewuency average

Median is grouped so DONT ADD ONE ,
Jus divide by 2, and linearly extrapolate and calm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Remember for ages what to do

A

If there are ages, you have to make it next year, because 29 years old are 29 until 30, so midpoint would be in 5s or whatever

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to linear interpolate

A

Grouped data
= jus divide by 2
Find how much in cumulative
Divide by frewuency multiply by class width and add it on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Bis plot

A

Plots the median the lowest the highest the lower quartile (25% and upper too)

Find these by finding the median = if it’s odd, discord this and find median of the other two

But if itdeven keep and find even medians of other two aswell

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to compare box plots

A

Compare them by
- saying the range
- higher median = higher average
And interquartile shows spread of middle 50% data , so if bigger than greater spread

You can also use box plots to detect skew, if the median closer to upper quartile than that means most of data is on higher values = negatively skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Outliers with quaritle d

A

Is anything thst is 1.5 the iqr above the upper and below the lower

17
Q

Cumulative frewuency curved

A

Plot them sgaidnt the upper boundaries , and then can use to find mkedisn , 50% of data and quartile etc

And if itsdiscrete data, it’s a less than or equal to mark , so make sure to consider thst

18
Q

Standard deviation

A

Is jsed alongside the mean jus as median with IQR

Deviation is the distance between the value and the mean, can give negative values
But these all add up to 0 so no point in using them
Instead if we swaure them they get rid of the negatives

However to compare to samples of data the number of data points mudt be tsken into coconut
So divide by the number, nut there are only n-1 independent measures as yiu cns deduec the last one if you know the first 4, as they add to 0

So divide this by j-1 and yiu have variance
Swaure root and yiu have standard deviation

19
Q

Formula for standard deviation

A

SQUARE ROOT of sum of x 2 - n x mean sqaured
Over n-1 and this is square root

20
Q

But if you using a frewuency tbsle how to find standard deviation

A

Sum of f x X2 and everything else the same !

Square

21
Q

Finally using standard deviation for outliers?

A

All data which are over 2 standard deviations above or below the mean are outliers

And IQR was 1.5
So Yh +- 2 standard deviation from mean value!

22
Q

Remember with correlations on scatter graphs, what to be wary about when making conclusion

A

Be wary thst if it’s a low correlation or even high , it suggests only never MEANS
Correlations nevernimlloes causation so yeah

23
Q

What is a discrete uniform distribution

A

Where all have the SAME CHANCES KF BEING SELECTED