Stats Flashcards
Discrete data vs CONTINOUS
Discrete can only take certain values, and her euse things like bar chart
CONTINOUS are any values over a range , and csntbe listed as this is infinite without being missed out
Remmeber to not overlap any data with discrete data
And how choose intervals?
But for CONTINOUS?
So honbeloew,
Yiu can use inequalities
Use the ones thst gived best feel for the data without too small
Why stem and leaf diagram good!
Yiu lose raw data when grouping in frewuency, but stem and leaf preserves raw data whilst organising them to see thr distribution
Makes easy to see the mode and calculate median too
And can compare two things side by side
And ken5h if line gives shape of each distribution
, and asored too
How to represent discrete/ categorical data
Use bar charts frewuency diagrams and pie chart
Reason pie chart could be better as ti shoes you the distribution as a proprtionsl easy tk see, as it scaled to 360 so easy to compare side by side
But loses raw data , ehwreas bar chsrt preserved
Why use a vertical line chart
Dot plot
Again ti respresent discrete data, but thin line is good because bar charts the width of the line is sometimes cinfused with the range of values it represents when even tho it jus represent done value
Dot plot allows you to see data much. Ore Emily’s as you can give a key to the dots and then comparing data very easy
Why do we use a histogram instead of a frequency against class
This gives distorted picture of the data, as if your class width is big, the area is huge, making it seem like there’s more when there isn’t
Thus we plot against area , and maybe a scale factor too
Here frewuency don’t have ti be equal but proteins, to area so valid comsoriodms csn be made
And to do this frequency density against class width should be llotted
And be more specific too
Skews of data
If more data is positive then it’s negriabley skeewd
If more data negative then positbe oh sekewed
And same is symmetrical
What is midrange
Midrange is addition of highest sndlowest and divide by 2, whereas range is higher - lowest
Mean median möde
Mean can be easily influence by car ash outliers as it takes an all data values
Median typically not influenced by outliers, gives a good representation of the average, but should only be used if no outliers
Möde is most frequent , only good if there is high frequency data ,motherwise pojtnelss
Midrange is also susceptible to outliers in this case, assumes data is symmetrical etc typically don’t use this
How to find mean median mode range in frewuency tables
Mean = frewuency times x and then Ober total frewuency
Median is given, as itd already ordered, if it’s even add one and divide by 2, so it’s gonna be between 10 and 11 basically find those and mean it rest od easy
But for grouped data?
Mean is midpoint times frewuency average
Median is grouped so DONT ADD ONE ,
Jus divide by 2, and linearly extrapolate and calm
Remember for ages what to do
If there are ages, you have to make it next year, because 29 years old are 29 until 30, so midpoint would be in 5s or whatever
How to linear interpolate
Grouped data
= jus divide by 2
Find how much in cumulative
Divide by frewuency multiply by class width and add it on
Bis plot
Plots the median the lowest the highest the lower quartile (25% and upper too)
Find these by finding the median = if it’s odd, discord this and find median of the other two
But if itdeven keep and find even medians of other two aswell
How to compare box plots
Compare them by
- saying the range
- higher median = higher average
And interquartile shows spread of middle 50% data , so if bigger than greater spread
You can also use box plots to detect skew, if the median closer to upper quartile than that means most of data is on higher values = negatively skewed