2.1, 2.2 Visualizing Numerical Data Flashcards
UD
It doesnt have CVs, rather it’s one graph for ONE of the NV headers (its values and frequencies)
No other variables to compare it with (correlational), just observing the variability within itself (descriptive research)- the WHAT
Organize through different patterns of sample for better visualization + conclusion
- dot plots, histograms, stemplots
- summarize: shape- symmetry?
Center- most common value?
Spread- any data values farther from the rest?
Analyze distributions (dot plot)
CHECK what each dot represents
Questions will center around the observation of the numerical values- patterns, density (popular values)
EX: what percentage are at least 68 mm?- researcher wants the amount of sampled measurements thats in that range (maybe its useful for their TOI)
Just because there is more dots on one value than teh rest- watch out for high but sparsed out variability outside fo that, because they accumulate in comparison to the obvious one
To accurately determine : obvious dots/ total dots
On dot plots it shows each person or each country aligning with whichever numerical variable they fall under (18 or 20)/ are a value of
Categories
We are looking at numerical distributions of the numerical variable for a conclusion of TOI
You cant put categories on a dot plot, but DO put categories underneath or beside the graph
Outlier
Not like other dots
Find true (not obvious) mode
Similar to dot plot card
1. Count dots of inquired numerical variable (age 18- dots)
2. Divide by total dots (all students on each age category)
3. Smaller than 0.5%? NOT MODE
Relative frequency
Proportion in DECI form - frequency/ total
Proportion of the observations that exhibit the relevant characteristic (<- numerical variable)
Shown on frequency table (IMP- this frequency table has NUMERICAL HEADERS) - standard deviations too
Shows how FREQUENTly each numerical value shows up on a graph
Frequency-> counts
Bimodal distribution
When distributions show 2 or more OBVIOUS peaks-> 2 different categorical wholes that are mostly polarized because of CIRCUMSTANCES attached to each
Eyeball-> if there is an obvious peak, but unlike unimodal peaks, the rest of the bins don’t cascade down in size, ONE is brave enough to stick out
EX: westerners are guaranteed less variability in our life expectancy
“More than 100” (dot plot)
Dots on 100 do NOT count
Center
most obviously the center numerical variable, not mean or typical value
N=
When it says “n=“ this gives the total number of dots on the graph
ALSO is the total number of numbers for the mean
You can also deduce whether or not there are significantly less of a CW when looking at how much less they show up in comparison to the other CW in a study with two graphs (gender)
Eyeballing variablity
measure variability by eyeballing it
1. Find center
- least variability= MOST dots/ values in center (EVERY girl has this!- nobody’s different smh)
- most variability= MOST sparsed dots/ values (balance= variability)
Histograms rules
Have frequency indicator on side IN PLACE of the dots that we could easily count before (for gauging like we did with dots)
Running number line so bars have to touch each other.
Rules:
- first post of bin counts but not its second
- any data value that lands ON a post (1st or second) automatically belongs in the bin to its right (therefor is represented by this bin + classified by it)
Relative frequency histograms
How much a percent is representative of a whole- frequency/ total
Y axis takes the frequency (count) of the numerical VALUES per numerical VARIABLE, then divide that by the total number of entries
Percentage of how much Numerical data falls into this bar (the one that matches it in height) using the total as reference
EX: bar at 5 counts -> 5/28 -> its now at 0.18 (18%) (SAME HEIGHT)
Find percentage of the relative frequencies accounted for within a certain amount of numerical values-> add the two relative frequencies you find
TIP- 4/100
We do this so we can see it through its accurate *LIKLIHOOD of showing up (counts aren’t enough)
Histograms on stat crunch
Making bins too small-> too much data to look at
Distorts data because there is a lot of information between TOO wide bars
Average on histogram
Locate bins according to what they ask for AND the bin rules
Measure the frequency of those bars (height)
Add up all the frequencies within your range
Then divide by total (will be given or just add up frequencies)