Week 5: Data Exploring + Pre-Processing Flashcards
Mean
Average of all numbers
Median
Middle number in a sequence
Mode
Number that occurs most often within a set
Range
Difference between highest and lowest values
Standard Derivation is a measure used to
quantify the amount of variation of data values
Histogram (2 points)
- is similiar to
- gives a rough sense of
- a bar chart but groups numbers into ranges (bins)
- density
Name the distribution
normal
Name the distribution
right skewed (where tail goes)
Name type of distribution
Multimodal
Draw
- positive linear association
- negative linear association
- non-linear associaition
- no association
Scatter plots show…
how much one variable is affected by another
Correlations show
how strongly pairs of variables are related
What is the measure of correlation?
correlation coefficient r
1 is perfect
0 is no correlation
-1 is perfectly negative correlation
An outlier is
an observation that lies an abnormal distance from other values in a random sample
How do you identify outliers? (BPRD)
- box plot
- probablitity plot
- dions test
- rosners test