Chapter 2 Flashcards

1
Q

Scatter plots

A

Graphs that provide case-by-case views of data for two ‘numerical variables
X axis shows explanatory variable, y-axis shows response variable.
Car be helpful to quickly spot associations relating variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Dot plot

A

One variable version of a scatter plot to show the distribution of data. Mean is usually shown

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Histograms

A

Graph of data density. Groups data into bins to display as bars, rather than individual date points.

When data trails off on one side, it is said to be skewed in that direction, i.e. If the bars trail Off on the right, it is right showed- also called a long right tail
Roughly equal tails on each side are then called symmetric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mode

A

A mode represents a prominent peak in a distribution ( histogram)
Unimodal - only one prominent peak
Bimodal -2 prominent peaks
Multimodal - 3 or more prominent peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Deviation & variance

A

The distance of an observation from the mean = deviation.
X minus mean of x

Variance is the squared deviations & then averaged. Gets rid of negatives.
(Deviations squared, added together) / n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Standard deviation

A

The square root of the variance. Represents the typical deviation from the mean.
Usually 70% of data falls within 1 standard deviation and 95% within 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Box plots

A

Summarize data using 5 statistics and plots outliers.
Median - thick line to separate data in half
Box of interquartile range (IQR) - measures the variability of the data. (The more variability in the data, the larger the std deviation and IQR).
Q1 First quartile shows the 25th percentile, meaning 25% of data falls below the line
Q3 third quartile shows 75th percentile
IQR = Q3 - Q1 which is 50% of the data
Whiskers capture data outside the box. Never more than 1.5IQR
Upper is Q3 + 1.5
IQR. Lower is Q1 - 1.5*IQR
Whiskers stop at the highest or lowest point if they do not reach this maximum.
Outliers are data points beyond the whiskers. Useful for identifying strong skew in distribution, possible data collection/entry errors, & insight into interesting properties of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Robust statistics

A

Stats where outliers have little effect on their values such as median and IQR

Mean & std dev are highly influenced by extreme observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Transformation of data

A

Rescaling of data using a function - helpful for strongly skewed data where much of the data is clustered near zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Intensity maps

A

Graphic of geographic data using colors to show values of a variable, not helpful with getting /showing precise values more so with seeing trends

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Contingency tables & plots

A

Summarizes data for 2 categorical variables where each value represents the number of times a particular combination of variables occurred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Row or column proportions in contingency tables

A

Row or column proportions use a fractional break down of one variable in another
Row proportions are computed as counts divided by their row totals & the cases are the proportions or percentages of that case
Columns are the same just using column totals instead,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Stacked bar plots or side-by-side ber plots

A

.graphical display of contingency table information
Stacked bar include two variables in one bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Mosaic plots

A

Visualization technique suitable for contingency tables that resemble a standardized stacked bar plot with the added benefit of still seeing the relative group sizes of the primary variable as well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Comparing numerical date across groups?

A

Using side-by-side box plots or hollow histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly