Chapter 2-Summarizing Data Flashcards
Non linear
Plots that have not visible sign of curvature or straight lines in the data points
Scatter plot
A plot that shows each data point in an experiment
Dot plot
A one variable scatter plot
Mean
Also commonly referred to as the average. It is a way to measure the distribution or spread of the data. Compute by adding up all the individual data points and dividing by the number of observations.
Histogram
A type of plot that groups the data into categories and shows frequency within the categories. Provide view off data density. Useful for showing shape of the data distribution
Data density
Frequency of data in a given category. What you see in a histogram.
Right or Left Skewed
When data trails of (diminishes) towards the right, it is right skewed.
When the data is less on the left, long thinner trail on the left, it is left skewed.
Mode
Prominent peak in the distribution. Can be unimodal (1peak), bimodal (2peaks), or multimodal (greater than 2 peaks)
Deviation
How far an observation/data point is from the mean
Standard deviation
How far away the typical data point/observation is away from the mean. This varies from deviation in that it looks at the “typical” observation while deviation looks at the individual datum. Derived from the square root of the variance.
About 70% data within 1 standard deviation and 95% within 2 standard deviations, not a hard rule
Variance
Average squared distance from the mean.
Sum of all deviation)^2/(#observations-1
Box plot
Summarizes data set using 5 statistics (median, interquartile range, first quartile, third quartile, whiskers) while also plotting unusual observations, outliers.
Median
Splits the data in half. Often confused with the mean. It takes the value from the observation that lands in the middle.
Interquartile range
The middle 50% of the data/observations. IQR for short.
IQR=Q3-Q1
First quartile
The first 25% of the observations
Third Quartile
The 75th percentile our the last 25% of the observations
Whiskers
The data inside the box within 1.5xIQR
Outliers
Data bound the whiskers, 1.5xIQR
Purposes:
- ID strong skew in the distribution
- ID possible data collection or data entry errors
- insight into interesting properties of the data
Transformations
Rescaling the data using functions to more readily make statistical models without destroying the statistical integrity
Intensity Map
A map with colors to show variations of intensity
Contingency Table
Summarizes data for two (categorical) variables, each value represents #of times that particular combination of variables outcomes occurred
Stacked bar plot
Graphical display of contingency table.
Most useful when one variable =explanatory and one variable =response.
Pie chart
A bar plot representing contingency table data useful for giving a high level overview
Side-by-side box plot
Traditional tool for comparing across groups
Null hypothesis (H-sub0)
Represents status quo; general statement or default position where there is no difference between two measured phenomena or that the two samples derive from the same general population
Ex: H-sub0 : p >0.6
Alternative Hypothesis (H-subA)
Position which states something is happening, a new theory is preferred instead of an old one.
Ex: H-subA : p < 6