Tables and graphics Flashcards
Nate Silver
The signal and the noise - 90% of data collected in the last 2 years
Charles Babbage
1791-1871
Errors using inadequate data are much
less than those using no data at all
John Wilder Tukey
1915-2000 Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise
Quantitative data reporting through figures
Aims to communicate information without distorting underlying results
Three main applications in science
- Experimental design
- Exploratory Data Analysis (EDA)
- Presentation/publication of results
Tables
Complex data and actual values.
Don’t tabulate what can easily be said
Figures
Trends and patterns and highlighting differences
Constructing tables
Always tabulate vertically so you read down, not across.
Figure axes
X/horizontal axis = what we manipulate.
Y/Vertical axis = the response
Figure variability
Include measure of variation and sample size
Exploratory Data Analysis (EDA)
What variation occurs within and between my variables.? Plot data before stats.
Helps to identify underlying structure and decide most parsimonious model.
Identifies outliers.
Anscombe Quartet
4 pairs of variables, 11 observations in each, same mean and fitted regression, same R^2 but very different situations.
Everything is not as it seems.
Linear model assumptions
- Linearity between response and predictors
- Residuals are normally distributed
- Residuals have equal variances
- No overly influential points
Classes of data
Categorical, ordinal (ranked, ordered), Measurement (ratio (with 0)/interval)
Categorical graph type
Bar graph
Quantitative graph type
One variable - Box-plot
Two variables - maps
Many - Icon
Mixed graph type
Two variables - Bar graph
Many - Bar graph
Histograms
For frequency distributions of continuous variables.
Bars drawn together.
X = classes, y= frequency
Frequency polygon
Frequency distribution - similar to histogram but with lines
Dot plots
Density plot - not affected by subjective choice of number/width of bars. Full dataset.
Box-plots
50% of data in the box, 90% of data in the whiskers
Notched box-plots
Notch with median and 95% confidence interval
Violin plots
Similar to box plots but reveal features - more accurate and reveal true distribution. Width proportional to frequency. Widest point is mode..
Pie charts
Rarely used in science - consider stacked bar chart instead
Bar graphs
NOT the same as histograms. Bars are separate and represent summary data (mean and variation).
Don’t reveal distribution.
Scatter plots
Show degree of association between two continuous variables
SPLOM
Scattered plot matrix
Useful for EDA.
Shows shape and distribution of all variables and bivariate relationships between them.
Category plots
Scatter plots stratified by a third (categorical or ordinal) variable
Line graphs
Time series or temporal data