LM 01 Flashcards
Definition of a data matrix
A convenient way to store data. Examples included tables and spreadsheets. Your trout is a unique case (observational unit) . Each column corresponds to a variable.
Types of variables
Numerical or categorical.
Classifications of numerical variables
Numerical variables can be discreet or continuous.
Classifications of categorical variables
Categorical variables can be ordered or nominal.
Explanatory and response variables
Explanatory variables might affect response variables. For example hours of study per week might affect GPA.
Types of data collection
Observational studies. Researchers collected data passively they merely observe.
Experiments: researchers actively control the data collection trying to establish causation
Sample versus population
Sample is a subset of population. Population is people sample is a group of selected people.
Simple, random sample.
Randomly selected from population. Example cars passing through intersections in Kelowna.
Stratified sample
Cases grouped into strata, then a simple random sampling.
Cluster sample.
Divided into clusters and sample all of an individual cluster. Example all cars at three intersections.
Multi stage sampling
Clusters are sampled for example, cars are randomly sampled at three intersections
Scatterplot
Way to provide case by case view of data. can visualize relationship between two numerical variables
Dot plot
Visualize one numerical variable.
Histograms
Provides a view of the data density. I.e. the data distribution.
Unimodal
A single prominent peak.
Bimodal/multimodal
Several prominent peaks
Uniform
No apparent peaks
Right skewed
Tale of the cat on the right hand side
Left skewed
Tale of the cat on the left-hand side.
Small variance
Sharpen, narrow peak
Large variance
Wide peak
Deviation
Distance from the mean
How to draw a box plot
1) draw a thick line for the median, Q2
2) draw a rectangle with bound Q1 and Q3
3) draw a dotted line for Q1 -1.5 IQR and Q3 +1.5 IQR
4) label, outliers and draw T-shirt, upper and lower whiskers. Only goes as far as either the highest or lowest points.
Robust statistics
Mean and IQR are more robust than mean and standard deviation.
Common practice
Symmetric distributions you use mean and standard deviation.
Skewed distributions you use median and IQR .