Lecture 1 Flashcards
What is a variable?
a characteristic of a unit that may vary for different observations
What are the two main types of variables (they each go by 2 terms)?
qualitative (categorical) & quantitative (numerical)
Qualitative uses which 2 scales of measurement?
nominal & ordinal
Nominal
order does not matter e.g. gender
Ordinal
order does matter e.g. education levels
Quantitative uses which 2 scales of measurement?
interval & ratio
Interval
difference of quantities that are meaningful but ratios of quantities cannot be compared e.g. temperature in C
Ratio
ratios of quantities that are meaningful
What is an observational study?
the investigator observes a variable of interest of an existing sample in order to draw conclusions
What is an experimental study?
the investigator examines how a response variable behaves when the researcher manipulates one or more factors to determine the effect of those factors on the response
Cross-sectional data
data collected at the same or approximately the same point in time
Time series data
data collected over several time periods
Spatio-temporal data
data collected at different locations over several time periods
Statistical sampling
the procedure to select a subset from a statistical population that is representative of the population
Frequency for a particular category
the number of times the category appears in the data set
Relative frequency for a particular category
the fraction or proportion of the time that the category appears in the data set
How are qualitative (categorical) variables typically summarized/visualized?
frequency table, bar chart & pie chart
Frequency table
displays the possible categories along with the associated frequencies or relative frequencies
How are quantitative (numerical) variables typically summarized/visualized?
stem-and-leaf plot, histogram & box-and-whisker plot
What does a measure of center attempt to do?
report a typical value for the variable e.g. mean, median & mode
What is it called when a measure of center is calculated with sample data?
statistic
What is it called when a measure of center is calculated with popular (e.g. census data)?
parameter
What is the population mean, how is it denoted & what is its formula?
denoted by mu_x, it is the sum of all the population values divided by the size of the population (N) [insert image]
What is the sample mean, how is it denoted & what is its formula?
denoted by Xbar, it is the sum of all the sample values divided by the sample size (n) [insert image]
Median
the value separating the higher half from the lower half of a data sample
Mode
the value of the observation that appears the most frequently
What are the measures of spread?
range, variance/standard deviation & interquartile range (IQR)
Range
the difference between the largest and smallest values in a dataset
What is the sample standard deviation, how is it denoted & what is its formula?
denoted by s, it is a measure of the amount of variation of data [insert image]
How is the sample variance denoted, what is its relationship to the sample standard deviation & what is its formula?
denoted by s^2, it is the sample standard deviation squared [insert image]
The sample standard deviation can be used as the estimate of the…
population standard deviation
Population standard deviation symbol
sigma
Population variance
sigma^2
IQR
Q_3 - Q_1
Q_1
the median of the lower half of the data (lower quartile)
Q_3
the median of the upper half of the data (upper quartile)
Percentile
a value such that at least p% of the data set is less than or equal to this value (e.g. 25th percentile = Q1)
Lower Fence (LF)
Q1 - 1.5 IQR
Upper Fence (UF)
Q3 + 1.5 IQR
Scatterplot
useful tool to graphically display the relationship between 2 numerical values (each dot represents one observation)