Introduction and Basics- Lecture 1 Flashcards
3 main roles in statistics
- Designing experiments
- Analysing data
- Drawing conclusions (understanding results)
Definitions:
1. Data
2. Statistics
3. Population
4. Sample
- consists of information that comes from observations, measurements, responses
- science of collecting, anlaysising and organising data. Involves interpretation
- collection of all outcomes, responses, measurements that are of interest
- subset of a population
Descriptive statistics
Involves organisation, summarization and display of data: e.g words, graphs, captions, numbers
Inferential statistics
Using a sample to interpret the results and draw conclusions about population
Design of a statistical study
- Identify variable of interest and population of the study
- Detailed plan to collect data- ensure sample is representative of population if using a sample
- Collect data
- Describe data
- Interpret data and make decisions about population using inferential statistics- drawing conclusions
- Identify possible errors
Methods of data collection - Observational study?
Researcher observes and measures characteristics of interest of part of population
Methods of data collection - experiment?
Treatment is applied to part of a population, and responses are observed
Methods of data collection - simulation?
use of a mathematical or physical model to reproduce the conditions of a situation or process
Methods of data collection - survey
investigation of one or more characteristics of a population
1. census = measurement of an entire population
2. sampling = measurement of part of population
Defintion - stratified sample?
members from each segment of a population, to ensure each segment is represented
Defintion - cluster samples?
all members from randomly selected segments of a population
Defintion - systematic samples
each member of the population is assigned a number. Starting number is randomly selected and sample members are selected at regular intervals
Defintion - convenience samples?
only of availbale members of the population (can be used as a pilot study but it is not representative of the whole population, will be biased)
Discrete variable
indivisible categories e.g class size, number of children in a family
Continuous variable
infinitely divisible into whatever units e.g time, weight. Time can be measured to the nearest minute, second, half - second etc.
Measuring variables
Requires a set of categories = scale of measurement and a process that classifies each individual into one cateogry
4 Types of Measurement scales
1. Nominal scale
2. Ordinal scale
3 Inverval scale
4. Ratio scale
- unordered set of categories indentified only by name
- ordered set of categories
- ordered series of equal-sized categories
- interval scale where a value of zero indicates none of the variables
correlational study
determine whether theres a relationship between 2 variables, describe relationship and observe 2 variables as they exist naturally
manipulated variable
independent variable
observed variable
dependent variable
central value characterisation of whole set of data
measures of central value e.g mean or media must be coupled with measures of data dispersion (average distance from the mean) to indicate how well the central value characterises data as a whole. The smaller the narrow window data, the better the representation of data.
center measurement defintion
summary measure of overall level of dataset e.g mode, mean, median
median sensitivity?
median is less sensitive to outliers (extreme scores) than the mean, thus better measure than the mean for highly skewed distributions
variability (dispersion) measures what?
amount of scatter in a dataset with methods used to represent this: range, variance, IQR, coefficient of variation. Most common is standard deviation
what is variance?
variance of set of observations is the average of the squares of the deviations of the observations from their mean
standard deviation
square root of the variance and variance showing how the data varies across collection of sample set. Large standard deviation indicates data points are far from the mean
data collection and selection of sample sizes makes difference why?
if there are 9 samples, can be assumed as 1 dataset with N=9
BUT can also be assumed as 3 datasets from 3 independent studies, N=3
mean remains the same but changes standard deviation
standard error
standard deviation of sample means and a measure of how representative a sample is likely to be of the population
large standard error?
a lot of variability between the means of different samples, thus sample might not be representative of population
small standard error?
most sample means are similar to the population mean, thus sample is accurate reflection of population
frequency distribution- best visualisation of data?
Histogram, but number of bins are important. Histograms not good when you dont have enough data. (too many bins = noisy, too few bins can mask out important features)
1.normal distribution, 2.skewed distribution, 3.modality distribution
- central bellcurve shape uniform
- shifted to left (positive) or right (negative)
- 2 efective central values and 2 populations of responses
z scores?
used to convert any normal distribution such that:
- mean = 0
- standard deviation - 1
important z score: +- 1.96 (removes outlying data- 2.5%)
z score calculation?
𝑧=(𝑋−𝑋̅)/𝑠
null hypothesis?
nothing is happening
alternate hypothesis?
what you’re expecting to happen is happening, trying to disprove null hypothesis
p- value?
Probability that the observed statistic is equal to or more extreme, than observed result then Ho is true.
trying to find at which point you have enough evidence against null hypothesis to support actual alternate hypothesis.
smaller p value?
swinging against null hypothesis, further towards end of bell curve, stronger evidence against null hypothesis