Lecture 8 Flashcards
What is a variable
Any quantity that can be measured
Finish the sentence “In a dataset there will be ___ of a variable for each individual in the sample”
Observations
What’s a central tendency
The typical value of a variable
What is dispersion
How far from the typical value the individual observations of a variable are
What is an association?
How a variable relates to another variable
What are inferential statistics?
Stats used to make predictions about parameters of the population based on two factors
What are parameters?
Characteristics
What estimates the parameters?
Statistics computed from a sample
What is probability?
The chance that a particular event will occur
What is sampling distribution
The probability that we obtain the parameters observed in our sample
What is hypothesis testing?
The data supporting our beliefs about the population
When do we use descriptive statistics?
To summarise sample data
What do we use statistical inference?
To generalise about population parameters
What determines or influences what statistical methods we can apply?
The level of measurement of the data
What’s descriptive statistics for?
To summarise the key features of data.
- To make it understandable for human readers
- To identify characteristics
- To identify patterns
- To provide basis for further analysis
What are measure of Central tendency?
Mean (x̄), median (M), mode (Z)
What are measure of central tendency
Single number that represents the ‘typical’ value of a variable (an average: mean, median, mode)
How would you visualise data?
In frequency tables i.e. Bar charts and Histograms
What is skewness?
Distributions that have a relatively higher proportion of values at the low (left) or high (right) end of the range (on the graph)
Where can you visualise skewness best?
Comparing values of means, median and mode in histograms
What does a normal distribution look like?
Evenly spread above and below the mean (bell shape)
Which side does a positive skew lean towards?
Right
Which way does a negative skew lean towards?
Left
What is the mean the best representation of?
The average in most cases of continuous data
What does the median identify?
The central point
What is the median useful for?
Correcting skewed data or when continuous variables are measured on subjective scales
What is the mode suitable for?
Nominal data or grouped data
What does dispersion measure?
How far, on average, each observation is from the central tendency (mean)
What does the dispersion figure represent?
The variation in values within a variable
What do lower values of dispersion indicate?
That the central tendency (mean) is a better representation of the ‘typical value’ (more accurate)
What does the range and interquartile range provide?
A basic measure, useful for visualisation and identifying outliers
Why should we use variance and standard deviation?
They are more statistically powerful measures
What’s the interquartile range?
The range of the middle 50% of values (Median of upper and lower halves)
What is variance?
The mean of the squared differences between each data point and the mean
What is standard deviation
Square root of the variance (most common measure of dispersion)
What can measures of dispersion not be applied to?
Nominal variables
What is a good visual form for understanding dispersion of a variable and identifying outliers?
Box plots
What is a plot outlier?
Values, figures, or data that lie outside the box plot limits
How do you calculate variance?
Mean of the squared differences between each value in the dataset
How do you calculate standard deviation
Square root of the variance
What does the measure of association consider?
The relationship between two variables
What does Kurtosis mean?
Flatness
What is a large SD? (Standard deviation)
Flat distribution
What is a small SD? (Standard deviation)
Narrow distribution
What does standard deviation tell us about in terms of distribution?
The flatness of distribution
What’s the statistic for categorical data?
Chi-squared x^2
Whats the statistic for continuous data?
Pearson’s correlation coefficient (r)
What does Chi-Squared measure?
The association between two categorical variables
What does chi-squared compare?
The expected frequency if there was no relationship with the observed frequency in the sample data
Correlation (r)
Strength and magnitude (direction) of the association between two variables
What does a positive correlation mean?
Increase in one variable associated with an increase in the other
What does a negative correlation mean
Increase in one variable associate with a decrease in the other
What does 0 correlation mean
No association
What does Chi-Squared rely on?
Testing for statistical significance
What is a statistical significance
Importance or quality of the data/stats
What is a critical value
How far from expected centre do you need to be before saying something is unusual here
Correlation is used for?
Ordinal and scale data (continuous)
Chi-Squared is used for?
Nominal (Categorical) data
What is covariance?
The degree to which two variables deviate from their expected values (mean) in similar ways
What does a positive covariance indicate?
Variables that tend to ‘move together’ away from
their means: if we observe a high value of x, we also expect to see a high value of y
What’s a scatter plot good for?
Checking if there is a linear relationship between two variables
What does negative covariance indicate?
Variables that move in opposite directions: if we
observe a high value of x, we expect to see a value of y below its mean
What is a strong correlation’s r value?
r = ± 0.8
What is a weak correlation’s r value?
r ± 0.3
What is an omitted variable?
A factor that could lead to changes in X and Y
What is a reverse casuality?
A change in Y leads to the change in X
What is Sample selection bias?
When Individuals sampled have a different tendency to show the association than the whole population
What’s a measurement error?
Values in the data that differ from the true value of the variable
What does association NOT imply?
Causation
What are alternative reasons for finding a relationship?
Omitted variables, reverse casuality, sample selection and measurement error
What does the test we use depend on?
Data meeting certain assumptions
What does the statistical inference process rely on?
Estimating the probability of obtaining our sample results, based on the distribution of sample statistics and population parameters.
What’s the standard normal (Z) distribution’s mean and SD value?
Mean = 0
SD = 1
What does a values Z-score represent
How many standard deviations it lies from th mean?
What does a higher Z score mean?
A lower probability of observing the value
What is the central limit theorem?
the sampling distribution of the mean will always be normally distributed, as long as the sample size is large enough
What is the ‘sampling distribution of the sample means?’
The distribution consisting of the means of all
random samples (n) of a given size that can be
drawn from a population.
Alternative hypothesis (H1) means
what we believe the
data will support
What’s the Null hypothesis (H0)
It covers all states we want to
disprove
What is the statistical significance level?
0.05 (Significance level of 5%)
Give me 4 levels of measurement data
Nominal, ordinal, interval or ratio data
What do larger samples provide?
More reliable results