Finals Flashcards
What does the Central Limit Theorem prove?
The sampling distribution of the mean is approximately normally distributed once σ is known and n sufficiently large.
What is a problem about the Central Limit Theorem? What can we use instead?
The standart deviation of the population, which we need to calculate CLT, is often not known. We can perform a one-sample t-test , which we need a sample standart deviation for.
What is a one-sample t-test? When should you use it? +Formula
It is used to compare a result to an expected value.
You should use this test when:
- You do not know the population mean or standard deviation.
- You have two independent, separate samples.
Formula: t = ( x̄ – μ) / (s / √n)
What can you tell me about Exploratory data analysis (EDA)? (2 bulletpoints)
- EDA is for seeing what the data can tell us beyond the formal modelling or hypothesis testing task.
- It is also the best paradigm to make statements about both validity and reliability
What scales of data are there? Explain and give an example. (a lot of text)
- Categorical (Nominal)
○ uses labels to classify cases into classes
○ gender, nationality, residence, car brand - Ordinal
○ monotonic increasing function
○ if X > Y then log(X) > log(Y)
○ PRESERVES ORDER NOT MAGNITUDE
○ ratings and rankings
○ Example: not all, slightly, fairly, much, very much - Interval
○ Y = aX + b
○ i.e. What is the exact temperature in your city? - Ratio
○ Y = aX
○ difference to ordinal: produces not only order but also makes the difference between variables known along with information on the value of true zero
○ i.e: how many children? 0, less or equal than 2 , more than 2
○ IT HAS A NATURAL ZERO POINT (total absence of the variable of interest, i.e. not having any children)
What are the properties of a reliable research tool?
A reliable research tool is consistent, stable, predictable and accurate.
What does the parallel forms reliability do? When do you use it?
It measures the correlation between two equivalent versions of a test. You use it when you have two different assessment tools or sets of questions designed to measure the same thing.
What does the test - retest reliability do?
It measures test consistency of a test measured over time.
What is the split half technique? What would ensure an acceptable level of reliability in the measurments?
It is a method used to check measuring instruments where half of the data is computed and is then correlated against the other half of the data. A correlation coefficient of 0,9 would ensure an acceptable level of reliablity in the measurments.
What is the inter - rater reliability? How is it calculated?
It is the extent to which two or more raters agree. It is calculated by COHENS KAPPA. (Formula: K = (po-pe) / (1-pe) )
How is the standard normal distribution curved? Give its 2 parameters and their values.
It is bell curved.
The parameters are the mean ( = 0 ) and the standart deviation ( = 1 ).
What is the difference between a T-distribution and a normal distribution?
A T-distribution is a normal distribution with heavier tails.
What do you know about the Monte Carlo method?
● Any problem that might be deterministic in principle can be solved by MC. It relies on repeated random sampling in order to obtain a good estimate or approximation of the exact p-value.
● MC is used when the data set does not meet the requirements necessary for parametric or asymptotic methods.
● Computing an exact p-value is possible via Exact tests, Randomization tests, but only for small data sets. MC can also work with large data sets.
● The Monte Carlo method tells you:
○ All of the possible events that could or will happen,
○ The probability of each possible outcome.
What are regularisation techniques? + Examples
They are techniques used in Bias Variance Trade-Off which create bias by slightly changing the slope of the regression line. Lasso and Ridge regression are examples.
Explain “Bias Variance Trade-Off”? What is the name of the techniques used here?
● Bias–variance tradeoff is the property of a model that the variance of the parameter estimated across samples can be reduced by increasing the bias in the estimated parameters.
● Regularization techniques are used.