Applied Quantitative Methods Flashcards
Variable
Usually denoted by capital letters such as X or Y, is a
characteristic or measurement that can be determined for each
member of the population.
Numerical variable
Take on numerical values.
Continuous variable
We measure it. Distance, height, GDP in kr., value of cars sold in kr.
Categorical variable
Known as qualitative data where the data is categorised (smoking vs non-smoking, vote (yes/no)) - the numbers in this type of data are purely for identification purposes (Ronaldo no.7, Christian Eriksen no. 10.)
Population
Collection of persons, things or objects under study.
Sampling
Select a subset (or portion) of the population, to gain information about population data.
Sample
Resulting data from sampling a population.
Statistic
Number that represents a property of the sample (e.g., sample mean, sample variance, etc.)
Parameter
Numerical characteristic of the whole population (e.g.
population mean, population variance, etc.)
Simple Random Sample
Chosen by a process that selects a sample of n objects from a population (N) in such a way that each member of the
population has the same probability of being selected.
Sampling Distributions
The population parameter (e.g., mean µ or variance ‡2), is a fixed (but
unknown) number.
But each sample from a population, has a different value of the mean and
variance. If you pick many samples and calculate the mean (and variance) of each sample, then the sample means (and variances) become a variable, which
can be treated as a random variable with a probability distribution.
Law of large numbers
States that given a random sample of size n from a population
N, the sample mean X¯ will approach the population mean µx as the sample size n
becomes large
Central Limit Theorem
States that the mean of a random sample, drawn from a population with any probability distribution, will be approximately: normally distributed given a large-enough sample size
Acceptance Interval
Is an interval where the sample mean has a high probability of occurring (given that we know the population mean and variance) If the sample mean falls within that specified interval, then we can accept
the conclusion that the random sample came from the population with the known mean and variance.
Distribution of sample proportion
Assume, we are dealing with a qualitative or categorical variable
For example, we investigate a characteristic (e.g. smoker/non-smoker) and note 1 if an individual has this characteristic and 0 otherwise. The (unknown) proportion of ones in the population is denoted P. We have a sample of 0 and 1 values.
Chi-Square Distribution
If we can assume that the underlying population distribution is
normal, then it can be shown that the sample variance and the
population variance are related through a probability distribution.
Student’s t Distribution
In this case, σ is replaced by the sample standard deviation (s):
t = X¯ − µ/ s/ √n
This random variable follows a member of a family of distributions called.
Sample Size for Population Proportion
Whatever the outcome, pˆ(1 − pˆ) cannot be bigger than 0.25 (i.e, when the
sample proportion is 0.5)
Thus, the largest possible value for the margin of error, ME, is given by
the following:
n =
0.25(zα/2)2/(ME)2
Null hypothesis and alternative hypothesis
We start with a hypothesis about the parameter - called the null hypothesis
- that hold unless there is strong evidence against this null hypothesis.
If we reject the null hypothesis, then the second hypothesis, named the
alternative hypothesis, will be accepted.
P-value
Getting p-value is the most popular procedure for considering the test of the null hypothesis in statistics
The p-value is the probability of obtaining a value of the test statistic as extreme
as or more extreme than the actual value obtained when the null hypothesis is true
p-value is the smallest significance level at which a null hypothesis can be rejected, given the observed sample statistic.
Significance level
In practice it can be necessary to decide that at what p-value we are going to
reject H0
The decision can be made if we have decided on a so-called α-level, known
as the significance level of the test
We reject H0, if p-value is less than or equal to α
We typically use 5% or 1% significance levels.
Tests of the difference between two population proportions
We consider the situation, where we have two qualitative samples and we
investigate whether a given property is present or not:
The proportion of population 1 has the property Px , which is estimated by pˆx
based on a sample of size nx
The proportion of population 2 has the property Py , which is estimated by pˆy
based on a sample of size ny
We are interested in the dierence py ≠ px , which is estimated by d = ˆpy ≠ pˆx.
Regression
Regressions are typically use to test whether two or more variables are
statistically related
In basic statistics, to explore the relationship between two variables.
Cross-sectional data
We can use numerical variables and also qualitative (or
categorical) variables in regression models