Week 2| Desirable probabilities of point estimators and parametic and on parametric techniques the assumptions assumption of normality Flashcards
Regarding the four property that can make point estimators easier to work with and possessed by good point estimators.
What is the first property?
Beta- hat is said to be a linear estimator of Beta if it is a linear function of the
sample observations.
E.g the sample mean X-bar is a linear estimator of the population mean mu
Sample variance s^2 is a quadratic function of the X_i sample observations so it is a non linear estimator of the population variance
What is the second property that can make point estimators easier to work with an are possessed by ‘good’ point estimators?
When beta_1 hat is centered around beta while beta_2 hat isn’t centered around beta. Which one is a more unbiased estimator?
b) Beta-hat is said to be an unbiased estimator of beta if E(beta\hat) = beta
if expected value of beta-hat is equal is beta and thus sampling distribution of beta-hat is centered around beta
when the sampling distribution of beta-hat is centered around beta while the sampling distribution of beta_2 hat is not
Beta_1 is an unbiased estimator
whereas Beta-2 hat is a biased estimator
Beta_1 can estimate beta more accurately than beta_2 hat
Whatis the third property that makes a point estimator easier to work with and are possessed by good point estimators? What about the variances?
Beta-hat is an efficient estimator of Beta within some well defined class of estimators If variance is smaller or at least not greater than that of any other estimator of Beta in the same class of estimators.
What does BLUE stand for to the population means?
BLUE stands for Best Linear Unbiased Estimator of the population mean best means x-bar has the smallest variance in the class of linear unbiased estimators of mu, hence it is an efficient estimator
what is the fourth property of having a estimator easier to work with and a property that a good point estimator will have?
Beta-hat is called a consistent estimator of beta if its sampling distribution collapses into a vertical straight line at the point Beta when the sample size n goes to infinity
As sampling distributions are centered around beta and as the sample size increases, they become narrower
If Beta hat is an unbiased estimator then consistency requires the variance of its sampling distribution to go to zero for increasing n.
For example, X-bar is a consistent estimator of mu
However, if Beta-hat is a biased estimator then consistency requires its variance and the bias to go to zero for increasing n
What are parametric techniques concerned with?
They are concerned with:
a) population parameters and
b) are based on certain assumptions about the sampled population or about the sampling distribution of some point estimator
What does the parametric technqieu assume? (the requirements)
i. The sample has been randomly selected
ii. The variable of interest is quantitative and continuous
iii. is measured on a ratio or interval scale
What is a non parametric test?
- Procedures that are either not concerned with some population parameter or
- based on relatively weaker assumptions than their parametric counterparts, and hence require less information about the sampled population
What are some ways we can check normality?
name the graphs for frst method
the 4 techniques for second (what conditions makes them normal or not)
the 3 methods for the last
i) visually
Using histogram or Q-Q plot
If the histogram is skewed - not normal
if the points on the Q-Q- plot are scattered around the straight line- not normal
ii) sample statistics:
1. mean & 2. median
if mean > median- right skewed
if mean 0
- Kurtosis
A distribution whose tails are relatively long and thus has more outliers is leptokurtic (thin graph centred around beta but long tails, lepto is for thin, fine)
A distribution whose tails are relatively short are thus has fewer outliers is called platykurtic (platus is for broad , flat)
K=3 symmetric
K>3 for leptokurtic
K<3 platykurtic
iii) Testing with normality
Shapiro Wilk test
H_0: data comes from normally distributed population
H_A: the data comes from a non-normally distributed population
iii) formal hypothesis tests
What are the 2 shortcomings of the shapiro wilk test?
i) At small sample sizes (n<20) when normality assumption can be crucial, it has little power to reject H_0 even if population is indeed not normally distributed (Type II error)
ii) At large sample (n>100) when violation of normality is far less critical in practice. It becomes too sensitive to the slightest signs of non-normality in the sample and often rejects H_0 even if it is actually true
What does it mean if the SK value in the R printout is positive?
sample of diff is skewed to the right
if skew.2SE >1 then the distribution of diff is unlikely normal
K-hat 3 = kurtosis
For quantitative data there are two most useful and popular measures of central location. What advantages do each of these measurements have?
mean adv:
- comprehensive measure because it is computed from all available datapoints, median is only based at most 2 data points
- the mean is used far more extensively in inferential statistics than the median
Median adv:
- median depends on only middle values, robust to outliers, mean is unduly influenced by outliers
- median exists even if the measurement scale is ordinal but mean does not
When should one use the nonparametric test?
When mean doesnt exist- or not ideal measure of the population due to outliers
T-test becomes inappropriate because normality assumption is violated
What two alternative non parametric test are there?
name thwir requirements, hypotheses
(one sample) Signed test for the median
i. The data is a random sample of independent observations
ii. The variable of interest is qualitative or quantitative
iii. The measurement scale is at least ordinal
does not assume anything about the distribution of the sampled population
Hypotheses:
H_0: n=n_0 vs H_A : nn_0, n not equal to n_0
(One sample) Wilcoxon signed ranks test for the median (n)
Also known as Wilcoxon signed rank sum test. The sign test is based on entirely the signs of the deviations from n_0.
Wilcox signed ranked tests is a more sensitive and powerful alternative bc it takes into account the magnitudes of the deviations into considerations
i. Data is a random sample of independent observations
ii. The variable of interest is quantitative and continuous
iii. The measurement scale is interval or ratio
iv. The distribution of the sampled population is symmetric (mu = n)
Hypotheses:
H_0: n = 10 , H_A : n>10
What conclusions woudl you reject null hypothesis for the sign test and wilcoxon signed rank sum test
sign test:
p-value < alpha - reject null hypothesis
Wilcoxon ranked sum sign test:
p-value < alpha - reject null hypothesis