V-Sampling/Estimation-4 Flashcards

1
Q

definition:

simple random sampling

A

简单随机抽样

population中的每个个体都有相同几率被选入样本

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

definition:

stratified random sampling

A

分层随机抽样

population is devided into several strata

从每个stratum中按比例随机抽取

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

definition:

sampling error

A

the difference between sample statistics and population parameter

如:

sampling error of the mean =

sample mean - population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

definition:

Sampling Distribution of a Statistic

A

从总体中随机抽取的、样本大小相同的所有可能性样本的statistics的概率分布

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

two different forms of data

A

Time-series data

Cross-sectional data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Central Limit Theorem

实质

条件2

结论2

A

实质:就关于是所有样本的均值服从怎样的分布

条件:若n>=30, 且总体的均值、方差已知,

结论:则1。sampling distribution服从正态分布,

2。该正态分布的平均值等于总体均值,S22/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Standard Error

定义与实质

公式

A

标准差:样本均值分布的标准差。实质就是如果取n个数的样本,其描述的均值与总体均值的偏差,n越大偏差越小

know population variance:

σx = σ/根号n

unknown population variance:

sx = s/根号n

(即假设样本标准差就是总体标准差)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

estimator内涵

3 properties of a good estimator

A
  • estimator: 即estimate formula, 一个计算方式,得出的是一组分布
  • unbiasedness: 估计量期望值等于总体参数真实值。E(Xbar) = μ
  • Efficiency: (在无偏估计量的基础上)方差最小
  • Consistency: 随样本量增大,精确性增加,Standard Error变小
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

2 approaches for

estimation

A
  • Point Estimate: a single value estimates population parameter
  • Confidence Interval Estimate: 总体参数落在某一区间的概率
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Confidence Interval Estimation:

公式

A

confidence interval: Xbar +- k·(σ/根号n)

[Point Estimate +/- k· Standard Error]

(根据normal distribution的置信区间,和central limit theorum推来)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

width of confidence interval

and

how to decrease width

A

sample variance 变小,

number of obervations增加

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

definition:

Level of significance

Degree of Confidence

α

A
  • 显著性水平/置信水平:1-95% = 5%
  • 置信度:如“落在μ+-1.96标准差区间内的概率是95%”, 95%
  • α: 5%
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Student t-distribution

图形及性质5点

A
  1. skewness = 0 (symmetry)
  2. platykurtic than normal distribution,fatter tail 矮峰肥尾
  3. 变量:Degrees of freedom df = n-1
  4. 自由度增加,图形收紧,峰变高,趋向于正态分布
  5. 相同level of confidence水平下,t分布的置信区间比z分布要宽
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
  1. z分布,.t分布, .z-Alternative下的置信区间构建
  2. reliable factor?
A

z-Alternative: large sample, unknown population variation

equation with z and s

2. reliable factor: 即Z value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

confidence interval estimation时,

何时选择z分布,何时选择t分布?

3点

A
  • 方差已知用z,方差未知用t
  • 大样本时即使未知方差也可用z(t已趋向于z)
  • 非正态分布总体小样本:不可估计
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

查z分布表和t分布表时

有何区别

A

t表需知degrees of freedom.

17
Q

影响confidence interval 宽度的因素

A
  1. reliable factor: choice of z or t distribution; choice of degree of confidence
  2. standard error: sample size
18
Q

what may be downsides of

bigger sample size?

A
  1. 可能会使用observations from another population
  2. 成本考虑
19
Q

5 types of bias

A
  • Data-mining bias: 拿偶然当必然,取样与假设检验使用同一数据库。只具有统计学意义
  • sample selection bias: 有些数据unavalable
  • survivorship bias
  • look-ahead bias: 分析当下没有该数据,预估未来数据
  • time-period bias:太长或太短(–data mining)
20
Q

A statistically significant result might not be

economically meaningful if you account

A

the risk, transaction costs, and applicable taxes