Week 10: Survey Analysis Flashcards

Question 1

Q

What is the purpose of power calculations in research design?

Answer

A

Power calculations determine the minimum sample size needed to precisely detect a program’s impact and inform data collection planning, budget, timeline, accuracy, and precision. They are used to compute power and minimum detectable effect size.

Question 2

Q

How do you calculate power for comparing two means?

Answer

A

<power twomeans mean1 mean2, sd1(x) sd2(y) n1(x) n2(y) power (.8) alpha(x)>
Unless using an alpha level other than 0.05, it doesn’t need to be specified. By default power is 80%

Question 3

Q

If alpha is reduced from 0.05 to 0.01, how does it affect power?

Answer

A

Lowering alpha decreases power because the probability of a Type II error increases (assuming all else is equal), reducing the likelihood of detecting a true effect

Question 4

Q

How does increasing sample size affect power?

Answer

A

Increasing sample size generally increases power, reducing the chance of a Type II error

Question 5

Q

What is the command to perform a power test for two proportions?

Answer

A

Question 6

Q

What is the impact of unequal group sizes on power?

Answer

A

Power decreases when sample sizes are unequal, even if the total sample size remains constant

Question 7

Q

How do you set survey design in Stata?

Answer

A

<svyset psuid [pweight=finalwgt], strata(stratid) singleunit(centered)>
The option <singleunit> allows for different ways of handling a single PSU in a stratum. There are usually at least two PSUs in each stratum. If there is only one PSU in a stratum (due to missing data or a subpopulation specification, for example) the variance cannot be estimated in that stratum. If the default option <missing> is used, there will be no SEs when Stata encounters a single PSU in a stratum. The <centered> option centres data with one sampling unit at the grand mean instead of the stratum mean</centered></missing></singleunit>

Question 8

Q

How do you obtain the mean and standard deviation of a variable considering survey design?

Answer

A

<svy: mean var_name>

<estat>
Once the survey design has been set, the <svy: > prefix can be used to calculate descriptive statistics (also includes <svy: tab cat_var> from which Stata will also give p values from chi-square tests)
</estat>

Question 9

Q

How does ignoring survey design affect statistical estimates?

Answer

A

Ignoring the survey design underestimates point estimates and their standard errors, increasing risk of Type I errors.
The sampling weight will affect the calculation of the point estimate, and the stratification/clustering will affect the calculation of standard errors.

Question 10

Q

How do you calculate the mean of a variable considering a categorical variable?

Answer

A

<svy, over(cat_var): mean var_name>

Question 11

Q

What happens to sample size requirements as the desired detectable difference decreases?

Answer

A

Smaller differences require larger sample sizes to achieve the same power and significance level

Question 12

Q

How does the SE change when using survey design in Stata?

Answer

A

SE increases when survey design features (e.g., weights, strata) are considered, providing more accurate confidence intervals

Question 13

Q

What is the built-in programme for calculating power?

Answer

A

<power>
The command is best used for simple randomisations with no clustering. It performs power and sample-size analysis for studies using hypothesis testing to infer about population parameters. You can compute sample size given power and effect size; or the minimum detectable effect size and the corresponding target parameter given power and sample size.
</power>

Question 14

Q

If the power of the analysis is 0.8611 at the 0.05 level, what does that mean for the difference between means?

Answer

A

The power is 84.1%, meaning there is 86% probability of finding a statistical difference at the 0.05 level in two means if a true difference exists

Question 15

Q

What happens to sample size requirements if we increase power?

Answer

A

Sample size increases

Question 16

Q

What happens if we do not have prior data/mean values that we can use to estimate sample size?

Answer

A

If we know the difference we want to detect, we can use this to obtain required sample size

Question 17

Q

What is the use of <nratio> in <power> calculations?</power></nratio>

Answer

A

To specify the ratio e.g., <nratio(0.5) is used for the ratio of group 2 against group 1

Question 18

Q

Why is it important not to rely on regular procedures in statistical software?

Answer

A

Regular procedures (not designed for survey data) analyse data as if the data were collected using SRS. Very few surveys use a simple random sample to collect data. We need to tell the software to consider the differences between the design used and SRS.

Question 19

Q

What are common features of many sampling designs?

Answer

A

Sampling weights
Strata
PSU

Question 20

Q

What are sampling weights?

Answer

A

Generally used to weigh the sample back to the population from which the sample was drawn. Often, the weights incorporate corrections for unit non-response or errors in the sampling frame (sometimes called non-coverage). Because these other values are included in the probability weight that is included with the dataset, it is often inadvisable to modify the sampling weights (such as trying to standardise them for a particular variable, e.g., age)

Question 21

Q

What are strata?

Answer

A

Stratification breaks down the population into different groups, often by demographic variables. Each element in the population must belong to one, and only one, stratum. Once the strata have been defined, samples are taken from each stratum as if it were independent of all the other strata. The purpose of stratification is to reduce the standard error of the estimates, and stratification works most effectively when the variance of the dependent variable is smaller within the strata than in the sample as a whole

Question 22

Q

What is a primary sampling unit (PSU)?

Answer

A

The first unit that is sampled in the design. For example, school districts from California may be sampled and then schools within districts may be sampled. The school district is the PSU. Accounting for clustering in the data (i.e., using PSUs) will increase SEs of the point estimates. Conversely, ignoring PSUs will tend to yield SEs that are too small leading to false positives in significance tests

Question 23

Q

How can we get more information on strata and PSUs after setting survey design?

Answer

A

Question 24

Q

How can you use <over> with the <svy: > prefix?</over>

Answer

A

<svy, over(cat_var): mean varname>
<svy: tab cat_var1 cat_var2, col>

Question 25

Q

Can we use <svy: > with t-tests?

Answer

A

No. However, <svy: mean> is an estimation command and allows for the use of <test> and <lincom> post-estimation commands</lincom></test>

Question 26

Q

How can we use the <test> command to test if two means are different?</test>

Answer

A

<svy, over(cat_var): mean var_name, coeflegend>

<coeflegend> is used to see the labels Stata has assigned to the values in the output
Then, we can use the <test> command:
<test>
</test></test></coeflegend>

Question 27

Q

Can you run linear regression models once Stata knows about the survey design?

Answer

A

Yes:
<svy: regress y x1 x2 x3>
The output of this regression will include <pweight>, <strata>, and <psu> variables</psu></strata></pweight>

Question 28

Q

What else can the <svy: > prefix be used for?

Answer

A

Making graphs - but there may be important differences between the “survey-weighted” and the “regular” commands