Data Analysis in Real Life Flashcards by Mark Analyst

What are Some methods for helping reports have a more clear narrative include

eliminating jargon and focusing on interpretability
critiquing significant effects by coming up with potential alternate explanations
focusing on simpler models and parsimony

How well did you know this?

Not at all

Perfectly

What is Version control software used for

Keeps track of checked in versions of code, data and reports

How well did you know this?

Not at all

Perfectly

What are Some easy things to double check reports include

verifying the signs of effect are in the obvious direction
checking magnitude of effects by comparison with other known effects
putting units on graphs and coefficients and generally keeping track of units

How well did you know this?

Not at all

Perfectly

What do Reproducible report writing tools like knitr and ipython help with

by automating the report writing process
by organizing ones thinking by blending the code and the narrative into a single document
by documenting the analysis code with the project narrative
by advancing the goal of reproducibility

How well did you know this?

Not at all

Perfectly

What are two components that make for good final data products that are ubiquitous across all settings

making the report reproducible and

making the report and code version controlled.

How well did you know this?

Not at all

Perfectly

The reason you get a null result it may be due to

low power

that the null hypothesis is actually correct

How well did you know this?

Not at all

Perfectly

A study with a very low sample size will likely have

low power

How well did you know this?

Not at all

Perfectly

Calculating power after the study has been done and analyzed is

problematic and should only be done by people well versed in the issues

How well did you know this?

Not at all

Perfectly

Ideally, a surrogate variable for variable of interest will

have a known or estimable variance around the desired measurement variable
be unbiased

How well did you know this?

Not at all

Perfectly

What is the idea of power

Power is the probability of rejecting the null hypothesis when it’s false.

How well did you know this?

Not at all

Perfectly

What can you do In the absence of any calibration data to evaluate your surrogate

either modeling via assumptions or sensitivity analyses.

How well did you know this?

Not at all

Perfectly

What to do if your surrogate variable is such an unreliable of an estimate of your actual outcome

one must come to the conclusion that it’s better to not conduct the study at all.

How well did you know this?

Not at all

Perfectly

Potential problems with testing lots of hypotheses until a significant one is found include

Declaring effects that are not significant as significant by chance

Misrepresenting the strength of the findings

How well did you know this?

Not at all

Perfectly

What is Comparing your effects to familiar ones useful for

Mentally calibrating the size of an effect or its significance when a variable under study is not well understood

How well did you know this?

Not at all

Perfectly

Negative control analyses are useful for

for evaluating processes to see if spurious effects are obtained

as a validity check of an effect of interest by looking to see if similar effects occur with the same analysis on variables where an effect is known not to be present

How well did you know this?

Not at all

Perfectly

A good negative control analysis will

Study These Flashcards

have a negative control that is known not to have an effect but is otherwise similar to the variable under study

What is hypothesis testing

Study These Flashcards

In hypothesis testing, we use a statistic to decide between two hypotheses. We set one as the default hypothesis (null hypothesis) and the other as the alternative.

The result of a hypothesis test is summarized with a p-value.What is p-value?

Study These Flashcards

A small p-value (close to 0) supports the alternative while a large one (close to 1) supports the null. We reject the null if our p-value is less than 0.05 if we want to control the probability of incorrectly rejecting the null at 5%.

What are potential problems with multiple comparisons

Study These Flashcards

The probability that we see apparently significant findings simply by chance even though they’re not actually significant increases.

In A/B testing randomization of a treatment is used for

Study These Flashcards

To make groups as comparable as possible

Attempt to balance potential unobserved confounding variables

Three strategies to combat sampling bias are

Study These Flashcards

Random sampling

Modeling

Weighting

It is generally a good idea to consider possible confounders when considering a significant effect

Study These Flashcards

TRUE

It’s possible for a regression effect to reverse itself after the inclusion of another variable into the model

Study These Flashcards

TRUE

What are Blocking and adjustment are tools used for

Study These Flashcards

Account for variables potentially impacting the estimation of the effect of interest.

If we see an association between two variables, it would be a good idea to

consider the possibility that the association is explained by a confounding third variable.

You see an effect of ice cream sales on the number of heat exhaustion cases. The effect is likely due to:

The hot weather as a confounder.

Associations can imply causality

under a set of strict assumptions often as a result of design choices.

What is confounding

Confounding occurs when you want to compare two things and a third gets in the way.

Define causal effects

We define causal effects as the difference between the outcome for a subject observed at a particular treatment minus the outcome observed as a a control

What is Casual Inference

The study of how to estimate causal effects using data is called causal inference.

What must The residuals consider:

the difference between the response and the fitted value

Summary tables should include

Quantiles Means Standard deviations

In leading digits of data that follow Benford's, the digits (0-9) are all equally likely

FALSE

Merging (linking two datasets via a common index) errors can have a strong impact on subsequent analyses

TRUE

How do you keep on top of data quality without being in the trenches?

The construction of summary tables. Regression diagnostics. Residuals -

Data Analysis in Real Life Flashcards

(35 cards)