Exploring Data - Topic 1: Design of Experiments Flashcards

1
Q

What is a data scientist?

A

A data scientist is someone who is able to interpret data and unlock insights into the data to tell stories about it.

This requires ever developing statistical thinking and computational skills, alongside collaboration, curiosity and clear communication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two possible types of data scientists?

A

Popular Data Scientist
Professional Data Scientist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the important qualities/characteristics of a modern data scientist?

A
  1. Ability to develop skills in maths and statistics
  2. Ability to have good programming skills and development of a programming database
  3. Should have domain knowledge and soft skills
  4. Have the ability to communicate and visualise concepts well
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the data scientists bounded by in terms of ethics and privacy?

A

Data scientists will need to comply with Australian legal and regulatory frameworks (ANDS).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How should a data scientist respect ethics and privacy?

A

Complying with the Aus legal and regulatory frameworks

Developing a transparent plan for data collection, storage, exchange, access and reporting.

Most IMPORTANTLY, results acquired from research needs to be non-identifiable, especially for ‘personal data’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is domain knowledge?

A

Domain knowledge is the background context information that helps you understand the data.

(Important as it ensures that the data isn’t taken out of context and that you know what it is actually addressing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some examples of domain knowledge which may be required?

A

Understanding what type of depression is caused by pills?

What type of acne is caused by pills?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How should different pieces of evidence be weighed up?

A

Different evidence should be weighed up on how reliable it is (i.e. a personal testimony would be weighed less than a reputable research paper).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the characteristics of a personal testimony/observation?

A

Can only suggest a more generalised finding. The source(s) behind a media article are often poorly cited.

We do not discount this sort of data but must be careful when approaching it and trying to interpret it because it can produce biased results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the characteristics of a reputable research paper? What is the new approach with journals?

A

In this, every stage of a statistical study; design, data collection, statistical methods, conclusion) should be documented and checked in the review process

Increasingly, journals are requiring more reproducible research which requires ‘data sets and software to be made available for verifying published findings and conducting alternative analyses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are common limitations of research papers?

A

Many research papers can base their conclusions on ASSUMPTIONS.

If a research paper doesn’t tell you that their conclusions may be assumptions or that there is a limitation to the conclusion –> need to be wary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the method of comparison to be able to identify if a certain treatment is effective (or in the case of other data science questions)?

A

Create a controlled experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a controlled experiment?

A

It is an experiment which conducts 2 parallel experiments, which only differ in whether the treatment is administered or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the issue with attempting to creaate a controlled experiment?

A

There are different complications involved with separating groups into both the treatment and the control groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is bias?

A

Bias is something which effects the ability of the data to accurately measure the treatment effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are examples of some bias?

A

Selection bias, observer bias and confounding

Consount bias, survivor bias, adherer bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is confounding?

A

Confounding (or confusion) occurs when the treatment and control groups differ by a third (often hidden) variable which influences the response being studied.

These confounding variables are often ‘lurking’ (hiding) and are hard to identify. These variables are bad because they often lead to misleading associations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the two problems which hinder a controlled experiment?

A

Selection bias and observer bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is selection bias?

A

This occurs when some subjects are more likely to be chosen to be in a study than others.

20
Q

What is an example of selection bias?

A

An example of this was the Portacaval shunt (1966). This is a health condition. There is an operation involved in being able to assist in repairing this. However, the operation is long and dangerous.

But, a study in 1966 seemed to imply that it was worth the risk given the increased life expectancy of patients compared to those who didn’t have the operation. However, this was biased because healthier patients tended to have the surgery –> selection bias towards healthier patients

21
Q

How is selection bias remedied?

A

This is through the use of a ‘Randomised Controlled Trial’ (RCT), where there is a random allocation to both a treatment group and a control group. This assists in making the patient group as comparable (as random) as possible.

22
Q

What is a control group?

A

A control group is the standard to which comparisons are made in an experiment. They typically DONT have the treatment.

23
Q

What is observer bias?

A

It occurs when the subjects or investigators are aware of the identities of the 2 groups (the control or the treatment group), and so we can get bias in either the responses or evaluations, as the participants may deliberately or subconsciously report more or less favourable results according to the group they are in. Or the investigators might adjust their evaluations to favour the treatment.

24
Q

What is a placebo?

A

It is a ‘pretend’ / ‘fake’ treatment, designed to be neutral and indistinguishable from the actual treatment. This often creates the PLACEBO EFFECT

25
Q

What is the placebo effect?

A

The placebo effect is the result of the experiment which occurs from the subject thinking they have had the treatment

26
Q

What is the solution to addressing the observer bias?

A

Conducting a ‘Randomised Controlled Double - Blind Trial’. The placebo should also be designed to resemble the treatment process as closely as possible (i.e. through injections or through taking a pill, as well as identical in colour, appearance, smell and taste)

27
Q

What is a Randomised Controlled Double-Blind Trial? What does it involve?

A

This is where both the subjects (“Single blind”) and investigators (“Double blind”) aren’t aware of the identity of the 2 groups (placebo or treatment) . Additionally, there should be a placebo involved.

It involves:

-Having a 3rd party administrator of the treatment and placebo

  • Design the palcebo to mimic the treatment as much as possible
28
Q

What is the gold standard for the best method of comparison?

A

The gold standard is the double blind randomised controlled trial (RCT)

29
Q

What is consent bias?

A

This occurs when subjects choose whether or not they take part in the experiment. And when the individuals who decide to participate differ systematically from those who chose not to participate. This can introduce a bias into the sample, as the characteristics of those who consent to participate may not be representative of the larger population

For example, individuals who are more enthusiastic about the topic of the study or who have a particular interest in the research question may be more likely to volunteer, leading to a sample that is not truly representative of the broader population. This potential bias can impact the external validity or generalizability of the study’s findings.

This can be countered by using RCTs

30
Q

What is survivor bias?

A

This occurs when certain types of subjects finish the study.

For example, an observed ‘improvement’ might be due to the dropout of the sickest subjects.

31
Q

What is adherer bias?

A

This occurs when certain types of subjects (adherers) keep taking the treatment (or placebo), as opposed to the non - adherers. As a result, an ‘improvement’ is really due to the adherers being more healthy and compliant

32
Q

What is an observational study?

A

It is one which the investigator cannot use randomisation for allocation to groups. In this scenario, the assignment of subjects is outside the control of the investigator.

33
Q

Why are observational studies required?

A

Although many researches aim for the gold standard, some will require an observational study. For example, to study the effects of smoking, investigators cant choose which subjects will be in the treatment group (i.e. can’t force people to be smoking or not)

34
Q

Do the conclusions of observational studies require great care?

A

yes

35
Q

What do researchers have to be wary of with observational studies? (3)

A

Observational studies CAN’T establish causation

Observational studies may present as an RCT when in reality it isnt

Observational studies with a confounding variable can lead to Simpson’s Paradox

36
Q

What does it mean by ‘Observational studies can’t establish causation’?

A

This means that Observational studies can only establish ASSOCIATION (that one thing is linked to another), which might SUGGEST causation.

HOWEVER association DOESN’T prove causation

37
Q

Why can’t observational studies establish causation?

A

Unlike RCTs, in observational studies, researchers can’t assign study subjects into treatment or control groups using a random mechanism, which makes it very difficult to draw a causal relationship between the treatment and the observed outcomes.

They are UNABLE TO CONTROL ALL VARIABLES –> potential for confounders

38
Q

What does it mean by ‘Observational studies may present as an RCT’? Provide an example

A

In this scenario, it is more of a warning that an experiment may be presented as an RCT but in actualities could be an observational study. There may be confounding variables which are hard to identify and mislead investigators about a cause and effect relationship

AN example could be that the control group may be historical not contemporaneous. Investigators might compare the effect of a new medication on current patients, with an old medication on past patients. This means that time is a confounding variable and so the Treatment group (new drug) and the historical Control group (old drug) may differ in aspects beside the treatment which is uncontrollable –> could be an observational study

In this scenario, the assignment of subjects is outside the coontrol of the investigator

Too many factors are out of the observers control (lurking variables) so that it no longer becomes an RCT even tho it is on paper, and instead what is perceived to be a good group isn’t actually???

Too many factors out of the observers control so that in the end even though it might seem like a RCT, it is actually an observational study (assignment of subjects is out of the observers control)

39
Q

What are some strategies for dealing with confounders?

A

Sometimes we can make the 2 groups more comparable by dividing them into subgroups with respect to the confounder.

For example, if alcohol consumption is a potential confounder for smoking’s affect on liver cancer, we can divide our subjects into 3 groups; heavy drinker, medium drinker and light drinker

40
Q

What is Simpson’s Paradox?

A

It is also called the reversing paradox. It occurs sometimes when there is a clear trend in individual groups of data that disappears when the groups are pooled together

41
Q

Why does Simpson’s paradox occur?

A

It occurs when relationships between

42
Q

How can observational studies with a confounding variable lead to Simpson’s Paradox?

A

Since observational studies do not involve randomization, the treatment and control groups are often very different from each other(for example smokers are more likely to be young, and old people are less likely to smoke, thus when looking at the stats on mortality, it may seem a lot more non-smokers die because they are older instead of the younger people who have better health).

Confounders can mix up our results. These differences can confound the results and make it look like our treatment is causing the response when it’s actually not.

43
Q

What is an example of Simpson’s Paradox?

A

For example, a study showed that smokers have a lower mortality rate compared to non smokers. However, when the data is spliced up into ages, it actually showed that there were more old people in the non-smokers group which led to greater recorded deaths compared to only young people in the smokers which will decrease likelihood of mortality

44
Q

What is a control?

A

A subject who did not get the treatment

45
Q

What is a controlled experiment?

A

A study where the investigators allocate subjects to the 2 groups

46
Q

What is controlling for confounders?

A

Trying to reduce the influence of confounding variables

47
Q

What is reproducible research?

A

Reproducible research is the system of documenting and publishing results of an impact evaluation. At the very least, reproducibility allows other researchers to analyze the same data to get the same results as the original study, which strengthens the conclusions of the original study.

A data report which can be checked by a third party