Exploring Data - Topic 1: Design of Experiments Flashcards
What is a data scientist?
A data scientist is someone who is able to interpret data and unlock insights into the data to tell stories about it.
This requires ever developing statistical thinking and computational skills, alongside collaboration, curiosity and clear communication.
What are the two possible types of data scientists?
Popular Data Scientist
Professional Data Scientist
What are the important qualities/characteristics of a modern data scientist?
- Ability to develop skills in maths and statistics
- Ability to have good programming skills and development of a programming database
- Should have domain knowledge and soft skills
- Have the ability to communicate and visualise concepts well
What are the data scientists bounded by in terms of ethics and privacy?
Data scientists will need to comply with Australian legal and regulatory frameworks (ANDS).
How should a data scientist respect ethics and privacy?
Complying with the Aus legal and regulatory frameworks
Developing a transparent plan for data collection, storage, exchange, access and reporting.
Most IMPORTANTLY, results acquired from research needs to be non-identifiable, especially for ‘personal data’
What is domain knowledge?
Domain knowledge is the background context information that helps you understand the data.
(Important as it ensures that the data isn’t taken out of context and that you know what it is actually addressing)
What are some examples of domain knowledge which may be required?
Understanding what type of depression is caused by pills?
What type of acne is caused by pills?
How should different pieces of evidence be weighed up?
Different evidence should be weighed up on how reliable it is (i.e. a personal testimony would be weighed less than a reputable research paper).
What are the characteristics of a personal testimony/observation?
Can only suggest a more generalised finding. The source(s) behind a media article are often poorly cited.
We do not discount this sort of data but must be careful when approaching it and trying to interpret it because it can produce biased results.
What are the characteristics of a reputable research paper? What is the new approach with journals?
In this, every stage of a statistical study; design, data collection, statistical methods, conclusion) should be documented and checked in the review process
Increasingly, journals are requiring more reproducible research which requires ‘data sets and software to be made available for verifying published findings and conducting alternative analyses
What are common limitations of research papers?
Many research papers can base their conclusions on ASSUMPTIONS.
If a research paper doesn’t tell you that their conclusions may be assumptions or that there is a limitation to the conclusion –> need to be wary
What is the method of comparison to be able to identify if a certain treatment is effective (or in the case of other data science questions)?
Create a controlled experiment
What is a controlled experiment?
It is an experiment which conducts 2 parallel experiments, which only differ in whether the treatment is administered or not
What is the issue with attempting to creaate a controlled experiment?
There are different complications involved with separating groups into both the treatment and the control groups.
What is bias?
Bias is something which effects the ability of the data to accurately measure the treatment effect.
What are examples of some bias?
Selection bias, observer bias and confounding
Consount bias, survivor bias, adherer bias
What is confounding?
Confounding (or confusion) occurs when the treatment and control groups differ by a third (often hidden) variable which influences the response being studied.
These confounding variables are often ‘lurking’ (hiding) and are hard to identify. These variables are bad because they often lead to misleading associations
What are the two problems which hinder a controlled experiment?
Selection bias and observer bias