DATA ANALYSIS Flashcards

1
Q

A/B TEST

A

TYPE OF INFERENTIAL ANALYSIS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DESCRIPTIVE ANALYSIS

A

Descriptive analysis lets us describe, summarize, and visualize data so that patterns can emerge. Sometimes we’ll only do a descriptive analysis, but most of the time a descriptive analysis is the first step in our analysis process.

Descriptive analyses include measures of central tendency (e.g., mean, median, mode) and spread (e.g., range, quartiles, variance, standard deviation, distribution), which are referred to as descriptives or summary statistics.

Typically, data visualization is also included in descriptive analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

EXPLORATORY ANALYSIS

A

Exploratory analyses show us underlying patterns and relationships within datasets.

Exploratory analyses cannot determine causation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

INFERENTIAL ANALYSIS

A

Inferential analysis lets us test a hypothesis on a sample of a population and then extend our conclusions to the whole population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

CAUSAL ANALYSIS

A

CORRELATION =! CAUSATION
Experiments that support causal analysis:

Correlation does not equal causation.
Proving causation is tricky and generally requires very careful experimental design.
Replication, randomization, and control are key components of good experimental design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

PREDICTIVE ANALYSIS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DATA ANALYSIS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

SUMMARY STATISTICS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

CENTRAL TENDENCY

A

INCLUDED IN DESCRIPTIVE ANALYSIS
e.g., mean, median, mode
EX OF SUMMARY STATISTIC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

SPREAD

A

INCLUDED IN DESCRIPTIVE ANALYSIS
(e.g., range, quartiles, variance, standard deviation, distribution
EX OF SUMMARY STATISTIC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

UNSUPERVISED LEARNING

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

CLUSTERING ALGORITHMS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

PRINCIPAL COMPONENT ANALYSIS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

K-MEANS CLUSTERING

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Rand statistic

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

GOOD EXPERIMENTAL DESIGN

A

REPLICATION
RANDOMIZATION
CONTROL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

REPLICATION

A

GATHER ENOUGH SUBJECTS (REPLICATES) TO SUPPORT STATISTICAL ANALYSIS

17
Q

RANDOMIZATION

A

ASSIGN SUBJECTS RANDOMLY INTO TREATMENT GROUPS, SO EACH SUBJECT HAS AN EQUAL CHANCE TO BE IN ANY TREATMENT GROUP

18
Q

CONTROL

A

CONTROL ALL FACTORS THAT ARE NOT THE EXPERIMENT’S FOCUS BUT COULD INFLUENCE THE OUTCOME

19
Q

Causal inference with observational data

A

requires:

Advanced techniques to identify a causal effect

Meeting very strict conditions

Appropriate statistical tests

20
Q

SUPERVISED MACHINE LEARNING

A

Supervised machine learning algorithms are trained with labeled data and predict the likelihood of future outcomes.

Supervised machine learning algorithms can only be as good as the data used to train them.

21
Q

POPULAR SUPERVISED MACHINE LEARNING TECHNIQUES

A

REGRESSION MODELS
SUPPORT VECTOR MACHINES
DEEP LEARNING CCN

22
Q

REGRESSION MODELS

A
23
Q

SUPPORT VECTOR MACHINES

A
24
Q

DEEP LEARNING CONVOLUTIONAL NEURAL NETWORKS

A
25
Q

GARBAGE IN

A

GARBAGE OUT

26
Q

LOW RISK PREDICTION

A
27
Q

HIGH RISK PREDICTION

A
28
Q

AUTOMATION BIAS

A

Automation bias stems from the idea that computers or machines are more trustworthy than humans because they are more objective. Automation bias is at the root of why people follow their GPS into trouble, even when contradictory information is available.

29
Q

BIAS

A

SYSTEMATIC ERRORS IN THINKING INFLUENCED BY CULTURAL AND PERSONAL EXPERIENCE.

DISTORT OUR PERCEPTION AND CAUSE US TO MAKE INCORRECT DECISIONS.

30
Q

SELECTION/SAMPLING BIAS

A

Selection bias occurs when study subjects (i.e., the sample) are not representative of the population. Selection bias can be due to poor study design if the sample is too small or is not randomized. Selection bias can also crop up when the only data available is influenced by historical bias

31
Q

HISTORICAL BIAS

A

systematic influence based on historic social and cultural beliefs

32
Q

ALGORITHMIC BIAS

A

Algorithmic bias arises when an algorithm produces systematic and repeatable errors that lead to unfair outcomes, such as privileging one group over another. Algorithmic bias can be initiated through selection bias and then reinforced and perpetuated by other bias types.

33
Q

EVALUATION BIAS

A

Testing an algorithm with a non-representative dataset leads to evaluation bias. Testing with a non-representative benchmarking dataset would give high overall accuracy scores, even if the algorithms were inaccurate for certain groups.

34
Q

BLACK BOX

A

the algorithms are proprietary, making them “black boxes”. In addition to not knowing what data were used to train and test the algorithm, we can’t know how it was designed or how it works. As a result, it’s impossible to evaluate the algorithms themselves.

35
Q

CONFIRMATION BIAS

A

our tendency to seek out information that supports our views. Confirmation bias influences data analysis when we consciously or unconsciously interpret results in a way that supports our original hypothesis. To limit confirmation bias, clearly state hypotheses and goals before starting an analysis, and then honestly evaluate how they influenced our interpretation and reporting of results.

36
Q

OVERGENERALIZATION BIAS

A

Is inappropriately extending observations made with one dataset to other datasets, leading to overinterpreting results and unjustified extrapolation. To limit overgeneralization bias, be thoughtful when interpreting data, only extend results beyond the dataset used to generate them when it is justified, and only extend results to the proper population.

37
Q

REPORTING BIAS

A

is the human tendency to only report or share results that affirm our beliefs or hypotheses, also known as “positive” results. Editors, publishers, and readers are also subject to reporting bias as positive results are published, read, and cited more often. To limit reporting bias, report negative results and cite others who do, too.

38
Q

NOMINAL

A
39
Q

ORDINAL

A