DATA ANALYSIS Flashcards
(40 cards)
A/B TEST
TYPE OF INFERENTIAL ANALYSIS
DESCRIPTIVE ANALYSIS
Descriptive analysis lets us describe, summarize, and visualize data so that patterns can emerge. Sometimes we’ll only do a descriptive analysis, but most of the time a descriptive analysis is the first step in our analysis process.
Descriptive analyses include measures of central tendency (e.g., mean, median, mode) and spread (e.g., range, quartiles, variance, standard deviation, distribution), which are referred to as descriptives or summary statistics.
Typically, data visualization is also included in descriptive analysis.
EXPLORATORY ANALYSIS
Exploratory analyses show us underlying patterns and relationships within datasets.
Exploratory analyses cannot determine causation.
INFERENTIAL ANALYSIS
Inferential analysis lets us test a hypothesis on a sample of a population and then extend our conclusions to the whole population.
CAUSAL ANALYSIS
CORRELATION =! CAUSATION
Experiments that support causal analysis:
Correlation does not equal causation.
Proving causation is tricky and generally requires very careful experimental design.
Replication, randomization, and control are key components of good experimental design.
PREDICTIVE ANALYSIS
DATA ANALYSIS
SUMMARY STATISTICS
CENTRAL TENDENCY
INCLUDED IN DESCRIPTIVE ANALYSIS
e.g., mean, median, mode
EX OF SUMMARY STATISTIC
SPREAD
INCLUDED IN DESCRIPTIVE ANALYSIS
(e.g., range, quartiles, variance, standard deviation, distribution
EX OF SUMMARY STATISTIC
UNSUPERVISED LEARNING
CLUSTERING ALGORITHMS
PRINCIPAL COMPONENT ANALYSIS
K-MEANS CLUSTERING
Rand statistic
GOOD EXPERIMENTAL DESIGN
REPLICATION
RANDOMIZATION
CONTROL
REPLICATION
GATHER ENOUGH SUBJECTS (REPLICATES) TO SUPPORT STATISTICAL ANALYSIS
RANDOMIZATION
ASSIGN SUBJECTS RANDOMLY INTO TREATMENT GROUPS, SO EACH SUBJECT HAS AN EQUAL CHANCE TO BE IN ANY TREATMENT GROUP
CONTROL
CONTROL ALL FACTORS THAT ARE NOT THE EXPERIMENT’S FOCUS BUT COULD INFLUENCE THE OUTCOME
Causal inference with observational data
requires:
Advanced techniques to identify a causal effect
Meeting very strict conditions
Appropriate statistical tests
SUPERVISED MACHINE LEARNING
Supervised machine learning algorithms are trained with labeled data and predict the likelihood of future outcomes.
Supervised machine learning algorithms can only be as good as the data used to train them.
POPULAR SUPERVISED MACHINE LEARNING TECHNIQUES
REGRESSION MODELS
SUPPORT VECTOR MACHINES
DEEP LEARNING CCN
REGRESSION MODELS
SUPPORT VECTOR MACHINES