Data Analytics Flashcards
Conditional probability
Probability of event A occurring, given that event B occurs
Association analysis
Task of finding interesting relationships in large datasets
Hadoop
Java language; allows for distributed processing, of large datasets across clusters of computers
3 V’s of Big Data
Volume; Variety; Velocity
Kano Analysis
Impact on customer satisfaction
SAAS
Software as a Service
Statistical significance
Defines whether the null hypothesis is assumed to be accepted or rejected
Quantitative
Numbers based, countable, measurable
Qualitative
Interpretation based, descriptive, relating to language
Type I error
False-Positive: rejecting null hypothesis when it’s true
Type II error
False-Negative: Failing to reject null hypothesis when it’s false
Type III error
Correctly rejecting null hypothesis for wrong reason
Nominal data
E.g., Male v Female
Ordinal data
E.g., 1st, 2nd, 3rd
Mann Whitney U Test
Test whether two samples are likely to derive from the same population
Wilcoxon Test
Test is used to compare two independent samples
IMPACT cycle
Identify questions
Master the data
Perform test plan
Address and refine results
Communicate insights
Track outcomes
ETL process
Extract, Transform and Load data.
Goal is identify and obtain data needed for solving problem
Multiple series with closely related data - what graph?
Line graph
Single data series - what graph?
Bar graph
Two data series - what graph?
Combo chart
Relationship between 2 data series and determining their correlation - what graph?
Scatter plot
Variance analysis; explaining how “actual” result is different to budget - what graph?
Waterfall chart
Distribution of dataset - what graph?
Histogram
Descriptive Analytics
Tells you what happened in the past