Introduction to data analytics Flashcards

1
Q

What are the 4 Vs of Big Data?

A

Volume
Velocity
Variety
Veracity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What paradigm do big data scientists use?

A

Retrospective data mining with multiple hypotheses

Looking for patterns without a particular hunch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Types of Data?

A

Structured Data

  • SQL Database
  • Excel Spreadsheet / CSV File

Unstructured Data

  • Free Text Responses
  • Doctor’s notes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Typical Data Structures

A
Typical Datasets
- CSV 
- eXtensible Markup Language (XML)
- JavaScript Object Notation (JSON)
   Nested JSON in CSV
- SQL
- Excel Data Formats
Other
- .txt files for text
- RGB data for images
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Quantitative Data?

A

Numbers, such as -

  • Temperature
  • Heart rate
  • Likert ratings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Qualitative Data?

A

Text, such as -

  • Surveys
  • Interviews
  • App comments / User Feedback
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Quantitative Analysis?

A

Qauntitative Statistical Analysis:

  • Descriptive statistics: Mean and SD
  • Inferential Statistics: t-tests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Qualitative Analysis?

A

Thematic Analysis
Advanced methods:
- Text analytics: word embeddings
- Review mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Gartner Analytic Continuum

A
Descriptive Analytics
- Hindsight
Diagnostics Analytics 
- Insight
Predictive Analytics
- Foresight
Prescriptive Analytics

Increasing difficulty and value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Typical Data Analytics Process

A

Data gathering/wrangling/linking -> data cleansing -> exploratory data analysis [EDA] -> supervised machine learning

EDA

  • Data visualisation
  • Association mining
  • Unsupervised machine learning

Supervised Machine Learning

  • Feature engineering
  • Model building
  • Model optimisation
  • Model evaluation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Machine Learning

A

Supervised ML

  • Labelled data
  • DL, SVM, Logit, Decision Trees, K-NN

Unsupervised ML

  • Unlabeled Data
  • Clustering, association rule mining

Semi-supervised ML
- Some labelled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The 5 Tribes of ML & the No free lunch theorem

A

Symbolists | Structure Inference | Production Rule System & Inverse Deduction
Connectionist | Estimating Parameters | Back propagation & Deep Learning
Bayesians | Weighing Evidence | HMM Graphical Model
Evolutionaries | Structure Learning | Genetic Algorithms & Evolutionary Programming
Analogisers | Mapping to Novelty | kNN and SVM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The Neat and Scruffy Data Scientist

A

Neat: they care about the details and the ML methods
Scruffy: they care about the results and are somewhat ignorant of details and the methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly