Week 1 - Supervised ML + Linear Regression Flashcards

1
Q

How do we describe data?

A

The 4 Vs of Big Data
Velocity - streaming data (sensors etc)
Veracity - uncertainty of data (poor quality)
Variety - different forms
Volume - scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Structured vs semi-structured vs unstructured

A

Structured - adheres to a data model (tabular format e.g. SQL) makes it easier to contextualise and understand
Semi - doesn’t follow the tabular structure but does contain tags and metadata to separate semantic elements and establish hierarchies of records and fields (xml).
Unstructured - information that is not arranged according to a preset data model or schema (e.g. text and audio)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is data integration?

A

Consolidating data from heterogenous sources into a single coherent data source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 5 data integration techniques?

A

Uniform data access
Common data storage
Application based integration
Common user interface
Middleware data integration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is uniform data access?

A

A technique that retrieves and uniformly displays data but leaves
it in its original source.

Use to automate and translate communications between systems and allow for more complicated analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is common data storage?

A

An approach that retrieves and uniformly displays data but it also makes a copy of the data and stores it.

Use to create and store a copy of original data and present uniformly for sophisticated data analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is application based integration?

A

Software applications locate, retrieve and integrate data by making data from different sources and systems compatible with one another.

Use to automate and translate communications between legacy and modern systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is common user interface?

A

Manually conduct all phases of the integration, from retrieval to presentation.

Use to merge a small amount of data sources for basic analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is middleware data integration?

A

A middleware is a type of software that facilitates communication between legacy systems and modern systems

Use to automate and translate communications between legacy and modern systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Supervised vs unsupervised vs semi-supervised

A

Supervised - uses data with labelled outcomes
Unsupervised - uses data without labelled outcomes
Semi-supervised - uses both data with labelled outcomes and without labelled outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Two types of supervised ML?

A

Regression
Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Parameters vs hyperparameters?

A

Parameters: the values that change as the model learns from the data. (e.g. regression coefficients)

Hyperparameters: parameter that is not learned directly from the data but relates to implementation, i.e., training our ML model.
(e.g. in simple linear regression, include the intercept in the model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Difference between regression and classification

A

Regression refers to any time we are trying to predict a numeric value.

Classification is when the outcome variable is categorical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the loss function?

A

Quantitive measure of how close yp was to y. Update rule will determine how to update the model parameters i.e. find parameters that minimise this loss function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the Pearson correlation?

A

Measure of the strength of the linear relationship between two samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly