Modelling - Past Questions Flashcards

1
Q

What is semantic segmentation?

A

a deep learning algorithm that labels or categorises every pixel in an image?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When you are trying to find items that are similar what algorithm would you use?

A

K-nearest neighbour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the linear learner algorithm show?

A

How a change in an independent variable affects a dependant variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What type of problem is random cut forest used for predominately?

A

Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What sagemaker algorithm supports recommendations?

A

Factorisation Machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What SageMaker algorithm supports regression

A

Linear Learner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What 4 types of problem can XGBoost be used to solve?

A

Regression, Binary Classification, Multi-class classification and Ranking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What format should the training data be in for XGBoost

A

CSV or libsvm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Random Cut Forest used for?

A

to identify anomalies in data (ie find fraud)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does Random Cut Forest find an anomaly?

A

It provides a score for each data point. A low score = similar to most of the data, high score = anomaly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What format should training data for Random Cut Forest be in?

A

CSV or x-recordio-protobuf format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

For online testing what type of data should you use?

A

live data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

For offline testing what sort of data should you use?

A

historical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When you perform offline testing of your models which endpoints should you deploy your trained models to?

A

alpha endpoints

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When using online testing which endpoint should you deploy your trained models to?

A

SageMaker endpoint

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When trying to select the correct trained model for real-time ml what steps would you take?

A

Deploy your models to SageMaker endpoint, then send a portion of live data to each ,model and finally evaluate each model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is object detection used for?

A

to identify all instances of an object within an image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How does object detection give the location of a particular object?

A

It uses a bounding box

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What type of ML algorithm is Object detection?

A

Supervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What format is recommended for Object detection training data ?

A

Apache MxNet recordIO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is incremental training?

A

You seed the training data with a previously trained model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When would object detection not be a good idea?

A

For problems at scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Latent Dirichlet Allocation used for?

A

Discovering a topic in a document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What algorithm would you use to classify millions of high-resolution images?

A

SageMaker built-in Image Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How does SageMaker's built-in Image Classification work?
It uses a convolutional Neural Network to classify images that supports multi-label classification
26
What is a factorisation Machine primarily used for?
detect interactions between features ie reactions to ads on a web page or item recommendations
27
What are factorization machines used for?
Classification and regression
28
If you want to find all elements of an item in an image and surround it with a bounding box what algorithm would you use?
Object Detection Algorithm
29
What is a Neural Topic Model algorithm used for?
to group documents into topics using the statistical distribution of words in the documents
30
What do you use XGBoost for?
predicting a target variable very quickly and efficently
31
What does XGBoost do with redundant features?
It includes them which can lead to performance drag
32
Why is removing redundant features outright a bad idea?
There is a risk of information loss
33
How would you solve the issue of redundant features most efficiently and quickly?
Principal Component Analysis
34
How does Principal component analysis work?
It finds composites of features that are uncorrelated
35
What is online learning?
the process of training your model incrementally by giving it data observations as individual observations or in mini-batches
36
What technique can you use within SageMaker to expedite the deployment and operation of your model?
Transfer learning
37
What is transfer learning?
You start with an off the shelf trained model and apply it to your different but similar observations
38
What is incremental learning?
You begin with an existing model you have already trained and extend it with new data.
39
When do you use Out-of-core learning?
when training with huge datasets that you can't load into your servers memory.
40
How does Out-of-core learning work?
The algorithm loads some of the data, trains on that subset, loads another subset of observations, trains on that subset and repeats
41
What does the early_stopping hyperparameter do?
Decide if the algorithm should be allowed to stop early when training if further training will not be necessary
42
What does the learning_rate hyperparameter do?
decides how quickly the model adapts to new or changing data. Values between 0.0 - 1.0
43
What does a learning_rate close to 1.0 do?
The model will learn quickly and take into account new observations quickly
44
What does a learning_rate close to 0.0
The model will learn slowly and take into account new observations slowly
45
What does the use_pretrained_model hyperparameter do?
Defines if you want a pre-trained model to be loaded in before training.
46
What are the three steps needed for deploying a model using Amazon SageMaker Hosting services?
1. Create a model in Amazon SageMAker including the S3 path where the model artefacts are stored and the Docker registry path for the inference image 2. Create an endpoint config for a HTTPS endpoint 3. Create a HTTPS endpoint
47
What does IoT Core do?
Allows you to send IoT messages to AwS services without managing infrastructure
48
What does IoT Greengrass do?
Helps you quickly build edge device software and remotely deploy and manage it.
49
What is IoT Analytics specifically built for?
Analysing and enriching highly unstructured IoT data
49
What are Inference Pipelines used for?
to define and deploy pre-trained SageMaker algorithms
49
Can Inference pipelines be used with IoT devices?
No they do not have the Inference Inference integration
50
If you wanted to enrich data using Kinesis Data Streams would you need any additional steps?
Yes you would need lambda functions to perform the enrichment steps.
51
Which Amazing ML services/features would you use to manage multiple experiments at scale?
Amazon SageMaker model tracking capability
52
What is Amazon SageMaker Inference pipeline used for?
to deploy pre-trained SageMaker algorithms packaged in docker containers.
53
What can you search for in the Amazon SageMaker model tracking capability?
key model attributes ie hyperparameter values. algorithms used and tags associated with the models.
54
What does Amazon SageMaker model experiments capability do?
It does not exist
55
What does Amazon SageMaker model containers capability do?
It does not exist
56
What format must the labelling file be in when using AWS Glue FindMatches Ml Transform?
CSV
57
How should the labelling file be structured when using AWS Glue FindMatches ML Transform?
The first two columns are the labeling_set_id and the label. Then the rest should match the schema of the data to be processed.
58
What happens if AWS Glue FindMatches ML Transform can't find a match for a record?
it is assigned a unique label
59
How should the labelling file be encoded when using AWS Glue FindMatches Ml Transform?
UTF-8 without BOM
60
Does SageMaker support GPU instances for the Random Cut Forest Algorithm?
No it does not. It only supports CPU
61
What is K-means?
an unsupervised learning algorithm. It attempts to find discrete groupings within data where members of a group are as similar as one another.
61
What is the difference between KNN and K-means?
K-Means is unsupervised and KNN is supervised.
62
When do you use logistic regression?
When doing supervised classification and the decision boundary is linear.
63
You are building a binary classifier with highly unbalanced data. What three things can you do to improve model performance?
- Collect more data of the class with less data - Oversample the class with less data - Create more samples using algorithms such as smote
64
How does SMOTE work?
uses kNN neighbours approach to exclude members of the majority class which creating synthetic examples similar the the minority class.
65
What is the easiest