Modelling - Past Questions Flashcards
What is semantic segmentation?
a deep learning algorithm that labels or categorises every pixel in an image?
When you are trying to find items that are similar what algorithm would you use?
K-nearest neighbour
What does the linear learner algorithm show?
How a change in an independent variable affects a dependant variable.
What type of problem is random cut forest used for predominately?
Classification
What sagemaker algorithm supports recommendations?
Factorisation Machines
What SageMaker algorithm supports regression
Linear Learner
What 4 types of problem can XGBoost be used to solve?
Regression, Binary Classification, Multi-class classification and Ranking
What format should the training data be in for XGBoost
CSV or libsvm
What is Random Cut Forest used for?
to identify anomalies in data (ie find fraud)
How does Random Cut Forest find an anomaly?
It provides a score for each data point. A low score = similar to most of the data, high score = anomaly
What format should training data for Random Cut Forest be in?
CSV or x-recordio-protobuf format
For online testing what type of data should you use?
live data
For offline testing what sort of data should you use?
historical data
When you perform offline testing of your models which endpoints should you deploy your trained models to?
alpha endpoints
When using online testing which endpoint should you deploy your trained models to?
SageMaker endpoint
When trying to select the correct trained model for real-time ml what steps would you take?
Deploy your models to SageMaker endpoint, then send a portion of live data to each ,model and finally evaluate each model.
What is object detection used for?
to identify all instances of an object within an image
How does object detection give the location of a particular object?
It uses a bounding box
What type of ML algorithm is Object detection?
Supervised
What format is recommended for Object detection training data ?
Apache MxNet recordIO
What is incremental training?
You seed the training data with a previously trained model.
When would object detection not be a good idea?
For problems at scale
What is Latent Dirichlet Allocation used for?
Discovering a topic in a document
What algorithm would you use to classify millions of high-resolution images?
SageMaker built-in Image Classification
How does SageMaker’s built-in Image Classification work?
It uses a convolutional Neural Network to classify images that supports multi-label classification
What is a factorisation Machine primarily used for?
detect interactions between features ie reactions to ads on a web page or item recommendations
What are factorization machines used for?
Classification and regression