PAST QUESTIONS Flashcards
Which python libraries are best for transforming data by changing raw feature vectors into a format best suited for a SageMaker batch transform job to generate a forecast?
Pandas + Scikit-learn
What is the best python library for data wrangling and manipulating tabular data such as CSV?
Pandas
What python library is the best for transforming raw feature vectors into a format suitable for downstream estimators?
Scikit-learn
What python libraries would you use for data visualisation (no data transformation)?
Matplotlib + Plotly
What python library is used to interface with AWS services such as S3, DynamoDB SQS etc?
Boto3
Does Boto3 have data transformation function?
No, it merely interfaces with AWS services
What is best used for text tagging, classification and tokenisation but not manipulating data?
Natural Language Toolkit (NLTK)
What is the best python library for crawling websites to gather structured data?
Scrapy
What hyperparameter setting would you use to get SageMaker Linear Learner algorithm to produce discrete results?
Set predictor_type to binary_classifier
When using XGBoost what hyperparameter would you set and what would be its value to produce a logistic regression ?
Set objective to reg:logistic
What hyperparameter setting would you use to get SageMaker Linear Learner algorithm to produce quantitative results?
set the predictor_type to regressor
What would you use Kinesis Data Streams Naive Bayes Classifier for?
You wouldn’t as it does not exist. KDS has no machine learning capabilities.
When using XGBoost what hyperparameter would you set and what would be its value to produce quantitative answers ?
set the objective to reg:linear
Does Kinesis Data Analytics provide nearest neighbour?
No, but it does provide Hotspots on streams which detect higher than normal activity using the distance between hotspot and its nearest neighbour. It does not provide ML model update categories.
Which algorithm would work well for near-real-time updates to the model?
Kinesis Data Analytics Random Cut Forest
When would you use SageMaker Random Cut Forest?
Large batch data sets where you don’t need to update the model frequentl
How would you use AWS Glue in the best way to build a data schema?
Use Glue crawlers to crawl your ride share data
The Rekognition model is not able to recognise visitors to a building what might be the issue?
Face collection contents. Store multiple images of the same person with different positions, glasses and posses to make it more successful.
What are Face landmarks?
Face landmarks are a set of salient points usually located at the corners, tips and midpoints of key facial components like eyes, lips and nose which Amazon Recognition uses.
Could the Face Landmarks filter sharpness impact Rekognitions sucess?
No. Fade landmarks have no sharpness parameter
How could setting the confidence threshold tolerance to low impact Rekognition performance?
It could cause a failure in Rekognition
What Amazon service could you use to produce a dashboard instead of coding a React or Angular UI?
Amazon QuickSight
You are using a regression decision tree. As you train your model you see it is overfitting to your training data. How can you improve your situation and get better training results more efficiently?
Use a random forest by building multiple randomised decision trees and averaging their outputs to get the predictions.