Misc. Items from wrong answers Flashcards
Can you use AWS Rekognition for Semantic Segmentation?
No. It really only supports classification and assigning a label using object detection. Not robust enough.
What SageMaker DataWrangler scaler is best used to deal with outliers?
Robust Scaler
What is the three step order to create and share a feature group in Sagemaker Feature Store?
Create a feature Group
Associate the feature group to an online store.
Ingest the data into the offline store in streaming mode.
What is the SageMaker Feature Store option Online store for?
It is used for low-latency and high-availability that provides real-time lookups of features. Good for inference.
What is the SageMaker Feature Store option Offline store for?
Historical data when sub-second retrieval is not needed. Supports Batch jobs. Used for exploration, model training, and batch inference.
Can SageMaker Feature Store have FeatureGroups that can support both online and offline modes?
Yes. They will actually sync.
What are the two ways that SageMaker Feature Store ingests data?
Streaming and Batches.
What process happens when you use SageMaker Feature Store Streaming Mode and you update a feature?
The records are pushed by calling a synchronous PutRecord API call.
What does the Split Data function in SageMaker Data Wrangler do?
It splits your dataset into two or three datasets.
What does the Randomized Split function in SageMaker Data Wrangler do?
Each split is a random non-overlapping sample of the original dataset
What does the Ordered Split function in SageMaker Data Wrangler do?
Splits the dataset based on the sequential order of the observations
What does the Stratified Split function in SageMaker Data Wrangler do?
Splits the dataset to make sure that the number of observations in the input column have proportional representation.
Which split is good for time series data?
Ordered Split
Which split is good for preventing data leakage?
Ordered Split
Which split is good for reducing bias?
Ordered split
What is the best correlation metric to investigate non-linear relationships between numeric features?
Spearman
What is the chi-square correlation metric?
The chi square is a statistical test assessing association between categorical variables.
What is the Phi correlation metric?
The phi coefficient assesses correlation between binary variables.
What correlation metric should be used for linear relationships of numeric features?
Pearson
Is Random Cut Forest a good algorithm for fraud detection?
Yes. RCF can identify outliers or unusual patterns within large datasets without the need for labeled data.
Is F1 good to use for classification models?
Yes