AWS practice test questions Flashcards

Question

A financial planning company is using the Amazon SageMaker endpoint with an Auto Scaling policy to serve its forecasting model to the company’s customers to help them plan for retirement. The team wants to update the endpoint with its latest forecasting model, which has been trained using Amazon SageMaker training jobs. The team wants to do this without any downtime and with minimal change to the code. What steps should the team take to update this endpoint? 1. Use a new endpoint configuration with the latest model Amazon S3 path in the UpdateEndpoint API. 2. De-register the endpoint as a scalable target. Update the endpoint using a new endpoint configuration with the latest model Amazon S3 path. Finally, register the endpoint as a scalable target again. 3. Update the endpoint using a new configuration with the latest model Amazon S3 path. Then, register the endpoint as a scalable target. 4. Create a new endpoint using a new configuration with the latest model. Then, register the endpoint as a scalable target.

Answer 1

2. De-register the endpoint as a scalable target. Update the endpoint using a new endpoint configuration with the latest model Amazon S3 path. Finally, register the endpoint as a scalable target again.* 1. Use a new endpoint configuration with the latest model Amazon S3 path in the UpdateEndpoint API. (Using a new endpoint configuration will not have Auto Scaling enabled.) 3. Update the endpoint using a new configuration with the latest model Amazon S3 path. Then, register the endpoint as a scalable target. (Before you can update the endpoint, you need to deregister the endpoint as a scalable target.) 4. Create a new endpoint using a new configuration with the latest model. Then, register the endpoint as a scalable target. (Creating a new endpoint will mean that they have to update the code every time the model changes, which isn’t scalable.)

Answer 2

3. Use the Build Your Own Container (BYOC) Amazon SageMaker option. Create a new Docker container with the existing code. Register the container in Amazon Elastic Container Registry (ECR). Finally, run the training and inference jobs using this container.* 1. Use the Amazon pre-built R container option and port the existing code over to the container. Register the container in Amazon Elastic Container Registry (Amazon ECR). Finally, run the training and inference jobs using this container. (Amazon SageMaker’s pre-built containers do not include a container supporting code written in R.) 2. Use Amazon in-built algorithms to run their training and inference jobs. (Amazon SageMaker built-in algorithms does not support using custom code.) 4. Create a new Amazon SageMaker notebook instance. Copy the existing code into an Amazon SageMaker notebook. Then, run the pipeline from this notebook. (Amazon SageMaker notebook instances support code written in R, but the notebook won’t be scalable or used in a pipeline.)

Answer 3

3. Amazon Textract* 1. A custom CNN model (You will need to manage the process if you are using a convolutional neural network model, which goes against one of the requirements of this question.) 2. An LSTM model (A LSTM model is not the right model to use for this use case because it generally works with sequences, rather than images.) 4. Amazon Personalize (Amazon Personalize cannot read and extract data from images.)

Answer 4

2. Use Amazon Rekognition Video to extract metadata from the videos* 1. Use Amazon SageMaker to create an ML model that extracts metadata from the videos (Amazon SageMaker requires management of the pipeline, which does not satisfy the company’s requirements.) 3. Use Amazon Kinesis Video Streams to stream the videos to Amazon EMR in order to create an ML model (Amazon EMR requires management of the pipeline, but the company wanted to avoidmanagement.) 4. Use AWS Batch to transform a batch of video files into metadata (AWS Batch does not have a function to transform video files to metadata, which is required in this situation.)

Answer 5

1. Amazon Lex to parse the utterances and intent of customer comments, Amazon Polly to reply to the customers* (This is the right blend of services.) 2. Amazon Polly to parse the utterances and intent of customer comments, Amazon Lex to reply to the customers (Amazon Polly should be used to reply to customers, rather than to parse utterances and intent. Amazon Lex can’t reply to customers.) 3. Amazon Transcribe to parse the utterances and intent of customer comments, Amazon Lex to reply to the customers (Amazon Transcribe can transcribe videos with text but not parse utterances. Amazon Lex can’t reply to customers.) 4. Amazon Transcribe to parse the utterances and intent of customer comments, Amazon Polly to reply to the customers (Amazon Polly is the right choice to reply to customers, but Amazon Transcribe does not parse utterances.)

Answer 6

1. Make sure the AWS Identity and Access Management (IAM) role used for Amazon S3 access has permissions to encrypt and decrypt the data with the AWS KMS key. * 2. Add “S3:*” to the IAM role that is attached to the Amazon SageMaker training job. (Adding “S3:*” wildcard is not a good security practice. It will not read encrypted AWS KMS data into Amazon SageMaker.) 3. Specify the “VolumeKmsKeyId” in the Amazon SageMaker training job. (VolumeKmsKeyId helps in encrypting data on the training job instance storage, not on Amazon S3.) 4. Add “EnableKMS” to the Amazon SageMaker training job. Then, specify the Amazon S3 bucket that includes the data. (There is no option to enable AWS KMS that will read at-rest data in an encrypted Amazon S3 bucket.)

Answer 7

2. Enable AWS CloudTrail.* 1. Use IAM roles. “logs:*” are added to those IAM roles. (The “logs:*” option in IAM roles will allow permission for Amazon CloudWatch logs, but not API calls.) 3. Enable CloudWatch logs. (CloudWatch logs will enable logs from the training instances, but not show API calls.) 4. Use the Amazon SageMaker SDK to call the ‘sagemaker_history()’ function. (There is no function that shows the Amazon SageMaker API calls in the Amazon SageMaker SDK.)

Answer 8

2. Create an endpoint configuration with production variants for the two models with equal weights.* 1. Create a Lambda function that preprocesses the incoming data, calls the two Amazon SageMaker endpoints for the two models, and finally returns the prediction. (Creating a Lambda function and having two Amazon SageMaker endpoints will require more management than an ideal solution.) 3. Create an endpoint configuration with production variants for the two models with a weight ratio of 90:10. (Creating a 90:10 variant will give more prediction power to one model than the other model, which might skew the result.) 4. Create a Lambda function that downloads the models from Amazon S3 and calculates and returns the predictions of the two models. (AWS Lambda can be used for prediction as well, but it has a 15- minute runtime limit and does not include GPU instances.)

Answer 9

3. Create an endpoint configuration with production variants for the two models with a weight ratio of 0:1. Update the weights periodically.* 1. Create an endpoint configuration with production variants for the two models with equal weights. (Creating equal weights is not how a canary deployment works. Canary deployment adds the new model or new deployment in small iterations.) 2. Create two Amazon SageMaker endpoints and change the endpoint URL after testing the new endpoint. (Creating two Amazon SageMaker endpoints will entail manual load to compare metrics.) 4. Create an endpoint configuration with production variants for the two models with a weight ratio of 10:90. (Creating an endpoint configuration with a weight ration of 10:90 will not satisfy the canary deployment technique, because canary deployment should start with either 100:0 or 90:10. This is done so that the original production model takes most of the load while you test the new model in production to see if there are any errors.)

Answer 10

1. Create a data lake using Amazon S3 as the data storage layer* 2. Store unstructured data in Amazon DynamoDB and structured data in Amazon RDS (Having two storage layers like this breaks the centralized repository requirement in this question.) 3. Use Amazon FSx to host the data for training (Amazon FSx should not be used for workloads and is too costly for a permanent storage solution.) 4. Use Amazon Elastic Block Store (Amazon EBS) volumes to store the data with data backup (Amazon EBS data backups are not highly available, which is one of the requirements in this question.)

Answer 11

4. Amazon Kinesis Data Analytics* 1. Amazon CloudWatch (CloudWatch does not offer an anomaly detection solution.) 2. Amazon SageMaker (An Amazon SageMaker training job needs processed data stored in Amazon S3 to train the model. It cannot train the model on streaming data.) 3. Amazon EMR with Spark (Amazon EMR would require more than just a minimal change in the pipeline to stream the data to an EMR instance.)

Answer 12

3. Use AWS Glue to transform files. Use Amazon S3 as the destination.* 1. Use Amazon EMR to transform files. Use Amazon S3 as the destination. (Amazon EMR is a good option, but the service still requires management of security settings and other management tasks —so this solution doesn’t meet the company’s requirements. 2. Use Lambda to transform files. Use Amazon EMR HDFS as the destination. (Lambda can transform the data, but that would require changing the code to Python, which increases labor instead of decreasing it.) 4. Use AWS Glue to transform files. Use Amazon EMR HDFS as the destination. (Amazon EMR HDFS requires spinning up and maintaining an Amazon EMR cluster to store the data. But the service still requires management of security settings and other management tasks—so this solution doesn’t meet the company’s requirements.)

Answer 13

3. Fill the missing values with zeros* 4. Impute the missing values using regression* 1. Remove the rows containing the missing values (The dataset is small enough where removing 20% of the data might result is loss of valuable information inside those rows) 2. Remove the columns containing the missing values (This approach causes the loss of data features, and, in this case, there are only three feature columns.) 5. Add regularization to the model (Adding regularization helps with overfitting, not missing data.)

Answer 14

3. Vectorize the sentences. Transform them into numerical sequences with a padding. Use the sentences as the input.* 1. Convert the individual sentences into sequences of words. Use those as the input. (It is more effective to vectorize the sentences to capture relationships across words than it is to convert the sentences into sequences of words.) 2. Convert the individual sentences into numerical sequences starting from the number 1 for each word in a sentence. Use the sentences as the input. (Using a numerical sequence for each word in a sentence will retain the placing of the word in the sentence, but will lose the actual word itself, which needs to be coded in.) 4. Vectorize the sentences. Transform them into numerical sequences. Use the sentences as the input. (The sentences will need to be padded, because the algorithm expected a fixed vector length and each sentence will not be the same length.)

Answer 15

2. As variable 1 increases, variable 5 decreases* 1. As variable 1 increases, variable 5 increases (The question’s correlation coefficient indicates a negative correlation between the two variables. This answer option represents a positive correlation.) 3. Variable 1 does not have any influence on variable 5 (The question’s correlation coefficient indicates a relationship between the two variables: in this case, a negative correlation.) 4. The data is not sufficient to make a well-informed interpretation (There is sufficient data to draw a conclusion here, which is that there is a negative correlation between the two variables.)

Answer 16

1. Regression* (Regression analysis is the right answer, because the company wants to predict the final house price (independent variable) depending on various cities (dependent variable).) 2. Classification (Classification cannot be used for this, because the company wants to predict a number for the sales price rather than a category.) 3. Recommender system (A recommender system doesn’t fit this use case, because a number needs to be predicted.) 4. Reinforcement learning (Reinforcement learning doesn’t fit the use case in this questio, because we already have historical data.)

Answer 17

1. Amazon SageMaker DeepAR* (Amazon Sagemaker DeepAR is a supervised learning algorithm designed for time series forecasting problems. Given the situation laid out in this question, this is the ideal algorithm to use.) 2. SciKit Learn Regression (This is a linear regression algorithm, which does not fit well in this question given that it’s a time series forecasting problem.) 3. Convolutional neural network (CNN) (A CNN is a class of deep neural networks most commonly applied to analyzing visual imagery. It would not be appropriate on its own for a time series forecasting problem.) 4. Scikit Learn Random Forest (Random forest is a very popular tree-based algorithm used for either classification or regression problems, but not for time series forecasting.)

Answer 18

2. Instance segmentation* 1. Image classification ((Image classification will not detect each distinct object in the image. It will only classify one distinct image.) 3. Image localization (Object or image localization tries to locate the main (or most visible) object in an image, but won’t detect each distinct object.) 4. Semantic segmentation (Semantic segmentation is the process of linking each pixel in an image to a class label. These labels could include a person, car, flower, piece of furniture, etc., just to mention a few. Think of semantic segmentation as image classification at the pixel level. And like image classification, semantic segmentation will not detect each distinct object in the image.)

Answer 19

1. Normalize the images before training * 2. Add batch normalization* 3. Add more epochs (The model already is suffering from convergence issues, so increasing the epochs won’t help with convergence.) 4. Increase batch size (Increasing the batch size will generally makes the convergence worse.) 5. Decrease weight decay (Weight decay is generally used for regularization and overfitting, and, therefore, wont help with convergence issues)

Answer 20

1. Use Amazon SageMaker tuning jobs to tune the hyperparameters used* 2. Increase the batch size to improve the score in the Amazon SageMaker training job (Increasing batch size may or may not help with improving the F1 score.) 3. Use momentum to improve the training in the Amazon SageMaker training job (Increasing momentum generally helps with convergence but may not help with increasing your F1 score.) 4. Run the Amazon SageMaker training job for more epochs (Running more epochs will overfit the model and will not help with increasing the testing F1 score.)

Answer 21

2. Call the DescribeJob API to check the FailureReason option* 3. Go to Amazon CloudWatch logs and check the logs for the given training job* 1. Log into the Amazon SageMaker training job instance and check the job history (You cannot log into an Amazon SageMaker training job instance.) 4. Check the error in the given training job directly in the Amazon SageMaker console (The Amazon SageMaker console doesn't give you insight into what happens with a specific training job.) 5. Check AWS CloudTrail logs to check the error that caused the training to fail (AWS CloudTrail logs the API calls for Amazon SageMaker, but will not log the error.)

Answer 22

1. Amazon Comprehend* 2. Amazon Personalize (Amazon Personalize is not the right service for this use case, because it is a service that creates recommendations for customers.) 3. Amazon Textract (Amazon Textract extracts data, but not metadata, from images and PDFs using optical character recognition (OCR).) 4. Amazon Rekognition Image (Amazon Rekognition Image does not extract metadata from articles and blogs, but from images.)

Answer 23

2. SSH into the Deep Learning AMI instance with port forwarding at port 8888, start a Jupyter notebook application, and create a new ipython notebook* 1. SSH into the Deep Learning AMI instance, start a new Flask interface application, and create a new ipython notebook (SSH into the Deep Learning AMI will work, but Flask won’t create a new notebook.) 3. SSH into the Deep Learning AMI instance with port forwarding at port 8888 and start a python3.6 application to create a new ipython notebook (SSH into the Deep Learning AMI with port forwarding is the right option, but the Python application will open a Python terminal instead of a GUI that creates the *.ipynb notebook.) 4. SSH into the Deep Learning AMI instance with port forwarding at port 8080 and start a Zeppelin application to create a new ipython notebook (SSH into the deep learning AMI will work, but a Zeppelin application doesn’t create a *.ipynb notebook.)

Answer 24

2. Use Lambda to call InvokeEndpoint. Use the Amazon API Gateway URL to call the AWS Lambda function.* 1. Use Amazon SageMaker InvokeEndpoint with API Gateway (The Amazon SageMaker model cannot be called directly using API Gateway, but needs a compute resource like Lambda in between to call the endpoint.) 3. Create a function on an Amazon EC2 instance that uses CURL to call the InvokeEndpoint API. Call the Amazon EC2 instance from the website. (Using Amazon EC2 will require more maintenance than the requirement in the question states.) 4. Install the sagemaker-runtime library on the web server. Call InvokeEndpoint from the webserver. (Calling InvokeEndpoint from the web server puts the load of preprocessing language data on the web server, which can slow the website.)

Answer 25

A – Amazon SageMaker Pipe mode streams the data directly to the container, which improves the performance of training jobs. (Refer to this link for supporting information.) In Pipe mode, your training job streams data directly from Amazon S3. Streaming can provide faster start times for training jobs and better throughput. With Pipe mode, you also reduce the size of the Amazon EBS volumes for your training instances. B would not apply in this scenario. C is a streaming ingestion solution, but is not applicable in this scenario. D transforms the data structure.

Answer 26

A – There are 2 sentences, 8 unique unigrams, and 8 unique bigrams, so the result would be (2,16). The phrases are “Please call the number below” and “Please do not call us.” Each word individually (unigram) is “Please,” “call,” ”the,” ”number,” “below,” “do,” “not,” and “us.” The unique bigrams are “Please call,” “call the,” ”the number,” “number below,” “Please do,” “do not,” “not call,” and “call us.” The tf–idf vectorizer is described at this link.

Answer 27

B – AWS Glue is the correct answer because this option requires the least amount of setup and maintenance since it is serverless, and it does not require management of the infrastructure. Refer to this link for supporting information. A, C, and D are all solutions that can solve the problem, but require more steps for configuration, and require higher operational overhead to run and maintain.

Answer 28

B – It is most likely that the loss function is very curvy and has multiple local minima where the training is getting stuck. Decreasing the batch size would help the Data Scientist stochastically get out of the local minima saddles. Decreasing the learning rate would prevent overshooting the global loss function minimum. Refer to the paper at this link for an explanation.

Answer 29

D – The following calculations are required: TP = True Positive FP = False Positive FN = False Negative TN = True Negative FN = False Negative Recall = TP / (TP + FN) False Positive Rate (FPR) = FP / (FP + TN) Cost = 5 * FP + FN Options C and D have a recall greater than 80% and an FPR less than 10%, but D is the most cost effective. For supporting information, refer to this link.

Answer 30

B – Decreasing the class probability threshold makes the model more sensitive and, therefore, marks more cases as the positive class, which is fraud in this case. This will increase the likelihood of fraud detection. However, it comes at the price of lowering precision. This is covered in the Discussion section of the paper at this link.

Answer 31

C – With datasets that are not fully populated, the Synthetic Minority Over-sampling Technique (SMOTE) adds new information by adding synthetic data points to the minority class. This technique would be the most effective in this scenario. Refer to Section 4.2 at this link for supporting information.

Answer 32

D – Use supervised learning to predict missing values based on the values of other features. Different supervised learning approaches might have different performances, but any properly implemented supervised learning approach should provide the same or better approximation than mean or median approximation, as proposed in responses A and C. Supervised learning applied to the imputation of missing values is an active field of research. Refer to this link for an example.

Answer 33

B – In this case, a full review summary usually contains the most descriptive phrases of the entire review and is a valid stand-in for the missing full review text field.

Answer 34

D – Amazon SageMaker Object2Vec generalizes the Word2Vec embedding technique for words to more complex objects, such as sentences and paragraphs. Since the supervised learning task is at the level of whole claims, for which there are labels, and no labels are available at the word level, Object2Vec needs be used instead of Word2Vec.

AWS practice test questions Flashcards

(58 cards)