Past paper 3 Flashcards
What is Apache Spark?
distributed processing framework and programming model that helps you do machine learning, stream processing, or graph analytics using Amazon EMR clusters.
What is Amazon EMR?
a web service that makes it easy for you to process and analyze vast amounts of data using applications in the Hadoop ecosystem
What could you use you transform data into RecordIO-Protobuf format?
Apache Spark
WHat is AWS Glue?
serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development
Can AWS Glue transform data into RecordIO-Protobuf format?
No it cannot
What is AWS Step Functions?
a low-code visual workflow service used to orchestrate AWS services, automate business processes and build serverless applications
What is Lambda not suited for?
Long -running processes such as transforming large datasets
What is Kinesis Firehose used for?
capture, transform, and load streaming data into Amazon repositories
Which Amazon repositories can Kinesis Firehose load streaming data into?
Amazon S3, Amazon Redshift, Amazon Elasticsearch Service and Splunk
What type of processing should Kinesis Firehose not be used for?
Batch processing
What does a VPC endpoint allow connections between?
a virtual private cloud (VPC) and supported services, such as SageMaker, without requiring that you use an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection.
What is an interface endpoint?
an elastic network interface with a private IP address from the IP address range of your subset.W
What traffic does an interface endpoint serve?
traffic destined for a service that is owned by AWS or owned by an AWS customer or partner.
What is a gateway endpoint used for?
used for traffic destined into either S3 or DynamoDB
What is the AWS Key Management Service used for?
to create and control the cryptographic keys used to protect your data
What is a permission boundary?
an advanced feature for using a managed policy to set the maximum permissions that an identity-based policy can grant an IAM entity
What would you use to encrypt data at rest?
AWS Key Management Service to manage encryption keys
How would you limit the the permissions of the root user?
AWS Organizations service control policy (SCP)
How would you connect SageMaker API or to the SageMaker Runtime?
though an interface endpoint in your VPC.
Define residual
the error between the predicted value and the observed actual value.
When is a linear model not suitable for a problem? (residuals)
When the variance is not constant. (Residuals do not form a zero-centred bell-curve
You are training a neural network and notice it is scoring highly on the training data but not the test. What do you do?
- Use dropout
- Use early stopping while training
- Add parameter regularization
What is regularization?
a set of different techniques that lower the complexity of a neural network model during training, and thus prevent the overfitting
What does L1 (lasso) regularization do?
unimportant features get weights of zero
What does L2 (ridge) regularization do?
unimportant features weights are forced NEAR zero (not zero)