Compute Flashcards

Question 1

Q

Explain the differences between On-demand, Spot, and Reserved instances, beyond just cost.

Answer

A

On-demand: Flexible, pay-as-you-go, ideal for unpredictable workloads.
Spot: Lowest price, can be interrupted, perfect for fault-tolerant or short-lived tasks. Requires careful consideration of interruption handling.
Reserved: Long-term commitment, lower cost than On-demand, suitable for stable applications and databases. Offers different payment options (No Upfront, Partial Upfront, All Upfront) with varying discounts.

Question 2

Q

Apart from cost, name at least two other factors to consider when choosing an EC2 instance type.

Answer

A

Performance (CPU, memory, storage), network performance, Availability Zones, operating system, security features (like encryption support). Consider the specific requirements of your application.

Question 3

Q

How does AWS Auto Scaling work with services like EMR? What are the benefits?

Answer

A

Auto Scaling dynamically adjusts the number of EC2 instances in your EMR cluster based on workload demands. Benefits include:

Efficiency: Ensures optimal resource allocation for processing.
Cost optimization: Scales down when demand is low, saving costs.
Fault tolerance: Automatically replaces unhealthy instances.

Question 4

Q

Describe the role of Master Node, Core Node, and Task Node in an EMR cluster, highlighting their differences.

Answer

A

Master Node: Manages the cluster, tracks job progress, and schedules tasks. There’s only one Master Node per cluster.
Core Node: Stores data in HDFS (Hadoop Distributed File System) and runs tasks. Contributes to both storage and processing.
Task Node: Only runs tasks, does not store HDFS data. Useful for increasing processing capacity without adding more storage.

Question 5

Q

What is AWS Graviton designed for? What are its advantages?

Answer

A

Graviton is a family of Arm-based processors designed by AWS for optimal performance and cost-efficiency on the AWS cloud. Advantages include:

Price-performance: Often provides a better price-to-performance ratio than x86-based instances.
Energy efficiency: Can lead to lower energy consumption and a reduced carbon footprint.
Wide service support: Powers a growing number of AWS services.

Question 6

Q

Compare the different Graviton instance families (M, C, R, X, I, G) and provide example use cases for each.

Answer

A

M (General purpose): Web servers, microservices, application servers (M7, T4).
C (Compute optimized): High-performance computing (HPC), batch processing, gaming (C7, C6).
R (Memory optimized): In-memory databases, real-time analytics (R7, X2).
X (Memory optimized): HPC, large-scale data analysis, in-memory databases requiring high memory bandwidth (R7, X2).
I (Storage optimized): NoSQL databases, transactional workloads, data warehousing (Im4, Is4).
G (Accelerated computing): Machine learning (ML) inference, graphics-intensive applications, game streaming (G5).

Question 7

Q

What does “serverless” truly mean in the context of AWS Lambda? What are the key benefits of serverless architectures?

Answer

A

“Serverless” means you don’t have to manage servers, operating systems, or infrastructure. AWS handles all of that. Benefits:

Focus on code: Developers can concentrate on writing code instead of managing infrastructure.
Automatic scaling: Lambda automatically scales your application based on demand.
Cost-efficiency: Pay only for the compute time you consume.
Reduced operational overhead: No server maintenance, patching, or monitoring.

Question 8

Q

Describe how Lambda integrates with other AWS services, using at least two specific examples.

Answer

A

Lambda integrates seamlessly with many AWS services:

With API Gateway: Create serverless APIs where Lambda functions handle requests.
With S3: Trigger Lambda functions when objects are uploaded or modified in S3 buckets.
With Kinesis: Process streaming data in real-time using Lambda.
With DynamoDB: Use Lambda to process data changes in DynamoDB tables (e.g., trigger a function when a new item is added).

Question 9

Q

What are some best practices for optimizing Lambda function performance?

Answer

A

Minimize package size: Smaller deployment packages lead to faster cold starts.
Optimize memory allocation: Choose the right memory setting for your function’s needs.
Use environment variables: Store configuration data in environment variables instead of hardcoding them.
Implement efficient code: Write code that executes quickly and avoids unnecessary operations.
Leverage concurrency controls: Manage concurrency to avoid overloading downstream resources.

Question 10

Q

Explain the concept of “cold starts” in Lambda. How can you mitigate their impact?

Answer

A

A cold start happens when a Lambda function is invoked for the first time or after a period of inactivity. It takes longer because AWS has to set up the execution environment. Mitigation strategies:

Provisioned concurrency: Keep functions “warm” by provisioning a minimum number of instances.
Optimize function code: Reduce initialization time by minimizing dependencies and optimizing code.
Keep function packages small: Smaller packages download and initialize faster.

Question 11

Q

What are some common challenges and pitfalls when working with Lambda? How can you address them?

Answer

A

Cold starts: (See previous card for mitigation)
Timeouts: Functions have a maximum execution time. Break down long-running tasks or use alternative services like ECS or EC2.
State management: Lambda functions are stateless. Use external services like DynamoDB or S3 to store state.
Debugging and monitoring: Can be more challenging than traditional applications. Use CloudWatch Logs, X-Ray, and other monitoring tools.

Question 12

Q

Why is the COPY command the preferred method for loading data into Amazon Redshift?

Answer

A

COPY is optimized for high-throughput data loading. It’s significantly faster than INSERT statements, especially for large datasets. It can also handle various data formats and sources.

Question 13

Q

How does Lambda interact with Kinesis streams? What are some considerations for processing Kinesis data with Lambda?

Answer

A

Lambda processes Kinesis data in batches.

Batch size: Configure the batch size to balance throughput and processing time. Large batches can cause timeouts.
Payload limit: Batches may be split if they exceed Lambda’s payload limit (6 MB).
Error handling: Lambda retries failed batches, which can stall the shard. Use appropriate error handling and multiple shards.
Synchronous processing: Lambda processes shard data synchronously, meaning it waits for processing to complete before fetching the next batch.

Question 14

Q

Explain the factors that contribute to the cost of running Lambda functions.

Answer

A

Requests: You are charged for the number of times your function is invoked.
Compute time: The duration of your function’s execution, measured in milliseconds.
Memory allocated: The amount of memory you allocate to your function. Higher memory usually means faster execution but higher cost.
Data transfer: Data transfer in and out of your function can incur costs.

Question 15

Q

How does Lambda handle errors? What are the implications for function design?

Answer

A

Lambda automatically retries failed executions 3 times by default. This is important for transient errors. For more persistent errors, implement proper error handling, logging, and potentially dead-letter queues (DLQs) to avoid infinite retries.

Question 16

Q

What is the purpose of the concurrency limit in Lambda? How can you manage concurrency for your functions?

Answer

Study These Flashcards

A

The concurrency limit prevents a single function from consuming excessive resources and impacting other applications. You can:

Reserve concurrency: Set a specific concurrency limit for your function.
Use provisioned concurrency: Ensure a minimum level of concurrency to reduce cold starts.
Implement throttling mechanisms: Control the rate of requests to your function.

Question 17

Q

What are the key differences between AWS Fargate and AWS Lambda? When would you choose one over the other?

Answer

Study These Flashcards

A

Fargate: Runs containers, provides more control over the environment, suitable for long-running applications and microservices.
Lambda: Runs code, more granular pricing, ideal for event-driven and short-lived tasks.

Question 18

Q

Compare and contrast Amazon ElastiCache, Amazon RDS, and Amazon DynamoDB. What types of applications are they best suited for?

Answer

Study These Flashcards

A

ElastiCache: In-memory data store (Redis, Memcached), ideal for caching, session management, and real-time analytics.
RDS: Managed relational databases (MySQL, PostgreSQL, Oracle, etc.), suitable for transactional workloads and applications requiring ACID properties.
DynamoDB: NoSQL database, highly scalable and available, good for applications with flexible data models and high write throughput.

Question 19

Q

Explain the use cases for Amazon OpenSearch Service, AWS Data Pipeline, AWS Glue, Amazon Redshift, Amazon S3, and Amazon Kinesis.

Answer

Study These Flashcards

A

OpenSearch Service: Search and analytics, log analytics, real-time application monitoring.
Data Pipeline: Automating data movement and transformation between different data sources.
Glue: Data integration, ETL (Extract, Transform, Load), data cataloging.
Redshift: Data warehousing, large-scale data analysis, business intelligence.
S3: Object storage, data lakes, backups, archives, content distribution.
Kinesis: Real-time data streaming, log ingestion, application activity tracking.

Compute Flashcards

(19 cards)