Compute Flashcards

1
Q

Explain the differences between On-demand, Spot, and Reserved instances, beyond just cost.

A
  • On-demand: Flexible, pay-as-you-go, ideal for unpredictable workloads.
  • Spot: Lowest price, can be interrupted, perfect for fault-tolerant or short-lived tasks. Requires careful consideration of interruption handling.
  • Reserved: Long-term commitment, lower cost than On-demand, suitable for stable applications and databases. Offers different payment options (No Upfront, Partial Upfront, All Upfront) with varying discounts.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Apart from cost, name at least two other factors to consider when choosing an EC2 instance type.

A

Performance (CPU, memory, storage), network performance, Availability Zones, operating system, security features (like encryption support). Consider the specific requirements of your application.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does AWS Auto Scaling work with services like EMR? What are the benefits?

A

Auto Scaling dynamically adjusts the number of EC2 instances in your EMR cluster based on workload demands. Benefits include:

  • Efficiency: Ensures optimal resource allocation for processing.
  • Cost optimization: Scales down when demand is low, saving costs.
  • Fault tolerance: Automatically replaces unhealthy instances.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the role of Master Node, Core Node, and Task Node in an EMR cluster, highlighting their differences.

A
  • Master Node: Manages the cluster, tracks job progress, and schedules tasks. There’s only one Master Node per cluster.
  • Core Node: Stores data in HDFS (Hadoop Distributed File System) and runs tasks. Contributes to both storage and processing.
  • Task Node: Only runs tasks, does not store HDFS data. Useful for increasing processing capacity without adding more storage.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is AWS Graviton designed for? What are its advantages?

A

Graviton is a family of Arm-based processors designed by AWS for optimal performance and cost-efficiency on the AWS cloud. Advantages include:

  • Price-performance: Often provides a better price-to-performance ratio than x86-based instances.
  • Energy efficiency: Can lead to lower energy consumption and a reduced carbon footprint.
  • Wide service support: Powers a growing number of AWS services.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Compare the different Graviton instance families (M, C, R, X, I, G) and provide example use cases for each.

A
  • M (General purpose): Web servers, microservices, application servers (M7, T4).
  • C (Compute optimized): High-performance computing (HPC), batch processing, gaming (C7, C6).
  • R (Memory optimized): In-memory databases, real-time analytics (R7, X2).
  • X (Memory optimized): HPC, large-scale data analysis, in-memory databases requiring high memory bandwidth (R7, X2).
  • I (Storage optimized): NoSQL databases, transactional workloads, data warehousing (Im4, Is4).
  • G (Accelerated computing): Machine learning (ML) inference, graphics-intensive applications, game streaming (G5).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does “serverless” truly mean in the context of AWS Lambda? What are the key benefits of serverless architectures?

A

“Serverless” means you don’t have to manage servers, operating systems, or infrastructure. AWS handles all of that. Benefits:

  • Focus on code: Developers can concentrate on writing code instead of managing infrastructure.
  • Automatic scaling: Lambda automatically scales your application based on demand.
  • Cost-efficiency: Pay only for the compute time you consume.
  • Reduced operational overhead: No server maintenance, patching, or monitoring.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe how Lambda integrates with other AWS services, using at least two specific examples.

A

Lambda integrates seamlessly with many AWS services:

  • With API Gateway: Create serverless APIs where Lambda functions handle requests.
  • With S3: Trigger Lambda functions when objects are uploaded or modified in S3 buckets.
  • With Kinesis: Process streaming data in real-time using Lambda.
  • With DynamoDB: Use Lambda to process data changes in DynamoDB tables (e.g., trigger a function when a new item is added).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some best practices for optimizing Lambda function performance?

A
  • Minimize package size: Smaller deployment packages lead to faster cold starts.
  • Optimize memory allocation: Choose the right memory setting for your function’s needs.
  • Use environment variables: Store configuration data in environment variables instead of hardcoding them.
  • Implement efficient code: Write code that executes quickly and avoids unnecessary operations.
  • Leverage concurrency controls: Manage concurrency to avoid overloading downstream resources.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain the concept of “cold starts” in Lambda. How can you mitigate their impact?

A

A cold start happens when a Lambda function is invoked for the first time or after a period of inactivity. It takes longer because AWS has to set up the execution environment. Mitigation strategies:

  • Provisioned concurrency: Keep functions “warm” by provisioning a minimum number of instances.
  • Optimize function code: Reduce initialization time by minimizing dependencies and optimizing code.
  • Keep function packages small: Smaller packages download and initialize faster.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some common challenges and pitfalls when working with Lambda? How can you address them?

A
  • Cold starts: (See previous card for mitigation)
  • Timeouts: Functions have a maximum execution time. Break down long-running tasks or use alternative services like ECS or EC2.
  • State management: Lambda functions are stateless. Use external services like DynamoDB or S3 to store state.
  • Debugging and monitoring: Can be more challenging than traditional applications. Use CloudWatch Logs, X-Ray, and other monitoring tools.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is the COPY command the preferred method for loading data into Amazon Redshift?

A

COPY is optimized for high-throughput data loading. It’s significantly faster than INSERT statements, especially for large datasets. It can also handle various data formats and sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does Lambda interact with Kinesis streams? What are some considerations for processing Kinesis data with Lambda?

A

Lambda processes Kinesis data in batches.

  • Batch size: Configure the batch size to balance throughput and processing time. Large batches can cause timeouts.
  • Payload limit: Batches may be split if they exceed Lambda’s payload limit (6 MB).
  • Error handling: Lambda retries failed batches, which can stall the shard. Use appropriate error handling and multiple shards.
  • Synchronous processing: Lambda processes shard data synchronously, meaning it waits for processing to complete before fetching the next batch.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain the factors that contribute to the cost of running Lambda functions.

A
  • Requests: You are charged for the number of times your function is invoked.
  • Compute time: The duration of your function’s execution, measured in milliseconds.
  • Memory allocated: The amount of memory you allocate to your function. Higher memory usually means faster execution but higher cost.
  • Data transfer: Data transfer in and out of your function can incur costs.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does Lambda handle errors? What are the implications for function design?

A

Lambda automatically retries failed executions 3 times by default. This is important for transient errors. For more persistent errors, implement proper error handling, logging, and potentially dead-letter queues (DLQs) to avoid infinite retries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the purpose of the concurrency limit in Lambda? How can you manage concurrency for your functions?

A

The concurrency limit prevents a single function from consuming excessive resources and impacting other applications. You can:

  • Reserve concurrency: Set a specific concurrency limit for your function.
  • Use provisioned concurrency: Ensure a minimum level of concurrency to reduce cold starts.
  • Implement throttling mechanisms: Control the rate of requests to your function.
17
Q

What are the key differences between AWS Fargate and AWS Lambda? When would you choose one over the other?

A
  • Fargate: Runs containers, provides more control over the environment, suitable for long-running applications and microservices.
  • Lambda: Runs code, more granular pricing, ideal for event-driven and short-lived tasks.
18
Q

Compare and contrast Amazon ElastiCache, Amazon RDS, and Amazon DynamoDB. What types of applications are they best suited for?

A
  • ElastiCache: In-memory data store (Redis, Memcached), ideal for caching, session management, and real-time analytics.
  • RDS: Managed relational databases (MySQL, PostgreSQL, Oracle, etc.), suitable for transactional workloads and applications requiring ACID properties.
  • DynamoDB: NoSQL database, highly scalable and available, good for applications with flexible data models and high write throughput.
19
Q

Explain the use cases for Amazon OpenSearch Service, AWS Data Pipeline, AWS Glue, Amazon Redshift, Amazon S3, and Amazon Kinesis.

A
  • OpenSearch Service: Search and analytics, log analytics, real-time application monitoring.
  • Data Pipeline: Automating data movement and transformation between different data sources.
  • Glue: Data integration, ETL (Extract, Transform, Load), data cataloging.
  • Redshift: Data warehousing, large-scale data analysis, business intelligence.
  • S3: Object storage, data lakes, backups, archives, content distribution.
  • Kinesis: Real-time data streaming, log ingestion, application activity tracking.