Compute Flashcards
Explain the differences between On-demand, Spot, and Reserved instances, beyond just cost.
- On-demand: Flexible, pay-as-you-go, ideal for unpredictable workloads.
- Spot: Lowest price, can be interrupted, perfect for fault-tolerant or short-lived tasks. Requires careful consideration of interruption handling.
- Reserved: Long-term commitment, lower cost than On-demand, suitable for stable applications and databases. Offers different payment options (No Upfront, Partial Upfront, All Upfront) with varying discounts.
Apart from cost, name at least two other factors to consider when choosing an EC2 instance type.
Performance (CPU, memory, storage), network performance, Availability Zones, operating system, security features (like encryption support). Consider the specific requirements of your application.
How does AWS Auto Scaling work with services like EMR? What are the benefits?
Auto Scaling dynamically adjusts the number of EC2 instances in your EMR cluster based on workload demands. Benefits include:
- Efficiency: Ensures optimal resource allocation for processing.
- Cost optimization: Scales down when demand is low, saving costs.
- Fault tolerance: Automatically replaces unhealthy instances.
Describe the role of Master Node, Core Node, and Task Node in an EMR cluster, highlighting their differences.
- Master Node: Manages the cluster, tracks job progress, and schedules tasks. There’s only one Master Node per cluster.
- Core Node: Stores data in HDFS (Hadoop Distributed File System) and runs tasks. Contributes to both storage and processing.
- Task Node: Only runs tasks, does not store HDFS data. Useful for increasing processing capacity without adding more storage.
What is AWS Graviton designed for? What are its advantages?
Graviton is a family of Arm-based processors designed by AWS for optimal performance and cost-efficiency on the AWS cloud. Advantages include:
- Price-performance: Often provides a better price-to-performance ratio than x86-based instances.
- Energy efficiency: Can lead to lower energy consumption and a reduced carbon footprint.
- Wide service support: Powers a growing number of AWS services.
Compare the different Graviton instance families (M, C, R, X, I, G) and provide example use cases for each.
- M (General purpose): Web servers, microservices, application servers (M7, T4).
- C (Compute optimized): High-performance computing (HPC), batch processing, gaming (C7, C6).
- R (Memory optimized): In-memory databases, real-time analytics (R7, X2).
- X (Memory optimized): HPC, large-scale data analysis, in-memory databases requiring high memory bandwidth (R7, X2).
- I (Storage optimized): NoSQL databases, transactional workloads, data warehousing (Im4, Is4).
- G (Accelerated computing): Machine learning (ML) inference, graphics-intensive applications, game streaming (G5).
What does “serverless” truly mean in the context of AWS Lambda? What are the key benefits of serverless architectures?
“Serverless” means you don’t have to manage servers, operating systems, or infrastructure. AWS handles all of that. Benefits:
- Focus on code: Developers can concentrate on writing code instead of managing infrastructure.
- Automatic scaling: Lambda automatically scales your application based on demand.
- Cost-efficiency: Pay only for the compute time you consume.
- Reduced operational overhead: No server maintenance, patching, or monitoring.
Describe how Lambda integrates with other AWS services, using at least two specific examples.
Lambda integrates seamlessly with many AWS services:
- With API Gateway: Create serverless APIs where Lambda functions handle requests.
- With S3: Trigger Lambda functions when objects are uploaded or modified in S3 buckets.
- With Kinesis: Process streaming data in real-time using Lambda.
- With DynamoDB: Use Lambda to process data changes in DynamoDB tables (e.g., trigger a function when a new item is added).
What are some best practices for optimizing Lambda function performance?
- Minimize package size: Smaller deployment packages lead to faster cold starts.
- Optimize memory allocation: Choose the right memory setting for your function’s needs.
- Use environment variables: Store configuration data in environment variables instead of hardcoding them.
- Implement efficient code: Write code that executes quickly and avoids unnecessary operations.
- Leverage concurrency controls: Manage concurrency to avoid overloading downstream resources.
Explain the concept of “cold starts” in Lambda. How can you mitigate their impact?
A cold start happens when a Lambda function is invoked for the first time or after a period of inactivity. It takes longer because AWS has to set up the execution environment. Mitigation strategies:
- Provisioned concurrency: Keep functions “warm” by provisioning a minimum number of instances.
- Optimize function code: Reduce initialization time by minimizing dependencies and optimizing code.
- Keep function packages small: Smaller packages download and initialize faster.
What are some common challenges and pitfalls when working with Lambda? How can you address them?
- Cold starts: (See previous card for mitigation)
- Timeouts: Functions have a maximum execution time. Break down long-running tasks or use alternative services like ECS or EC2.
- State management: Lambda functions are stateless. Use external services like DynamoDB or S3 to store state.
- Debugging and monitoring: Can be more challenging than traditional applications. Use CloudWatch Logs, X-Ray, and other monitoring tools.
Why is the COPY command the preferred method for loading data into Amazon Redshift?
COPY is optimized for high-throughput data loading. It’s significantly faster than INSERT statements, especially for large datasets. It can also handle various data formats and sources.
How does Lambda interact with Kinesis streams? What are some considerations for processing Kinesis data with Lambda?
Lambda processes Kinesis data in batches.
- Batch size: Configure the batch size to balance throughput and processing time. Large batches can cause timeouts.
- Payload limit: Batches may be split if they exceed Lambda’s payload limit (6 MB).
- Error handling: Lambda retries failed batches, which can stall the shard. Use appropriate error handling and multiple shards.
- Synchronous processing: Lambda processes shard data synchronously, meaning it waits for processing to complete before fetching the next batch.
Explain the factors that contribute to the cost of running Lambda functions.
- Requests: You are charged for the number of times your function is invoked.
- Compute time: The duration of your function’s execution, measured in milliseconds.
- Memory allocated: The amount of memory you allocate to your function. Higher memory usually means faster execution but higher cost.
- Data transfer: Data transfer in and out of your function can incur costs.
How does Lambda handle errors? What are the implications for function design?
Lambda automatically retries failed executions 3 times by default. This is important for transient errors. For more persistent errors, implement proper error handling, logging, and potentially dead-letter queues (DLQs) to avoid infinite retries.