S3, Databases and Analytics Flashcards

1
Q

What is Amazon S3? and how are files stored ?

A

Amazon S3 (Simple Storage Service) is an object storage service that allows people to store files (objects) in “buckets” (directories).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the naming conventions for S3 buckets?

A

S3 buckets must:

  • Have a globally unique name
  • Be 3-63 characters long
  • Not contain uppercase or underscores
  • Not start with the prefix xn–
  • Not end with -s3alias
  • Start with a lowercase letter or number
  • Not be an IP address
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the maximum size of an S3 object?

A

The maximum size of an S3 object is 5TB. For files larger than 5GB, you must use multi-part upload.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between the key and object in Amazon S3?

A

The key is the full path to the object in S3. The object is the content or file stored in the bucket. Example of key: s3://my-bucket/my_folder/my_file.txt.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the types of metadata that can be associated with S3 objects?

A

S3 objects can have:

  • Metadata (system or user-defined key-value pairs)
  • Tags (Unicode key-value pairs, up to 10 per object)
  • Version ID (if versioning is enabled)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the security options for Amazon S3?

A
  • User-based: IAM policies that control API access.
  • Resource-based: Bucket policies, Object ACLs, and Bucket ACLs.
  • Encryption: Server-side and client-side encryption.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What can an S3 bucket policy control?

A

S3 bucket policies (JSON-based) can:

  • Grant public access to a bucket
  • Force encryption of objects during upload
  • Grant cross-account access
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is S3 versioning and its benefits?

A

S3 versioning allows multiple versions of the same file to exist. Benefits include protection against accidental deletions and easy rollback to previous versions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two types of replication in S3?

A
  • Cross-Region Replication (CRR): Replicates objects to a bucket in a different region.
  • Same-Region Replication (SRR): Replicates objects within the same region.
    Both require versioning to be enabled.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the use cases for CRR and SRR in S3?

A
  • CRR: Compliance, lower latency, replication across accounts.
  • SRR: Log aggregation, replication between production and test accounts.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name the main S3 storage classes and their use cases.

A
  • S3 Standard: For frequently accessed data (e.g., big data analytics).
  • S3 Standard-IA: For infrequently accessed data (e.g., disaster recovery).
  • S3 One Zone-IA: For infrequently accessed data in a single AZ.
  • S3 Glacier: For archival data with varying retrieval times.
  • S3 Intelligent-Tiering: For automatic cost optimization.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is server-side encryption in Amazon S3? how is different to client side encryption

A

Server-side encryption means that S3 encrypts your files after they are uploaded to the server.

Client-side encryption: The client encrypts the data before sending it to S3 and is responsible for managing the encryption keys.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the AWS Snow Family?

A

The AWS Snow Family includes offline devices (Snowcone, Snowball, Snowmobile) for data migration to S3 and edge computing, used when transferring large amounts of data is impractical over the network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is AWS Storage Gateway?

A

AWS Storage Gateway is a hybrid cloud service that connects on-premises environments with S3, providing file, volume, and tape gateway options for backup, disaster recovery, and tiered storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is S3 Standard used for?

A

General-purpose storage class for frequently accessed and updated data with high durability and fast access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How many Availability Zones does S3 Standard store data in?

A

A minimum of three Availability Zones.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Can S3 Standard host static websites?

A

Yes, it can host static websites.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is S3 Standard-Infrequent Access (S3 Standard-IA) designed for?

A

Infrequently accessed data with lower storage costs but higher retrieval costs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is S3 Standard-IA commonly used for?

What is the minimum storage duration for S3 Standard-IA?

A

Long-term storage and backup.

30 days

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Where does S3 One Zone-Infrequent Access store data?

A

In a single AWS Availability Zone for cost savings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What kind of data is S3 One Zone-IA suitable for?

A

Less frequently accessed data and backups.

22
Q

What is S3 Intelligent-Tiering best suited for?

A

Data with unknown or changing access patterns.

23
Q

What is the additional cost associated with S3 Intelligent-Tiering?

A

A small monthly monitoring and automation fee per object.

24
Q

What is S3 Glacier Instant Retrieval used for? and how quickly can you retrieve objects?

A

Archiving data with fast retrieval times., you can retrieve within a few milliseconds

25
Q

What is S3 Glacier Flexible Retrieval designed for?

A

Low-cost archiving with retrieval times ranging from minutes to hours.

26
Q

What is S3 Glacier Deep Archive used for? and how long foes it take to retrive data?

A

The lowest-cost archival storage class, ideal for long-term archiving of rarely accessed data.
12 to 48 hours.

27
Q

What is S3 Outposts?

A

A service that allows S3 to run on-premises using AWS Outposts for local storage with access to AWS services.

28
Q

What does Amazon S3 Transfer Acceleration do?

A

It helps read and write data to S3 over long geographic distances with low latency.

29
Q

What are the four factors that S3 pricing is based on?

A
  • Total amount of data stored (in GB).
  • Storage class (e.g., S3 Standard, Glacier).
  • Data transferred out of AWS from S3.
  • Number of requests to S3.
30
Q

What is Amazon RDS (Relational Database Service)? and a disadvantage

A

Managed database service offering continuous backups, restore, monitoring, and Multi-AZ setups.

Disadvantage of RDS - Cannot SSH into the database.

31
Q

What is Amazon Aurora?

A

SQL serverless database for high peformance

32
Q

What is Amazon ElastiCache?

A

In-memory database with low latency, used to reduce load on databases for read-intensive workloads.

33
Q

What is Amazon DynamoDB?
How does DynamoDB integrate with AWS security?

A

A fully managed NoSQL, serverless, key/value database with low latency.

DynamoDB integrates with IAM for security.

34
Q

What is DynamoDB Accelerator (DAX)?

A

Fully managed in-memory caching for DynamoDB, providing a 10x performance improvement.

35
Q

What is Amazon RedShift?

A

PostgreSQL-based data warehousing and OLAP database with columnar storage.
It loads data once every hour, not every second.

36
Q

What is Amazon EMR used for?

A

Creates Hadoop clusters to analyze big data, machine learning, and large-scale data processing.

37
Q

What does Amazon Athena do?

A

Serverless query service for performing analytics against S3 objects using SQL.

38
Q

What is Amazon QuickSight?

A

A service to create dashboards and visualize data, integrated with RDS, Aurora, and Redshift.

39
Q

What is Amazon DocumentDB?

A

A fully managed NoSQL database compatible with MongoDB for storing and querying JSON data.

40
Q

What is Amazon Neptune?

A

A fully managed graph database, ideal for applications like social networks, highly available across 3 AZs.

41
Q

What is Amazon Timestream?

A

A fast, scalable, serverless time-series database, automatically scalable.

42
Q

What is Amazon QLDB (Quantum Ledger Database)?

A

A managed ledger database for financial transactions that provides immutable and cryptographically verifiable records.

43
Q

What is Amazon Managed Blockchain?

A

A service for building decentralized applications where multiple parties execute transactions without a central authority.

44
Q

What is AWS Glue?

A

A managed extract, transform, and load (ETL) service to prepare data for analytics.

45
Q

What is AWS DMS (Database Migration Service)?

A

A service for migrating databases while keeping the source database available, supports both homogeneous and heterogeneous migrations.

46
Q

What type of database is Amazon DynamoDB?

A

A key/value NoSQL database.

47
Q

What is the difference between RDS & Aurora vs Redshift?

A

RDS & Aurora are used for Online Transaction Processing (OLTP), while Redshift is used for data warehousing and analytics (OLAP).

48
Q

What does Amazon EMR provide?

A

Hadoop clusters for analyzing big data and machine learning.

49
Q

What type of database is Amazon Neptune?

A

A graph database.

50
Q

What type of database is Amazon Timestream?

A

A time-series database.

51
Q

What is the primary use case of Amazon QLDB?

A

For financial transactions with an immutable and verifiable ledger.