Databases Flashcards

1
Q

questions to ask yourself when choosing database service

A
  1. TYPE OF WORKLOAD is your workload read-heavy or write-heavy a lot of writes or is it more of a balanced workload? What are your throughput needs? Will it change? Will it fluctuate over time or will it need to scale over time?
  2. SIZE OF DATA how much data do you store and for how long? Will it grow and what’s your average object size? Is it really small? is it really big? Is it average? how is your data accessed? Is there some security needs around it?
  3. DURABILITY, do you need your data to be there for a week or forever? Is your database going to be a source of truth for all your data sets?
  4. LATENCY How many concurrent users will you get?
  5. DATA MODEL How will you query your data? Do you query by primary key? Do you join? Is it structured?
    Is it semi-structured? Do you need to search it?
  6. SCHEMA: do we need strong schema or more flexibility? Do we need reporting, search? Do you want it to be RDBMS or NoSQL?
  7. LICENSE is there any license cost? Can you switch to a Cloud Native database such as Aurora
    in the case of license cost say on Oracle?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

database types on AWS

A
  1. RDBMS (SQL and OLTP = online transaction processing): RDS, Aurora - great for joins. any time you see data in a tabular form
  2. NoSQL databases: DynamoDB (~JSON), ElastiCache for key value pairs, high performance. Neptune (graphs). We can’t really do joins and there’s no SQL language to query your database so it’s more of a way to organize your data that’s going to be different
    and you get performance benefits out of it.
  3. Object Store: S3 (big objects), Glacier (archive)
  4. Data Warehouse (=SQL Analytics, BI): Redshift, Athena. Redshift is going to be an OLAP so online analytical processing. Athena can be used to query your data in S3 and to be considered as a data warehouse for analytics and BI purposes
  5. Search: ElasticSearch (JSON): search around free text, unstructured searches
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

RDS operations

A

we have a very small downtime when a failover happens, and when maintenance happens.

And when we scale reads, or EC2 instance, and EBS restore that implies manual intervention. So that means that we still have to do a few operations on our RDS databases.

And then when there is a change, maybe we have to do an application change as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

RDS security

A

AWS will be responsible for OS security, for the EC2 instance security, but we are responsible for letting know to use KMS, to configure the security groups correctly, to set up the IAM policies correctly, and authorizing users in our database, and enforcing SSL encryption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

RDS reliability

A

there is this multi AZ feature, and it’s done automatically for us, so there will be a failover in case of failures. And that makes RDS particularly reliable. Now, if you don’t use multi AZ, then you have a risk obviously, of having some outages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

RDS performance

A

will depend basically on the EC2 instance type you provisioned for your RDS instance, and as well as the EBS volume type, so do you want to use a gp2 or io1?

And then if you want to scale and perform some reads, then you need to add Read Replicas. And RDS, in that case, doesn’t auto-scale. So it’s not something that’s super cloud native, it’s still something that we have to provision and scale manually, and adapt based on our workload.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

RDS cost

A

you’re going to pay per the hour, based on the provisioned EC2 instance type, and the EBS volume.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Aurora operations

A

compared to RDS we get less operations and there is auto scaling of storage

so we basically don’t need to think too much about aurora, we just set up the aurora database and setup auto scaling for maybe real replicas and we are done.

We will have a database that will not need many operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Aurora security

A

same security for RDS

AWS will be responsible for OS security, for the EC2 instance security, but we are responsible for letting know to use KMS, to configure the security groups correctly, to set up the IAM policies correctly, and authorizing users in our database, and enforcing SSL encryption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Aurora reliability

A

more reliable than RDS

it’s multi AZ, highly available, we have six replicas of the data

we even have Aurora Serverless option if you want to have even more reliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Aurora performance

A

five times the performance due to the architectural optimization they’ve done around how they designed Aurora and we can have up to 15 read replicas verses five for RDS so seemingly we get much better perfomance and much better scaling on aurora than we do for RDS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Aurora cost

A

pay per hour based on the EC2 instance type of operation and the storage usage but it is possibly a much lower cost compared to an enterprised grade database such as Oracle for even better performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ElastiCache operations

A

Operations would get the exact same as RDS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

ElastiCache security

A

the same thing as RDS, except this time we don’t get IAM authentication to ElastiCache.

We get Redis Auth if we wanted to.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

ElastiCache Reliability

A

Clustering feature, sharding feature, and Multi AZ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

ElastiCache performance

A

it’s really good for a cache

with sub-millisecond performance, in memory,
and you have read replicas for sharding.

So, it’s a very popular cache option.

So, if you see sub-millisecond performance, at the exam, in memory, think ElastiCache.

17
Q

ElastiCache cost

A

similar pricing as RDS,

So, we’re going to pay per hour based on the EC2 instance type, that we provision and the storage usage.

18
Q

Athena operations

A

It’s serverless so this is the holy grail in AWS, you don’t have any operations.

19
Q

Athena security

A

IAM + S3 security, so you need to remember that there is an S3 security component in there, usually through bucket policies.

20
Q

Athena Reliability

A

it’s a managed service and it uses Presto as an engine, which is a very high performance engine and on top of it, all the queries are done in a highly available fashion, so you’re pretty much sure that they will succeed.

21
Q

Athena performance

A

the queries will scale based on the data size, so you expect a big data chuck to be analyzed and Athena will scale accordingly

22
Q

Athena cost

A

you’re going to pay per query which are per terabyte of data scanned, so that means that you only pay only for the actual usage and that’s what makes it as well, a serverless offering.

23
Q

S3 operations

A

you don’t need to do anything. It’s there, it’s available, it’s all the time here. You don’t need to provision servers,

24
Q

S3 security

A

up to you to manage it,

  1. define your IAM policies correctly,
  2. your Bucket Policies, your ACL,
  3. make sure that encryption is done correctly based on your requirements on server and clients,
  4. and then make sure you’re using SSL for encryption in flight.
25
Q

S3 Reliability

A

huge, so we have 99.999999 durability and 99.99 availability, so its makes it a really reliable store for your data.

You also have Multi AZ, so by default all you did is replicated across Multi AZ,

and you get CRR for Cross Region Replication, if you wanted to put all your Bucket contents into another region just in case.

26
Q

S3 Performance

A

amazing, you can scale to thousands of reads and writes per second. You can get transfer acceleration

if you use CloudFront, and you use multi-part for big files to make sure they’re reliably put into S3.

27
Q

S3 cost

A

you’re going to only pay for the storage you actually use, so you don’t need to think about how much storage you want to provision. That makes S3 an infinite storage store,

you’re only going to pay as well for network cost,
so the bandwidth to transfer and retrieve the data,

and then finally, if you do a lot of requests
on S3, you’re going to get billed for that as well.

28
Q

Which database helps you store data in a relational format, with SQL language compatibility and capability of processing transactions?

A

RDS

29
Q

Which database do you suggest to have caching capability with a Redis compatible API?

A

ElastiCache can create a Redis cache or a Memcached cache

30
Q

You are looking to perform OLTP, and would like to have the underlying storage with the maximum amount of replication and auto-scaling capability. What do you recommend?

A

Aurora

31
Q

As a solution architect, you plan on creating a social media website where users can be friends with each other, and like each other’s posts. You plan on performing some complicated queries such as “What are the number of likes on the posts that have been posted by the friends of Mike?”. What database do you suggest?

A

Neptune

This is AWS’ managed graph database

32
Q

You would like to store big objects of 100 MB into a reliable and durable Key Value store. What do you recommend?

A

S3 is indeed a key value store! (where the key is the full path of the object in the bucket)

33
Q

You would like to have a database which is efficient at performing analytical queries on large sets of columnar data. You would like to connect that Data Warehouse to a reporting and dashboard tool such as Amazon Quicksight. Which technology do you recommend?

A

Redshift

34
Q

Your log data is currently stored in S3 and you would like to perform a quick analysis if possible serverless to filter the logs and find a user which may have completed an unauthorized action. Which technology do you recommend?

A

Athena

35
Q

Your gaming website is currently running on top of DynamoDB. Users have been asking for a search feature to find other gamers by name, with partial matches if possible. Which technology do you recommend to implement that feature?

A

Anytime you see “search”, think ElasticSearch