Databases Flashcards
questions to ask yourself when choosing database service
- TYPE OF WORKLOAD is your workload read-heavy or write-heavy a lot of writes or is it more of a balanced workload? What are your throughput needs? Will it change? Will it fluctuate over time or will it need to scale over time?
- SIZE OF DATA how much data do you store and for how long? Will it grow and what’s your average object size? Is it really small? is it really big? Is it average? how is your data accessed? Is there some security needs around it?
- DURABILITY, do you need your data to be there for a week or forever? Is your database going to be a source of truth for all your data sets?
- LATENCY How many concurrent users will you get?
- DATA MODEL How will you query your data? Do you query by primary key? Do you join? Is it structured?
Is it semi-structured? Do you need to search it? - SCHEMA: do we need strong schema or more flexibility? Do we need reporting, search? Do you want it to be RDBMS or NoSQL?
- LICENSE is there any license cost? Can you switch to a Cloud Native database such as Aurora
in the case of license cost say on Oracle?
database types on AWS
- RDBMS (SQL and OLTP = online transaction processing): RDS, Aurora - great for joins. any time you see data in a tabular form
- NoSQL databases: DynamoDB (~JSON), ElastiCache for key value pairs, high performance. Neptune (graphs). We can’t really do joins and there’s no SQL language to query your database so it’s more of a way to organize your data that’s going to be different
and you get performance benefits out of it. - Object Store: S3 (big objects), Glacier (archive)
- Data Warehouse (=SQL Analytics, BI): Redshift, Athena. Redshift is going to be an OLAP so online analytical processing. Athena can be used to query your data in S3 and to be considered as a data warehouse for analytics and BI purposes
- Search: ElasticSearch (JSON): search around free text, unstructured searches
RDS operations
we have a very small downtime when a failover happens, and when maintenance happens.
And when we scale reads, or EC2 instance, and EBS restore that implies manual intervention. So that means that we still have to do a few operations on our RDS databases.
And then when there is a change, maybe we have to do an application change as well.
RDS security
AWS will be responsible for OS security, for the EC2 instance security, but we are responsible for letting know to use KMS, to configure the security groups correctly, to set up the IAM policies correctly, and authorizing users in our database, and enforcing SSL encryption.
RDS reliability
there is this multi AZ feature, and it’s done automatically for us, so there will be a failover in case of failures. And that makes RDS particularly reliable. Now, if you don’t use multi AZ, then you have a risk obviously, of having some outages.
RDS performance
will depend basically on the EC2 instance type you provisioned for your RDS instance, and as well as the EBS volume type, so do you want to use a gp2 or io1?
And then if you want to scale and perform some reads, then you need to add Read Replicas. And RDS, in that case, doesn’t auto-scale. So it’s not something that’s super cloud native, it’s still something that we have to provision and scale manually, and adapt based on our workload.
RDS cost
you’re going to pay per the hour, based on the provisioned EC2 instance type, and the EBS volume.
Aurora operations
compared to RDS we get less operations and there is auto scaling of storage
so we basically don’t need to think too much about aurora, we just set up the aurora database and setup auto scaling for maybe real replicas and we are done.
We will have a database that will not need many operations.
Aurora security
same security for RDS
AWS will be responsible for OS security, for the EC2 instance security, but we are responsible for letting know to use KMS, to configure the security groups correctly, to set up the IAM policies correctly, and authorizing users in our database, and enforcing SSL encryption.
Aurora reliability
more reliable than RDS
it’s multi AZ, highly available, we have six replicas of the data
we even have Aurora Serverless option if you want to have even more reliable
Aurora performance
five times the performance due to the architectural optimization they’ve done around how they designed Aurora and we can have up to 15 read replicas verses five for RDS so seemingly we get much better perfomance and much better scaling on aurora than we do for RDS.
Aurora cost
pay per hour based on the EC2 instance type of operation and the storage usage but it is possibly a much lower cost compared to an enterprised grade database such as Oracle for even better performance.
ElastiCache operations
Operations would get the exact same as RDS
ElastiCache security
the same thing as RDS, except this time we don’t get IAM authentication to ElastiCache.
We get Redis Auth if we wanted to.
ElastiCache Reliability
Clustering feature, sharding feature, and Multi AZ.
ElastiCache performance
it’s really good for a cache
with sub-millisecond performance, in memory,
and you have read replicas for sharding.
So, it’s a very popular cache option.
So, if you see sub-millisecond performance, at the exam, in memory, think ElastiCache.
ElastiCache cost
similar pricing as RDS,
So, we’re going to pay per hour based on the EC2 instance type, that we provision and the storage usage.
Athena operations
It’s serverless so this is the holy grail in AWS, you don’t have any operations.
Athena security
IAM + S3 security, so you need to remember that there is an S3 security component in there, usually through bucket policies.
Athena Reliability
it’s a managed service and it uses Presto as an engine, which is a very high performance engine and on top of it, all the queries are done in a highly available fashion, so you’re pretty much sure that they will succeed.
Athena performance
the queries will scale based on the data size, so you expect a big data chuck to be analyzed and Athena will scale accordingly
Athena cost
you’re going to pay per query which are per terabyte of data scanned, so that means that you only pay only for the actual usage and that’s what makes it as well, a serverless offering.
S3 operations
you don’t need to do anything. It’s there, it’s available, it’s all the time here. You don’t need to provision servers,
S3 security
up to you to manage it,
- define your IAM policies correctly,
- your Bucket Policies, your ACL,
- make sure that encryption is done correctly based on your requirements on server and clients,
- and then make sure you’re using SSL for encryption in flight.