Databases Flashcards
questions to ask yourself when choosing database service
- TYPE OF WORKLOAD is your workload read-heavy or write-heavy a lot of writes or is it more of a balanced workload? What are your throughput needs? Will it change? Will it fluctuate over time or will it need to scale over time?
- SIZE OF DATA how much data do you store and for how long? Will it grow and what’s your average object size? Is it really small? is it really big? Is it average? how is your data accessed? Is there some security needs around it?
- DURABILITY, do you need your data to be there for a week or forever? Is your database going to be a source of truth for all your data sets?
- LATENCY How many concurrent users will you get?
- DATA MODEL How will you query your data? Do you query by primary key? Do you join? Is it structured?
Is it semi-structured? Do you need to search it? - SCHEMA: do we need strong schema or more flexibility? Do we need reporting, search? Do you want it to be RDBMS or NoSQL?
- LICENSE is there any license cost? Can you switch to a Cloud Native database such as Aurora
in the case of license cost say on Oracle?
database types on AWS
- RDBMS (SQL and OLTP = online transaction processing): RDS, Aurora - great for joins. any time you see data in a tabular form
- NoSQL databases: DynamoDB (~JSON), ElastiCache for key value pairs, high performance. Neptune (graphs). We can’t really do joins and there’s no SQL language to query your database so it’s more of a way to organize your data that’s going to be different
and you get performance benefits out of it. - Object Store: S3 (big objects), Glacier (archive)
- Data Warehouse (=SQL Analytics, BI): Redshift, Athena. Redshift is going to be an OLAP so online analytical processing. Athena can be used to query your data in S3 and to be considered as a data warehouse for analytics and BI purposes
- Search: ElasticSearch (JSON): search around free text, unstructured searches
RDS operations
we have a very small downtime when a failover happens, and when maintenance happens.
And when we scale reads, or EC2 instance, and EBS restore that implies manual intervention. So that means that we still have to do a few operations on our RDS databases.
And then when there is a change, maybe we have to do an application change as well.
RDS security
AWS will be responsible for OS security, for the EC2 instance security, but we are responsible for letting know to use KMS, to configure the security groups correctly, to set up the IAM policies correctly, and authorizing users in our database, and enforcing SSL encryption.
RDS reliability
there is this multi AZ feature, and it’s done automatically for us, so there will be a failover in case of failures. And that makes RDS particularly reliable. Now, if you don’t use multi AZ, then you have a risk obviously, of having some outages.
RDS performance
will depend basically on the EC2 instance type you provisioned for your RDS instance, and as well as the EBS volume type, so do you want to use a gp2 or io1?
And then if you want to scale and perform some reads, then you need to add Read Replicas. And RDS, in that case, doesn’t auto-scale. So it’s not something that’s super cloud native, it’s still something that we have to provision and scale manually, and adapt based on our workload.
RDS cost
you’re going to pay per the hour, based on the provisioned EC2 instance type, and the EBS volume.
Aurora operations
compared to RDS we get less operations and there is auto scaling of storage
so we basically don’t need to think too much about aurora, we just set up the aurora database and setup auto scaling for maybe real replicas and we are done.
We will have a database that will not need many operations.
Aurora security
same security for RDS
AWS will be responsible for OS security, for the EC2 instance security, but we are responsible for letting know to use KMS, to configure the security groups correctly, to set up the IAM policies correctly, and authorizing users in our database, and enforcing SSL encryption.
Aurora reliability
more reliable than RDS
it’s multi AZ, highly available, we have six replicas of the data
we even have Aurora Serverless option if you want to have even more reliable
Aurora performance
five times the performance due to the architectural optimization they’ve done around how they designed Aurora and we can have up to 15 read replicas verses five for RDS so seemingly we get much better perfomance and much better scaling on aurora than we do for RDS.
Aurora cost
pay per hour based on the EC2 instance type of operation and the storage usage but it is possibly a much lower cost compared to an enterprised grade database such as Oracle for even better performance.
ElastiCache operations
Operations would get the exact same as RDS
ElastiCache security
the same thing as RDS, except this time we don’t get IAM authentication to ElastiCache.
We get Redis Auth if we wanted to.
ElastiCache Reliability
Clustering feature, sharding feature, and Multi AZ.