Section 3 - CosmosDB Essentials Flashcards
What are the Advantages of a Globally Distributed DB?
When we talk about global data distribution with Cosmos DB, you’re talking about the availability to utilize multiple data centers all around the globe and have that data be replicated to all of those data centers.
This allows me to access the data center that is closest to my geographical location and greatly increases the speed of the data.
Describe the MongoDB Architecture
- A web source or Azure function could kick off a request for data.
- That data would then generally pass through a traffic manager.
- The traffic manager is important because it is responsible for understanding where you are and then routing you to the fastest database location.
- Within each one of these regions, you are going to have an application gateway, a web tier, a load balancer, and a middle tier.
- This region then ties in to Cosmos DB instance, which replicates information across all of the regions.
- From Cosmos DB, another action might be performed. So we might have another Azure function that’s kicked off, or data might run through Databricks for transformation, or information might be sent to Power BI and a report generated.
Summary Slide
What is CAP Theorem?
The cap theorem essentially states that there are three variables:
- Consistency
- Availability
- Partition tolerance.
Consistency means that every node contains the same data at the same time.
Availability says that each one of those nodes must be available to serve data at all times. Availability says that at least one node must be available to serve data at all times.
Partition tolerance basically says that the failure of the system is extremely rare.
So it’s impossible for every node to contain the same data at the same time and to have nodes that are always available to serve data.
Because of this, we have to strike a balance.
What are the 5 Levels of Consistency?
Strong
Bounded Staleness
Session
Consistent Prefix
Eventual
What is the Most Used Consistency Level?
Session Consistency
It provides write latencies, availability and read throughput comparable to that of Eventual Consistency but also provides the consistency guarantees that suit the needs of applications written to operate in the context of a user.
What is Strong Consistency?
Strong Consistency - data is replicated across all of the nodes almost simultaneously, but the availability of those nodes suffers. This means that the reads are guaranteed to see the most recent write,
so they’re all going in order across all of the regions.
More Detail
strong consistency says we’re going to play a note. Every time we play a note or we write to Cosmos DB, before we write another note, we’re going to stop, and we’re going to make sure that every single region has read that database. Okay. It has read that note. So we play a note,
and we make sure that it’s carried across all of these different regions.
And we make sure that all of them have read that note.
Once that happens, then we move on to the next note, and we play that,
and then so on and so forth, all the way up the scale. So, that’s very strong consistency.
Everybody has the same view of the data.
In order to get that, we have two things that have to happen. One,
we’re going to have higher costs because we have to make sure that everything has been read, okay, all the way across.
And it’s higher-cost because if we don’t increase the performance of Cosmos DB, our availability is going to be terrible because,
if we have a note played and then we have to stop on a super low system
and try and process all of these carryovers, you’re going to have very low availability because you can’t play the next note until the first one’s been carried. Okay.
What is Eventual Consistency?
Eventual Consistency - much more available,
but the data isn’t replicated as often. Eventually they will all be replicated and reads and writes will match.
So basically Microsoft says that, in the absence of further rights,
the replicas within the group will eventually converge.
More Detail
In eventual what’s going to happen is we’re going to start playing notes.
And eventually all of them will carry through to these different regions, but it may take a while. So in this one, we have availability, but we have very low consistency.
So this is going to be a much cheaper option, and you’re going to have a more available system. However, you’re not going to have consistency.
Why is CosmosDB Reliable?
Because the data is replicated across many regions,
it’s very easy to offer a high SLA within each region.
Each partition is protected with all rights replicated and distributed across as
many as 10 to 20 fail domains.
So if your Cosmos DB account is distributed in N regions,
there would be N times 4 copies of your data.
So an example of this would be –
let’s say that I’m using the East and West coast U.S. Regions.
So I would have 2 times 4, or 8, copies of my data.
How Does CosmosDB Scale?
Cosmos DB is a very elastic system that uses horizontal partitioning for
scalability.
You can scale from thousands to hundreds of millions of requests around the
globe and only pay for the throughput and storage you need.
Describe SQL API
- SQL API is the default API for Cosmos.
- This is the document database. Document database is a non-relational database that’s designed to store
- documents that have no fixed structure. For example, we could have our own key column and then a document column, which contains JSON.
- Using the SQL API allows us to use a relational database query language in a non-relational data store.
- So the highlight for SQL API is new projects being created from scratch and a need for a document database store.
Define the Gremlin API
- Gremlin API is your graph-based choice
- Graph databases use nodes and edges
- The graph database is great for defining relationships
Define Cassandra API
- Cassandra is our second API choice and Cassandra uses a partitioned row store.
- The primary use for Cassandra would be if your team is already using Cassandra in another application.
Define MongoDB API
- MongoDB also stores data in the document format, similar to SQL API
- MongoDB would be much like Cassandra in that this would be the primary solution if you’re already using MongoDB and just need to migrate
Describe the Azure Table API
Azure Table API is the go-to solution to provide support for applications that are written in Azure Table Storage and need high availability, elasticity, and global distribution.