Explore Cosmos DB Flashcards
What is cosmos DB?
- Fully managed NOSQL DB designed to provide low latency, elastic scalability of throughput, well-defined semantics for data consistency and high availability
What are the benefits of cosmos DB global distribution?
- can achieve low latency by placing in the region closest to users
- can add or remove regions associated with an account at any time
- the app doesn’t need to be paused or redeployed to add or remove region
- every region supports reads and writes with 99.999% availability
- Guaranteed reads and writes served in less than 10ms at the 99th %tile
- CosmosDB internally handles the data replication between regions with consistency-level guarantees
- if one region goes down the others will pick up the load
What is a cosmos DB account?
- Fundamental unit of global distribution and high availability
- contains unique DNS name
- managed via portal, CLI or SDKS
- can add or remove regions to your account at any time
- can create 50 accounts under a subscription
What is a cosmosDB container?
- fundamental unit of scalability
- you can virtually have unlimited provisioned throughput (RU/s) and storage on a container
- ComosDB transparently partitions container using the logical partition key that you specify to scale your provsioned throughput and storage elastically
- container is a schema agnostic container of items
What is the cosmos DB hierarchy?
- Accounts -> databases -> containers -> items (stored procs, functions, triggers etc)
What is the defintion of a cosmosDB?
- unit of management for a set of azure cosmos DB containers
How is a container partitioned?
- Horizontally partitioned and then replicated across multiple regions
- items you add to it are auto grouped into logical partitions which are distributed across physical partitions based on a partition key
- throughput is evenly distributed across physical partitions
How is throughput on a container configured?
- Dedicated provisioned throughput mode; the throughput provisioned on a container is exclusively reserved for the container and its backed by the SLAs
- shared provisioned throughput mode; containers share the throughput with other containers in the same DB
What is a cosmosDB item?
- depending on which API you use an item can be a doc in a collection, a row in a table or a node or edge in a graph
- can have arbitrary schemas
- by default all items that you add to a container are automatically indexed without requiring explicit index or schema management
How does cosmosDB approach data consistancy?
- As a spectrum of sources
- strong consistency and eventual consistency are at the ends of the spectrum
- the further away from strong you are the higher availability, lower latency and higher throughput you will have
- region agnostic
- CosmosDB guarantees that 100% of read requests meet the consistency guarantee for the consistency level chosen
What are the levels of data consistency?
- strong
- bounded staleness
- session
- consistent prefix
- eventual
How can consistency modals be used?
- each one can be used for specific real-world scenarios
- you can configure detail consistency level on your azure cosmos DB account at any time
- it applies to all cosmos DB databases and containers under that account
What is the strong consistency level of data consistency?
- offers linearizability (serving requests concurrently) guarantee
- reads are guaranteed to return the most recent committed version of an item
- client never sees an uncommitted or partial write
What is the bounded stalness level of data consistency?
- reads are guaranteed to honour the consistent-prefix guarantee
- might lag behind writes by at most X versions (updates) of an item by Y time interval, whichever is reached first
- X and Y are staleness
- for single region min value of x and y is 10 write operations or 5 seconds
- for multi region min values are 100k and 300s
what is the session consistancy level of data consistency?
- within a single client session reads are guaranteed to honour the consistent-prefix, monotonic reads, monotonic writes, read-your-writes, and write-follows-reads guarantees
- assumes a single writer session or sharing the session token for multiple writers
What is the consistent-prefix level of data consistency?
- updates made as single doc writes see eventual consistency
- updates made as a batch within a transaction are returned consistent to the transaction in which they were committed
- write operations within a transaction of multiple docs are visible together
- assume 2 write operations performed on DOC1 and DOC2 by transactions T1 and T2
- when the client reads they will see either DOC1 v1 and DOC2 v1 or DOC1 v2 and DOC2 v2 never DOC1 v2 and DOC2 v2
What is the eventual level of data consistency?
- no ordering guarantees for reads
- replicas eventually converge
- weakest form as client may read the values that are older than the ones it read before
- ideal when app doesn’t require ordering guarantees
- e.g. retweets, likes or nonthreaded comments
What APIs does cosmos DB offer?
- NoSQL
- MongoDB
- PostgreSQL
- Apache cassandra
- Table
- Apache Gremlin
What are the benefits of cosmos DB offering multiple APIs?
- allows us to modal real-world data using docs, key/value, graph and column data models
- allows apps to treat cosmosDB as if it were various other DB technologies without the overhead of management and scaling approaches
Which of the APIs ar native to cosmosDB
- API for NoSQL
- The rest implement the wire protocol of open source DB engines best suited for
– if you have existing apps using those technologies
– you don’t want to rewrite your entire data access layer
– you want to use open-source dev ecosystem
What does the API for NoSQL provide?
- stores data in doc format
- best end-to-end experience as we have full control over interface, service and SDKs
- any new features rolled out for cosmosDB are available here first
What does the API for mongoDB provide?
- stores data in doc BISON format
- doesn’t use any native mongoDB related code
- combines mongoDB ecosystem with cosmosDB features
What does the API for postgreSQL provide?
- managed service for running postgreSQL at any scale
- stores data on single node or distributed in multi node config
What does API for cassandra provide?
- column orientated
- highly orientated, horizontally scaling approach to storing large volumes of data while offering flexible approach to column orientated schema
What does API for gremlin provide?
- allows users to make graph queries and stores data as edge and vertices
- useful for scenarios; involving dynamic data, data with complex relations, desire to use Gremlin
- What does API for table provide?
- key/value format
- if you use azure table storage you may see some limitations in latency, scaling and throughput
- this API overcomes these issues
What do we pay for in cosmosDB?
- throughput you provision and the storage you consume hourly
- throughput must be provisioned to ensure sufficient system resources are available for your DB at all times
What is a Request Unit (RU)
- with cosmosDB operations is normalised and expressed as an RU
- represent the system resources such as CPU, IOPS and memory that are required to perform the DB operations supported by cosmos DB
How much do operations cost in regards of RUs?
- the cost to do a point read (fetching a single item by ID and partition key value) for a 1KB item is 1RU
- all other operations are assigned a cost using RUs
- other CRUD operations (except READ) have variable number of RUs depending on the complexity of the operation
What are the modes we can create a cosmos DB account in?
- provisioned throughput mode
- serverless mode
- autoscale mode
What is the provisioned throughput mode for a cosmos DB account?
- you provision the number of Rus for you app on a per second basis in increments of 100 RUs per second
- you can increase or decrease RUs at any time to scale the provisioned throughput for app
- can make change programmatically or via portal
What is the serverless mode for a cosmos DB account?
- you dont have to provision throughput when creating resources in cosmos account
- at end of billing period you get billed for number of RUs that have been used
What is the autoscale mode for a cosmos DB account?
- you can auto and instantly scale the throughput RUs of your DB or container based on usage
- doesn’t affect availability, latency, throughput or performance of the workload
- well suited for mission-critical workloads that have variable or unpredictable traffic patterns and require SLAs on a high performance and scale