AZ-204 Storage Flashcards

1
Q

What are the 2 types of partitions?

A

A logical partition consists of a set of items that have the same partition key. For example, in a container that contains data about food nutrition, all items contain a foodGroup property. You can use foodGroup as the partition key for the container.

A container is scaled by distributing data and throughput across physical partitions. Internally, one or more logical partitions are mapped to a single physical partition. Typically smaller containers have many logical partitions but they only require a single physical partition.

The number of physical partitions in your container depends on the following:

The number of throughput provisioned (each individual physical partition can provide a throughput of up to 10,000 request units per second). The 10,000 RU/s limit for physical partitions implies that logical partitions also have a 10,000 RU/s limit, as each logical partition is only mapped to one physical partition.

The total data storage (each individual physical partition can store up to 50GB data).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Consistency levels

A

Strong: Strong consistency offers a linearizability guarantee. Linearizability refers to serving requests concurrently. The reads are guaranteed to return the most recent committed version of an item. A client never sees an uncommitted or partial write. Users are always guaranteed to read the latest committed write.

Bounded Staleness: In bounded staleness consistency, the reads are guaranteed to honor the consistent-prefix guarantee. The reads might lag behind writes by at most “K” versions (that is, “updates”) of an item or by “T” time interval, whichever is reached first. In other words, when you choose bounded staleness, the “staleness” can be configured in two ways:

The number of versions (K) of the item
The time interval (T) reads might lag behind the writes
For a single region account, the minimum value of K and T is 10 write operations or 5 seconds. For multi-region accounts the minimum value of K and T is 100,000 write operations or 300 seconds.

Session: Strong consistency only within a user session.

Consistent Prefix: May not see latest write operation, but reads are never out of order

Assume two write operations are performed on documents Doc1 and Doc2, within transactions T1 and T2. When client does a read in any replica, the user will see either “Doc1 v1 and Doc2 v1” or “ Doc1 v2 and Doc2 v2”, but never “Doc1 v1 and Doc2 v2” or “Doc1 v2 and Doc2 v1” for the same read or query operation.

Eventual: In eventual consistency, there’s no ordering guarantee for reads. In the absence of any further writes, the replicas eventually converge.

Eventual consistency is the weakest form of consistency because a client may read the values that are older than the ones it had read before. Eventual consistency is ideal where the application does not require any ordering guarantees. Examples include count of Retweets, Likes, or non-threaded comments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Core API + Gremlin

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Features of each aspect of the Cosmos Hierarchy?

Cosmos account -> cosmos dbs ->cosmos containers-> cosmos items

A

Cosmos Account: The Azure Cosmos DB account is the fundamental unit of global distribution and high availability. Your Azure Cosmos DB account contains a unique DNS name and you can manage an account by using the Azure portal or the Azure CLI, or by using different language-specific SDKs.

Cosmos DBs: You can create one or multiple Azure Cosmos DB databases under your account. A database is analogous to a namespace. A database is the unit of management for a set of Azure Cosmos DB containers.

Cosmos containers: An Azure Cosmos DB container is the unit of scalability both for provisioned throughput and storage. A container is horizontally partitioned and then replicated across multiple regions. The items that you add to the container are automatically grouped into logical partitions, which are distributed across physical partitions, based on the partition key.

Cosmos Items: Depending on which API you use, an Azure Cosmos DB item can represent either a document in a collection, a row in a table, or a node or edge in a graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Classes + methods of blob storage API (BlobClient, BlobClientOptions, BlobContainerClient, BlobServiceClient, BlobUriBuilder)

A

BlobClient: The BlobClient allows you to manipulate Azure Storage blobs.
BlobClientOptions: Provides the client configuration options for connecting to Azure Blob Storage.
BlobContainerClient: The BlobContainerClient allows you to manipulate Azure Storage containers and their blobs.
BlobServiceClient: The BlobServiceClient allows you to manipulate Azure Storage service resources and blob containers. The storage account provides the top-level namespace for the Blob service.
BlobUriBuilder: The BlobUriBuilder class provides a convenient way to modify the contents of a Uri instance to point to different Azure Storage resources like an account, container, or blob.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Blob metadata types

A

System properties: System properties exist on each Blob storage resource. Some of them can be read or set, while others are read-only. Under the covers, some system properties correspond to certain standard HTTP headers. The Azure Storage client library for .NET maintains these properties for you.

User-defined metadata: User-defined metadata consists of one or more name-value pairs that you specify for a Blob storage resource. You can use metadata to store additional values with the resource. Metadata values are for your own purposes only, and do not affect how the resource behaves.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Blob storage access tiers

A

The Hot access tier, which is optimized for frequent access of objects in the storage account. The Hot tier has the highest storage costs, but the lowest access costs. New storage accounts are created in the hot tier by default.

The Cool access tier, which is optimized for storing large amounts of data that is infrequently accessed and stored for at least 30 days. The Cool tier has lower storage costs and higher access costs compared to the Hot tier.

The Archive tier, which is available only for individual block blobs. The archive tier is optimized for data that can tolerate several hours of retrieval latency and will remain in the Archive tier for at least 180 days. The archive tier is the most cost-effective option for storing data, but accessing that data is more expensive than accessing data in the hot or cool tiers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Transition blobs to different tiers

A

Consider a scenario where data gets frequent access during the early stages of the lifecycle, but only occasionally after two weeks. Beyond the first month, the data set is rarely accessed. In this scenario, hot storage is best during the early stages. Cool storage is most appropriate for occasional access. Archive storage is the best tier option after the data ages over a month.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a Blob storage lifecycle policy?

A

A lifecycle management policy is a collection of rules in a JSON document. Each rule definition within a policy includes a filter set and an action set. The filter set limits rule actions to a certain set of objects within a container or objects names. The action set applies the tier or delete actions to the filtered set of objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two throughput modes for Cosmos DB containers?

A

Dedicated provisioned throughput mode: The throughput provisioned on a container is exclusively reserved for that container and it is backed by the SLAs.

Shared provisioned throughput mode: These containers share the provisioned throughput with the other containers in the same database (excluding containers that have been configured with dedicated provisioned throughput). In other words, the provisioned throughput on the database is shared among all the “shared throughput” containers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When to use each Cosmos DB API?

A

Cassanda, Mongo, PostgreSQL, Table: Integrate existing DBs using these services already.

Cosmos core: The Azure Cosmos DB API for NoSQL stores data in document format. NoSQL accounts provide support for querying items using the Structured Query Language (SQL) syntax.

Gremlin: used for modelling data as graphs. - Involving dynamic data
- Involving data with complex relations
- Involving data that is too complex to be modeled with relational databases
- If you want to use the existing Gremlin ecosystem and skills

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Cosmos DB request unit?

A

The cost of all database operations is normalized by Azure Cosmos DB and is expressed by request units (or RUs, for short). A request unit represents the system resources such as CPU, IOPS, and memory that are required to perform the database operations supported by Azure Cosmos DB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the 3 Cosmos DB account modes?

A

Provisioned throughput mode: In this mode, you provision the number of RUs for your application on a per-second basis in increments of 100 RUs per second. To scale the provisioned throughput for your application, you can increase or decrease the number of RUs at any time in increments or decrements of 100 RUs. You can make your changes either programmatically or by using the Azure portal. You can provision throughput at container and database granularity level.

Serverless mode: In this mode, you don’t have to provision any throughput when creating resources in your Azure Cosmos DB account. At the end of your billing period, you get billed for the amount of request units that has been consumed by your database operations.

Autoscale mode: In this mode, you can automatically and instantly scale the throughput (RU/s) of your database or container based on its usage. This mode is well suited for mission-critical workloads that have variable or unpredictable traffic patterns, and require SLAs on high performance and scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the blob storage account recommendations?

A

Standard general purpose v2: Standard storage account type for blobs, file shares, queues, and tables. Recommended for most scenarios using Azure Storage. If you want support for NFS file shares in Azure Files, use the premium file shares account type.

Premium block blobs: Premium storage account type for block blobs and append blobs. Recommended for scenarios with high transactions rates, or scenarios that use smaller objects or require consistently low storage latency.

premium page blob: Premium storage account type for page blobs only.

premium file shares: Premium storage account type for file shares only.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the blob storage resource types?

A

The storage account: A storage account provides a unique namespace in Azure for your data. Every object that you store in Azure Storage has an address that includes your unique account name. The combination of the account name and the Azure Storage blob endpoint forms the base address for the objects in your storage account.

A container in the storage account: A container organizes a set of blobs, similar to a directory in a file system. A storage account can include an unlimited number of containers, and a container can store an unlimited number of blobs. The container name must be lowercase.

A blob in a container:
- Block blobs store text and binary data, up to about 190.7 TB. Block blobs are made up of blocks of data that can be managed individually.
- Append blobs are made up of blocks like block blobs, but are optimized for append operations. Append blobs are ideal for scenarios such as logging data from virtual machines.
- Page blobs store random access files up to 8 TB in size. Page blobs store virtual hard drive (VHD) files and serve as disks for Azure virtual machines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the considerations for picking a blob storage access tier?

A
  • The access tier can be set on a blob during or after upload.
  • Only the hot and cool access tiers can be set at the account level. The archive access tier can only be set at the blob level.
  • Data in the cool access tier has slightly lower availability, but still has high durability, retrieval latency, and throughput characteristics similar to hot data.
  • Data in the archive access tier is stored offline. The archive tier offers the lowest storage costs but also the highest access costs and latency.
  • The hot and cool tiers support all redundancy options. The archive tier supports only LRS, GRS, and RA-GRS.
  • Data storage limits are set at the account level and not per access tier. You can choose to use all of your limit in one tier or across all three tiers.
17
Q

What are the rule parameters for blob storage?

A

name: A rule name can include up to 256 alphanumeric characters. Rule name is case-sensitive. It must be unique within a policy.
enabled: An optional boolean to allow a rule to be temporary disabled. Default value is true if it’s not set.
type: The current valid type is Lifecycle.
definition: Each definition is made up of a filter set and an action set.

18
Q

What is the CLI command to add a lifecycle policy?

A

az storage account management-policy create \
–account-name <storage-account> \
--policy @policy.json \
--resource-group <resource-group></resource-group></storage-account>

19
Q

How to get blob metadata using rest?

A

The GET and HEAD operations both retrieve metadata headers for the specified container or blob. These operations return headers only; they do not return a response body. The URI syntax for retrieving metadata headers on a container is as follows:

for containers:
GET/HEAD https://myaccount.blob.core.windows.net/mycontainer?restype=container

for blobs:
GET/HEAD https://myaccount.blob.core.windows.net/mycontainer/myblob?comp=metadata

20
Q

How to modify blob metadata using SDK?

A

To set metadata, add name-value pairs to an IDictionary object, and then call one of the following methods of the BlobContainerClient class to write the values:
SetMetaData/SetMetaDataAsync

To retrieve metadata:
GetMetaData/GetMetaDataAsync

21
Q

How to set blob metadata using REST?

A

The PUT operation sets metadata headers on the specified container or blob, overwriting any existing metadata on the resource. Calling PUT without any headers on the request clears all existing metadata on the resource.

for containers:
PUT https://myaccount.blob.core.windows.net/mycontainer?comp=metadata&restype=container

for blobs:
PUT https://myaccount.blob.core.windows.net/mycontainer/myblob?comp=metadata

22
Q

What are the partition key best practices?

A

Be a property that has a value which does not change. If a property is your partition key, you can’t update that property’s value.

Have a high cardinality. In other words, the property should have a wide range of possible values.

Spread request unit (RU) consumption and data storage evenly across all logical partitions. This ensures even RU consumption and storage distribution across your physical partitions.

23
Q

What are the 2 ways of choosing a partition key?

A

-for read heavy containers one that appears frequently in filters
-Item id

24
Q

What is a synthetic key and how can you calculate them?

A

It’s the best practice to have a partition key with many distinct values, such as hundreds or thousands. The goal is to distribute your data and workload evenly across the items associated with these partition key values. If such a property doesn’t exist in your data, you can construct a synthetic partition key.

1.Concat multiple properties of an item
2.Use a partition key with a random suffix
3. Use a partition key with pre-calculated suffix

25
Q

What are containers and items in the cosmos db sdk?

A

A container can be a collection, graph, or table. An item can be a document, edge/vertex, or row, and is the content inside a container.

26
Q

What are the common/useful Db methods of the cosmos sdk?

A

Create CosmosClient:
CosmosClient client = new CosmosClient(endpoint, key);

Create db:
// An object containing relevant information about the response
DatabaseResponse databaseResponse = await client.CreateDatabaseIfNotExistsAsync(databaseId, 10000);

Read db by ID:
DatabaseResponse readResponse = await database.ReadAsync();

Delete DB:
await database.DeleteAsync();

27
Q

What are the common/useful container methods of the cosmos sdk?

A

Create container:
// Set throughput to the minimum value of 400 RU/s
ContainerResponse simpleContainer = await database.CreateContainerIfNotExistsAsync(
id: containerId,
partitionKeyPath: partitionKey,
throughput: 400);

get container by ID:
Container container = database.GetContainer(containerId);
ContainerProperties containerProperties = await container.ReadContainerAsync();

Delete Container:
await database.GetContainer(containerId).DeleteContainerAsync();

28
Q

What are the common/useful item methods of the cosmos sdk?

A

Create item:
ItemResponse<SalesOrder> response = await container.CreateItemAsync(salesOrder, new PartitionKey(salesOrder.AccountNumber));</SalesOrder>

Read item:
string id = “[id]”;
string accountNumber = “[partition-key]”;
ItemResponse<SalesOrder> response = await container.ReadItemAsync(id, new PartitionKey(accountNumber));</SalesOrder>

Query item:
QueryDefinition query = new QueryDefinition(
“select * from sales s where s.AccountNumber = @AccountInput “)
.WithParameter(“@AccountInput”, “Account1”);

FeedIterator<SalesOrder> resultSet = container.GetItemQueryIterator<SalesOrder>(
query,
requestOptions: new QueryRequestOptions()
{
PartitionKey = new PartitionKey("Account1"),
MaxItemCount = 1
});</SalesOrder></SalesOrder>

29
Q

What is the blob storage change feed?

A

The purpose of the change feed is to provide transaction logs of all the changes that occur to the blobs and the blob metadata in your storage account. The change feed provides ordered, guaranteed, durable, immutable, read-only log of these changes.

30
Q

What is a blob lease?

A

The Lease Blob operation creates and manages a lock on a blob for write and delete operations. The lock duration can be 15 to 60 seconds, or can be infinite.