Implementing Data Storage Solutions: Non-Relational Flashcards

Question

Partitioning Definitions ## Footnote 1. Partitioning 2. Partition Keys (rem. imp. fct.) 3. Logical Partition 4. Physical Partition (rem. imp. fct.) 5. Composite Key (add. term) 5. Partition Restrictions

Answer 1

**Partition**: items in a container are divided into distinct subsets called logical partitions. **Partition Key:** the value by which Azure organizes your data into logical divisions. Cannot change partition key after creation of the database or container. Should be distinctive so that data is eveny distributed across logical paritions, but not so unique that you create overly numerous partitions, impacting read & write throughput. **Logical Partitions:** subsets of your data divided by the partition key. **Physical Partitions:** the physical machines that house the different logical partitions. Logical partitions are **never** divided across multiple physical partitions. **Composite Key:** multiple unique identifiers combined to create a single partition key, further subdividing data into smaller units. **Restrictions:** 1. Each document cannot exceed 2MB 2. Each logical partition cannot exceed 20GB

Answer 2

When you define throughput at the database level: **Shared:** throughput is evenly distributed across containers (recommended) **Dedicated:** defined throughput for each container, if throughput is defined at the container level by default it will be dedicated

Answer 3

not enough RUs for the logical partition, while other logical partitions have plenty of available RUs (good practice to create partition keys that evenly distribute data across logical partitions)

Answer 4

**Single:** can identify all data from a query in a single logical partition (most efficient). **Cross:** queries has to look across multiple logical partitions to find data for query. Also called a **fan out** query.

Answer 5

columns with values that are unique or very uncommon

Answer 6

the cost to run each query against your data

Answer 7

* the time period for the data to be active before it is deleted * set the time to live value under settings * defaults to comsuming only leftover RUs - if other workloads are running, time to live data deletion will be delayed

Answer 8

**Definition** Data can be replicated globally and read from any selected region. Storage and throughput are copied into selected global region. **Paired Regions** Two geographic centers with high speed connection. Used for diaster recovery and business contunuity purposes. **Multi-Region Write (Multi-Master Write)** User from two seperate regions (ex. Japan & US) update data at the same time. Options: * last write wins (must define, ex. time stamp) * merge procedure (define the procedure) * merge procedure ( don't define) * actions are stored and manually define the stored procedure later

Answer 9

applies when there is only one write enabled center **Manual:** user chooses the next write enabled center **Automatic:** decide prior to natural diaster replication will automatically occur in either scenario as long as a global backup center has been identified

Answer 10

In general, there is a trade-off between consistency and availability. **Strong:** always read most up-to-date data, no dirty reads. High latency, highest cost. * Even user who writes data cannot see changes until they are committed and synchronized. **Bounded Staleness:** dirty reads are only possible within a bounded timeframe. **Session:** within a session no dirty reads. Once session ends dirty reads possible. No dirty reads for writers in the same session, however dirty reads possible for other users. * Session refers to the user's session, time on the computer. * The user can read in the value he/she writes within that session. Only the same user within the same session is guarenteed to read the same value written within a single session. **Consistency Prefix:** dirty reads are possible but never seen out-of-order for updates. Data is always read in order althought it may not be the most recent data. **Eventual:** automatically respond to request, so dirty reads are possible and those read may be out of order. Evenetually everything will be updated to the correct data.

Answer 11

clients can set consistency levels to a lower level at connection time for each request (**Strong** is the highest consistency level)

Answer 12

**Azure Storage** * how to provision an account * replication options (LRS, GRS, ZRS, GZRS, RA-GRS, RA-GZRS) * blob storage **Data Lake** * evolution from Blob & distinctions * security options **Cosmos DB (largest area of this portion)** * features * multi-model * consistency levels * databases & containers * throughput & request * partitioning & horizontal scaling * global distribution * multi-master write * failover * time to live * CLI (code to create an account) * security * pricing

Answer 13

**RBAC:** access controls based on users and groups of users. Also refered to as Identity & Access management. Does not automatically expire, needs to be manually revoked. **Roles:** 1. Owner: full access and grant others access 2. Account Reader: read access only 3. Backup Operator: restore the system 4. Contributor: read, write, delete, account management 5. Operator: provisions databases, accounts, but no creation of access keys

Answer 14

white list domain names that are allowed to make request to Cosmos DB

Answer 15

provide access to all of the administrative resources; primary and secondary access keys that can be utilized to grant different read and write access; can be utilized to change keys to limit unauthorized access to the systems;

Answer 16

Data Lake is a combination of Blob Storage and Hadoop HDFS stored in the cloud. It is optimized for big data analytics by handling the need for increased processing speed and wide variety of data types. **Blob v. Data Lake Similarities** * available in every region * local and global redundancy **Data Lake Specific Features** * optimized for big data analytics * allows for hierarchical namespace * supports multiple integrations * compatible with Hadoop **Blob Specific Features** * more features than Data Lake, general purpose data storage * processing performance limits

Answer 17

* all accounts have 2 storage keys (gives all administrative access) switching keys * client applications uses one key * shift applications to key two * then regenerated to create a new key one * shift applications to key one * then regenerated to create a new key two

Answer 18

it creates users and user groups, feeds information into the role based access controls

Answer 19

* gives users the minimum set of permissions needed to perform their task * autmatically expires after 2 months **How is it set up?** 1. specify permissions and the range of time for access to be granted 2. identify the IP adress you wnat to grant access to 3. identify the account key to utilize

Answer 20

* sets up permissions for files and folders * service principal: defines the access policy and permissions for users/apps within an instance within an active directory * 3 types of service principals: * application - permissions @ app level * managed identity - automated credentials management * legacy: app. created before registration

Answer 21

a network security device that monitors incoming and outgoing traffic, deciding whether to allow or block the specific traffic

Answer 22

**RPO:** measurement of how frequently backups occur **RTO:** the amount of downtime a business can tolerate

Answer 23

* backups are completed automated with no RU cost * defaults setting (can be changed): * inteveral: backups every 4 hours * retention: available for 8 hours * maximum 2 backups * backup for Cosmos DB are stored seperately in Blob storage * initially store in the same region (lower latency) then in paired region

Answer 24

* solution that does not restrict attributes to a specific vendor/customer/entity * different types of products can have different attribute * some products can have different columns populated * not all columns in the existing database are used

Answer 25

**Table:** Language Integreated Query (LINQ) **MongoDB:** javascript **Core SQL:** SQL **Cansandra:** Cansandra Query Language (CQL) **Gremlin:** graph traversal language (Apache TinkerPop)

Answer 26

**Elastic Pools** * solution for managing and scaling multiple databases * databases in a pool are on a single server and share a set number of resources at a set price * used when you have to provision databases for different customer groups, where the source data comes from a similar source (remember data warehouse architecture) **Data Sync & Sync Groups** * sync group: group of databases to synchronize (hub & spoke) * used for data needs to be kepy updated across several databases * not the prefered strategy for diaster recovery **Elastic Jobs** * automate tasks on a set a Azure SQL servers or SQL databases * task that needs to run regulary on a schedule, or run * administrative task, mainainence, and scheudled transaction queries

Answer 27

* orchestration platform to move data across different data stores via data pipelines, can run scheduled pipelines but not suited for administrative task * orchestration is the automated configuration, management, and coordination of computer systems, apps, and services

Answer 28

* provide granular access to Comos DB while limiting access access to administrative tasks * safe alternative to a master key

Answer 29

full administrative acess to storage accounts

Answer 30

**File Premissions Hierarchy (order matters)** 1. Owner 2. Owner Group 3. Everyone Else **POSIX** 1. Read Only: 4 2. Write Only: 2 3. Execute Only: 1 4. No Access: 0 5. Read + Write: 4 + 2 = 6 6. Read + Execute: 4 + 1 = 5

Implementing Data Storage Solutions: Non-Relational Flashcards

(55 cards)