Storage & Databases Flashcards

Question

What is the difference between stopping and suspending an instance?

Answer 1

Please have a look at the documentation [_Suspending and resuming an instance_](https://cloud.google.com/compute/docs/instances/suspend-resume-instance): > **Suspending an instance differs from stopping an instance in the following ways**: > > * Suspended instances preserve the guest OS memory, device state, and application state. > * **Google charges for the storage necessary to save instance memory.** > * You can only suspend an instance for up to 60 days. After 60 days, the instance is automatically moved to the TERMINATED state. and at the article [_Stopping and starting an instance_](https://cloud.google.com/compute/docs/instances/stop-start-instance#billing):

Answer 2

* PROVISIONING: resources are allocated for the VM. The VM is not running yet. * STAGING: resources are acquired, and the VM is preparing for first boot. * RUNNING: the VM is booting up or running. * STOPPING: the VM is being stopped. You requested a stop, or a failure occurred. This is a temporary status after which the VM enters the TERMINATED status. * REPAIRING: the VM is being repaired. Repairing occurs when the VM encounters an internal error or the underlying machine is unavailable due to maintenance. During this time, the VM is unusable. If repair succeeds, the VM returns to one of the above states. * TERMINATED: the VM is stopped. You stopped the VM, or the VM encountered a failure. You can [restart](https://cloud.google.com/compute/docs/instances/stop-start-instance#starting_a_stopped_instance) or [delete](https://cloud.google.com/compute/docs/instances/deleting-instance) the VM. * SUSPENDING: The VM is in the process of being suspended. You [suspended](https://cloud.google.com/compute/docs/instances/suspend-resume-instance) the VM. * SUSPENDED: The VM is in a suspended state. You can [resume](https://cloud.google.com/compute/docs/instances/suspend-resume-instance#resume) the VM or [delete](https://cloud.google.com/compute/docs/instances/deleting-instance) it.

Answer 3

You might want to stop a VM for several reasons: * You no longer need the VM but want the resources that are attached to the VM—such as its internal IPs, MAC address, and persistent disk. * You don't need to preserve the guest OS memory, device state, or application state. * You want to change certain properties of the VM that require you to first stop the VM.

Answer 4

You might want to suspend a VM for the following reasons: * You want to stop paying for the core and memory costs of running a VM and pay the comparatively cheaper cost of storage to preserve the state of your VM instead. * You don't need the VM at this time but want to be able to bring it back up quickly with its OS and application state where you left it. You can [resume a suspended VM](https://cloud.google.com/compute/docs/instances/suspend-resume-instance) when you need to use it again.

Answer 5

First, consider data replication requirements. Second, consider that GCP offers replication across availability zones, even if you are in the same region. Third, if storing in a single region poses a risk for disaster recovery, you should consider multiregional replication.

Answer 6

Persistent disks do not directly attach to a server. Rather, they attach to the server hosting the network-accessible virtual machine. With a VM, if you attach a disk locally and then shut down, data stored on a persistent disk is lost when a virtual machine is terminated. However, the data on the disk itself remains when an instance is terminated.

Answer 7

Solid-state drive (SSD) and hard disk drive (HDD). You select an SSD when you require high throughput and consistent performance across an environment. HDDs have longer latencies and cost less. An HDD is the preferred choice for large data ingest, when you are performing a batch operation, and you require less sensitivity to data variability.

Answer 8

First, if you mount a persistent disk on multiple virtual machines, it provides multistorage capacity. Second, snapshots, when leveraging persistent disks, can be created quickly, supporting quick virtual machine distribution. If you intend to use a snapshot mounted to a single virtual machine instance, read/write operations are often permissible.

Answer 9

If you are looking for storage that can hold user session data, maintain short-lived web and mobile applications data, or handle gaming data at speed and scale, Cloud Memorystore is the storage option to consider. *Cloud Memorystore* is a managed Redis service, which is an open source cache solution. Memorystore offers a fully managed in-memory data store with features such as scalability, a well-built security posture, and high availability, all managed by Google.

Answer 10

: * **Redis** In-memory data structure store that can be used as a database, cache, and message broker * **Memcached** In-memory key-value store intended exclusively for caching data

Answer 11

Object storage is a strategy to manage and manipulate data storage as a distinct unit, called an object. Each object can be stored in a single storage unit instead of being embedded into files or folders.

Answer 12

object, relational, and nonrelational. The database platforms vary in size, scale, and capability. Nonrelational databases consist of platforms that support NoSQL as well as alternative solutions developed by Google, such as Cloud Firestore and Firebase. These two platforms are mobile NoSQL solutions.

Answer 13

Backing Up a Database Backups can be created at any time with GCP. For example, if you are about to complete a risky task, you’ll want to back up your database or storage system. For these occasions, you can utilize on-demand backups, as you do not have to wait for the backup window to arrive to create a copy. Unlike automated backups, on-demand backups do not automatically get deleted. Instead, you need to delete the backups. Failing to delete them yourself results in a hefty billing charge.

Answer 14

Dataproc is Google’s managed Apache Spark and Hadoop service. Like BigQuery, Dataproc is designed for big data applications. You should be aware that Spark is intended for analysis and machine learning, whereas Hadoop is appropriate for batching data, with emphasis on big data applications. For the exam, you need to be familiar with creating Dataproc clusters and storage facilities as well as know how to submit jobs that run in those clusters. ![]()

Answer 15

**A dataset is contained within a specific project**. Datasets are top-level containers that are used to organize and control access to your tables and views. A table or view must belong to a dataset, so you need to create at least one dataset before loading data into BigQuery.

Answer 16

**A dataset is contained within a specific project**. Datasets are top-level containers that are used to organize and control access to your tables and views. A table or view must belong to a dataset, so you need to create at least one dataset before loading data into BigQuery.

Answer 17

**Cloud Spanner** Cloud Spanner is a good option when you plan to use large amounts of data (more than 10TB) and need transactional consistency. It is also good if you want to use sharding for higher throughput and accessibility. If you know or think that you might eventually need to be able to horizontally scale your Google Cloud database, Cloud Scanner is a better option than Cloud SQL. If you start with Cloud SQL and need to eventually move to Cloud Spanner, be prepared to re-write your application in addition to migrating your database.

Answer 18

**Cloud Firestore/Datastore** Cloud Firestore or Datastore are good options when you plan to focus on app development and need live synchronization and offline support. If you need to store unstructured data in JSON documents, Cloud Datastore is the recommended option. This is in comparison to if you need to store structured data, in which case Cloud Spanner is recommended. An additional factor to consider is whether you need atomicity, consistency, isolation, durability (ACID) compliance. If so, you need to choose Cloud Spanner since Cloud Datastore only offers atomic and durable transactions.

Answer 19

Cloud Bigtable is a good option if you are using large amounts of single key data. In particular, it is good for low-latency, high throughput workloads. If you need to perform single-region analytics, Cloud Bigtable is preferred over Cloud Spanner. However, if you need multi-regional operations, Cloud Spanner is the recommended solution. For example, Cloud Bigtable is a good option for a time series app created for DevOps monitoring. Meanwhile, Cloud Spanner is the recommended option for an infrastructure monitoring platform designed for software as a service (SaaS) offering.

Answer 20

* Document databases: Store information as documents (in formats such as JSON and XML). For example: Firestore * Key-value stores: Group associated data in collections with records that are identified with unique keys for easy retrieval. Key-value stores have just enough structure to mirror the value of relational databases while still preserving the benefits of NoSQL. For example: Bigtable, Memorystore * In-memory database: Purpose-built database that relies primarily on memory for data storage. These are designed to attain minimal response time by eliminating the need to access disks. They are ideal for applications that require microsecond response times and can have large spikes in traffic. For example: Memorystore * Wide-column databases: Use the tabular format but allow a wide variance in how data is named and formatted in each row, even in the same table. They have some basic structure while preserving a lot of flexibility. For example: Bigtable * Graph databases: Use graph structures to define the relationships between stored data points; useful for identifying patterns in unstructured and semi-structured information. For example: JanusGraph

Answer 21

* A node is a measure of compute in Spanner. * Node servers serve the read and write/commit transaction requests, but they don’t store the data. * Each node is replicated across three zones in the region. * The database storage is also replicated across the three zones. * Nodes in a zone are responsible for reading and writing to the storage in their zone. * The data is stored in Google’s underlying Colossus distributed replicated file system. * This provides huge advantages when it comes to redistributing load, as the data is not linked to individual nodes. * If a node or a zone fails, the database remains available, being served by the remaining nodes. * No manual intervention is needed to maintain availability.

Answer 22

Each table in the database is stored sorted by primary key. Tables are divided by ranges of the primary key and these divisions are known as splits. Each split is managed completely independently by different Spanner nodes. The number of splits for a table varies according to the amount of data: empty tables have only a single split. The splits are rebalanced dynamically depending on the amount of data and the load (dynamic resharding). But remember that the table and nodes are replicated across three zones, how does that work?

Answer 23

Everything is replicated across the three zones - the same goes for split management. Split replicas are associated with a group (Paxos) that spans zones. Using Paxos consensus protocols, one of the zones is determined to be a leader. The leader is responsible for managing write transactions for that split, while the other replicas can be used for reads. If a leader fails, the consensus is redetermined and a new leader may be chosen. For different splits, different zones can become leaders, thus distributing the leadership roles among all the Cloud Spanner compute nodes. Nodes will likely be both leaders for some splits and replicas for others. Using this distributed mechanism of splits, leaders, and replicas, Cloud Spanner achieves both high availability and scalability.

Answer 24

TrueTime is essential to make Spanner work as well as it does...so, what is it, and how does it help? TrueTime is a way to synchronize clocks in all machines across multiple datacenters. The system uses a combination of GPS and atomic clocks, each correcting for the failure modes of the other. Combining the two sources (using multiple redundancy, of course) gives an accurate source of time for all Google applications.

Answer 25

Cloud Bigtable is a fully managed wide-column NoSQL database that scales to petabyte-scale. It's optimized for low latency, large numbers of reads and writes, and maintaining performance at scale. It offers really low latency of the order of single-digit milliseconds. It is an ideal data source for time series and MapReduce-style operations. Bigtable supports the open-source HBase API standard to easily integrate with the Apache ecosystem including HBase, Beam, Hadoop and Spark. It also integrates with Google Cloud ecosystem including Memorystore, BigQuery, Dataproc, Dataflow and more.

Answer 26

How BIG is Bigtable? Bigtable has nearly 10 Exabytes of data under management. It delivers highly predictable performance that is linearly scalable. Throughput can be adjusted by adding/removing nodes -- each node provides up to [_10,000 operations per second_](https://cloud.google.com/bigtable/docs/performance#typical-workloads) (read and write). You can use Bigtable as the storage engine for large-scale, low-latency applications as well as throughput-intensive data processing and analytics. It offers high availability with an SLA of 99.9% for zonal instances. It’s strongly consistent in a single cluster; replication between clusters adds eventual consistency. If you leverage Bigtable’s multi cluster routing across two clusters, the SLA increases to 99.99% and if that routing policy is utilized across clusters in 3 different regions you get a 99.999% uptime SLA.

Answer 27

Bigtable is another NoSQL database, but unlike Datastore, it is a wide-column database, not a document database. Wide-column databases, as the name implies, store tables that can have a large number of columns. Not all rows need to use all columns, so in that way it is like Datastore—neither require a fixed schema to structure the data. Bigtable is designed for petabyte-scale databases. Both operational databases, like storing IoT data, and analytic processing, like data science applications, can effectively use Bigtable. This database is designed to provide consistent, low-millisecond latency. Bigtable runs in clusters and scales horizontally.

Answer 28

Regional, multiregional, nearline, and coldline are the four storage classes. Multiregional class replicates data across regions. Regional storage replicates data across zones. Nearline is designed for infrequent access, less than once per month. Coldline storage is designed for archival storage, with files being accessed less than once per year. Both nearline and coldline storage incur retrieval charges in addition to charges based on the size of data.

Answer 29

With your existing MapReduce, you can operate on an immense amount of data each day without any overhead worries. With the in-built monitoring system, you can transfer your cluster data to your applications. You can get quick-reports from the system and also have the feature of storing data in Google’s BigQuery. Quick launch and delete smaller clusters stored in blob storage, as and when required using Spark (Spark SQL, PySpark, Spark shell). Spark Machine Learning Libraries and Data Science to customize and run classification algorithms.

Answer 30

Cloud Storage transfer tools—These tools help you upload data directly from your computer into Google Cloud Storage. You would typically use this option for small transfers up to a few TBs. These include the Google Cloud Console UI, the JSON API, and the GSUTIL command line interface. Storage Transfer Service—This service enables you to quickly import online data into Cloud Storage from other clouds, from on-premises sources, or from one bucket to another within Google Cloud. You can set up recurring transfer jobs to save time and resources and it can scale to 10’s of Gbps. Transfer Appliance—This is a great option if you want to migrate a large dataset and don’t have lots of bandwidth to spare. Transfer Appliance enables seamless, secure, and speedy data transfer to Google Cloud. For example, a 1 PB data transfer can be completed in just over 40 days using the Transfer BigQuery Data Transfer Service—With this option your analytics team can lay the foundation for a BigQuery data warehouse without writing a single line of code. It automates data movement into BigQuery on a scheduled, managed basis.

Answer 31

Is the data structured or unstructured? How frequently will the data be accessed? What is the read/write pattern? What is the frequency of reads versus writes? What are the consistency requirements? Can Google managed keys be used for encryption, or do you need to deploy customer managed keys? What are the most common query patterns? Does your application require mobile support, such as synchronization? For structured data, is the workload analytic or transactional? Does your application require low latency writes?

Answer 32

Cloud Storage FUSE *Filesystem in Userspace (FUSE)* is a framework for exposing a filesystem to the Linux kernel. FUSE uses a stand-alone application that runs on Linux and provides a filesystem API along with an adapter for implementing filesystem functions in the underlying storage system.

Answer 33

Atomicity Atomic operations ensure that all steps in a transaction complete or no steps take effect. For example, a sales transaction might include reducing the number of products available in inventory and charging a customer's credit card. If there isn't sufficient inventory, the transaction will fail, and the customer's credit card will not be charged. Consistency, specifically transactional consistency, is a property that guarantees that when a transaction executes, the database is left in a state that complies with constraints, such as uniqueness requirements and referential integrity, which ensures foreign keys reference a valid primary key. When a database is distributed, consistency also refers to querying data from different servers in a database cluster and receiving the same data from each. For example, some NoSQL databases replicate data on multiple servers to improve availability. If there is an update to a record, each copy must be updated. In the time between the first and last copies being updated, it is possible to have two instances of the same query receive different results. This is considered an inconsistent read. Eventually, all replicas will be updated, so this is referred to as eventual consistency.

Answer 34

The information is stored when changed? The durability property ensures that once a transaction is executed, the state of the database will always reflect or account for that change. This property usually requires databases to write data to persistent storage—even when the data is also stored in memory—so that in the event of a crash, the effects of the transactions are not lost. Google Cloud Platform offers two managed relational database services: Cloud SQL and Cloud Spanner. Each is designed for distinct use cases. In addition to the two managed services, GCP customers can run their own databases on GCP virtual machines.

Answer 35

BigQuery is integrated with Cloud IAM, which has several predefined roles for BigQuery. Access can be granted at the organization, project, dataset, and table/view levels. When access is provided at the organization or project level, that access applies to all of a project's BigQuery resources. Datasets are children of projects in the resource hierarchy, so access granted at the dataset level apply only to that dataset and its tables and views. You can also assign access at the table and view levels. roles/bigquery.dataViewer: This role allows a user to list projects and tables and get table data and metadata. roles/bigquery.dataEditor: This has the same permissions as dataViewer, plus permissions to create and modify tables and datasets. roles/bigquery.dataOwner: This role is similar to dataEditor, but it can also create, modify, and delete datasets. roles/bigquery.metadataViewer: This role gives permissions to list tables, projects, and datasets. roles/bigquery.user: The user role gives permissions to list projects and tables, view metadata, create datasets, and create jobs. roles/bigquery.jobUser: A jobUser can list projects and create jobs and queries. roles/bigquery.admin: An admin can perform all operations on BigQuery resources.

Answer 36

Cloud Datastore is a managed document database, which is a kind of NoSQL database that uses a flexible JSON-like data structure called a document. The terminology used to describe the structure of a document is different than that for relational databases. A table in a relational database corresponds to a kind in Cloud Datastore, while a row is referred to as an entity. The equivalent of a relational column is a property, and a primary key in relational databases is simply called the key in Cloud Datastore. Cloud Datastore is fully managed. GCP manages all data management operations including distributing data to maintain performance. The flexible data structure makes Cloud Datastore a good choice for applications like product catalogs or user profiles.

Answer 37

Network latency is a consideration when designing storage systems, particularly when data is transmitted between regions within GCP or outside GCP to globally distributed devices. Three ways of addressing network latency concerns are as follows: Replicating data in multiple regions and across continents Distributing data using Cloud CDN Using Google Cloud Premium Network tier

Answer 38

These include object storage, persistent local and attached storage, and relational and NoSQL databases. Object storage is often used to store unstructured data, archived data, and files that are treated as atomic units. Persistent local and attached storage provides storage to virtual machines. Relational databases are used for structured data, while NoSQL databases are used when it helps to have flexible schemas.

Answer 39

Cloud Filestore is designed to provide low latency and IOPS, so it can be used for databases and other performance-sensitive services.

Answer 40

Cloud Firestore and Cloud Datastore are managed document databases, which are a kind of NoSQL database that uses a flexible JSON-like data structure called a document. Cloud Firestore and Cloud Datastore are fully managed. GCP manages all data management operations, including distributing data to maintain performance. They are designed so that the response time to return query results is a function of the size of the data returned and not the size of the dataset that is queried. The flexible data structure makes Cloud Firestore and Cloud Datastore good choices for applications like product catalogs or user profiles. Cloud Firestore is the next generation of GCP-managed document database.

Answer 41

**System Generated Routes** * System-generated default routes 0.0.0.0/0 for IPv4 ::/0 for IPv6 default-internet-gateway Applies to the whole VPC network * Can be removed or replaced * Subnet route Created automatically for each subnet IP address range VPC network Forwards packets to VMs and internal load balancers Applies to the whole VPC network * Created, updated, and removed automatically by Google Cloud when you create, modify, or delete a subnet or secondary IP address range of a subnet. **Custom Routes** * Static route Supports various destinations Forwards packets to a static route next hop For details about each static route next hop, see considerations for: Instances and internal TCP/UDP load balancers * Next hop instances Internal TCP/UDP load balancer next hops Classic VPN tunnel next hops * Dynamic route Destinations that don't conflict with subnet routes or static routes

Answer 42

Common uses of labels We do not recommend creating large numbers of unique labels, such as for timestamps or individual values for every API call. Here are some common use cases for labels: **Team or cost center labels**: Add labels based on team or cost center to distinguish instances owned by different teams (for example, team:research and team:analytics). You can use this type of label for cost accounting or budgeting. **Component labels**: For example, component:redis, component:frontend, component:ingest, and component:dashboard. **Environment or stage labels**: For example, environment:production and environment:test. **State labels**: For example, state:active, state:readytodelete, and state:archive. * **Virtual machine labels**: A label can be attached to a virtual machine. Virtual machine tags that you defined in the past appear as a label without a value. Use labels on Compute Engine You can apply labels to the following Compute Engine resources: * Virtual machine (VM) instances * Images * Persistent disks * Persistent disk snapshots You can also use labels on related Google Cloud components such as the following: * [Cloud Storage buckets](https://cloud.google.com/storage/docs/using-bucket-labels) * Networking resources: * Forwarding rules * VPN tunnels ([Preview](https://cloud.google.com/products#product-launch-stages)) * Static external IP addresses ([Preview](https://cloud.google.com/products#product-launch-stages)

Answer 43

roles.storage.admin Grants full control of buckets and objects. When applied to an individual **bucket**, control applies only to the specified bucket and objects within the bucket.

Answer 44

BigQuery is a fully managed big data tool for companies that need a cloud-based interactive query service for massive datasets. BigQuery is not a database, it's a query service. BigQuery supports SQL queries, which makes it quite user-friendly. It can be accessed from Console, CLI, or using SDK. You can query billions of rows, it only takes seconds to write, and seconds to return. You can use its REST APIs and get your work done by sending a JSON request. Let’s understand with help of an example, Suppose you are a data analyst and you need to analyze tons of data. If you choose a tool like traditional MySQL, you need to have an infrastructure ready, that can store this huge data. You can focus on analysis rather than working on infrastructure. Hardware is completely abstracted. Designing this infrastructure itself will be a difficult task because you will have to figure out RAM size, CPU type, or any other configurations. BigQuery is mainly for Big Data. You shouldn’t confuse it with OLTP (Online Transaction Processing) database.

Answer 45

Datasets: Datasets hold one or more tables of data. Tables: Tables are row-column structures that hold actual data Jobs: Operations that you perform on the data, such as loading data, running queries, or exporting data.

Answer 46

Cloud Spanner is used to handle large amounts of data. It provides petabytes of capacity. Main use cases include financials and inventory applications. Cloud Spanner can be considered a replacement for traditional SQL. For e.g. In a traditional database environment, when database query response times get close to or even exceed the threshold limit due to an increase in the number of users or queries, you can bring response times down to acceptable levels through manual intervention. Cloud Spanner can scale horizontally easily with minimal intervention. You can scale horizontally by just increasing the number of nodes (just change one digit). Scaling horizontally means thousands of small machines will do the work together for you. Scaling vertically means one big machine will do all the work for you. Simply, Horizontal scaling implies scaling by adding more machines into your resource pool whereas Vertical scaling implies scaling by adding more power and strength to an existing machine. For e.g multiple cloud-based POS solutions for retailers, restaurateurs, and eCommerce merchants around the globe. Spanner is quite more expensive than Cloud SQL. For Cloud SQL you can select machine type, type of hard disk and size, region, and zone. You are restricted to have everything on one server. Spanner is not for general SQL needs, Spanner is mainly used for massive-scale applications.

Answer 47

Bigtable is a distributed database that runs on clusters for applications that has massive data. Its mainly designed for unstructured data, and scales horizontally. Cloud Bigtable is not a relational database system. It stores data in key-value pairs.

Answer 48

For Interactive querying in an online analytical processing system use **BigQuery**. BigQuery is a data warehouse application, and it stores data in structured tables. BigQuery supports SQL queries whereas BigTable doesn't support SQL queries. BigTable is not a recommended solution for a small volume of data(\< 1 TB). BigTable is characteristic of a NoSQL system whereas BigQuery is somewhat of a hybrid where it is mainly used for SQL queries but it does support NoSQL as well. For e.g. If you want to do analytics or business intelligence from collected data from different sources into one location i.e BigQuery. Simple, **Database - BigTable** whereas **Analytics - BigQuery**

Answer 49

You can select from the following location types: A *region* is a specific geographic place, such as São Paulo. A *dual-region* is a specific pair of regions, such as Tokyo and Osaka. A *multi-region* is a large geographic area, such as the United States, that contains two or more geographic places. All Cloud Storage data is redundant across at least two [zones](https://cloud.google.com/docs/geography-and-regions#regions_and_zones) within at least one geographic place as soon as you upload it. Additionally, objects stored in a multi-region or dual-region are *geo-redundant*. Objects that are geo-redundant are stored redundantly in at least two separate geographic places separated by at least 100 miles. *Default replication* is designed to provide geo-redundancy for 99.9% of newly written objects within a target of one hour. Newly written objects include uploads, rewrites, copies, and compositions. [*Turbo replication*](https://cloud.google.com/storage/docs/turbo-replication) provides geo-redundancy for all newly written objects within a target of 15 minutes. Applicable only for dual-region buckets. Cloud Storage stores object data in the selected location in accordance with the [Service Specific Terms](https://cloud.google.com/terms/service-terms).

Storage & Databases Flashcards

(99 cards)