Services Flashcards

Question

Memcache App Engine

Answer 1

When to use a memory cache One use of a memory cache is to speed up common datastore queries. If many requests make the same query with the same parameters, and changes to the results do not need to appear on the web site right away, the application can cache the results in the memcache. Subsequent requests can check the memcache, and only perform the datastore query if the results are absent or expired. Session data, user preferences, and other data returned by queries for web pages are good candidates for caching. Memcache can be useful for other temporary values. However, when considering whether to store a value solely in the memcache and not backed by other persistent storage, be sure that your application behaves acceptably when the value is suddenly not available. Values can expire from the memcache at any time, and can be expired prior to the expiration deadline set for the value. For example, if the sudden absence of a user's session data would cause the session to malfunction, that data should probably be stored in the datastore in addition to the memcache. App Engine supports two levels of the memcache service: Shared memcache is the free default for App Engine applications. It provides cache capacity on a best-effort basis and is subject to the overall demand of all the App Engine applications using the shared memcache service. Dedicated memcache provides a fixed cache capacity assigned exclusively to your application. It's billed by the GB-hour of cache size and requires billing to be enabled. Having control over cache size means your app can perform more predictably and with fewer reads from more costly durable storage. ## Footnote https://cloud.google.com/appengine/docs/legacy/standard/php/memcache

Answer 2

Pub/Sub is an asynchronous and scalable messaging service that decouples services producing messages from services processing those messages. Pub/Sub allows services to communicate asynchronously, with latencies typically on the order of 100 milliseconds. Pub/Sub is used for streaming analytics and data integration pipelines to load and distribute data. It's equally effective as a messaging-oriented middleware for service integration or as a queue to parallelize tasks. Pub/Sub lets you create systems of event producers and consumers, called publishers and subscribers. Publishers communicate with subscribers asynchronously by broadcasting events, rather than by synchronous remote procedure calls (RPCs). Publishers send events to the Pub/Sub service, without regard to how or when these events are to be processed. Pub/Sub then delivers events to all the services that react to them. In systems communicating through RPCs, publishers must wait for subscribers to receive the data. However, the asynchronous integration in Pub/Sub increases the flexibility and robustness of the overall system. Common use cases Ingesting user interaction and server events. To use user interaction events from end-user apps or server events from your system, you might forward them to Pub/Sub. You can then use a stream processing tool, such as Dataflow, which delivers the events to databases. Examples of such databases are BigQuery, Bigtable, and Cloud Storage. Pub/Sub lets you gather events from many clients simultaneously. Real-time event distribution. Events, raw or processed, may be made available to multiple applications across your team and organization for real- time processing. Pub/Sub supports an "enterprise event bus" and event-driven application design patterns. Pub/Sub lets you integrate with many systems that export events to Pub/Sub. Replicating data among databases. Pub/Sub is commonly used to distribute change events from databases. These events can be used to construct a view of the database state and state history in BigQuery and other data storage systems. Parallel processing and workflows. You can efficiently distribute many tasks among multiple workers by using Pub/Sub messages to communicate with the workers. Examples of such tasks are compressing text files, sending email notifications, evaluating AI models, and reformatting images. Enterprise event bus. You can create an enterprise-wide real-time data sharing bus, distributing business events, database updates, and analytics events across your organization. Data streaming from applications, services, or IoT devices. For example, a SaaS application can publish a real-time feed of events. Or, a residential sensor can stream data to Pub/Sub for use in other Google Cloud products through a data-processing pipeline. Refreshing distributed caches. For example, an application can publish invalidation events to update the IDs of objects that have changed. Load balancing for reliability. For example, instances of a service may be deployed on Compute Engine in multiple zones but subscribe to a common topic. When the service fails in any zone, the others can pick up the load automatically. ## Footnote https://cloud.google.com/pubsub/docs/overview

Answer 3

Deployment options Cloud Run offers multiple deployment options. All deployment options result in a container image that runs as a Cloud Run service or job on Cloud Run's fully managed and highly scalable infrastructure. Container images Any container image which adhere to Cloud Run's container runtime contract can be deployed to a Cloud Run service or job. Sources For convenience, Cloud Run lets you build and deploy source code from a single command. When deploying sources, Cloud Build transforms the code into a container image stored in Artifact Registry. You can deploy sources that include a Dockerfile or are written in one of the supported language runtimes. Sources can be deployed to a Cloud Run service or job. Cloud Run services The service is one of the main resource of Cloud Run. Each service is located in a specific Google Cloud region. For redundancy and failover, services are automatically replicated across multiple zones in the region they are in. A given Google Cloud project can run many services in different regions. Each service exposes a unique endpoint and automatically scales the underlying infrastructure to handle incoming requests. You can deploy a service from a container, repository, or source code.

Answer 4

Create read replicas MySQL | PostgreSQL | SQL Server This page describes how to create a read replica for a Cloud SQL instance. A read replica is a copy of the primary instance that reflects changes to the primary in almost real time, in normal circumstances. You can use a read replica to offload read requests or analytics traffic from the primary instance. Additionally, for disaster recovery, you can perform a regional migration. If a replica is a cross-region replica, you can perform a failover to another region; specifically, you can promote a replica to a standalone instance (in which case, existing replicas would not consider that instance as primary). For more information about how replication works, see Replication in Cloud SQL. A limiting factor of Cloud SQL is that databases can scale only vertically, that is, by moving the database to a larger machine. For use cases that require horizontal scalability or support, a globally accessed database, Cloud Spanner, is an appropriate choice. ## Footnote https://cloud.google.com/sql/docs/mysql/replication/create-replica

Answer 5

- Identify idle VMs (and disks) - Schedule VMs to auto start and stop - Rightsize VMs - Leverage preemptible VMs - Optimize Cloud Storage costs and performance - Tune your data warehouse - Filter that network packet - ## Footnote https://cloud.google.com/blog/topics/cost-management/best-practices-for-optimizing-your-cloud-costs?hl=en

Answer 6

Your organization might own intellectual property in the form of highly sensitive data, or your organization might handle sensitive data that is subject to additional data protection regulations, such as PCI DSS. Unintended loss or disclosure of sensitive data can lead to significant negative business implications. If you are migrating from on-premises to the cloud, one of your goals might be to replicate your on-premises network based security architecture as you move your data to Google Cloud. To protect your highly sensitive data, you might want to ensure that your resources can only be accessed from trusted networks. Some organizations might allow public access to resources as long as the request originates from a trusted network, which can be identified based on the IP address of the request. To mitigate data exfiltration risks, your organization might also want to ensure secure data exchange across organizational boundaries with fine-grained controls. As an administrator, you might want to ensure the following: Clients with privileged access don't also have access to partner resources. Clients with access to sensitive data can only read public data sets but not write to them. ## Footnote https://cloud.google.com/vpc-service-controls/docs/overview

Answer 7

App Engine : App Engine is a fully managed, serverless platform for developing and hosting web applications at scale. You can choose from several popular languages, libraries, and frameworks to develop your apps, and then let App Engine take care of provisioning servers and scaling your app instances based on demand https://cloud.google.com/appengine/docs/an-overview-of-app-engine Cloud Run Functions: Cloud Run functions is a lightweight compute solution for developers to create single-purpose, stand-alone functions that respond to Cloud events without the need to manage a server or runtime environment. https://cloud.google.com/functions/docs/concepts/overview Cloud Run: Cloud Run is a managed compute platform that enables you to run containers that are invocable via requests or events. Cloud Run is serverless: it abstracts away all infrastructure management, so you can focus on what matters most — building great applications. https://cloud.google.com/run/docs/overview/what-is-cloud-run ## Footnote https://cloud.google.com/run/docs

Answer 8

Google Cloud Dataproc is a fast, easy-to-use, low-cost and fully managed service that lets you run the Apache Spark and Apache Hadoop ecosystem on Google Cloud Platform. Cloud Dataproc provisions big or small clusters rapidly, supports many popular job types, and is integrated with other Google Cloud Platform services, such as Google Cloud Storage and Stackdriver Logging, thus helping you reduce TCO. https://cloud.google.com/dataproc/docs/resources/faq The best solution for this scenario would be Google Cloud Dataproc (option B). Google Cloud Dataproc is a fully-managed cloud service for running Apache Spark and Hadoop clusters. It provides a fast, easy, and cost-effective way to run big data workloads in the cloud. With Dataproc, you can easily create, configure, and manage Spark and Hadoop clusters without having to worry about the underlying infrastructure. Dataproc provides a number of benefits that make it ideal for scaling big data workloads in the cloud: Easy to use: Dataproc makes it easy to create and manage Spark and Hadoop clusters. You can easily set up clusters of any size and configuration, and Dataproc will handle all the underlying infrastructure. Scalability: Dataproc is designed to scale up and down as needed, so you can easily handle fluctuations in workload. This means that you can easily scale your big data workloads to meet your needs without having to worry about capacity planning. Cost-effective: Dataproc is a cost-effective solution for running big data workloads in the cloud. You only pay for the resources you use, and Dataproc provides automatic cluster scaling and shutdown to minimize costs. Integration: Dataproc integrates with a number of other Google Cloud Platform services, including Google Cloud Storage, BigQuery, and Cloud SQL. This makes it easy to move data between services and run complex data pipelines. ## Footnote https://cloud.google.com/dataproc/docs/resources/faq

Answer 9

- cAdvisor/Kubelet - GKE control-plane metrics - Hubble (GKE dataplane V2) - Istio - Kube State Metrics - Node Exporter - NVIDIA Data Center GPU Manager (DVGM) - Prometheus ## Footnote https://cloud.google.com/stackdriver/docs/managed-prometheus/exporters/kubelet-cadvisor?hl=en

Answer 10

Cloud Dataflow is a batch and stream processing service that can be used for transforming data before it is loaded into a data warehouse Dataflow is a Google Cloud service that provides unified stream and batch data processing at scale. Use Dataflow to create data pipelines that read from one or more sources, transform the data, and write the data to a destination. Typical use cases for Dataflow include the following: Data movement: Ingesting data or replicating data across subsystems. ETL (extract-transform-load) workflows that ingest data into a data warehouse such as BigQuery. Powering BI dashboards. Applying ML in real time to streaming data. Processing sensor data or log data at scale. Dataflow uses the same programming model for both batch and stream analytics. Streaming pipelines can achieve very low latency. You can ingest, process, and analyze fluctuating volumes of real-time data. By default, Dataflow guarantees exactly-once processing of every record. For streaming pipelines that can tolerate duplicates, you can often reduce cost and improve latency by enabling at-least-once mode. 1 Pub/Sub ingests data from an external system. 2 Dataflow reads the data from Pub/Sub and writes it to BigQuery. During this stage, Dataflow might transform or aggregate the data. 3 BigQuery acts as a data warehouse, allowing data analysts to run ad hoc queries on the data. 4 Looker provides real-time BI insights from the data stored in BigQuery. ## Footnote https://cloud.google.com/dataflow/docs/overview

Answer 11

Storage Transfer Service enables seamless data movement across object and file storage systems, including: Amazon S3, Azure Blob Storage, or Cloud Storage to Cloud Storage On-premises storage to Cloud Storage, or Cloud Storage to on-premises Between on-premises storage systems From publicly-accessible URLs to Cloud Storage From HDFS to Cloud Storage Storage Transfer Service is optimized for transfers involving more than 1TiB of data. For smaller transfers, see our recommendations. With Storage Transfer Service, you can: Automate data transfers: Eliminate the need for manual processes and custom scripts. Transfer data at scale: Move petabytes of data quickly and reliably. Optimize network performance: Choose between Google-managed transfers for simplicity or self-hosted agents for granular control over network routing and bandwidth consumption. Support diverse storage systems: Transfer data seamlessly between cloud providers and on-premises environments. ## Footnote https://cloud.google.com/storage-transfer/docs/overview

Answer 12

This document describes sole-tenant nodes. For information about how to provision VMs on sole-tenant nodes, see Provisioning VMs on sole-tenant nodes. Sole-tenancy lets you have exclusive access to a sole-tenant node, which is a physical Compute Engine server that is dedicated to hosting only your project's VMs. Use sole-tenant nodes to keep your VMs physically separated from VMs in other projects, or to group your VMs together on the same host hardware as shown in the following diagram. You can also create a sole-tenant node group and specify whether you want to share it with other projects or with the entire organization. ## Footnote https://cloud.google.com/compute/docs/nodes/sole-tenant-nodes

Answer 13

Cloud Spanner is a managed database service that supports horizontal scalability across regions. This database supports common relational features, such as schemas for structured data and SQL for querying. Cloud Spanner supports both Google Standard SQL (ANSI 2011 with extensions) and Postgres dialects. It supports strong consistency, so there is no risk of data anomalies caused by eventual consistency. Cloud Spanner also manages replication. Cloud Spanner is used for applications that require strong consistency on a global scale. Here are some examples: ■■ Financial trading systems require a globally consistent view of markets to ensure that traders have a consistent view of the market when making trades. ■■ Logistics applications managing a global fleet of vehicles need accurate data on the state of vehicles. ■■ Global inventory tracking requires global-scale transaction to preserve the integrity of inventory data. Cloud Spanner provides 99.999 percent availability, which guarantees less than 5 minutes of downtime per year. Like Cloud SQL, all patching, backing up, and failover management is performed by GCP. Data is encrypted at rest and in transit. Cloud Spanner is integrated with Cloud Identity to support the use of user accounts across applications and with Cloud Identity and Access Management to control authorizations to perform operations on Cloud Spanner resources. As with any distributed database, there is the potential for hot spotting. That is skewing the database workload so that a small number of nodes are doing a disproportionate amount of work. When using Cloud Spanner, it is recommended that you use primary keys that do not lead to hotspotting. Incremented values, time stamps, and other values that monotonically increase in the first part of the key should not be used as a primary key since it will lead to writes being directed to a single server instead of more evenly distributed across all servers. Cloud Spanner supports secondary indexes in addition to primary key indexes. Cloud Spanner stores data encrypted at rest and by default uses Google-managed encryption. If you need to manage your encryption, you have the option to use Cloud Key Management Service (KMS) with a symmetric key, a Cloud HSM key, or a Cloud External Key Manager key.

Answer 14

Cloud Trace, a distributed tracing system for Google Cloud, helps you understand how long it takes your application to handle incoming requests from users or other applications, and how long it takes to complete operations like RPC calls performed when handling the requests. Cloud Trace can also help you when you are developing a service or troubleshooting a failure. For example, it can help you understand how requests are processed in a complicated microservices architecture, and it might help you identify which logs to examine. Because Cloud Trace receives latency data from some Google Cloud services, such as App Engine, and from applications instrumented with the Cloud Trace API, it can help you answer the following questions: How long does it take my application to handle a given request? Why is it taking my application so long to handle a request? Why do some of my requests take longer than others? What is the overall latency of requests to my application? Has latency for my application increased or decreased over time? What can I do to reduce application latency? What are my application's dependencies? ## Footnote https://cloud.google.com/trace/docs/overview

Answer 15

Understanding the performance of production systems is notoriously difficult. Attempting to measure performance in test environments usually fails to replicate the pressures on a production system. Microbenchmarking parts of your application is sometimes feasible, but it also typically fails to replicate the workload and behavior of a production system. Continuous profiling of production systems is an effective way to discover where resources like CPU cycles and memory are consumed as a service operates in its working environment. But profiling adds an additional load on the production system: in order to be an acceptable way to discover patterns of resource consumption, the additional load of profiling must be small. Cloud Profiler is a statistical, low-overhead profiler that continuously gathers CPU usage and memory-allocation information from your production applications. It attributes that information to the source code that generated it, helping you identify the parts of your application that are consuming the most resources, and otherwise illuminating your applications performance characteristics. Types of profiling available Cloud Profiler supports different types of profiling based on the language in which a program is written. The following table summarizes the supported profile types by language: Profile type : CPU time Heap Allocated heap Contention Threads Wall time Languages : Go Java Node.js Python For complete information on the language requirements and any restrictions, see the language's how-to page. For more information about these profile types, see Profiling concepts. ## Footnote https://cloud.google.com/profiler/docs/about-profiler

Answer 16

Cloud Firestore is a NoSQL, document-oriented database. Unlike a SQL database, there are no tables or rows. Instead, you store data in documents, which are organized into collections. Each document contains a set of key-value pairs. Cloud Firestore is optimized for storing large collections of small documents. All documents must be stored in collections. Documents can contain subcollections and nested objects, both of which can include primitive fields like strings or complex objects like lists. Collections and documents are created implicitly in Cloud Firestore. Simply assign data to a document within a collection. If either the collection or document does not exist, Cloud Firestore creates it. Cloud Firestore is the recommended enterprise-grade JSON-compatible document database, trusted by more than 250,000 developers. It's suitable for applications with rich data models requiring queryability, scalability, and high availability. It also offers low latency client synchronization and offline data access. ## Footnote https://firebase.google.com/docs/firestore/rtdb-vs-firestore

Answer 17

The Firebase Realtime Database is a cloud-hosted database. Data is stored as JSON and synchronized in realtime to every connected client. When you build cross-platform apps with our Apple platforms, Android, and JavaScript SDKs, all of your clients share one Realtime Database instance and automatically receive updates with the newest data. Key capabilities Realtime Instead of typical HTTP requests, the Firebase Realtime Database uses data synchronization—every time data changes, any connected device receives that update within milliseconds. Provide collaborative and immersive experiences without thinking about networking code. Offline Firebase apps remain responsive even when offline because the Firebase Realtime Database SDK persists your data to disk. Once connectivity is reestablished, the client device receives any changes it missed, synchronizing it with the current server state. Accessible from Client Devices The Firebase Realtime Database can be accessed directly from a mobile device or web browser; there's no need for an application server. Security and data validation are available through the Firebase Realtime Database Security Rules, expression-based rules that are executed when data is read or written. Scale across multiple databases With Firebase Realtime Database on the Blaze pricing plan, you can support your app's data needs at scale by splitting your data across multiple database instances in the same Firebase project. Streamline authentication with Firebase Authentication on your project and authenticate users across your database instances. Control access to the data in each database with custom Firebase Realtime Database Security Rules for each database instance. ## Footnote https://firebase.google.com/docs/database

Answer 18

Bigtable is a sparsely populated table that can scale to billions of rows and thousands of columns, enabling you to store terabytes or even petabytes of data. A single value in each row is indexed; this value is known as the row key. Bigtable is ideal for storing large amounts of single-keyed data with low latency. It supports high read and write throughput at low latency, and it's an ideal data source for MapReduce operations. Bigtable is exposed to applications through multiple client libraries, including a supported extension to the Apache HBase library for Java. As a result, it integrates with the existing Apache ecosystem of open source big data software. Bigtable's powerful backend servers offer several key advantages over a self-managed HBase installation: Incredible scalability. Bigtable scales in direct proportion to the number of machines in your cluster. A self-managed HBase installation has a design bottleneck that limits the performance after a certain threshold is reached. Bigtable does not have this bottleneck, so you can scale your cluster up to handle more reads and writes. Simple administration. Bigtable handles upgrades and restarts transparently, and it automatically maintains high data durability. To replicate your data, add a second cluster to your instance, and replication starts automatically. No more managing replicas or regions; just design your table schemas, and Bigtable will handle the rest for you. Cluster resizing without downtime. You can increase the size of a Bigtable cluster for a few hours to handle a large load, then reduce the size of the cluster again—all without any downtime. After you change a cluster's size, it typically takes just a few minutes under load for Bigtable to balance performance across all of the nodes in your cluster. What it's good for Bigtable is ideal for applications that need high throughput and scalability for key-value data, where each value is typically no larger than 10 MB. Bigtable also excels as a storage engine for batch MapReduce operations, stream processing/analytics, and machine-learning applications. You can use Bigtable to store and query all of the following types of data: Time-series data, such as CPU and memory usage over time for multiple servers. Marketing data, such as purchase histories and customer preferences. Financial data, such as transaction histories, stock prices, and currency exchange rates. Internet of Things data, such as usage reports from energy meters and home appliances. Graph data, such as information about how users are connected to one another. Bigtable storage model Bigtable stores data in massively scalable tables, each of which is a sorted key-value map. The table is composed of rows, each of which typically describes a single entity, and columns, which contain individual values for each row. Each row is indexed by a single row key, and columns that are related to one another are typically grouped into a column family. Each column is identified by a combination of the column family and a column qualifier, which is a unique name within the column family. Each intersection of a row and column can contain multiple cells. Each cell contains a unique timestamped version of the data for that row and column. Storing multiple cells in a column provides a record of how the stored data for that row and column has changed over time. Bigtable tables are sparse; if a column is not used in a particular row, it does not take up any space. ## Footnote https://cloud.google.com/bigtable/docs/overview

Answer 19

Cloud Dataprep is used to prepare data for analysis and machine learning Cloud Dataprep est l'outil de préparation de données en libre-service de Google, conçu en collaboration avec Alteryx. Dans cet atelier, vous allez apprendre à nettoyer et à enrichir plusieurs ensembles de données à l'aide de Cloud Dataprep. Les exercices de cet atelier se basent sur un scénario fictif.

Answer 20

Fully managed in-memory Valkey, Redis* and Memcached service that offers sub millisecond data access, scalability, and high availability for a wide range of applications.

Answer 21

Compute Engine might preempt Spot VMs to reclaim the resources at any time. Compute Engine preempts Spot VMs for a variety of reasons—for example, system events. The probability that Compute Engine preempts Spot VMs is generally low, but might vary from day to day and from zone to zone depending on current conditions. Spot VMs are finite Compute Engine resources, so they might not always be available. Spot VMs can't live migrate to become standard VMs while they are running or be set to automatically restart when there is a host event. Due to the preceding limitations, Spot VMs are not covered by any Service Level Agreement and are excluded from the Compute Engine SLA. The Google Cloud Free Tier credits for Compute Engine do not apply to Spot VMs. Spot VMs are only available for supported machine types. Compute Engine sends a preemption notice to the VM in the form of an ACPI G2 Soft Off signal. You can use a shutdown script to handle the preemption notice and complete cleanup actions before the VM stops. The shutdown period for a preemption notice is best effort and up to 30 seconds Leverage preemptible VMs: Preemptible VMs are highly affordable compute instances that live up to 24 hours and that are up to 80% cheaper than regular instances. Preemptible VMs are a great fit for fault tolerant workloads such as big data, genomics, media transcoding, financial modelling and simulation. You can also use a mix of regular and preemptible instances to finish compute-intensive workloads faster and cost-effectively, by setting up a specialized managed instance group. ## Footnote https://cloud.google.com/compute/docs/instances/spot

Answer 22

https://cloud.google.com/products/databases?hl=en

Answer 23

Firebase Test Lab est une infrastructure de test d'applications cloud qui vous permet de tester votre application sur différents appareils et configurations afin d'obtenir une meilleure idée de ses performances auprès des utilisateurs réels. ## Footnote https://firebase.google.com/docs/test-lab?hl=fr