Misc Flashcards
What is ingestion?
Ingestion becomes the most critical and is an important process while building a data pipeline. Ingestion is a process to read data from data sources. Typically, ingestion can happen either as batches or through streaming.
Batch Ingestion sets the records and extracts them as a group. It is sequential and processes records according to criteria set by developers. Streaming which is an alternative data ingestion paradigm automatically pass individual records one by one. Organizations use streaming only when they need near-real-time data for use with in applications or analytics.
GCP offers various ingestion services to batch load or stream data from difference sources and further build pipelines as required.
What are various Explore and Visualize services?
Datalab
Data Studio
Google Sheets
What are the different no sql database types?
The relational database (RDBMS) model completely dominated database technology for over 20 years. Today this “one size fits all” stability has been disrupted by a relatively recent explosion of new database technologies. These paradigm-busting technologies are powering the “Big Data” and “NoSQL” revolutions, as well as forcing fundamental changes in databases across the board.
Whether your business is early in its journey or well on its way to digital transformation, Google Cloud’s solutions and technologies help chart a path to success. In Google Cloud Databases you can migrate, manage, and modernize data with secure, reliable, and highly available databases. This article discusses the differences between services that are provided by GCP and the critical factors to consider when choosing a cloud NoSQL database for your new and upcoming projects.
What are the strengths of Datastore?
Let’s begin by understanding Datastore’s strengths.
- Highly structured data
- A. It’s a document-oriented database. So it’s ideal for highly structured data such as XML or HTML.
1.B. It was typically used in highly skilled web serving applications, which require very fast rates of reference data but are relatively low on the need for interactivity.
1.C. Datastore has a transactional mode and so we can use it for use cases where ACID support is important. - Multi-tenancy
2.A. Separate data partitions for each client organization. Datastore was originally meant for use in web apps where incoming data might be from multiple different organizations and Datastore’s multi-tenancy feature creates separate data partitions for each client organization.
2.B. We can have the same scheme of data from all clients but very the values.
2.C. Specified via a namespace (inside which kinds and entities can exist). The use of namespaces, kinds, and entities, and other parts of the Datastore, data model need to exist within namespaces. In this way, the data within different namespaces are kept separated and isolated. - Ancestors
3.A. Entities are arranged hierarchically within a namespace. The most intuitive way to make sense of Datastore’s data model is to imagine entities as hierarchically arranged documents within a namespace.
3.B. Like document files on a directory system.
3.C. Each entity can have a designated parent or it could be a root entity. This idea of an entity is having an ancestor makes Datastorage’s data model very different from a relational one.
- Schema-less NoSQL
4.A. Entities of the same kind can have different properties.
4.B. Rows of the same table can have different columns.
4.C. Different entities can have properties of the same name but different value types.
4.D. Different rows can have columns of the same name but different types.
When should you avoid Cloud Datastore
1. With relational data needing full SQL support.
2. Unstructured data such as blobs.
3. When running a lot of interactive queries.
What is Cloud Datastore
Datastore is a schema-less, NoSQL, document-oriented database service. It provides auto-scaling, ACID transactions, and a SQL-like query language called GQL.
It’s probably fair to say the Datastore hasn’t gotten all of the adoptions that Google would have hoped for, and its future roadmap is a little uncertain there is talk of folding it into Google Cloud Firestore even so data store is a powerful and scalable product. This is widely used in existing applications particularly app engine applications to show very popularly in the early days of Google Cloud.
What are the different documents, collections, and fields in Firestore?
4.A. Firestore is a NoSQL, document-oriented database. Unlike a SQL database, there are no tables or rows. Instead, you store data in documents, which are organized into collections.
4.B. All documents must be stored in collections. Documents can contain sub-collections and nested objects, both of which can include primitive fields like strings or complex objects like lists.
4.C. Collections and documents are created implicitly in Firestore. Simply assign data to a document within a collection. If either the collection or document does not exist, Firestore creates it.
Cloud Firestore Ideal for
According to what we discussed above, Cloud Firestore is ideal for building client-side mobile and web applications, gaming leaderboards, and user presence on a global scale.
What is the architecture for BigTable?
Cloud Bigtable
Finally, let’s define the third cloud database service, Cloud Bigtable. The massively scalable Cloud Bigtable database can help you work with data sets from GB to PB, and build a foundation for your innovative and groundbreaking application. It’s a fully managed, NoSQL database service for large analytical and operational workloads. We will list the key features of Cloud Bigtable.
- High throughput at low latency
Bigtable is ideal for storing very large amounts of data in a key-value store and supports high read and write throughput at low latency for fast access to large amounts of data. Throughput scales linearly—you can increase QPS (queries per second) by adding Bigtable nodes.
- Cluster resizing without downtime
Scale seamlessly from thousands to millions of reads/writes per second. Bigtable throughput can be dynamically adjusted by adding or removing cluster nodes without restarting, meaning you can increase the size of a Bigtable cluster for a few hours to handle a large load, then reduce the cluster’s size again—all without any downtime.
- Flexible, automated replication to optimize any workload
Write data once and automatically replicate where needed with eventual consistency—giving you control for high availability and isolation of reading and write workloads. No manual steps are needed to ensure consistency, repair data, or synchronize writes and deletes.
Cloud Bigtable Ideal for
Again and based on what we discussed above, Cloud Firestore is ideal for building client-side mobile and web applications, gaming leaderboards, and user presence on a global scale.
_________________________________________
So you think in migrating to GCP NoSQL databases? Hope this article helps you to know the key features for each service provided by Google Cloud Platform to be able to decide which one of them is the most suitable for your project’s needs.
How does node numbers help BigTable scale?
Scale seamlessly from thousands to millions of reads/writes per second. Bigtable throughput can be dynamically adjusted by adding or removing cluster nodes without restarting, meaning you can increase the size of a Bigtable cluster for a few hours to handle a large load, then reduce the cluster’s size again—all without any downtime.
- Flexible, automated replication to optimize any workload
Write data once and automatically replicate where needed with eventual consistency—giving you control for high availability and isolation of reading and write workloads. No manual steps are needed to ensure consistency, repair data, or synchronize writes and deletes.
What are the INGEST Services?
App Engine Compute Engine Kubernetes Engine Cloud Pub/Sub Stackdriver Logging Cloud Transfer Service Transfer Appliance
What are the parts of the datastore database?