Course1-M3 Flashcards

Question 1

Q

some of the tools used for data ingestion, supporting both batch and streaming modes?

Ref

Answer

A

Google Cloud DataFlow
IBM Streams
IBM Streaming Analytics on Cloud
Amazon Kinesis
Apache Kafka

Question 2

Q

The Storage and Integration layer in a data platform needs to: Store data for processing and long-term use. Transform and merge extracted data, either logically or physically. Make data available for processing in both streaming and batch modes. The storage layer needs to be reliable, scalable, high-performing, and also cost-efficient. some of the popular relational databases are: ?

Ref

Answer

A

IBM DB2
Microsoft SQL Server
MySQL
Oracle Database
PostgreSQL
are some of the popular relational databases.

Question 3

Q

Cloud-based relational databases, also referred to as Database-as-a-Service, have gained great popularity over the recent years. Such as:?

In the NoSQL, or non-relational database systems on the cloud, we have: ?

Ref

Answer

A

IBM DB2 on Cloud
Amazon Relational Database Service (RDS)
Google Cloud SQL
SQL Azure.

NoSQL:

IBM Cloudant
Redis
MongoDB
Cassandra
Neo4J

Question 4

Q

Tools for integration include:?

Open-source tools such as ? are also very popular integration tools.

There are a number of vendors offering cloud-based Integration Platform as a Service (or iPaaS). For example:?

Ref

Answer

A

Integration tools
* IBM’s Cloud Pak for Data and Cloud Pak for Integration
* Talend’s Data Fabric
* Open Studio

Open source
* Dell Boomi
* SnapLogic

Cloud-based
* Adeptia Integration Suite
* Google Cloud’s Cooperation 534
* IBM’s Application Integration Suite on Cloud
* Informatica’s Integration Cloud

Question 5

Q

Once the data has been ingested, stored, and integrated, it needs to be processed. Data validations, transformations, and applying business logic to the data are some of the things that need to happen in this layer.

There are a host of tools available for performing these transformations on data, selected based on the data size, structure, and specific capabilities of the tool. Such as ?

Ref

Answer

A

Spreadsheets
OpenRefine
Google DataPrep
Watson Studio Refinery
Trifacta Wrangler
Python and R also offer several libraries and packages that are explicitly created for processing data.

Question 6

Q

Are storage and processing always performed in separate layers?

Ref

Answer

A

It’s important to note that storage and processing may not always be performed in separate layers. For example, in relational databases, storage and processing can occur in the same layer, while in Big Data systems, data can be first stored in the Hadoop File Distribution System, or HDFS, and then processed in a data processing engine like Spark. And, the data processing layer can also precede the data storage layer, where transformations are applied before the data is loaded, or stored, in the database.

Question 7

Q

Note:

The Analysis and User Interface Layer delivers processed data to data consumers. Data consumers can include: Business Intelligence Analysts and business stakeholders who consume this data through interactive visual representations, such as dashboards and analytical reports. Data Scientists and Data Analytics that further process this data for specific use cases. Other applications and services that may need this data as input for further use. The Analysis and UI Layer needs to support: Querying tools and programming languages. For example, SQL for querying relational databases and SQL-like querying tools for non-relational databases, such as CQL for Cassandra, Programming languages such as Python, R, and Java, APIs that can be used to run reports on data for both online and offline processing.

Ref

Question 8

Q

Overlaying the Data Ingestion, Data Storage and Integration, and Data Processing layers is the __?__ layer with the Extract, Transform, and Load tools. This layer is responsible for implementing and maintaining a continuously flowing data pipeline.

Ref

Answer

A

Data Pipeline

Question 9

Q

There are a number of data pipeline solutions available, most popular among them being ____ and ____.

Ref

Answer

A

Apache Airflow
DataFlow

Question 10

Q

Some of the primary considerations for designing a data store are: ?

Ref

Answer

A

The type of data you want to store
Volume of data
Intended use of data
Storage considerations
Privacy, security, and governance needs of your organization

Intended use of data: The number of transactions, frequency of updates, type of operations performed on the data, response time, and backup and recovery requirements all need to be provisioned for in the design of a data store.

Storage considerations: Whether you need to use the data store for recording high-volume transactional data or executing complex queries for analytical purposes, your processing and storage needs will differ.

Question 11

Q

Non-relational databases, based on the type of data and how you want to query the data, are of four different types: ?

Ref

Answer

A

key-value
document
column
graph-based

Question 12

Q

Transactional systems, that is systems used for capturing high-volume transactions, need to be designed for ____, ____ and ____ operations.
Analytical systems, on the other hand, need complex queries to be applied to large amounts of historical data aggregated from transactional systems. They need faster ____ to complex queries.

Ref

Answer

A

high-speed read, write, and update
response times

Question 13

Q

Normalization of the database is another important consideration at the design stage. Normalization is ____. Done right, it helps in the optimal use of storage space, makes database maintenance easier, and provides faster access to data. Normalization is important for systems that handle ____ data. But for systems designed for handling analytical queries, normalization can lead to ____ issues.

Ref

Answer

A

the process of efficiently organizing data in a database.
transactional
performance

Question 14

Q

The architecture of a data platform can be seen as a set of layers, or functional components, each one performing a set of specific tasks. These layers include:

Data Ingestion or Data Collection Layer, responsible for bringing data from source systems into the data platform.
Data Storage and Integration Layer, responsible for storing and merging extracted data.
Data Processing Layer, responsible for validating, transforming, and applying business rules to data.
Analysis and User Interface Layer, responsible for delivering processed data to data consumers.
Data Pipeline Layer, responsible for implementing and maintaining a continuously flowing data pipeline.

Ref

Course1-M3 Flashcards

Data Platforms, Data Stores, and Security