5 Vs of big data Flashcards

https://explore.skillbuilder.aws/learn/course/19747/data-engineering-on-aws-foundations;lp=2195

1
Q

Which big data characteristic deals with having accurate, precise and trusted data?

A

Veracity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which big data characteristic deals with the speed in whcih data is generated, dist, and collected.

A

Veracity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which big data chracteristic deals with speed in which data is generated, distributed and collected?

A

Velocity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

type of volume data sources used ofr ingestion and storage which includes the following: Customer inforamation, Online product purchases and services contracts

A

Transactional data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Type of volume data sources for ingestion and storage like Internet browser cache

A

Temporary Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Type of volume data sources for ingestion and storage like images, text messages, email messages

A

objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data stored as documents or key-value pairs

A

semi-structured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data stored as files

A

unstructured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data stored in tables

A

structured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

a PDF is type of what data - unstructured, structured or semi-structured

A

unstructured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Structured data are stored within ______

A

RDBMS - like SQL. Goal is optimized storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

fixed schema is a type of weakness for which database management system

A

RDBMS - need to consider data types used. Need to consider storage, hardware capabilities. There can be issues with storing unstructured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Good transactional latency is a Strength for which database management technology

A

RDBMS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

type of non relational database that stores in a single table. The values are associated with a specific key.

A

Key-value pair

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Database type built to store semi structure data for rapid retrieval and collection.

A

Non-relational database - like NoSQL often storing data as documents or key-value pairs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The ability to link keys directly with value without having to index or join. This is a strength for which Database type is this?

A

Non_relational, Key-value pair

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Being able to Query values stored in a single blob is weakness for which non relational database type

A

Key value pair.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which non relational database type stores semi structured data in the form of files. Like CSV, PDF

A

Document stores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Data not being instantly updated is a weakness of which Database type - Relational or Non Relational

A

Non-Relational NoSQL Databases. Has low transactional latency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

OLAP columar data is stored on by

row by row or column by column

A

Column by column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Comparing OLTP with OLAP Which is better at sequential reads and writes - columnar or row based

A

columnar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Comparing OLAP with OLTP which database type does this key characteristic cover: Collection of documents single table with keys and values

A

Colunmar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

COmparing OLTP with OLAP which database type handles horizontal scaling.

A

Colunmar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Which AWS RDBMS services provides serverless, scalable HA for MYSQL and Postgres?

A

Aurora

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Which AWS service is used for cloud based data warehousing with ML

Amazon redshift, Amazon elsticache, Amazon Dynamo

A

Amazon Redshift

25
Q

Which aws database service can be used for durable, im memory ultra db performance -
Amazon elsticache, DocumentDb or MemoryDb for Redis

A

Amazon MemoryDb for Redis

26
Q

Which AWS services provides serverless graph database services

Neptune, timestream or Quantum ledger database

A

Amazon Neptune

27
Q

Which AWS database service provides is fast flexible and highly scalable noSQL database?

Amazon RDS, DynamoDB, Redshift

A

Azmazon DynamoDB

28
Q

WHich AWS service is Fully managed, scalable JSON document database?

Amazon Keyspaces, DocumentDB, MemoryDB, Neptune

A

Aamazon DocumentDB

29
Q

WHich AWS server is scalable, HA, serverless, managed Apache Cassandra compatiable database service.

Amazon DcoumentDB, MemoryDb, Keyspaces

A

Amazon Keyspaces(for apache cassandra)

30
Q

WHat database service is fast, scalable, serverless timeseries database.

Amazon Neptune, DMS, Timestream

A

Amazon Timestream

31
Q

What AWS service is fully managed, cryptographically verifiable lefger database.

AWS database migration service, Amazon Keyspaces, Amazon Quantum ledger.

A

Amazon, Quantum Ledger

32
Q

WHich type fo data requires the least amount of preperaton before it can be analyzed?

Structured, SemiStructured, Unstructured

A

Structured
or relational data is the most convenient to analyze.

33
Q

Which type of database should be usd to store data in the form blob objects and does not require a predefined schema.

Relational, Dcoument stored, OLTP, Key-value

A

key value database is best to stored data in the form of blob objects and does not require a predefined schema.

34
Q

Which type of database should AnyCompany use when it needs to find the total number of items sold on a specific date?

A

OLAP - should be used when needing to find aggregations of column values.

35
Q

Which type of database storage method is best suited for returning full rows of data based on a key?

Graph, Vector, OLTP, OLAP

A

OLTP is best for returning full rows of data based on a key.

36
Q

type of batch processing that represents data that is processed in a very large volume on a regulariy scheduled basis.

Shceudled, periodic, Near real-time

A

Scheduled

37
Q

Type of batch processing, where data is processed at irregular times. These workloads are often run after a certain amount of daata has been collected. This can make them unpredicatable and hard to plan.

Scheduled, Periodic, Near real-time, Real-time

A

Periodic

38
Q

type of processing that represents streaming data that is processed in small indiivdual batches

Scheduled, Periodic, Near real-time, Real-time.

A

Near real-time

39
Q

type of processing that represents streaming data that is processed in very small indivdual batches. These batches are continuously collected and then processed within miliseconds of the data generation.

Scheudled, Periodic, Near realtime, Real-time.

A

Real-time

40
Q

Which type of data processing docuses on analyzing continuous data streams in real time without storing the data first?

Batch processing, Transactional processing, Stream processing, Interactive processing.

A

Stream Processing.

41
Q

Data _________ is contingenet on the integrtiy of the data.

Velocity, Veracity, Volume

A

Veracity.

Data integrity is all about making sure your data is trustworthy. You need to make sure it has integrity and that the entire data chain is secure and free from compromise.

42
Q

________, ___________ and _________ is the process of collecting data from raw data sources and transforing that data into common type.

ELT, ETT, ETL

A

ETL (extract, transform and load)

New data is loaded into a final location to be available for anlysis and inspection.

42
Q

Data Analysts may need to determine the __________ of the data sources and make adjustments to account for any integrity deficies

velocity, integrity, volume

A

Integrity

can be come from both internal and external sources. During the data integrity checks process, data anlysts look for potential sources of data integrity problems.

43
Q

The following is a purpose of the what ______, ______ and ________

ETL, ELT

“To ensure the data has he required accuracy, precision and depth”
“To bring together data from different sources to gain a complete picture”
“To build purpose-built data sets to answer key business questions”

A

ETL

44
Q

_____________ phase of the ETL process, where 1) identify where all of the source of data resides. 2) PLan when the extraction will take place will take place. 3) PLan where the data will be stored during processing. 4) Plan for how often the extraction must be repeated.

Transform, Extract, Load

A

Extract data phase.

Most importatn of all phases. The data required for most analytics trasnformation will likely come from multple location and be of multiple types.

45
Q

__________ your data into a uniform, queryable format is really the heart of the ETL process.

This phase involves using a series of rules and algorithims to massage the data into its final form. Data cleansing also occurs during this part of the process.

Extract, Transform, Load

A

Transform

Can be basic, such as cleaning data to update formats or to perform data subsitutions.

Can be more advances, including applying business roles to the data to calculate new values. Filtering, complex join operations, aggregating rows, splitting columns and data validation are all very common types of transofrmations applied at this pahase.

The changes can have a huge impact on the usefulness of this data to analysts later, in the visualisation process.

46
Q

_____________ the final phase of the ETL process. Where you stored the newly transformed data. The planning steps you took in the extraction phase will dictate the form the final data store must take. After the process has sucessfully completed, the data in this location is ready to be analysed.

A

Load.

47
Q

________, __________ and _______ approach loads data as it is. It then transforms it at later stage depending on the se case and analytics requirements.

ETL, ELT, ELTL

A

ELT

Process that requires more definition at teh beginning. Analytics must be involved from the start to define target data types, structure and relationships.

48
Q

three steps of ELT

A

Extract - raw data fromv arious sources
Load - it in its natrual state into a data warehouse or data lake
Tranform - it as needed while in the target system.

49
Q

With _______ all data cleansing, transformation and enrichment occur within the data warehouse. You can interact with and transform the raw data as many times as needed.

ETL, ELT

A

ELT

50
Q

You have structured data stored in a relational database that you want to analyze. Where does data transformation happen in your modern cloud-based environment?

SOurce relational database
Staging tables
final tables
target data warehouse

A

Target data warehouse

51
Q

Before making decisions its i important to extract the __________ from your data.

Velocity, value

A

Value - THe process of extracting, filtering and custmizing your data can be accomplished by creating queries. Value can be derived from data by quering the data and generating meaningful reports.

52
Q

______________ reporting is used to transform data into actionable information that empowers organisations to make informed decisions, optimize processes, and acheive strategic objectives.

Tranasctional, Relational, Analytical

A

Analytical Reporting

There are a few steps:
1. Gather the data, facts, action items and conclusions
2. Indetify the audience, expectations they have and proper method of delivery.
3. Identify the visulisation styles and report style that wull best fit the needs of the audience.
4. Create the reports and dashboards.

53
Q

______________ makes complex data more accessible and understandable, helping users quickly identify trends, patterns and anomalies.

Reporting, Visulisation, Analytics

A

Visulisation.

With reporting tools you can create visual representations of data such as charts, graphs and dashboards.

54
Q

When creating reports and dashboard, use charts, tables and grpahs to _______________

A

Answer Questions. The clearer the questions the better the answers the report or dashboard will provide.

55
Q

__________, interactive, dashboards is one of the three broad types of visual reports

dynamic, static

A

static reports, interactive reports, dashboards.

56
Q

Type of report found in the form of PDF’s and PowerPoint slides can can often be accessed through web portals and software.

Dynamic reports, Static reports, ineractive reports and Dashboards

A

Static reports.

57
Q

Type of reports that generally fall under the heading of slef-service business intelligence. These reports often take on a print based report style but have the advantage that consumers can apply filters t charts and grahs, change the scales and even group and sort values within the reports.

Dynamic reports, Static Reports, Interactive Report

A

Interactive Reports.

A Consumer can then tell their own story using the foundation laid by the report builder.

These reports often take a pint based report style but have the advantage that consumers can apply filters to charts and grphs, change the scales and even group and sort values within the reports.

58
Q

Type of visulisation very popular reporting tool. Where you can focus on high level roll-ups of key business factors.

Static Reports, Interactive Reports, Dashboards

A

Dashboards