5 Vs of big data Flashcards
https://explore.skillbuilder.aws/learn/course/19747/data-engineering-on-aws-foundations;lp=2195
Which big data characteristic deals with having accurate, precise and trusted data?
Veracity
Which big data characteristic deals with the speed in whcih data is generated, dist, and collected.
Veracity
Which big data chracteristic deals with speed in which data is generated, distributed and collected?
Velocity
type of volume data sources used ofr ingestion and storage which includes the following: Customer inforamation, Online product purchases and services contracts
Transactional data
Type of volume data sources for ingestion and storage like Internet browser cache
Temporary Data
Type of volume data sources for ingestion and storage like images, text messages, email messages
objects
Data stored as documents or key-value pairs
semi-structured
Data stored as files
unstructured
Data stored in tables
structured
a PDF is type of what data - unstructured, structured or semi-structured
unstructured.
Structured data are stored within ______
RDBMS - like SQL. Goal is optimized storage
fixed schema is a type of weakness for which database management system
RDBMS - need to consider data types used. Need to consider storage, hardware capabilities. There can be issues with storing unstructured data.
Good transactional latency is a Strength for which database management technology
RDBMS
type of non relational database that stores in a single table. The values are associated with a specific key.
Key-value pair
Database type built to store semi structure data for rapid retrieval and collection.
Non-relational database - like NoSQL often storing data as documents or key-value pairs.
The ability to link keys directly with value without having to index or join. This is a strength for which Database type is this?
Non_relational, Key-value pair
Being able to Query values stored in a single blob is weakness for which non relational database type
Key value pair.
Which non relational database type stores semi structured data in the form of files. Like CSV, PDF
Document stores
Data not being instantly updated is a weakness of which Database type - Relational or Non Relational
Non-Relational NoSQL Databases. Has low transactional latency.
OLAP columar data is stored on by
row by row or column by column
Column by column
Comparing OLTP with OLAP Which is better at sequential reads and writes - columnar or row based
columnar
Comparing OLAP with OLTP which database type does this key characteristic cover: Collection of documents single table with keys and values
Colunmar
COmparing OLTP with OLAP which database type handles horizontal scaling.
Colunmar
Which AWS RDBMS services provides serverless, scalable HA for MYSQL and Postgres?
Aurora
Which AWS service is used for cloud based data warehousing with ML
Amazon redshift, Amazon elsticache, Amazon Dynamo
Amazon Redshift
Which aws database service can be used for durable, im memory ultra db performance -
Amazon elsticache, DocumentDb or MemoryDb for Redis
Amazon MemoryDb for Redis
Which AWS services provides serverless graph database services
Neptune, timestream or Quantum ledger database
Amazon Neptune
Which AWS database service provides is fast flexible and highly scalable noSQL database?
Amazon RDS, DynamoDB, Redshift
Azmazon DynamoDB
WHich AWS service is Fully managed, scalable JSON document database?
Amazon Keyspaces, DocumentDB, MemoryDB, Neptune
Aamazon DocumentDB
WHich AWS server is scalable, HA, serverless, managed Apache Cassandra compatiable database service.
Amazon DcoumentDB, MemoryDb, Keyspaces
Amazon Keyspaces(for apache cassandra)
WHat database service is fast, scalable, serverless timeseries database.
Amazon Neptune, DMS, Timestream
Amazon Timestream
What AWS service is fully managed, cryptographically verifiable lefger database.
AWS database migration service, Amazon Keyspaces, Amazon Quantum ledger.
Amazon, Quantum Ledger
WHich type fo data requires the least amount of preperaton before it can be analyzed?
Structured, SemiStructured, Unstructured
Structured
or relational data is the most convenient to analyze.
Which type of database should be usd to store data in the form blob objects and does not require a predefined schema.
Relational, Dcoument stored, OLTP, Key-value
key value database is best to stored data in the form of blob objects and does not require a predefined schema.
Which type of database should AnyCompany use when it needs to find the total number of items sold on a specific date?
OLAP - should be used when needing to find aggregations of column values.
Which type of database storage method is best suited for returning full rows of data based on a key?
Graph, Vector, OLTP, OLAP
OLTP is best for returning full rows of data based on a key.
type of batch processing that represents data that is processed in a very large volume on a regulariy scheduled basis.
Shceudled, periodic, Near real-time
Scheduled
Type of batch processing, where data is processed at irregular times. These workloads are often run after a certain amount of daata has been collected. This can make them unpredicatable and hard to plan.
Scheduled, Periodic, Near real-time, Real-time
Periodic
type of processing that represents streaming data that is processed in small indiivdual batches
Scheduled, Periodic, Near real-time, Real-time.
Near real-time
type of processing that represents streaming data that is processed in very small indivdual batches. These batches are continuously collected and then processed within miliseconds of the data generation.
Scheudled, Periodic, Near realtime, Real-time.
Real-time
Which type of data processing docuses on analyzing continuous data streams in real time without storing the data first?
Batch processing, Transactional processing, Stream processing, Interactive processing.
Stream Processing.
Data _________ is contingenet on the integrtiy of the data.
Velocity, Veracity, Volume
Veracity.
Data integrity is all about making sure your data is trustworthy. You need to make sure it has integrity and that the entire data chain is secure and free from compromise.
________, ___________ and _________ is the process of collecting data from raw data sources and transforing that data into common type.
ELT, ETT, ETL
ETL (extract, transform and load)
New data is loaded into a final location to be available for anlysis and inspection.
Data Analysts may need to determine the __________ of the data sources and make adjustments to account for any integrity deficies
velocity, integrity, volume
Integrity
can be come from both internal and external sources. During the data integrity checks process, data anlysts look for potential sources of data integrity problems.
The following is a purpose of the what ______, ______ and ________
ETL, ELT
“To ensure the data has he required accuracy, precision and depth”
“To bring together data from different sources to gain a complete picture”
“To build purpose-built data sets to answer key business questions”
ETL
_____________ phase of the ETL process, where 1) identify where all of the source of data resides. 2) PLan when the extraction will take place will take place. 3) PLan where the data will be stored during processing. 4) Plan for how often the extraction must be repeated.
Transform, Extract, Load
Extract data phase.
Most importatn of all phases. The data required for most analytics trasnformation will likely come from multple location and be of multiple types.
__________ your data into a uniform, queryable format is really the heart of the ETL process.
This phase involves using a series of rules and algorithims to massage the data into its final form. Data cleansing also occurs during this part of the process.
Extract, Transform, Load
Transform
Can be basic, such as cleaning data to update formats or to perform data subsitutions.
Can be more advances, including applying business roles to the data to calculate new values. Filtering, complex join operations, aggregating rows, splitting columns and data validation are all very common types of transofrmations applied at this pahase.
The changes can have a huge impact on the usefulness of this data to analysts later, in the visualisation process.
_____________ the final phase of the ETL process. Where you stored the newly transformed data. The planning steps you took in the extraction phase will dictate the form the final data store must take. After the process has sucessfully completed, the data in this location is ready to be analysed.
Load.
________, __________ and _______ approach loads data as it is. It then transforms it at later stage depending on the se case and analytics requirements.
ETL, ELT, ELTL
ELT
Process that requires more definition at teh beginning. Analytics must be involved from the start to define target data types, structure and relationships.
three steps of ELT
Extract - raw data fromv arious sources
Load - it in its natrual state into a data warehouse or data lake
Tranform - it as needed while in the target system.
With _______ all data cleansing, transformation and enrichment occur within the data warehouse. You can interact with and transform the raw data as many times as needed.
ETL, ELT
ELT
You have structured data stored in a relational database that you want to analyze. Where does data transformation happen in your modern cloud-based environment?
SOurce relational database
Staging tables
final tables
target data warehouse
Target data warehouse
Before making decisions its i important to extract the __________ from your data.
Velocity, value
Value - THe process of extracting, filtering and custmizing your data can be accomplished by creating queries. Value can be derived from data by quering the data and generating meaningful reports.
______________ reporting is used to transform data into actionable information that empowers organisations to make informed decisions, optimize processes, and acheive strategic objectives.
Tranasctional, Relational, Analytical
Analytical Reporting
There are a few steps:
1. Gather the data, facts, action items and conclusions
2. Indetify the audience, expectations they have and proper method of delivery.
3. Identify the visulisation styles and report style that wull best fit the needs of the audience.
4. Create the reports and dashboards.
______________ makes complex data more accessible and understandable, helping users quickly identify trends, patterns and anomalies.
Reporting, Visulisation, Analytics
Visulisation.
With reporting tools you can create visual representations of data such as charts, graphs and dashboards.
When creating reports and dashboard, use charts, tables and grpahs to _______________
Answer Questions. The clearer the questions the better the answers the report or dashboard will provide.
__________, interactive, dashboards is one of the three broad types of visual reports
dynamic, static
static reports, interactive reports, dashboards.
Type of report found in the form of PDF’s and PowerPoint slides can can often be accessed through web portals and software.
Dynamic reports, Static reports, ineractive reports and Dashboards
Static reports.
Type of reports that generally fall under the heading of slef-service business intelligence. These reports often take on a print based report style but have the advantage that consumers can apply filters t charts and grahs, change the scales and even group and sort values within the reports.
Dynamic reports, Static Reports, Interactive Report
Interactive Reports.
A Consumer can then tell their own story using the foundation laid by the report builder.
These reports often take a pint based report style but have the advantage that consumers can apply filters to charts and grphs, change the scales and even group and sort values within the reports.
Type of visulisation very popular reporting tool. Where you can focus on high level roll-ups of key business factors.
Static Reports, Interactive Reports, Dashboards
Dashboards