Analytics Flashcards
In Analytics there are 4 types of analysis you can make. What are their names?
-Descriptive analytics
-Diagnostic analytics
-Predictive analytics
-Prescriptive analytics
What are Descriptive Analytics?
Descriptive analytics focuses on analyzing present and past data to determine what is happening at present.
What are Diagnostic Analytics?
Diagnostic Analystic focus on analysing data to determine for what reason something happens.
What are Predictive Analytics?
Predictive analytics focuses on determining what might happen
What are Prescriptive Analytics?
Prescriptive analytics are similar to predictive analytics, but instead of only predicting what might happen you also suggests actions to take and what are their consequences
Explain Amazon CodeWhisperer
Amazon CodeWhisperer is an AWS AI Service that generates and comments code ussing LLM tecnology.
True or False: Amazon CodeWhisperer can detect security vulnerabilities in your code
True
What are the 5 big Vs of big data and what do they mean?
-Volume: The amount of date being ingested
-Variety: The number and types of data sources
-Velocity: The speed with which new data is processed and stored
-Veracity: The degree to which the data can be trusted
-Value: The amount of information that can be extracted from the data
What are the 3 most frequent caracterizations of data based on their format?
-Structured Data
-Semi-structured Data
-Unstructured Data
What are the 4 data processing velocities?
-Scheduled
-Periodic
-Near real-time
-Real-time
What is AWS Lake Formation?
A service that simplifies ingesting, cleaning, cataloging, transforming, and securing data on S3 Data Lakes.
AWS’ main data warehousing solution is called AWS __________
Redshift
Complete the following statement regarding ETL on AWS:
When looking at a standard, simplified ETL pipeline on AWS, one should use ________. For customized processes, however, one should use __________.
-Glue
-EMR
What are the 4 main functions of a Data Lake?
-Ingest and store data
-Catalog data for searches
-Secure and protect data
-Allow analytics and insights to be run
What are the 6 stages on an Analytics pipeline?
-Data Source
-Ingestion
-Data Store
-Cataloging and processing stage
-Search and analytics stage
-Visualization stage
What are the 3 main challenges in mantaining a data lake?
-Data governance
-Data quality
-Security
What are AWS Lake Formation’s 4 main features?
-Automate building the data lake environment (collecting, moving, cleansing data, etc)
-Store metadata from raw and processed datasets
-Orchestrate ETL jobs, crawlers and triggers using AWS Glue
-Centralize access control to the data lake
Lake formation security model consists of 3 security roles to be used in managing the lake. What are they?
-Lake formation administrator
-Database Creator
-Table Creator
What permissions does the Lake Formation administrator have?
-Has full read access to resources
-Has data location permissions
-Can grant or revoke access to resources, including self
-Can create databases
-Can grant permission to create databases
What permissions does the Lake Formation Database Creator have?
-Has all database permissions on databases that they create
-Has permissions on tables that they create
-Can use console or API to designate database creators
What permissions does the Lake Formation Table Creator have?
-Has permissions on tables that they create
-Can grant permissions on tables that they create
-Can view databases containing the tables that they create
The AWS Service for Data Mesh is called __________
DataZone
What are the main Kinesis services?
-Kinesis Data Streams
-Kinesis Video Streams
-Kinesis Firehose
-Kinesis Analytics
What are the default and the max Kinesis Data Streams data retention?
-Default: 24 hours
-Max: 365 days
True or False: Kinesis Data Streams data can only be deleted using the AWS SDK
False, it cannot be deleted at all
What are the 2 Kinesis Dat Streams provisioning types?
-On-demand
-Provisioned
Kinesis data streams data is stored inside ______ that make up _______
-Shards
-Partitions
What are the Kinesis Data Streams Shard parts?
-Partition key: Indicates the shard’s partition
-Sequence: Indicate the sard location inside the partition
-Data: Contains up to 1MB of data
What are the Kinesis Data Streams data producers?
-The AWS SDK
-The AWS Kinesis Agent
-The Kinesist Production Library (KPL)
True or False: A Kinesis Data Stream shard can have multiple producers and consumers
False, 1 producer, multiple consumers
What are the Kinesis Data Streams data consumers?
-AWS SDK
-Kinesis Client Library
-Lambda
How much data can you write to Kinesis Data Streams per second?
-1MB/s or 1000 messages/s per shard
How much data can you read from Kinesis Data Streams per second?
-2MB/s or 5 API calls per second per shard
True or False: When using Consumer Enhanced Fan-Out Kinesis Data Streams, You cannot read data using APIs since it’s a push model
True
AWS Kinesis Firehose can send data to 3 possible destinations: AWS, 3rd party partner or Custom (Any HTTP endpoint). What are the possible AWS destinations for Kinesis Firehose Data?
-S3
-Redshift
-OpenSearch
Whats the minimum latency fo Kinesis Firehose to write data to it’s destination?
60s
True or False: Kinesis Firehose updates data in real time
False, near real time (60s delay at least)
What AWS Service can you use to peform custom treatments on AWS Kinesis Firehose Data? (Select one):
-Glue
-Lambda
-EMR
-Sagemaker
-Lambda
How does Kinesis Firehose Buffers work?
The buffer has 2 main parameters BufferSize (in MB) and BufferTime (in seconds). It only writes the data when one of those buffer values is reached
True or False: Kinesis Firehose stores data that passes though it for 7 days
False, it does not store data at all
What are the main use cases for Kinesis Data Analytics?
-Streaming ETL (only simple transformations)
-Continuous Metric Generation
-Responsive analytics for certain metrics
What languages does Kinesis Data Analytics accept?
Flink and SQL
True or False: You can use AWS Lambda do pre-process Kinesis Data Analytics data
True
What’s the name of the AWS Service used to run Apache Kafka inside of AWS
Amazon Managed Streaming for Apache Kafka (Amazon MSK)
Amazon MSK has both provisioned and serverless settings
True
What are the differences between Kinesis Data Streams and Amazon MSK regarding:
-Message Size
-Data organization
-Structure resizing
-Cryptography
-KDS has a max message size of 1MB, while MSK has a default size of 1MB that can be increased up to 10MB
-KDS has data streams with shards, while MSK has Kafka Topics with Partitions
-KDS accepts shard splitting and merging, while MSK can only add partitions to a topic
- Both support KMS and TLS, but MSK also supports Plaintext in-flight encryption
What are the accepted data consumers for Amazon MSK?
-Lambda
-Glue
-Kinesis Analytics
-Custom applications running on EC2, ECS, etc
What EMR use cases?
Big Data ML, Big Data Processing, etc
True or False: EMR means Elastic MapReduce, and it creates Fargate Hadoop clusters to analyze and process vast amounts of data
False, EMR runs on EC2 clusters
What are the EMR node types and what are their functions?
- Master Node: Manage the cluster, coordinate, manage health
- Core Node: Run tasks and store data
- Task Node (Optional): Just runs task, usually Spot instances
What are the EMR purchasing options?
-On-demand
-Reserved (Min 1 year)
-Spot Instances
What are the types of EMR Instance Groups?
-Uniform instance groups: All nodes have same instance type and configurations
-Instance fleet: select target capacity, mix instance types and purchasing options
True or False: EMR has no auto-scaling for both EMR Instance Groups
False, Uniform Instance Groups have Auto-Scaling
True or False: AWS Glue is fully serverless
True
What is Amazon Quicksight
It’s a BI tool offered by AWS to visualize data on multiple different sources
To control acess to dashboard, Quicksight uses _________ and ________
Users and Groups