Analytics Flashcards by Luiz Martins

In Analytics there are 4 types of analysis you can make. What are their names?

-Descriptive analytics
-Diagnostic analytics
-Predictive analytics
-Prescriptive analytics

How well did you know this?

Not at all

Perfectly

What are Descriptive Analytics?

Descriptive analytics focuses on analyzing present and past data to determine what is happening at present.

How well did you know this?

Not at all

Perfectly

What are Diagnostic Analytics?

Diagnostic Analystic focus on analysing data to determine for what reason something happens.

How well did you know this?

Not at all

Perfectly

What are Predictive Analytics?

Predictive analytics focuses on determining what might happen

How well did you know this?

Not at all

Perfectly

What are Prescriptive Analytics?

Prescriptive analytics are similar to predictive analytics, but instead of only predicting what might happen you also suggests actions to take and what are their consequences

How well did you know this?

Not at all

Perfectly

Explain Amazon CodeWhisperer

Amazon CodeWhisperer is an AWS AI Service that generates and comments code ussing LLM tecnology.

How well did you know this?

Not at all

Perfectly

True or False: Amazon CodeWhisperer can detect security vulnerabilities in your code

True

How well did you know this?

Not at all

Perfectly

What are the 5 big Vs of big data and what do they mean?

-Volume: The amount of date being ingested
-Variety: The number and types of data sources
-Velocity: The speed with which new data is processed and stored
-Veracity: The degree to which the data can be trusted
-Value: The amount of information that can be extracted from the data

How well did you know this?

Not at all

Perfectly

What are the 3 most frequent caracterizations of data based on their format?

-Structured Data
-Semi-structured Data
-Unstructured Data

How well did you know this?

Not at all

Perfectly

What are the 4 data processing velocities?

-Scheduled
-Periodic
-Near real-time
-Real-time

How well did you know this?

Not at all

Perfectly

What is AWS Lake Formation?

A service that simplifies ingesting, cleaning, cataloging, transforming, and securing data on S3 Data Lakes.

How well did you know this?

Not at all

Perfectly

AWS’ main data warehousing solution is called AWS __________

Redshift

How well did you know this?

Not at all

Perfectly

Complete the following statement regarding ETL on AWS:
When looking at a standard, simplified ETL pipeline on AWS, one should use ________. For customized processes, however, one should use __________.

-Glue
-EMR

How well did you know this?

Not at all

Perfectly

What are the 4 main functions of a Data Lake?

-Ingest and store data
-Catalog data for searches
-Secure and protect data
-Allow analytics and insights to be run

How well did you know this?

Not at all

Perfectly

What are the 6 stages on an Analytics pipeline?

-Data Source
-Ingestion
-Data Store
-Cataloging and processing stage
-Search and analytics stage
-Visualization stage

How well did you know this?

Not at all

Perfectly

What are the 3 main challenges in mantaining a data lake?

-Data governance
-Data quality
-Security

How well did you know this?

Not at all

Perfectly

What are AWS Lake Formation’s 4 main features?

-Automate building the data lake environment (collecting, moving, cleansing data, etc)
-Store metadata from raw and processed datasets
-Orchestrate ETL jobs, crawlers and triggers using AWS Glue
-Centralize access control to the data lake

How well did you know this?

Not at all

Perfectly

Lake formation security model consists of 3 security roles to be used in managing the lake. What are they?

-Lake formation administrator
-Database Creator
-Table Creator

How well did you know this?

Not at all

Perfectly

What permissions does the Lake Formation administrator have?

-Has full read access to resources
-Has data location permissions
-Can grant or revoke access to resources, including self
-Can create databases
-Can grant permission to create databases

How well did you know this?

Not at all

Perfectly

What permissions does the Lake Formation Database Creator have?

-Has all database permissions on databases that they create
-Has permissions on tables that they create
-Can use console or API to designate database creators

How well did you know this?

Not at all

Perfectly

What permissions does the Lake Formation Table Creator have?

-Has permissions on tables that they create
-Can grant permissions on tables that they create
-Can view databases containing the tables that they create

How well did you know this?

Not at all

Perfectly

The AWS Service for Data Mesh is called __________

DataZone

How well did you know this?

Not at all

Perfectly

What are the main Kinesis services?

Study These Flashcards

-Kinesis Data Streams
-Kinesis Video Streams
-Kinesis Firehose
-Kinesis Analytics

What are the default and the max Kinesis Data Streams data retention?

Study These Flashcards

-Default: 24 hours
-Max: 365 days

True or False: Kinesis Data Streams data can only be deleted using the AWS SDK

False, it cannot be deleted at all

What are the 2 Kinesis Data Streams provisioning types?

-On-demand -Provisioned

Kinesis data streams data is stored inside ______ that make up _______

-Shards -Partitions

What are the Kinesis Data Streams Shard parts?

-Partition key: Indicates the shard's partition -Sequence: Indicate the sard location inside the partition -Data: Contains up to 1MB of data

What are the Kinesis Data Streams data producers?

-The AWS SDK -The AWS Kinesis Agent -The Kinesist Production Library (KPL)

True or False: A Kinesis Data Stream shard can have multiple producers and consumers

False, 1 producer, multiple consumers

What are the Kinesis Data Streams data consumers?

-AWS SDK -Kinesis Client Library -Lambda

How much data can you write to Kinesis Data Streams per second?

-1MB/s or 1000 messages/s per shard

How much data can you read from Kinesis Data Streams per second?

-2MB/s or 5 API calls per second per shard

True or False: When using Consumer Enhanced Fan-Out Kinesis Data Streams, You cannot read data using APIs since it's a push model

True

AWS Kinesis Firehose can send data to 3 possible destinations: AWS, 3rd party partner or Custom (Any HTTP endpoint). What are the possible AWS destinations for Kinesis Firehose Data?

-S3 -Redshift -OpenSearch

Whats the minimum latency fo Kinesis Firehose to write data to it's destination?

60s

True or False: Kinesis Firehose updates data in real time

False, near real time (60s delay at least)

What AWS Service can you use to peform custom treatments on AWS Kinesis Firehose Data? (Select one): -Glue -Lambda -EMR -Sagemaker

-Lambda

How does Kinesis Firehose Buffers work?

The buffer has 2 main parameters BufferSize (in MB) and BufferTime (in seconds). It only writes the data when one of those buffer values is reached

True or False: Kinesis Firehose stores data that passes though it for 7 days

False, it does not store data at all

What are the main use cases for Kinesis Data Analytics?

-Streaming ETL (only simple transformations) -Continuous Metric Generation -Responsive analytics for certain metrics

What languages does Kinesis Data Analytics accept?

Flink and SQL

True or False: You can use AWS Lambda do pre-process Kinesis Data Analytics data

True

What's the name of the AWS Service used to run Apache Kafka inside of AWS

Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Amazon MSK has both provisioned and serverless settings

True

What are the differences between Kinesis Data Streams and Amazon MSK regarding: -Message Size -Data organization -Structure resizing -Cryptography

-KDS has a max message size of 1MB, while MSK has a default size of 1MB that can be increased up to 10MB -KDS has data streams with shards, while MSK has Kafka Topics with Partitions -KDS accepts shard splitting and merging, while MSK can only add partitions to a topic - Both support KMS and TLS, but MSK also supports Plaintext in-flight encryption

What are the accepted data consumers for Amazon MSK?

-Lambda -Glue -Kinesis Analytics -Custom applications running on EC2, ECS, etc

What EMR use cases?

Big Data ML, Big Data Processing, etc

True or False: EMR means Elastic MapReduce, and it creates Fargate Hadoop clusters to analyze and process vast amounts of data

False, EMR runs on EC2 clusters

What are the EMR node types and what are their functions?

- Master Node: Manage the cluster, coordinate, manage health - Core Node: Run tasks and store data - Task Node (Optional): Just runs task, usually Spot instances

What are the EMR purchasing options?

-On-demand -Reserved (Min 1 year) -Spot Instances

What are the types of EMR Instance Groups?

-Uniform instance groups: All nodes have same instance type and configurations -Instance fleet: select target capacity, mix instance types and purchasing options

True or False: EMR has no auto-scaling for both EMR Instance Groups

False, Uniform Instance Groups have Auto-Scaling

True or False: AWS Glue is fully serverless

True

What is Amazon Quicksight

It's a BI tool offered by AWS to visualize data on multiple different sources

To control acess to dashboard, Quicksight uses _________ and ________

Users and Groups

Analytics Flashcards

(56 cards)