Analytics Flashcards

1
Q

In Analytics there are 4 types of analysis you can make. What are their names?

A

-Descriptive analytics
-Diagnostic analytics
-Predictive analytics
-Prescriptive analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Descriptive Analytics?

A

Descriptive analytics focuses on analyzing present and past data to determine what is happening at present.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are Diagnostic Analytics?

A

Diagnostic Analystic focus on analysing data to determine for what reason something happens.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are Predictive Analytics?

A

Predictive analytics focuses on determining what might happen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are Prescriptive Analytics?

A

Prescriptive analytics are similar to predictive analytics, but instead of only predicting what might happen you also suggests actions to take and what are their consequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain Amazon CodeWhisperer

A

Amazon CodeWhisperer is an AWS AI Service that generates and comments code ussing LLM tecnology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

True or False: Amazon CodeWhisperer can detect security vulnerabilities in your code

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 5 big Vs of big data and what do they mean?

A

-Volume: The amount of date being ingested
-Variety: The number and types of data sources
-Velocity: The speed with which new data is processed and stored
-Veracity: The degree to which the data can be trusted
-Value: The amount of information that can be extracted from the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 3 most frequent caracterizations of data based on their format?

A

-Structured Data
-Semi-structured Data
-Unstructured Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 4 data processing velocities?

A

-Scheduled
-Periodic
-Near real-time
-Real-time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is AWS Lake Formation?

A

A service that simplifies ingesting, cleaning, cataloging, transforming, and securing data on S3 Data Lakes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

AWS’ main data warehousing solution is called AWS __________

A

Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Complete the following statement regarding ETL on AWS:
When looking at a standard, simplified ETL pipeline on AWS, one should use ________. For customized processes, however, one should use __________.

A

-Glue
-EMR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the 4 main functions of a Data Lake?

A

-Ingest and store data
-Catalog data for searches
-Secure and protect data
-Allow analytics and insights to be run

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the 6 stages on an Analytics pipeline?

A

-Data Source
-Ingestion
-Data Store
-Cataloging and processing stage
-Search and analytics stage
-Visualization stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the 3 main challenges in mantaining a data lake?

A

-Data governance
-Data quality
-Security

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are AWS Lake Formation’s 4 main features?

A

-Automate building the data lake environment (collecting, moving, cleansing data, etc)
-Store metadata from raw and processed datasets
-Orchestrate ETL jobs, crawlers and triggers using AWS Glue
-Centralize access control to the data lake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Lake formation security model consists of 3 security roles to be used in managing the lake. What are they?

A

-Lake formation administrator
-Database Creator
-Table Creator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What permissions does the Lake Formation administrator have?

A

-Has full read access to resources
-Has data location permissions
-Can grant or revoke access to resources, including self
-Can create databases
-Can grant permission to create databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What permissions does the Lake Formation Database Creator have?

A

-Has all database permissions on databases that they create
-Has permissions on tables that they create
-Can use console or API to designate database creators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What permissions does the Lake Formation Table Creator have?

A

-Has permissions on tables that they create
-Can grant permissions on tables that they create
-Can view databases containing the tables that they create

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The AWS Service for Data Mesh is called __________

A

DataZone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the main Kinesis services?

A

-Kinesis Data Streams
-Kinesis Video Streams
-Kinesis Firehose
-Kinesis Analytics

24
Q

What are the default and the max Kinesis Data Streams data retention?

A

-Default: 24 hours
-Max: 365 days

25
Q

True or False: Kinesis Data Streams data can only be deleted using the AWS SDK

A

False, it cannot be deleted at all

26
Q

What are the 2 Kinesis Dat Streams provisioning types?

A

-On-demand
-Provisioned

27
Q

Kinesis data streams data is stored inside ______ that make up _______

A

-Shards
-Partitions

28
Q

What are the Kinesis Data Streams Shard parts?

A

-Partition key: Indicates the shard’s partition
-Sequence: Indicate the sard location inside the partition
-Data: Contains up to 1MB of data

29
Q

What are the Kinesis Data Streams data producers?

A

-The AWS SDK
-The AWS Kinesis Agent
-The Kinesist Production Library (KPL)

30
Q

True or False: A Kinesis Data Stream shard can have multiple producers and consumers

A

False, 1 producer, multiple consumers

31
Q

What are the Kinesis Data Streams data consumers?

A

-AWS SDK
-Kinesis Client Library
-Lambda

32
Q

How much data can you write to Kinesis Data Streams per second?

A

-1MB/s or 1000 messages/s per shard

33
Q

How much data can you read from Kinesis Data Streams per second?

A

-2MB/s or 5 API calls per second per shard

34
Q

True or False: When using Consumer Enhanced Fan-Out Kinesis Data Streams, You cannot read data using APIs since it’s a push model

A

True

35
Q

AWS Kinesis Firehose can send data to 3 possible destinations: AWS, 3rd party partner or Custom (Any HTTP endpoint). What are the possible AWS destinations for Kinesis Firehose Data?

A

-S3
-Redshift
-OpenSearch

36
Q

Whats the minimum latency fo Kinesis Firehose to write data to it’s destination?

A

60s

37
Q

True or False: Kinesis Firehose updates data in real time

A

False, near real time (60s delay at least)

38
Q

What AWS Service can you use to peform custom treatments on AWS Kinesis Firehose Data? (Select one):
-Glue
-Lambda
-EMR
-Sagemaker

A

-Lambda

39
Q

How does Kinesis Firehose Buffers work?

A

The buffer has 2 main parameters BufferSize (in MB) and BufferTime (in seconds). It only writes the data when one of those buffer values is reached

40
Q

True or False: Kinesis Firehose stores data that passes though it for 7 days

A

False, it does not store data at all

41
Q

What are the main use cases for Kinesis Data Analytics?

A

-Streaming ETL (only simple transformations)
-Continuous Metric Generation
-Responsive analytics for certain metrics

42
Q

What languages does Kinesis Data Analytics accept?

A

Flink and SQL

43
Q

True or False: You can use AWS Lambda do pre-process Kinesis Data Analytics data

A

True

44
Q

What’s the name of the AWS Service used to run Apache Kafka inside of AWS

A

Amazon Managed Streaming for Apache Kafka (Amazon MSK)

45
Q

Amazon MSK has both provisioned and serverless settings

A

True

46
Q

What are the differences between Kinesis Data Streams and Amazon MSK regarding:
-Message Size
-Data organization
-Structure resizing
-Cryptography

A

-KDS has a max message size of 1MB, while MSK has a default size of 1MB that can be increased up to 10MB
-KDS has data streams with shards, while MSK has Kafka Topics with Partitions
-KDS accepts shard splitting and merging, while MSK can only add partitions to a topic
- Both support KMS and TLS, but MSK also supports Plaintext in-flight encryption

47
Q

What are the accepted data consumers for Amazon MSK?

A

-Lambda
-Glue
-Kinesis Analytics
-Custom applications running on EC2, ECS, etc

48
Q

What EMR use cases?

A

Big Data ML, Big Data Processing, etc

49
Q

True or False: EMR means Elastic MapReduce, and it creates Fargate Hadoop clusters to analyze and process vast amounts of data

A

False, EMR runs on EC2 clusters

50
Q

What are the EMR node types and what are their functions?

A
  • Master Node: Manage the cluster, coordinate, manage health
  • Core Node: Run tasks and store data
  • Task Node (Optional): Just runs task, usually Spot instances
51
Q

What are the EMR purchasing options?

A

-On-demand
-Reserved (Min 1 year)
-Spot Instances

52
Q

What are the types of EMR Instance Groups?

A

-Uniform instance groups: All nodes have same instance type and configurations
-Instance fleet: select target capacity, mix instance types and purchasing options

53
Q

True or False: EMR has no auto-scaling for both EMR Instance Groups

A

False, Uniform Instance Groups have Auto-Scaling

54
Q

True or False: AWS Glue is fully serverless

A

True

55
Q

What is Amazon Quicksight

A

It’s a BI tool offered by AWS to visualize data on multiple different sources

56
Q

To control acess to dashboard, Quicksight uses _________ and ________

A

Users and Groups