Data and Analytics Flashcards
What is Amazon Athena?
A server-less query service used to analyse data stored in Amazon S3 with SQL queries
What are federated queries?
Queries that can be run across multiple data sources than just what is in S3, such as relational, non-relational, object and custom data sources
What is 1 method that can be used to increase the performance of Athena?
Partitioning
Using columnar data
Use larger files as these are easier to scan and retrieve for Athena
What is Amazon Redshift used for?
Data warehousing and analytics
Is Redshift columnar or row-based?
Columnar
What engine is Redshift based on?
PostgreSQL
What are the two snapshot modes of Redshift and what are the differences?
Automated and manual.
With automated, the snapshot is retained for a period that the user sets, whereas with manual the snapshot is kept until it is deleted.
What are the two node types within a Redshift cluster?
Leader and compute
What is Redshift Spectrum?
A service that allows the user to query data that is already in S3 without having to load it
What is the principal benefit of Redshift spectrum?
It allows the user to leverage a lot more computing power than they actually have provisioned and for the avoidance of having to actually load the S3 data
What is OpenSearch?
A service that allows the user to search any field, including partial matches, of a database
What is EMR?
Elastic Map Reduce - a service that allows the user to create Hadoop clusters for big data analytics
How does EMR scale?
Automatically, through the provisioning of additional clusters
What are the node types within an EMR cluster?
Master, core and task.
Master nodes manage the cluster and co-ordinate the other nodes. There is only 1 in a cluster.
Core nodes run tasks and store data.
Tasks nodes are optional and just run tasks but don’t store data.
What service would be used to make ML-powered interactive dashboards?
QuickSight