Technical Interview 5 Flashcards

1
Q

Analytics: What is Deep Learning?

A

This is a subset of machine learning that involves systems that think and learn like humans using artificial neural networks. Enables machines to take decisions with the help of artificial neural networks. It needs a large amount of training data. The machine learns the features from the data it is provided

Rekognition Image is based on deep learning technology, and it uses deep neural network models to detect and label thousands of objects and scenes in user’s images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Analytics: What is Streaming Data?

A

It’s data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in small sizes

(You can use Kinesis Data Streams or Firehose)

  • telemetry from connected devices (Sensors in transportation vehicles)
  • log files generated by customers using web applications
  • ecommerce transactions
  • information from social networks (clickstream records)
  • geospatial services.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Analytics: What is Hadoop?

A

Amazon Elastic MapReduce (EMR)

It’s an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Compute - Serverless: What is Serverless computing?

A

It’s being able to build and run applications without having to think about provisioning servers (like EC2). It’s software as a service

Compute:

  • AWS Lambda
  • AWS Fargate

Integration:

  • Amazon SQS
  • Amazon SNS

Data Stroe:

  • S3
  • DynamoDB
  • Aurora Serverless
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Storage & Content Delivery: What is File type storage?

A

For example - Elastic File Store

It’s a hierarchical storage methodology used to organize and store data on a computer hard drive or on network-attached storage (NAS) device

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Storage & Content Delivery: What is Block type storage?

A

Example is Elastic Block Store

Block-level storage, is a technology that is used to store data files on Storage Area Networks (SANs) or cloud-based storage environments. It’s usually used in computing situations where you require fast, efficient, and reliable data transportation.

Block storage breaks up data into blocks and then stores those blocks as separate pieces, each with a unique identifier. The SAN places those blocks of data wherever it is most efficient. That means it can store those blocks across different systems and each block can be configured (or partitioned) to work with different operating systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Storage & Content Delivery: What is Object type storage?

A

Example is S3

Object-based storage, is a data storage architecture for handling large amounts of unstructured data. This is data that does not conform to, or cannot be organized easily. Each object is a simple, self-contained repository that includes the data, metadata (descriptive information associated with an object), and a unique identifying ID number. In the case of S3 it uses a key-based object store

Objects are discrete units of data that are stored in a structurally flat data environment. There are no folders, directories, or complex hierarchies as in a file-based system. Each object is a simple, self-contained repository that includes the data, metadata (descriptive information associated with an object), and a unique identifying ID number (instead of a file name and file path). This information enables an application to locate and access the object. You can aggregate object storage devices into larger storage pools and distribute these storage pools across locations. This allows for unlimited scale, as well as improved data resiliency and disaster recovery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Storage & Content Delivery: What is a CDN?

A

Example is CloudFront

Content Delivery Network - for example - CloudFront. It enables users to get content from servers that are geographically closest to them. In the case of cloud front, these would be the Edge Location (possibly reaching out to the Regional Edge Caches)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Database: What is a Relational Database?

A

(Examples: Aurora, Oracle, MySQL)

A relational database is a collection of data items with pre-defined relationships between them.

The items are are organized as sets of tables with columns and rows

  • Tables (made of up many objects and it’s fields)
    • Rows (collection of related values of one object)
    • Columns (field/certain kind of data)

Primary Key and Foreign Keys
- a unique identifier called a primary key, and rows among multiple tables can be made related using foreign keys. This data can be accessed in many different ways without reorganizing the database tables themselves.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Database: What is a NoSQL database?

A

Examples are: DynamoDB

A (nonrelational) database that does not use the relational database model and instead opts for a design that allows for flexible schema, is good for when data is unstructured/unpredictable, doesn’t have related tables within the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Database: What is a data warehouse?

A

Example - Redshift

A central repository of information that can be analyzed to make more informed decisions.

Data flows into a data warehouse from transactional systems, relational databases, and other sources, typically on a regular cadence.

Business analysts, data engineers, data scientists, and decision makers access the data through business intelligence (BI) tools, SQL clients, and other analytics applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Database: What is a data lake?

A

Example - sending data to be stored in S3

A data lake is a centralized repository for all data, including structured, semi-structured, and unstructured. Any data that may or may not be curated (i.e. raw data).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Deployment & Management: What is Code Management?

A

Example is CodeCommit

Tracking the modifications to code. Tracking modifications assists development and collaboration by providing a running history of development and helping to resolve conflicts when merging contributions from multiple sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Administration and Security: What is Monitoring?

A

Logging, reporting, and analysis of logs to provide visibility and security insights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Administration and Security: What is Identity and Access Management?

A

Help define and manage user identity, access policies, and entitlements. Helps enforce business governance including, user authentication, authorization, and single sign-on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Storage & Content: What is columnar storage?

A

Columnar storage is storing data from a single column sequentially in a storage block. In an example with 3 rows - When I/O is used to read the block, you only have to use a fraction of the I/O (1/3rd) since it only has to read the 1 block instead of reading 3 entire rows which likely span multiple blocks. This is extremely beneficial when querying based on specific columns of data, and since the block contains the same type of data - the type of compression can be uniquely selected for the data type that is being stored.

17
Q

Analytics: What is Machine Learning?

A

Machine learning (ML) is a subset of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use historical data as input to predict new output values.

Enables machines to take decisions on their own, based on past data. It needs only a small amount of data for training. Most features need to be identified in advance and manually coded

Examples of things that are built upon machine learning: Rekognition, Transcribe, Macie

18
Q

What is a key-based object store?

A

Amazon S3 - When you store data, you assign a unique object key that can later be used to retrieve the data. Keys can be any string, and they can be constructed to mimic hierarchical attributes. Alternatively, you can use S3 Object Tagging to organize your data across all of your S3 buckets and/or prefixes.

19
Q

Database: What is ACID? And what does each letter mean?

A

Atomic - Transactions are wholly successful, otherwise it fails (all or nothing)

Consistent - data written to DB must adhere to rules and restrictions (constraints, cascades,triggers)

Isolated - make sure each transaction is independent unto itself. No transaction will be affected by any other transaction that hasn’t been completed.

Durable - once a transaction is committed, it will remain in the system – even if there’s a system crash immediately following the transaction.

20
Q

Database: What is an index?

A

An index is a pointer to data in a table. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed.

An index in a database is very similar to an index in the back of a book.

21
Q

Database: What is a table scan?

A

A table scan is the reading of every row in a table and is caused by queries that don’t properly use indexes. Table scans on large tables take an excessive amounts of time and cause performance problems.

22
Q

Database: What are the benefits of using a data warehouse??

A

Benefits of a data warehouse include the following:

Informed decision making

Consolidated data from many sources

Historical data analysis

Data quality, consistency, and accuracy

Separation of analytics processing from transactional databases, which improves performance of both systems

23
Q

Data Warehouse: OLTP vs OLAP?

A

Online Transaction Processing - Production DB with simple live transactions - like Amazon RDS, Aurora, DynamoDB

Online Analytics Processing - Data Warehouse, used for complex queries and data analysis - Like Redshift and Amazon EMR

24
Q

Deployment & Management: What is Version Control?

A

Example is CodeCommit

Version control, also known as source control, is the practice of tracking and managing changes to software code. Version control systems are software tools that help software teams manage changes to source code over time.