Data Platforms Flashcards

1
Q

Data-Driven Innovation

A

Refers to the use of analytics to drive innovation and business value from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Analytics

A

In this context, we mean the different types of business intelligence initiatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Advanced Analytics

A

Semi-autonomous examination of data to get deeper insights (Machine Learning)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Augmented Analytics

A

Augment how people explore data with the incorporation of AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Database

A

Structured and persistent collection of information with efficient retrieval and modification (relational databases)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Warehouse

A

Subject oriented collection of data that supports decision making processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

OLTP

A

Constant queries and updates, short term data retention. (Accounting database, online retail transactions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

OLAP

A

Periodic large updates, complex queries for reporting/decision support

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data Lake

A

Central repository system where data is kept in various original formats, unstructured, semi-structured, structured and queried only when needed.

Supports storage, processing and analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What kind of users use Data Warehouses vs Data Lakes

A

Business analysts

Vs

Data scientists, data developers, and business analysts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What kind of users use Data Warehouses vs Data Lakes

A

Business analysts

Vs

Data scientists, data developers, and business analysts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Platform

A

Meets end-to-end data needs such as acquisition, storage, preparation, delivery, governance and security so users ONLY focus on functional aspects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we prevent DP from becoming a swamp?

A

We MUST govern data transformations and leverage metadata and maintenance to keep control over data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are 5 areas of data management? (PCPED) Plankton chokes Patrick every day

A
  1. Data provenance
  2. Compression
  3. Data profiling
  4. Entity resolution
  5. Data versioning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data Provenance

A

Descriptions of origins of data and process by which it arrives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Data Provenance Granularity

A

Fine-grained (instance level)
Coarse-grained (schema level)

Tracking items vs dataset transformations

17
Q

Three levels of data provenance (EAA)

A

Entity (physical/conceptual thing)
Activity (what generated the thing)
Agent (associated with the activity)

18
Q

Compression

A

Concise representation of a dataset in a comprehensible manner

19
Q

Data profiling

A

Analyzing the structure and quality of a dataset ?

Scanned for metadata, completeness and uniqueness of columns, keys and foreign keys

20
Q

Two things data profiling can help with

A
  1. Optimizing queries
  2. Cleansing (errors in data)
21
Q

Entity resolution

A

Find records that refer to the same entity

22
Q

Version Control

A

Managing changes to computer programs/data collections with a code as the version number.

23
Q

Data versioning

A

Version control that extends to data models, model parameter tracking and performance comparison

24
Q

Data lakehouse

A

Flexibility of data lakes and structure of data warehouses (ACID transactions) to combine BI and ML

Vendor lock in…?

25
Q

Data Platform Engineer job description

A

Implement cloud technologies within data structure of business, in charge of purchasing decisions for cloud services and approval of data architectures

26
Q

DevOpS

A

Enable software DEVeleopment and operations teams to accelerate delivery with collaboration and iterative improvement

27
Q

DataOps

A

Use automation to shorten data analytic lifecycle

28
Q

Data fabric

A

Seamless data access and sharing in distributed environment

Fabric is smooth, unified surface

29
Q

Data mesh

A

Decentralized, distributed governance and domains owning data products.

Mesh is a grid like surface with interconnected “nodes”/“domains”