Core data concepts (15% - 20%) Flashcards

1
Q

What is data?

A

A collection of facts such as numbers, descriptions and observations used in decision making. Values used to record information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three types of data?

A

Structured (tabular, csv), semi-structured (JSON) and unstructured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is structured data?

A

Structured data is typically tabular data, represented by columns and rows in a database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are relational databases?

A

Databases that hold tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is stream processing?

A

Processing data as it arrives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is batch processing?

A

The processing of groups (or batches) of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the differences between batch and stream processing?

A

Batch processing can process all the data in a dataset. Stream processing only processes the newest piece of data, or data within a rolling window (30 seconds).

Batch processing is more suitable for processing large datasets. Stream processing is for micro batches containing only a few records.

Batch processing can have a take a few hours to complete, so its latency is pretty high. Stream processing is relatively instantaneous with latency in the milliseconds or seconds.

Batch processing is typically used for complex analytic workloads. Stream processing is used for simple response functions, aggregates and calculations like rolling averages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How are batch processes run? Give an example.

A

A batch process is usually triggered by an event such as a certain amount of data collected, a scheduled time (e.g. 3pm every Tuesday) or some other trigger.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a transactional system?

A

A system that records transactions. (E.g. an online store has a transactional system that records orders and payments.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a transaction?

A

A small, discrete, unit of work.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is data ingestion?

A

The processing of capturing raw data for immediate use or storage in a database.

Depending on the source, the data may arrive as a a stream or in batches.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is data processing?

A

The cleaning and conversion of raw data into a more useful format (tables, graphs, documents, and so on).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is data transformation?

A

The process of transforming data from one format to another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are seven of the common tools Database Administrators use?

A
SSMS (SQL Server Management Studio)
pgAdmin for PostgreSQL systems
MySQL Workbench for MySQL databases. 
Azure Data Studio for a number of database engines.
The Azure portal
The Azure CLI
sqlcmd
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some of the common tools a Data Engineer uses?

A

SQL
Azure Synapse Studio
Azure CLI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some of the tools Data Analysts use?

A

Power BI, which includes: Power BI Desktop, Power BI Services, Power BI Embedded and Power BI Report.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does a Database Administrator do?

A

A database administrator (DBA) manages, maintains and optimises databases, manages security, grants users and apps access, backups the database and monitors database performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is an entity?

A

An entity is a thing about which information needs to be known or held.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the main characteristics of a relational database?

A

1) All data is tabular
2) All rows in the same table have the same set of columns
3) A table can contain any number of rows.
4) A primary key uniquely identifies each row in a table. No two rows can share the same PK.
5) A foreign key references rows in another, related table. For every value in the FK column, there should be a row with the same value is the corresponding PK column in the other table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is normalisation?

A

Normalisation is the process of organising data in a database.

This includes creating tables and establishing relationships between those tables according to rules designed to protect the data and making the database more flexible by eliminating redundancy and inconsistent dependency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is OLTP?

A

Online Transactional Processing (OLTP) is a category of data processing focused on transaction-oriented tasks.

It typically involves CRUD operations on small amounts of data in a database and mainly deals with a large number of transactions by a large number of users.

22
Q

What does a data analyst do?

A

A data analyst provides insights into the data, creates visual reports, models data for analysis and combines data visualization and analysis.

23
Q

What does a data engineer do?

A

A data engineer creates database pipelines and processes, manages data ingestion storage, prepares data for analytics and analytical processing.

24
Q

What is ETL?

A

ETL (Extract, Transform and Load) is a process by which data is ingested (Extract) into temporary storage that (Transform)s the raw data into a format that its final resting place can store (Load).

25
Q

What is ELT?

A

ELT (Extract, Load and Transform) is a process where data is retrieved (Extract) into storage (Load) then (Transform)ed before being sent back to storage.

26
Q

What is the purpose of descriptive analytics?

A

To help answer questions about what has happened, based on historical data.

27
Q

What is the purpose of diagnostic analytics?

A

To help answer questions about why things have happened.

28
Q

What is the purpose of predictive analytics?

A

To help answer questions about what will happen in future.

29
Q

What is the purpose of prescriptive analytics?

A

To help answer questions about what should be done to make something happen.

30
Q

What is the purpose of cognitive analytics?

A

To help to draw inferences from existing data and patterns.

31
Q

What is data visualisation?

A

The graphical representation of data and information.

32
Q

What is a pie chart good for representing?

A

How much variables contributes to a given scenario.

33
Q

What is a line chart good for representing?

A

The overall shape of an entire series of values, usually over time.

34
Q

What is reporting?

A

The process of organising data into informational summaries to monitor how different areas of an organisation is performing.

35
Q

What is BI?

A

BI (Business Intelligence) is the term for the technology, applications and practises for the collection, integration, analysis and presentation of business information.

36
Q

What is a matrix chart good for representing?

A

A summary of data in a table.

37
Q

What is the purpose of BI?

A

To support better decision making.

38
Q

What is OLAP?

A

Online Analytics Processing (OLAP) is a category of data processing focused on analysis-oriented tasks typically performed in a data warehouse.

39
Q

What is real-time processing?

A

The processing of unbound streams of input data with very short latency requirements for processing - measured in seconds or milliseconds.

40
Q

What is the difference between real-time processing and stream processing?

A

Real-time processing is about reacting to data, stream processing is about the actions taken on data.

41
Q

What is a key influencer chart good for representing?

A

The major contributors to a selected result or value.

42
Q

What is a Treemap good for representing?

A

The relative value of items of a given situation (e.g. the sales of videogames by genres and developer).

43
Q

What is a scatter chart good for representing?

A

The relationship between two numerical values.

44
Q

What is a bubble chart?

A

A scatter chart that replaces data points with bubbles, with bubble size representing an additional third data dimension.

45
Q

What is a dot chart?

A

A chart similar to a scatter and bubble chart, but can plot data along the x-axis.

46
Q

What is a Filled map good for representing?

A

How a value differs in proportion across geography or region.

47
Q

What is denormalized data?

A

Redundant data stored in different places in a database/data store.

48
Q

What is Big Data?

A

High volume and velocity data with unknown (veracity) structure from various sources.

49
Q

How is data stored?

A

Files

Databases (relational, non-relational)

50
Q

OLTP systems enforce transactions that adhere to ACID. What does this mean?

A

Atomicity: A transaction treated as a single unit, succeeds completely or fails completely.
Consistency: A transaction must take system from one valid state to another.
Isolation: Concurrent transactions cannot interfere with one another
Durability: When a transaction has been committed, it will remain committed.