Understanding Database Analytics v2 Flashcards

1
Q

CRM stands for

A

Customer relationship management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

This database is a resource containing all client information collected, governed, transformed, and shared across an organization. It includes marketing and sales reporting tools, which are useful for leading sales and marketing campaigns and increasing customer engagement.

A

Customer relationship management database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

SCM stands for

A

Supply Chain Management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

is management of the flow of goods, data, and finances related to a product or service, from the procurement of raw materials to the delivery of the product at its final destination.

A

Supply Chain Management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

ETL stands for

A

Extract, Transform and Load

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  • Also known as an enterprise data warehouse, is a system used for reporting and data analysis and is considered a core component of business intelligence.
  • Are central repositories of integrated data from one or more different sources.
A

Data Warehouse (DW)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DW stands for

A

Data warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A centralized repository designed to store, process, and secure large amounts of structured, semistructured, and unstructured data. It can store data in its native format and process any variety of it, ignoring size limits.

A

Data lake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

OLAP stands for

A

Online Analytical Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  • A computing method that enables users to easily and selectively extract and query data in order to analyze it from different points of view.
  • _____ business intelligence queries often aid in trends analysis, financial reporting, sales forecasting, budgeting and other planning purposes.
A

OLAP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

OLTP stands for

A

Online Transaction Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A type of data processing that consists of executing a number of transactions occurring concurrently—online banking, shopping, order entry, or sending text messages, for example.

A

Online Transaction Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

GSP stands for

A

Generalized Sequential Pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

An algorithm used for sequence mining. The algorithms for solving sequence mining problems are mostly based on the apriori (level-wise) algorithm.One way to use the level-wise paradigm is to first discover all the frequent items in a level-wise fashion.

A

GSP algorithm (Generalized Sequential Pattern algorithm)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

For frequent item set mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database.

A

Apriori algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

An integrated system or database that enables the user to instantly analyze internal data and external data generated by the operation system of an enterprise over time, without the need for separate programming from multiple points of view, by integrating data by subject.

A

Data warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Characteristics of the data warehouse (4)

A
  • Subject-oriented
  • Integrated
  • Time variant
  • Non-volatile
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Characteristics of the data warehouse

Among the multiple types of operation system data that are managed by data business functions, the data of a specific subject needed for decision-making activities from an enterprise perspective is saved, while other data are not included.

A

Subject-oriented

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Characteristics of the data warehouse

  • The structure of a data warehouse is characterized by data consistency and physical unity through company-wide data standardization.
  • When obtaining data from the operation system, a series of data conversion tasks are performed to integrate the data.
A

Integrated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Characteristics of the data warehouse

  • To analyze past and present trends and forecast the future, a data warehouse retains data for a long time in the form of a series of snapshots.
  • Users can understand the process of data change over time using the data history.
A

Time variant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Characteristics of the data warehouse

  • A data warehouse is a read-only database that cannot be deleted or modified once it has been loaded from the operation system database.
  • When a modification occurs in the operation system data, existing data are deleted. The data in the data warehouse stores the history of data at each point in time.
A

Non-volatile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Refers to a data modeling technique that, from a data analysis perspective, enables users to analyze large-scale data from various viewpoints.

A

Data warehouse modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Are generally organized into fact tables and dimension tables so that end users or analysts can easily analyze information.

24
Q

Components of Data warehouse modeling (2)

A
  • Fact table
  • Dimension table
25
* **A core table composed of a set of highly relevant measures.** * As measurement data, a Measure can observe the goal of information analysis, such as the amount, number, time, etc.
Fact table
26
* A sub-table and a perspective of . analyzing each fact. * **A ______ has multiple attributes, thus allowing data analysis from diverse perspectives.**
Dimension table
27
* **In this schema, has data duplication but its easy to understand.** * A modeling technique for designing data by separating it into fact tables and dimension tables. * Data duplication occurs because dimension table data are not normalized. * The schema is easy to understand and has few joins, thus improving query performance, but data consistency problems may occur.
Star schema
28
* **In this schema, data are normalized to limit data duplication.** * A modeling technique for completely normalizing the dimension table of the star schema. * Data duplication is rare, and fewer storage spaces are used owing to the normalization of the dimensional table, but there is some concern about performance degradation due to the greater number of joins compared to the star schema.
Snowflake schema
29
**Refers to the entire process by which data are extracted from the source system and stored in the data warehouse after cleansing and conversion.** It plays the role of maintaining data consistency and integrity among the components of the data warehouse.
ETL
30
ETL and is also called
ETT (Extraction Transformation, Transportation).
31
* **The phase in which data are extracted from the original file or operating system database and stored in the data warehouse.** * In the past, data were extracted on a daily or monthly basis, but in some recent cases, data were extracted in real-time using database logs according to the business requirements.
Extraction/Extract
32
* **A phase in which extracted data are cleaned and converted into a data format suitable for the data warehouse.** * In the event of data quality problems, data are cleansed according to the reference data or business rules. * The original data format is converted into a data format suitable for the data warehouse.
Transformation/Transform
33
* **A phase in which converted data are sent to the warehouse for storage, and the necessary indexes are generated.** * Full and partial update techniques are available.
Loading/Load
34
E-COMMERCE (CUSTOMER DATA INTEGRATION) ETL EXAMPLE * ____ - The transformed data is loaded into a centralized CRM system or data warehouse for personalized marketing campaigns and customer analytics. * ____ - The data is cleaned (removing duplicates), formatted into a standard structure, and enriched by adding customer segmentation based on purchase history. * ____ - An online retailer collects customer data from multiple sources—website purchases, mobile app activity, and third-party marketing tools.
1. Load 2. Transform 3. Extract
35
HEALTHCARE (PATIENT RECORDS MANAGEMENT) ETL EXAMPLE * ____ - The data is standardized into a uniform format, errors are corrected (e.g., duplicate patient entries), and missing values are handled. * ____ - A hospital gathers patient information from different departments (lab tests, doctor visits, billing records). * ____ - The cleaned data is stored in an Electronic Health Records (EHR) system, enabling doctors to access a patient's complete history for better treatment decisions.
1. Transform 2. Extract 3. Load
36
BANKING & FINANCE (FRAUD DETECTION AND RISK ANALYSIS) ETL EXAMPLE * ____- A bank collects transaction data from ATMs, mobile banking apps, and online banking portals. * ____ - The processed data is stored in a fraud detection system, alerting the security team to prevent unauthorized transactions. * ____ - The system analyzes the data, categorizes transactions, and flags suspicious activity using machine learning models.
1. Extract 2. Load 3. Transform
37
SOCIAL MEDIA (USER ENGAGEMENT ANALYTICS) ETL EXAMPLE * ____ - The final dataset is loaded into an analytics dashboard for insights on user engagement trends. * ____ - A social media company gathers user activity data from different platforms (likes, comments, shares). * ____ - The data is aggregated, sorted by user demographics, and cleaned for errors (e.g., bot activity removal).
1. Load 2. Extract 3. Transform
38
RETAIL (INVENTORY MANAGEMENT) ETL EXAMPLE * ____ - A supermarket pulls inventory data from POS systems, supplier databases, and warehouse stock reports. * ____ - The data is reconciled to check for discrepancies, outdated items are removed, and stock levels are updated. * ____ - The final data is pushed to the inventory management system, ensuring accurate stock tracking and restocking alerts.
1. Extract 2. Transform 3. Load
39
Refers to the process by which the end user accesses multi-dimensional information without an intermediary or medium, and then analyzes the information interactively and uses it for decision-making. That is, when the operational data extracted and converted by ETL are stored in the data warehouse or data mart, the end user analyzes them using _____.
OLAP
40
**Provides various search techniques that allow end users to analyze data from diverse perspectives and summary levels.**
OLAP
41
Main OLAP search techniques (6)
1. Drill down 2. Roll up 3. Drill across 4. Pivot 5. Slice 6. Dice
42
* **A search technique that approaches a specific analysis topic in phases from a high summary level to a low (detail) summary level.** * E.g., Time Dimension: Year → month → day
Drill Down
43
* Concept opposite to Drill Down. * **A search technique that approaches a specific analysis topic in phases from a low summary level to a high summary level.** * E.g., Time Dimension: Day → month → year
Roll up
44
**A search technique that uses a certain analysis viewpoint on one analysis topic to approach another analysis topic.**
Drill across
45
**A search technique that changes the axis of the analysis perspective on a specific analysis topic.**
Pivot
46
**A search technique that creates subsets by selecting specific values for the members at one level or the members above that level.**
Slice
47
**A search technique that creates subsets by slicing more than two dimensions.**
Dice
48
Refers to **a series of processes that identify a systematic statistical rule or pattern among a large amount of data, convert it into meaningful information, and apply it to corporate decision-making.**
Data mining
49
Major data mining algorithms (4)
1. Association 1. Sequence 1. Classification 1. Clustering
50
* **An analysis algorithm that discovers a pattern using a combination of highly relevant data in transaction data, etc.** * Apriori algorithm). * This algorithm is mainly used to place products by analyzing offline stores and to recommend related products automatically at online shopping malls, etc.
Association
51
* An analysis algorithm that searches for the correlation of items over time by adding the concept of time to association analysis. * **The possibility of a given transaction occurring in the future is forecast by performing time series analysis on transaction history data** * Apriori algorithm, Generalized Sequential Patterns (GSP), etc.
Sequence
52
* **An analysis algorithm that creates a tree-type model, which classifies the values (category values) of a specific attribute (category type) by analyzing a dataset when it is given.** * Decision tree algorithm, etc.
Classification
53
A plan that includes a root node, branches, and leaf nodes. Every internal node characterizes an examination of an attribute, each division characterizes the consequence of an examination, and each leaf node grasps a class tag.
Decision tree
54
* **An analysis algorithm that groups records with similar attributes, by considering several attributes of given records (customers, products).** * K-Means algorithm, EM algorithm, etc.
Clustering
55
A technique for data clustering that may be used for unsupervised machine learning. It is capable of classifying unlabeled data into a predetermined number of clusters based on similarities (k).
K- Means algorithm
56
An approach for maximum likelihood estimation in the presence of latent variables. It is a general technique for finding maximum likelihood estimators in latent variable models.
Expectation-Maximization Algorithm (EM Algorithm)