Wk3:Chap3 - Data management, big data analytics, and records management Flashcards

1
Q

Databases?

A
  • Collections of data sets or records stored in a systematic way.
    Stores data generated by business apps, sensors, operations, & transaction-processing systems (TPS).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Warehouses

A

Integrate data from multiple databases and data silos, and organize them for complex analysis, knowledge discovery, and to support decision making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Marts

A
  • Small-scale data warehouses that support a single function or one department.
  • Enterprises that cannot afford to invest in data warehousing may start with one or more data marts.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Business intelligence (BI)

A

Tools and techniques that process data and conduct statistical analysis for insight and discovery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Database Management System (DBMS)

A
  • Integrate with data collection systems such as TPS and business applications.
  • Stores data in an organized way.
  • Provides facilities for accessing and managing data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Relational Management System (DBMS)

A

Provides access to data using a declarative language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Declarative Language

A
  • Simplifies data access by requiring that users only specify what data they want to access without defining how they will be achieved.
  • Structured Query Language (SQL) is an example of a declarative language:
    SELECT column_name(s)
    FROM table_name
    WHERE condition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DBMS Functions

A
  • Data filtering and profiling: Check for errors/ Inconsistencies and redundancies
  • Data integrity and maintenance: Consistency
  • Data synchronization: Integration
  • Data security: Data Integrity over time
  • Data access: Authorisation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Latency

A

The delay or time elapsed between when data is created and when it is available for reporting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Online Transaction Processing (OLTP)

A
  • DBMSs record and process transactions and supports queries
  • Designed to manage transaction data, which are volatile & break down complex information into simpler data tables to strike a balance between transaction-processing efficiency and query efficiency.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Online Analytics Processing (OLAP)

A
  • A means of organizing large business databases.

- Divided into one or more cubes that fit the way business is conducted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Dirty Data

A
  • Lacks integrity/validation and reduces user trust.
  • Incomplete, out of context, outdated, inaccurate, inaccessible, or overwhelming.
  • Need for integrity checks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data Life Cycle: Model illustrating how data travels throughout an organisation

A
  1. Principle of Diminishing Data Value
    - The value of data diminishes as they age.
    - Blind spots (lack of data availability) of 30 days or longer inhibit peak performance.
    - Global financial services institutions rely on near-real-time data for peak performance.
  2. Principle of 90/90 Data Use
    - As high as 90 percent, is seldom accessed after 90 days (except for auditing purposes).
    - Roughly 90 percent of data lose most of their value after 3 months.
  3. Principle of data in context
    - The capability to capture, process, format, and distribute data in near real time or faster requires a huge investment in data architecture.
    - The investment can be justified on the principle that data must be integrated, processed, analyzed, and formatted into “actionable information.”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Master Reference File and Data Entities:

A

As data volumes explode database performance degrades.
Solution = Master Data and Master Data Management (see chapter 2)
MDM processes integrate data from a variety of sources to create a more complete view of an entity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Market share

A

Percentage of total sales in a market captured by a brand, product, or company.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Operating Margin

A
  • A measure of the percent of a company’s revenue left over after paying variable costs: wages, raw materials, etc.
  • Increased margins mean earning more per dollar of sales.
  • The higher the operating margin, the better.
17
Q

Enterprise data warehouses (EDW)

A
  • Data warehouses that pull together data from disparate sources and databases across an entire enterprise.
  • Warehouses are the primary source of cleansed data for analysis, reporting, and Business Intelligence (BI).
  • Their high costs can be subsidized by using Data marts.
18
Q

3 Procedures to Prepare EDW Data for Analytics

A
  • Extract from designated databases.
  • Transform by standardizing formats, cleaning the data, integration.
  • Loading into a data warehouse.
19
Q

CDC Change Data Capture

A

minimises the resources required by ETL by focusing primarily on data changes.

20
Q

Active Data Warehouse (ADW)

A
  • Real-time data warehousing and analytics.

- Transform by standardizing formats, cleaning the data, integration.

21
Q

Hadoop

A

Is an Apache processing platform that places no conditions on the processed data structure.
Distributes computing problems over a number of servers

22
Q

MapReduce

A

Provides a reliable, fault-tolerant software framework to write applications easily that process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware.

23
Q

Map stage

A

Breaks up huge data into subsets then distributes them across several servers for processing.

24
Q

Reduce stage

A

Recombines partial results and makes them available to analytical tools.

25
Data Mining
software that enables users to analyze data from various dimensions or angles, categorize them, and find correlative patterns among fields in the data warehouse.
26
Text Mining
broad category involving interpreted words and concepts in context (How could we track what it is said about my Co.)
27
Sentimental Analysis
trying to understand consumer intent
28
Text Analytics (Mining) Procedure
Exploration - Simple word counts - Topics consolidation Preprocessing - Standardization - May be 80% of processing time - Grammar and spell checking Categorizing and Modelling - Create business rules and train models for accuracy and precision