session 6: data knowledge mgmt Flashcards

1
Q

database

A

is a collection of related data files or tables that contain data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

difficulties in managing data (10)

A

1) data increase exponentially with time
2) data are scattered throughout org.
3) multiple sources of data
4) data become outdated
5) data media rots
6) data security/quality/integrity may be compromised
7) new sources of data
8) legal requirements need to be met with appropriate data-storage methods
9) lefacy IT systems/functional requirement may results in redudancy or inconsistency
10) high volumes of big data + variety of data collected increase in complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

sources of data

A

internal sources: corporate database, company docs…
personal sources: personal thoughts, opinions…
external sources: commercial database, gov. reports, coprorate website…
new sources: blogs, podcats, tweets etx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

clickstream data

A

data that visitors and customers produce when they visit a website and click on hyperlinks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data governance (subset of IT governance)

A

an approach to managing info across an entire organization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

data governance objective

A

enable available, transparent, useful data => single version of the truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

data governance involves…

A

provides a planned approach to data mgmt for all types of data
includes a formal set of business processes for data handling
requires well-defined unambiguous rules +> which address creating, collecting, handling, protecting data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

master data mgmt

A

process that spans all of an organization’s businsess processes and applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

master data mgmt goal

A

goal : effecitvely store, maintain, exchange and synchronize master data
provide consistency, accuracy, timeliness, up-to-date master data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

master data def

A

set of core data such as customer, product employee, vendor etc
stored in a master file or as tables as part of the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

transactional data def

A

generated and captured by operational systems describe the business’s activities
represents activtiies or events (payroll cheques, customer invoice etc)
stored in transaction files or as table in the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

big data def

A

collection of data that is so large and complex that it is difficult to manage using traditional database mgmt systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

characteristics of big data

A

exhibit variety
include unstructured/structured/ semi-structured data
generated at high velocity with an uncertain pattern
do not fit neatly into traditional, structured, relational databases
can be captured, processed, transformed and analyzed in a reasonable amount of time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

sources of big data

A

traditional enterprise data (customer info, web sotre transactions…)
machine-generated/sensor data (smart meters, manufacturing sensors…)
social data (feedback comments…)
images captured by billions of devices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

big data 3V

A

volume
velocity
variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Issues with big data

A

come from untrusted sources
big data is dirty (innacurate, incomplete, incorrect etc)
changes

17
Q

data warehouses def

A

repository of historical data organized by subject to support decision makers

18
Q

data mart def

A

low cost, scaled down versions of data warehouse designed for end-users needs in a startegic business unit

19
Q

Query by example (QBE)

A

method of creating database queries that allows users to search for doc based on an example in the form of a selected string of text

20
Q

characteristics of data warehouses and data marts (6)

A

organized by business dimension or subject
use online analytical processing
integrated
time variant
nonvolatile
mutlidimensional

21
Q

ETL

A

extract, transform, load

22
Q

generic data warehouse environment

A

source systems : provide data to the warehouse or mart
data integration technology and processes:: prepare data for use
storing data: handled by variety of architectures
metadata: data about data
data quality issues; data cleansing needs to be used to ensure data meets user’s needs
BI: establishing ppl, comittees/processes to maintain data warehouses
users: business value for users rises when data can be accessed quickly

23
Q

data lakes def

A

central repository that stores all of an organization’s data, regardless of its source or format

24
Q

information silo def

A

an info system that does not communicate with other related info systems in an organization

25
Q

how companies can use big data to gain a competitive advantage

A

strategies:

make big data available
use big data to conduct experiments
micro-segmentation of customers
creating new business models

use in functional areas of org. :
human resources (employee benefits, hiring..)
product development (capture customer preferences…)
operations (analyze data to make operations more efficient)
marketing (better understanding customers…)
gov operations

26
Q

architectures for data mart and data warehouses

A

one central enterprise data warehouse (without data marts)
independent data marts: data marts store data for a single application or a few
hub and spoke: contains a central data warehouse that stores the data plus multiple independent data marts that source their data from the central repository

27
Q

benefits of data warehousing

A
  • end user can access needed data quickly and easily through web browsers because these data are located in one place
  • end users can conduct extensive analysis with data in ways that were not previously possible
  • end users can obtain consolidated view of organizational data
28
Q

data warehouse and data lakes differences

A

data:
warehouse:
relational from transactional systems, operational databases and the lines of business apps
lakes:
non-relational and relational from IoT devices, websites, mobile apps, social media and corporate applications

schema:
warehouse:
designed propr to the DW implementation (schema-on-write)
lake:
written at the time of analysis (schema-on-read)

price/performance
warehouse:
fastest query results using higher cost storage
lake:
query results getting faster using low-cost storage

data quality:
warehouse:
highly curated data that serves as the central version of the truth
lakes:
any data that may or may not be curated

users:
warehouse:
business analysts
lakes:
data scientists, data developers, business analysts

analytics:
warehouse: batch reporting, BI and visualizations
lake: machine learning, predictive analytics, data discovery and profiling