session 6: data knowledge mgmt Flashcards
database
is a collection of related data files or tables that contain data
difficulties in managing data (10)
1) data increase exponentially with time
2) data are scattered throughout org.
3) multiple sources of data
4) data become outdated
5) data media rots
6) data security/quality/integrity may be compromised
7) new sources of data
8) legal requirements need to be met with appropriate data-storage methods
9) lefacy IT systems/functional requirement may results in redudancy or inconsistency
10) high volumes of big data + variety of data collected increase in complexity
sources of data
internal sources: corporate database, company docs…
personal sources: personal thoughts, opinions…
external sources: commercial database, gov. reports, coprorate website…
new sources: blogs, podcats, tweets etx
clickstream data
data that visitors and customers produce when they visit a website and click on hyperlinks
Data governance (subset of IT governance)
an approach to managing info across an entire organization
data governance objective
enable available, transparent, useful data => single version of the truth
data governance involves…
provides a planned approach to data mgmt for all types of data
includes a formal set of business processes for data handling
requires well-defined unambiguous rules +> which address creating, collecting, handling, protecting data
master data mgmt
process that spans all of an organization’s businsess processes and applications
master data mgmt goal
goal : effecitvely store, maintain, exchange and synchronize master data
provide consistency, accuracy, timeliness, up-to-date master data
master data def
set of core data such as customer, product employee, vendor etc
stored in a master file or as tables as part of the database
transactional data def
generated and captured by operational systems describe the business’s activities
represents activtiies or events (payroll cheques, customer invoice etc)
stored in transaction files or as table in the database
big data def
collection of data that is so large and complex that it is difficult to manage using traditional database mgmt systems
characteristics of big data
exhibit variety
include unstructured/structured/ semi-structured data
generated at high velocity with an uncertain pattern
do not fit neatly into traditional, structured, relational databases
can be captured, processed, transformed and analyzed in a reasonable amount of time
sources of big data
traditional enterprise data (customer info, web sotre transactions…)
machine-generated/sensor data (smart meters, manufacturing sensors…)
social data (feedback comments…)
images captured by billions of devices
big data 3V
volume
velocity
variety