pic 3 - Data and Knowledge Management Flashcards
Difficulties of managing data
- Amount of data is increasing exponentially.
- Data is scattered throughout organizations.
- Data is generated from multiple sources.
- New sources of data are constantly being developed. Data becomes let current over time.
- Data rot
- Data security, quality, and integrity are critical, yet they are easily jeopardized.
- Federal gov. regulations require companies to account for how information is being managed within their organizations. Companies are downing in data, much of which is unstructured.
- Big data
Data silo
A collection of data held by one group that is not easily accessible by other groups. They hinder the process of gaining actionable insight from organizational data, create barriers to an overall view of the enterprise and its data.
Data streams
Data that are continuously generated by point-of-sale systems, clickstream data, social media, and sensors.
Data rot
Refers primarily to problems with the media on which the data are stored.
- Temperature, humidity, and exposure to light can cause physical problems with storage media and make it difficult to access data.
- Another aspect is finding the machines needed to access the data.
Data governance
An approach to managing information across an entire organization.
- Involves a formal set of business processes and policies that are designed to ensure that data are handled in a certain, well-defined fashion.
- Objective is to make info available, transparent, and useful for the people who are authorized to access it, from the moment it enters an organization until it become outdated and is deleted.
Master data management
A process that spans all of an organization’s business processes and applications.
- Strategy for implementing data governance.
- Provides companies the ability to store, maintain, exchange, and synchronize a consistent, accurate, and timely “single version of the truth” for the company’s master data.
Master data
A set of core data (e.g., customer, product, employee, vendor, geographic location) that span the enterprise’s information systems.
Transactional data
Data generated and captured by operational systems that describe the business’s activities, or transactions.
Data file
A collection of logically related records.
Database systems minimize the following problems:
- Data redundancy – same data stored in multiple locations.
- Data isolation – applications cannot access data associated with other applications.
- Data inconsistency – various copies of the data do not agree.
Database systems also maximize the following benefits:
- Data security – Since data is put in once place in databases, there is risk of losing a lot of data at once. Databases must have extremely high security measures in place to minimize mistakes and deter attacks.
- Data integrity – Data meet certain constraints (E.g., there are no letter in a SIN).
- Data independence – Applications and data are independent of one another; that is, applications and data are not linked to each other, so all applications are able to access the same data.
A bit (binary digit)
the smallest unit of data a computer can process. Binary means it can only consist of a 0 or 1.
A byte
a group of 8 bits that represent a single character. Can be a letter, number, or symbol.
Field
a characteristic of interest that describes an entity. It is a logical grouping of characters into a word, a small group of words, or an identification number.
Record
a logical grouping of related fields.
Table
a logical grouping of related records. AKA a data file.
Database
a logical grouping of related files.
Database management system (DBMS)
A set of programs that provide users with tools to create and manage a database.
- Managing a database refers to the processes of adding, deleting, accessing, modifying, and analyzing data that are stored in a database.
Relational database model
Data model based on the simple concept of tables in order to capitalize on characteristics of rows and columns of data.
Data model
A diagram that represents entities in the database and their relationships
Entity
A person, a place, a things, or an event about which an organization maintain information.
- A record generally describes an entity.
Instance
Each row in a relational table, which is specific, unique representation of the entity.
Attribute
Each characteristic or quality of a particular entity.
Primary key
A field (or attribute) of a record that uniquely identifies that record so that it can be retrieved, updated, and sorted (e.g., student number)
Secondary key
A field that has some identifying information, but typically does not uniquely identify a record with complete accuracy (e.g., Student’s major if a user wanted to identify all of the students majoring in a particular field of study).
Foreign key
A field (or a group of fields) in one table that uniquely identifies a row of another table.
- Used to establish and enforce a link between two tables.
Structured data
highly organized data in fixed fields in a data repository such as a relational database that must be defined in terms of field name and type (e.g., alphanumeric, numeric, and currency).
Unstructured data
Data that do not reside in a traditional relational database (e.g., email messages, work processing documents, videos).
Big Data
A collection of data that is so large and complex that it is difficult to manage using traditional database management systems.
Characteristics of Big Data
Characteristics of Big Data
1. Volume – mass amounts of data.
- A single jet engine can generate 10 terabytes of data in 30 minutes.
2. Velocity – the rate at which data flow into an organization is rapidly increasing.
- Critical because it increases the speed of feedback loop between a company, its customers, its suppliers, and its business partners.
3. Variety – Big Data formats change rapidly.
- Includes satellite imagery, broadcast audio streams, digital music files, web page content, scans of government documents, and comments posted on social media networks.
Big Data generally consists of the following:
- Traditional enterprise data (e.g., customer relationship management systems, operations data, etc.).
- Machine-generated/sensor data (e.g., smart meters, manufacturing sensors, sensors integrated into smartphones, automobiles.).
- Social data (e.g., customer feedback comments, Microblogging sites such as Twitter, and social media sites.).
- Images captured by billions of devices located throughout the world, from digital cameras and camera phones to medical scanners and security cameras.
Issues with Big Data
- Big Data can come from Untrusted sources.
- Big data is dirty
Dirty data - inaccurate, duplicate, or erroneous data.
Ex. Problems such as misspelling of words and duplicate data such as retweets or company press releases that appear multiple times in social media. - Big data changes
Massively parallel processing
the coordinated processing of an application by multiple processors that work on different parts of the application, with each processor utilizing its own operating system and memory.
Cold data
The storage of relatively inactive data that does not have to be accessed frequently or rapidly.
Hot data
Data that must be accessed frequently and rapidly.
Data warehouse
A repository of historical data that are organized by subject to support decision makers within the organization.
Data mart
a low-cost, scaled-down version of a data warehouse that is designed for the end-user needs in a strategic business unit or an individual department.
Knowledge management (KM)
a process that helps organizations manipulate important knowledge that comprises part of the organization’s memory, usually in an unstructured format.
Explicit knowledge
The more objective, rational, and technical types of knowledge.
Tacit knowledge
The cumulative store of subjective or experiential learning, white is highly personal and hard to formalize.
Knowledge management systems (KMSs)
the use of modern technologies – the internet, intranets, extranets, and databases – to systemize, enhance, and expedite knowledge management both within one firm and among multiple firms.
The KMS Cycle
- Create knowledge – Knowledge is created as people determine new ways of doing things or develop know-how. Sometime external knowledge is brough in.
- Capture knowledge – New knowledge must be identified as valuable and be presented in a reasonable way.
- Refine knowledge – New knowledge must be placed in context so that it is actionable.
- Store knowledge – useful knowledge must be stored in a reasonable format in a knowledge repository so that other people in the organization can access it.
- Manage knowledge – the knowledge must be kept current. Must be reviewed regularly to verify that it is relevant and accurate.
- Disseminate knowledge – knowledge must be made available in a useful format to anyone in the organization who need it, anywhere and anytime.
Structured query language (SQL)
The most popular query language for requesting information from a relational database.
Query by example (QBE)
A method to obtain information from a relational database by filling out a grid or template – also known as a form – to construct a sample or a description of the data desired.
Entity-relationship modelling
The process of designing a database by organizing data entities to be used and identifying the relationships among them.
Entity-relationship (ER) diagram
Document that shows data entities and attributes and relationships among them.
Data dictionary
A collection of definitions of data elements; data characteristics that use the data elements; and the individuals, business functions, applications, and reports that use these data elements.
Normalization
a method for analyzing and reducing a relational database to its most streamlined form to ensure minimum redundancy, maximum data integrity, and optimal processing performance.