Scavenger Hunt 8 Flashcards

1
Q

What is the rate of data growth?

A

Doubling every 6 months…unprecedented and will continue regardless of budget restraints

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is “information overload”? What is its alleged impact?

A

Vast flood of info is disruptive, liking drinking from a fire hydrant
People’s cant do their jobs with so much data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

By contrast, what is “information abundance” and what are its implications for knowledge workers?

A

Take advantage of the presence of all the new data to learn how to make meaningful info from it and gain managerial knowledge
Geek up!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Business Intelligence

A

Combining aspects of reporting, data exploration & ad hoc queries, & sophisticated data modeling & analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Analytics

A

Statistical and quantitative analysis of data
Explanatory and predictive models, and fact-based management to drive decisions and actions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data

A

Raw facts and figures
Tells you nothing alone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Information

A

Data that has been presented in such a way that it answers questions or supports decision making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Knowledge

A

Insight derived from experience and enterprise-savvy information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Structured Data

A

Organized data that conforms to a model so that it can be searched, analyzed, and queried using traditional analysis tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Unstructured Data

A

Not organized, no schema
Ex. Text, email, Facebook pages, news stories
Binary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Table

A

Organized collection of data made up of records and fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Record

A

Part of table
Row of data
Individual observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Fields

A

Part of table
Column
Attribute for data (fixed schema-textual data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Relational Database

A

A database that correlates data from multiple tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Relational Database Benefits

A

Combines and simplifies data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Relational Database Key Field

A

One of the fields in a table marked with a key so that the data items are unique for that row
Never repeat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Relational Database Valid Relationship Types

A

One:One -> Exactly one occurrence in the key field and Table B has only 1 occurrence as well
One:Many -> One occurrence in the key field but many occurrences in other tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Relational Database Views

A

Display data from multiple tables relationships by combining them for reporting and display

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

SQL

A

Structured Query Language
Most common language for creating & manipulating databases
Ruling champion database in business world

20
Q

TPS

A

Transaction Processing Systems
Ex. ATM, retail sales transactions, websites, searches

21
Q

(TPS) What is a “transaction?” What are its two key characteristics?

A

Any business exchange
1. Standardized-schema
2. Occurs repeatedly

22
Q

(TPS) Point of Sale system

A

Retail computer systems that collect sales data and are hooked directly into the store’s inventory-control system
Scan barcode, transaction happens

23
Q

(TPS) How do loyalty cards generate valuable data?

A

Membership program in which company is paying you through bonuses for data about you that you otherwise would not give them

24
Q

Enterprise Software

A

CRM, ERP, SCM
Applications that address the needs of multiple users throughout an organization or work group

25
Q

CRM

A

Customer Relation Management System
Every sales call, every customer inquiry, every follow up call=data

26
Q

ERP

A

Enterprise Resource Planning System
Paychecks, invoices, payments=business transactions/data to seek insights

27
Q

SCM

A

Supply Chain Management System
Each order for finished goods/raw materials=transactions

28
Q

Business operations — examples

A

Health care patient data
Michigan tags cows at birth which gives lifetime stream of data for each cow raised in the state
Transportation industry: Plane engine produces 10tb of data every 30 min (sensors on aircrafts)
Swiss Rails: 100 data items a second (sensors on train & track)

29
Q

Sources of customer-provided data

A

Customer surveys: customer insight on products/services they received

Product registration cards: data about customer income, where they live, highest education, hobbies

Contests: cheap data where thousands of people apply, giving the company new data from all the entries

30
Q

Data aggregator

A

A company whose sole job is to collect data from a wide variety of sources and organize it, clean it, and connect it to each other and then sell access to it to others

31
Q

Data silos

A

Data collections completely separated with no possibility of communication or sharing

32
Q

How do data silos come into being?

A

Company may have some data trapped inside of obsolete legacy systems

33
Q

Why are data silos a problem?

A

Incompatible systems make it so there’s missed opportunities to see patterns, trends, correlations, and develop new insights to answer questions and make decisions

34
Q

How do inconsistent data formats impact a business?

A

Makes it hard to sort data

35
Q

Operational data

A

Data that is produced by your organization’s day to day operations Things like customer, inventory, and purchase data

36
Q

How does the analysis of operational data compete with customers?

A

Delays and lost sales due to significant amount of additional load to the system during business hours (best if we do not query operational data)

37
Q

What can a company do about the operational data vs customers problem?

A

Separate data repository:
- One for operational data
- One for reporting and analytics

Combine data from many sources and cleaning it

Historical data builds as months and days go by; used as a resource to see trends

Periodic import from operational systems allows analytical system to be up to date enough to come up with inferences

38
Q

What is a Data Warehouse? What are its characteristics

A

Collection of databases that supports decision making
- Many sources
- Operational systems-periodic transfer
- Historical data
- Fast Queries
- Exploration

39
Q

How is a Data Mart different from a Data Warehouse?

A

Same thing, different scale
Data Mart looks at specific problem/unit rather than the entire enterprise

40
Q

What three characteristics are necessary for something to be “Big Data”? (three V’s) Explain each

A

Volume:
notion data is “too big” to be analyzed with traditional methods (hundreds and millions of data items)

Velocity:
Rapid arrival & feedback loop. Data is too fast. Cannot react fast enough.

Variety:
text, images, sound, video, human input, sensors, servers, so many types of data

41
Q

What is Hadoop?

A

open source system designed to be able to consume ANY data you want (structured, unstructured, all data types)

Distributing computing platform

Scalable, cost effective, flexible, fault-tolerant

42
Q

Examples of big data - How do you see the Three V’s in each?

A

Predictive Policing in LA
(historical crime data)

Tesco grocery chain
(optimized fridge costs with in store fridges providing 70M data points per store per year; proactive maintenance reduced maintenance costs by ID’ing problems before they happen)

Actions speak louder than words
(Veteran Therapy; military suicide prevention in the US; uses pattern recognition to identify signs and types of psychological distresses through video measurements)

43
Q

Canned Reports + Pros & Cons

A

Reports that provide regular summaries of information in a predetermined format
Answer specific questions
Pros: Easy & useful
Cons: Inflexible & IT overhead

44
Q

Ad-Hoc Reporting Tools + Pros & Cons

A

Tools that put users in control so that they can create custom reports on an as-needed basis by selecting fields, ranges, summary conditions, and other parameters
Pros: Users define their own resorts, Powerful/flexible
Cons: Demanding of user, Potentially steep learning curve, Business knowledge, Understand data schema

45
Q

Dashboards

A

Graphic view of what is happening inside the software system
Some customization
A picture is worth a thousand words

46
Q

OLAP + Pros & Cons

A

Online analytical processing

The manipulation of information to create business intelligence in support of strategic decision making

used for enormous amounts of data

Pros: Huge data, Pre-processed + Summarized, User reports fast
Cons: No access to details; user only sees summary

47
Q

Data Mining

A

The process of analyzing data to extract information not offered by the raw data alone

Enormous historical datasets
Identify patterns
Build Models
Predict Future