Lecture 1 Flashcards

1
Q

5V’s of Big Data

A

5v’s: Volume, Variety, Velocity, Value and Veracity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Caveat 1: Big Data and Thick data

A

There are many reasons for Nokia’s downfall, but one of the
biggest reasons that I witnessed in person was that the company
over-relied on numbers. They put a higher value on quantitative
data, they didn’t know how to handle data that wasn’t easily
measurable, and that didn’t show up in existing reports.”
Tricia Wang 2016

• Message
• Beware of Quant Bias or Quant Addiction
• Big Data in many cases needs to be supported with Thick Data
• Thick Data (Emotion, Context, Meaning,..)
•Ethnography – people’s way of living, culture
• Improve the use of big data or analytics by seeing the whole
picture of decision making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why “big data” NOW?

A

• Availability of massive amounts of digital data
• Combination of technical developments and societal
needs
• A philosophical view
• Rationalism vs empiricism
• The discovery of the power of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why Big Data now: Technical developments

A

Radical changes in:
• The way elementary data are captured
• Sensors (automated) vs keyboard (human)
• The way data is stored
• Main memory and cloud vs hard disk
• The way data is analyzed
• Data-driven methods vs sampling
• The way data is provided to users
• Data logistics vs data integration
• The way data is presented
• Graphical interactive visualizations vs management reports
• The way knowledge (business rules, models) is created
• Learning/mining vs (labor-intensive) knowledge acquisition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Three types of learning

A

Supervised:

  • Labeled data
  • Focused outcomes
  • Assess/Predict

Unsupervised:

  • No initial focus
  • No feedback
  • Clustering

Reinforcement:

  • Action/Results
  • Reward function
  • Learning for planning or action
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Machine Learning learning

A

We don’t solve problems with Machine Learning, we solve problems with the rules and knowledge that ML builds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Steps in ML

A
  1. Data and Analytics
  2. Machine Learning
  3. Reasoning
  4. Partnerships

Each step supports the next

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data science definition (Provost)

A

is a set of fundamental principles that support and guide the principled extraction of information and knowledge from data.
o Sometimes referred to as “Applied AI”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data mining definition (Provost)

A

is the actual extraction of knowledge from data, via technologies that incorporate these principles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data-driven Decision Making (DDD) definition Provost

A

is the practice of basing decisions on the analysis of data, rather than purely on intuition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Science principles (By Provost)

A

• Entities that are similar with respect to known features or attributes often are similar with respect to unknown features or attributes
• Deal with missing information as far as it goes
o Cf. old “closed-world” view in traditional database: ‘not in DB, then false’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Science Principles (1)

A

Extracting useful knowledge from data to solve
business problems can be treated systematically by
following a process with reasonably well-defined
stages. The Cross-Industry Standard Process for Data
Mining (CRISP-DM7) is one codification of this
process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data Science Principles (2)

A

• If you look too hard at a set of data, you will find
something—but it might not generalize beyond the
data you’re looking at (problem of overfitting)
• To draw causal conclusions, one must pay very close
attention to the presence of confounding factors,
possibly unseen ones (observation vs intervention)
• When using AI heuristics to find some optimum you
may end up in a local maximum.
• The relationship between the business problem and
the analytics solution often can be decomposed into
tractable subproblems via the framework of analyzing
expected value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Types of Business Analytics research

A
  1. Applying analytics to answer a business question
    • Problem-oriented. Management Science. Result is an insight with practical
    value
    • Scientific value depends on genericity
    • Example: “analyze effects of social media intensity on program item
    redemption”
  2. Developing an analytics-based tool/ process/ method
    • Design Science research (DSR): result is an IS application
    • Example: “association rule-based anomaly detection for event logs”
  3. Improving upon analytical techniques/tools
    • Technical Design research (CS): result is new or improved algorithm
  4. Other types of research: identify and address legal challenges,
    economic consequences, …
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Final remarks lecture 1

A

• Data Science is a response to Big Data by adopting
AI.
• AI is making big steps because of Big Data
• A data science solution must be embedded in the
business. This is not a simple step.
• The increasing use of AI in business imposes new
challenges:
• Human-centered AI: how to make best use of humans and
machines
• Responsible AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly