Literature & Lectures Flashcards

Final Exam 10-06-2020

1
Q

What is a HiPPO?

A

the highest-paid person’s opinion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the idea of ‘Mute the HiPPO’?

A

Businesses should rely more on data instead of HiPPOs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The 5 management challenges by McAfee?

A
  1. Leadership: Establish leadership teams that set clear goals, define what success looks like, and ask the right questions.
  2. Talent Management: As data become cheaper, the complements to data become more valuable
  3. Technology: The tools available to handle the volume, velocity and variety of big data.
  4. Decision-Making: Put information and the relevant decision rights in the same location.
  5. Company culture: Move away from acting solely on hunches and instinct and breaking bad habits
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Three analytics capability levels proposed by LaValle

A
  1. Aspirational: Use analytics to justify actions
  2. Experienced: Use analytics to guide actions
  3. Transformed: Use analytics to prescribe actions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

5 recommendations of LaValle

A
  1. Focus on the biggest and highest-value opportunities
  2. Within each opportunity, start with questions, not data
  3. Embed insights to drive actions and deliver value
  4. Keep existing capabilities while adding new ones
  5. Use an information agenda to plan for the future
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does Hadoop do?

A

combines commodity hardware with open-source software. It takes incoming streams of data and distributes them onto cheap disks; it also provides tools for analyzing the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a relational database?

A

a database structured to recognize relations among stored items of information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is PoS Data?

A

Point of Sale Data is data collected by a business when a transaction happens.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is preventive maintenance?

A

uses sensor data to monitor a system, then continuously evaluates it against historical trends to predict failure before it occurs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

3 Business Models enabled with Big Data

A
  1. Differentiating creates new experiences
  2. Brokering augments the value of information
  3. Delivery networks enable the monetization of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The differences between hierarchical and relational databases as pointed out by Lake & Crowther (2013)?

A

In a hierarchical database, segments are implicitly joined with each other.

In a relational database, this relationship between tables is captured by foreign keys and primary keys.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

a hierarchical database

A

segments are implicitly joined with each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

a relational database

A

this relationship between tables is captured by foreign keys and primary keys.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

SQL database

A

relational database (handle structured data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

NoSQL database

A

non-relational database (handle unstructured data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

4 Pros of SQL database

A
  1. Fits structured data readily into well- organized tables
  2. Are more mature & represent huge investments by vendors and users
  3. Convenient for transactions that require great precision (supports ACID*)
  4. Offers a big feature set and data integrity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

4 Cons of SQL database

A
  1. Scaling. They aren’t designed to function with data partitioning
  2. Complexity. When data doesn’t fit easily into a table, the database’s structure can be complex, difficult and slow to work with.
  3. Can entail large amounts of complex code and doesn’t work well with modern, agile development.
  4. Large feature set. Users don’t need all the features, as well as the cost and complexity they add.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does ACID stands for?

A

Atomicity means an update is performed completely or not at all

Consistency means no part of a transaction will be allowed to break a database’s rules

Isolation means each application runs transactions independently of other applications

Durability means that completed transactions will persist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

4 Pros of NoSQL database

A
  1. Efficient handling of unstructured data
  2. Easier to work with for developers that are not familiar with SQL
  3. Enable better performance
  4. Process data faster than relational databases
20
Q

4 Cons of NoSQL database

A
  1. May cause problems for applications that require great precision (ACID)
  2. Require manual query programming (is fast for simple tasks but is difficult and time- consuming for others)
  3. Don’t offer the degree of reliability because it doesn’t support ACID
  4. May compromise consistency because it doesn’t support ACID
21
Q

What is SQL?

A

Is Structured Query Language, which is a computer language for storing, manipulating and retrieving data stored in a relational database.

22
Q

Why would you choose NoSQL?

A

To enable cloud, easily scalable, real-time, with shorter processing time and relatively cheaper than RDBMSs

23
Q

Why Hadoop?

A

(1) It is scalable: you can add more nodes on the fly

(2) It is fault-tolerant: if nodes go down, data gets processed by another node

24
Q

What are the 3 core components of Hadoop?

A
  1. Hadoop Distributed File System (HDFS)
    The storage layer of Hadoop
  2. Map-Reduce
    The data processing layer of Hadoop. It processes data in two phases (1) Map Phase and (2) Reduce Phase
  3. Yet Another Resource Locator (YARN)
    The resource management layer of Hadoop
25
Q

Name 4 reasons why Hadoop is important

A
  1. Hadoop has a better Career Scope
  2. A Maturing Technology
  3. Data managing
  4. Omnipresent
  5. Open for all (Hadoop is easy to manage)
  6. Professionals shortage
26
Q

What is a Database Management System? (DMS)

A

A software system that enables users to define, create, maintain and control access to databases.

27
Q

2 Logical structures for database models

A
  1. Navigational models (e.g. hierarchical-, network-, and graph database models)
  2. Relational models
28
Q

What is understood when we speak about a table?

A

a collection of records

29
Q

What does cardinality tell us?

A

Describes the expected number of related occurrences between the two entities in a relationship

30
Q

What is an Entity-Relationship database? (ER)

A

Schematic representation of the database

31
Q

What is a normalisation process in programming?

A

Fitting messy ‘real-life’ data into a homogenized and uniformized database. Making sure that the database is accurate, scalable, easy to update and queried.

32
Q

What does a normalisation process prevent?

A

redundancy, confusion, improper keys, wasted storage, incorrect/outdated data

33
Q

5 stages of the Big Data Life Cycle

(Most companies focus only on the modeling/analysis part but focusing on all processes will result in higher business rewards)

A
  1. Data acquisition: produce, derive and collect data
  2. Information extraction and cleaning: Pull out information and express it in a structured form suitable for analysis
  3. Data integration, aggregation and representation: Collection of heterogeneous data from multiple sources
  4. Modeling and analysis: Methods for querying and mining big data
  5. Interpretation: A decision-maker had to interpret the results of the analysis
34
Q

The inherent challenges of Big Data

A
  1. Heterogeneity
  2. Inconsistency and incompleteness
  3. Scale
  4. Timeliness
  5. Privacy and data ownership
35
Q

7 fundamental concepts of Big Data by Provost & Fawcett

A
  1. Extracting useful knowledge from data by following a process with reasonably well-defined stages.

The Cross-Industry Standard Process for Data Mining (CRISP-DM)

  1. Evaluating data-science results requires careful consideration of the context in which they will be used.
  2. The relationship between the business problem and the analytics solution often can be decomposed into tractable subproblems via the framework of analyzing the expected value
  3. Information technology can be used to find informative data items from within a large body of data
  4. Entities that are similar with respect to known features or attributes often are similar with respect to unknown features or attributes
  5. If you look too hard at a set of data, you will find something—but it might not generalize beyond the data you’re observing ‘‘overfitting’’
  6. To draw causal conclusions, one must pay very close attention to the presence of confounding factors, possibly unseen ones
36
Q

4 mistakes most managers make with analytics by Lambrecht & Tucker

A
  1. Not understanding the issues of integration
  2. Not realizing the limits of unstructured data
  3. Assuming correlations mean something
  4. Underestimating the labor skills needed
37
Q

Two key advices of Frank

A
  1. Start small: define a few relatively simple analytics, this allows the organization to see what the data can do. Also, the results are easier to test.
  2. Targeted prototyping
    Capture only the data you need to perform the test, instead of dealing with all of the data available.

This is a lower-risk way to see what big data can do for your firm and to test your firm’s readiness to use it.

38
Q

The top benefits of AI

A

Make better decisions

Optimize internal operations

Optimize external operations

Free workers to be more creative

Enhance current products

39
Q

4 Factors that drove the AI wave

A
  1. New algorithms for machine learning
  2. The internet and the cloud
  3. Big data
  4. Moore’s Law
40
Q

What is Neutral Language Processing? (NLP)

A

NLP is a term for everything from speech recognition to language generation, each requiring different techniques (such as chatbots and translations)

41
Q

5 fundamental methods and techniques for AI

A
  1. Heuristics
    Heuristics are a way to employ a practical method to find a solution that is not guaranteed to be optimal, but one that is sufficient for the immediate goals (such as navigation)
  2. Support Vector Machine
    Classification problems where there is no straight rule for identifying the classes. (such as a spam filter or identifying handwriting or characters)
  3. Artificial Neutral Networks
    Understanding complex relationships between features of a certain item (such as image or speech recognition)
  4. Markov Decision Process
    Find a policy for the decision-maker, tell him which particular action should be taken at which state. Solving complex decision-making problems (such as inventory planning)
  5. Neutral Language Processing
    NLP is a term for everything from speech recognition to language generation, each requiring different techniques (such as chatbots and translations)
42
Q

What is Cognitive Computing used for?

A

The ability to understand

43
Q

What does a Linear Regression do?

A

Linear Regression allows us to map numeric inputs to numeric outputs, fitting a line into the data points.

44
Q

What is the essence of an algorithm?

A

To capture the dominant trend and fit our line within that trend.

Note:

We always want to find the trend, not fit the line to all the data points!!!

45
Q

What does a bias-variance trade-off mean?

A

Machine Learning models fulfill their purpose when they generalize well. Generalization is bound by the two undesirable outcomes — high bias and high variance.

A situation with low bias and low variance represents the desired situation ‘‘generalization’’

This trade-off is the most integral aspect of Machine Learning model training. Detecting whether the model suffers from either one is the sole responsibility of the model developer.

46
Q

Explain underfitting

A

!! High Bias !!

Underfitting is the case where the model has “ not learned enough” from the training data, resulting in low generalization and unreliable predictions.

47
Q

Explain overfitting

A

!! High Variance !!

Overfitting is the case where the overall cost is really small, but the generalization of the model is unreliable. This is due to the model learning “too much” from the training data set.