Prelims - honpritz Flashcards

honpritz

1
Q

Data encoders, gatherers

A

Collector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Treat, prepare data

A

Data engineer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Performs the modeling, testing, and validation

A

Modeler or Data Scientist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Do the decision making

A

Business analyst

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Steward

A

Collector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Modeler

A

Data Scientist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

It is a multi - disciplinary field that uses scientific method, processes, algorithms, computations, and systems in order to extract understanding and insights from a structured and/or unstructured data.

A

Data Science

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

is the mother of invention.

A

NECESSITY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What era:

REPORT WRITING
Goal: Automation

A

1970s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What era:

CENTRALIZED SYSTEM
Goal: ERP (Enterprise Resource Planning)/ MIS (Management Info System)

A

1980s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Goals of the 1980s Centralized system

A
  • Enterprise Resource Planning
  • Management Info System
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What era:

Business Intelligence
Goal: Apps for everyone
Applications for personal use were invented and made to share (not YET to analyze)

A

1990s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Goal: Apps for everyone
Applications for personal use were invented and made to share (not YET to analyze)

A

1990s Business Intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What era:

INTERNET & DATA MINING

A

2000s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What era:

BIG DATA &
Data Science (used for real-time analysis)

A

2010

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The value in the data haystack is guided by your knowledge of the ____ - not the ___ or ____

A

domain; tools or techniques

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

the combination of al skillsets needs to find the value in the data

A

Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Data under Business Intelligence

A
  • Standard reports (What happened?)
  • Ad Hoc, Drill down (Where exactly is the problem?)
  • Alerts (What needs attention?)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Data under Predictive analytics

A

Predictive modeling
“What is the next best action?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Data under Prescriptive analytics

A

Optimization
“What is the best thing that can happen?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Evolution of analytics

A

Descriptive → Diagnostic → Predictive → Prescriptive → Cognitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What happened? Describes historical data: Helps understand how things are going

A

Descriptive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why did it happen?
Helps understand unique drivers; Segmentation, Statistical, & Sensitivity analysis

A

Diagnostic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What could happen? Forecast future performance, events a n d results

A

Predictive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How to make it happen?
Analysis that suggest a prescribed action

A

Prescriptive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What to do, why &how?
Proactive action
Learn at scale
Reason with purpose
Interact naturally

A

Cognitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Data Science & Analytics:
in health care

A
  • Medical Image analysis
  • Machine Learning in Disease Diagnosis
  • Genetics & Genomics
  • Drug Development
  • Virtual assistance for patients and customer support
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Finding useful pattern in a data.

A

Data Mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

it is the process of knowledge discovery, machine learning and predictive analytics.

A

Data Mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Data Mining

A
  • Extracting Meaningful Patterns.
  • Building Representative Models.
  • Combination of Statistics, Machine Learning, and Computing
  • Algorithms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

DATA MINING: Types of Learning Models

A
  • Supervised
  • Unsupervised
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Data Mining is NOT about:

A
  • Descriptive statistics.
  • Exploratory visualization.
  • Dimensional slicing
  • Hypothesis testing
  • Queries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

directed data mining

A

Supervised Learning Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

The model generalizes the relationship between the input and output variables.

A

Supervised Learning Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Undirected data mining

A

Unsupervised Learning Model

35
Q

The objective of this class of data mining techniques is to find patterns in data based on the relationship between data points themselves

A

Unsupervised Learning Model

36
Q

DATA MINING: Groups of Learning Models

A
  • Classification Models
  • Regression Models
  • Clustering Models
  • Anomaly Detection
  • Time Series Forecasting
  • Association
  • Text and Sentiment Analysis
37
Q

DATA MINING: Steps

A
  • Business Understanding
  • Data Understanding
  • Data Preparation
  • Modeling
  • Testing and Evaluation
  • Deployment
38
Q

the process of preparing data for analysis by removing or modifying incorrect, incomplete, irrelevant, duplicated, or improperly formatted data.

A

Data Cleaning

39
Q

variables of a given data set; Represented by columns

A

Attributes

40
Q

Cases or observations of a given data set

Represented by rows

41
Q

Functions or building blocks that create processes for data analysis

42
Q

Parts of the RapidMiner Interface

A
  • Canvas / Process Panel
  • Repository / Source Tabs
  • Operators / Analysis Tabs
  • Parameter Tabs
  • Description Tabs
43
Q

Working area for building processes

A

Canvas or the Process Panel

44
Q

Storage within rapid miner studio for data and rapid miner processes

A

Repository / Source Tabs

45
Q

Building blocks used to create rapidminer processes

A

Operators / Analysis Tabs

46
Q

Settings that modify operator behavior

A

Parameter Tabs

47
Q

context-sensitive help for selected operator

48
Q

work area for accessing specific functionality

49
Q

Methods of Importing Data

A
  • From Repository
  • “Read Excel” Operator
50
Q

many different string values (for example: red, green, blue, yellow)

A

polynomial

51
Q

exactly two values (for example: true/false, yes/no)

52
Q

a fractional number (for example: 11.23 or -0.0001).

53
Q

a whole number (for example: 23, -5, or 11,024,768).

54
Q

both date and time (for example: 23.12.2014 17:59).

55
Q

Operator used for filtering cases

A

Filter Examples

56
Q

Operator used for removing all cases with missing values

A

Filter Examples

57
Q

Operator used for imputing missing data

A

Replace Missing Values

58
Q

To remove “white spaces” in the encoding, use the
_____ operator.

59
Q

To remove “duplicates” in the encoding, use the _________ operator.

A

Remove Duplicates

60
Q

To recode miscoded values, use the ______ operator.

61
Q

Use the ________ operator to select the attributes that you need for analysis.

A

Select Attributes

62
Q

Use the _____ operator to tag the attribute that will be use as the label (Target Variable) or any other role it will act in the analysis.

63
Q

If two data sets are needed to be merged in order to make an analysis, use the ____ operator.

64
Q

Joining Two Data Sets:

In the parameter tab, use _____ as join type.

65
Q

graphical representation of data

A

Data Visualization

66
Q

techniques used to communicate
insights from data through visual
representation.

A

Data Visualization

67
Q

to distill large datasets into visual graphics to allow for easy understanding of complex relationships within the data

A

Data Visualization

68
Q

to analyze massive amounts of information
and make data-driven decisions.

A

Data Visualization

69
Q

Visualization Technique:

to compare counts,
percentage, or other measures (average) for different discrete
categories of data

70
Q

Visualization Technique: to observe trend

A

Line Graph

71
Q

Visualization Technique:

shows the relative
contribution that different categories contribute to an overall total

72
Q

Visualization Technique:

the frequency distribution of
continuous attribute

73
Q

(Bar vs Histo)

presents categorical attribute

74
Q

(Bar vs Histo)

represents numerical attribute

75
Q

(Bar vs Histo)

represents numerical attribute

76
Q

(Bar vs Histo)

have spaces between bars

77
Q

(Bar vs Histo)

do not have spaces between bars

78
Q

Visualization Technique:

plots two numerical attributes

A

Scatterplot

78
Q

Visualization Technique:

graphical representation of the quartiles

79
Q

process performed to decide which examples are kept ad which are removed

A

Data filtering

79
Q

Visualization Technique:

a graphical representation of data where the individual values contained in a matrix (map) are represented as colors.

80
Q

replaces missing values by the attribute’s minimum, maximum, or average value.

A

Missing Value Imputation

81
Q

Imputation method is selected in the ?

82
Q

Use the ______ operator to create a RapidMiner data set from the process

83
Q

Use the ______ operator to store the data in a format you want.