final exam Flashcards

1
Q

Append

A

Add records from one dataset to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Merge

A

Add fields from one dataset to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Rectangular Data

A

Product of records and fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Stages of CRISPDM

A

Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, Deployment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

SuperNode

A

Condensing several nodes into a single node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Histogram is used for which fields

A

Continuous Fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Distribution is used for which fields

A

Categorical Fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Direct link between 2 variables

A

Causation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

2 variables change at a certain rate in relationship to eachother

A

Correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data point that deviates so far from the other observations

A

Outlier/Extreme

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Values 3-5 SD from the mean

A

Outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Values more then 5 SD away from the mean

A

Extreme (Outlier)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

2 or more categories that can be ranked (School ranking)

A

Ordinal Ranking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

2 or more categories that can be ranked, that have no order (peoples favorite color)

A

Nominal Ranking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

All the numbers added together then divided by how many number there are (average)

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The number in the middle when you arrange a set from least to greatest

17
Q

The number that appears most often in the set

18
Q

1) Rank,
2) Fractional rank as %
3) Sum of case weights
4) Savage score
5)Fractional rank

A

Options for ranking models

19
Q

Estimates and compares models for continuous numeric range outcomes

A

AutoNumeric

20
Q

Estimates and compares models for either nominal or binary targers

A

AutoClassifier

21
Q

Testing multiple models, this node would present the results for all models, for both partitions so you can easily determine which model performed the best

A

Running an Analysis Node

22
Q

1)Undefined values represented as $null$
2) White Spaces
3)Values that are not in the allowed set of values,

A

Invalid Values

23
Q

Define an area of certain size (Space-time-box)

24
Q

Models in this category predict a target field, using one or more predictors

A

Supervised Models

25
These models create groups of records with similar values on the input field
Unsupervised Models (Segmentation)
26
The process of extracting valuable insights from larger datasets, it helps organizations make data-driven decisions understand customer behavior. By uncovering patterns and relationships in data, businesses can gain a competitive edge and improve efficiency.
Data Mining
27
Modify unit of analysis, remove duplicates, create a dataset with one record per customer
Distinct Node
28
Only data from records present in all source datasets will be merged
Inner Join
29
Automatically create new nominal fields based on the values of one or more existing continuous field
Binning Node
30
1) First m – Returns/Discards the first M records in in dataset 2)1-in-n - Every nth record is selected/discarded 3)Random % - There is a r% probability of each record being selected/discarded
Simple Node
31
Measures distance from the center (mean)
Standard Deviation
32
Sample from groups of records rather than from individual records
Clustered Sample
33
Sample independently within subgroups
Stratified Sample
34
Relationship between 2 Categorical Fields
Matrix, Distribution
35
Relationship between 1 Categorical Fields, 1 Continuous Field
Means, Histogram
36
Relationship between 2 Continuous Fields
Statistics, Plot
37
Process of reading or specifying information such as measurement levels and values for a field
Instantiating Data
38
A field has unknown storage
Uninstantiated
39
K-Means, Kohohen, Two-Step
3 segmentation methods