final exam Flashcards

1
Q

Append

A

Add records from one dataset to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Merge

A

Add fields from one dataset to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Rectangular Data

A

Product of records and fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Stages of CRISPDM

A

Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, Deployment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

SuperNode

A

Condensing several nodes into a single node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Histogram is used for which fields

A

Continuous Fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Distribution is used for which fields

A

Categorical Fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Direct link between 2 variables

A

Causation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

2 variables change at a certain rate in relationship to eachother

A

Correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data point that deviates so far from the other observations

A

Outlier/Extreme

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Values 3-5 SD from the mean

A

Outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Values more then 5 SD away from the mean

A

Extreme (Outlier)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

2 or more categories that can be ranked (School ranking)

A

Ordinal Ranking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

2 or more categories that can be ranked, that have no order (peoples favorite color)

A

Nominal Ranking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

All the numbers added together then divided by how many number there are (average)

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The number in the middle when you arrange a set from least to greatest

A

Median

17
Q

The number that appears most often in the set

A

Mode

18
Q

1) Rank,
2) Fractional rank as %
3) Sum of case weights
4) Savage score
5)Fractional rank

A

Options for ranking models

19
Q

Estimates and compares models for continuous numeric range outcomes

A

AutoNumeric

20
Q

Estimates and compares models for either nominal or binary targers

A

AutoClassifier

21
Q

Testing multiple models, this node would present the results for all models, for both partitions so you can easily determine which model performed the best

A

Running an Analysis Node

22
Q

1)Undefined values represented as $null$
2) White Spaces
3)Values that are not in the allowed set of values,

A

Invalid Values

23
Q

Define an area of certain size (Space-time-box)

A

Geohash

24
Q

Models in this category predict a target field, using one or more predictors

A

Supervised Models

25
Q

These models create groups of records with similar values on the input field

A

Unsupervised Models (Segmentation)

26
Q

The process of extracting valuable insights from larger datasets, it helps organizations make data-driven decisions understand customer behavior. By uncovering patterns and relationships in data, businesses can gain a competitive edge and improve efficiency.

A

Data Mining

27
Q

Modify unit of analysis, remove duplicates, create a dataset with one record per customer

A

Distinct Node

28
Q

Only data from records present in all source datasets will be merged

A

Inner Join

29
Q

Automatically create new nominal fields based on the values of one or more existing continuous field

A

Binning Node

30
Q

1) First m – Returns/Discards the first M records in in dataset
2)1-in-n - Every nth record is selected/discarded
3)Random % - There is a r% probability of each record being selected/discarded

A

Simple Node

31
Q

Measures distance from the center (mean)

A

Standard Deviation

32
Q

Sample from groups of records rather than from individual records

A

Clustered Sample

33
Q

Sample independently within subgroups

A

Stratified Sample

34
Q

Relationship between 2 Categorical Fields

A

Matrix, Distribution

35
Q

Relationship between 1 Categorical Fields, 1 Continuous Field

A

Means, Histogram

36
Q

Relationship between 2 Continuous Fields

A

Statistics, Plot

37
Q

Process of reading or specifying information such as measurement levels and values for a field

A

Instantiating Data

38
Q

A field has unknown storage

A

Uninstantiated

39
Q

K-Means, Kohohen, Two-Step

A

3 segmentation methods