final exam Flashcards
Append
Add records from one dataset to another
Merge
Add fields from one dataset to another
Rectangular Data
Product of records and fields
Stages of CRISPDM
Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, Deployment
SuperNode
Condensing several nodes into a single node
Histogram is used for which fields
Continuous Fields
Distribution is used for which fields
Categorical Fields
Direct link between 2 variables
Causation
2 variables change at a certain rate in relationship to eachother
Correlation
Data point that deviates so far from the other observations
Outlier/Extreme
Values 3-5 SD from the mean
Outlier
Values more then 5 SD away from the mean
Extreme (Outlier)
2 or more categories that can be ranked (School ranking)
Ordinal Ranking
2 or more categories that can be ranked, that have no order (peoples favorite color)
Nominal Ranking
All the numbers added together then divided by how many number there are (average)
Mean
The number in the middle when you arrange a set from least to greatest
Median
The number that appears most often in the set
Mode
1) Rank,
2) Fractional rank as %
3) Sum of case weights
4) Savage score
5)Fractional rank
Options for ranking models
Estimates and compares models for continuous numeric range outcomes
AutoNumeric
Estimates and compares models for either nominal or binary targers
AutoClassifier
Testing multiple models, this node would present the results for all models, for both partitions so you can easily determine which model performed the best
Running an Analysis Node
1)Undefined values represented as $null$
2) White Spaces
3)Values that are not in the allowed set of values,
Invalid Values
Define an area of certain size (Space-time-box)
Geohash
Models in this category predict a target field, using one or more predictors
Supervised Models