Exam1 Flashcards

Question

One challenge of using AI for predictions is that AI uses _____ data

Answer 1

Historical (ex how would an AI model fall out of an unanticipated large event like Covid?)

Answer 2

limits of technological maturity (memory space, computational power)

Answer 3

the end goal in mind: who will use this model any why

Answer 4

data collected in it's original form, prior to any processing or adjustments

Answer 5

- Descriptive - Predictive - Prescriptive

Answer 6

Predictive just predict the future (forecasts, etc), prescriptive change the future (control, optimization, etc)

Answer 7

- numeric vs non-numeric - categorical data (ex fault or no-fault) - structured vs unstructured - temporal, spatial, spatio-temporal - experimental vs operational

Answer 8

experimental data will isolate a single (or few) variables from other variables, while operational data will have a much more impact from the surrounding environment (which was not controlled)

Answer 9

data that challenges the current capabilities of a single computing unit

Answer 10

- metered data - sub-metering - communications - measured data - data storage

Answer 11

Cross industry standard process for data mining

Answer 12

an instance

Answer 13

the science of analyzing raw data to draw insight, and make conclusions from that data

Answer 14

1. Access Data 2. Detect Duty Cycles 3. Remove Outliers 4. Sanitize Gaps 5. Check Process Limits 6. Analyze data...

Answer 15

the variance between 2 variables

Answer 16

no apparent linear statistical dependence between the 2 variables

Answer 17

inversely to the other variable

Answer 18

- normalized to -1 to +1 - unitless

Answer 19

covariance of A&B / (std.devA*std.devB)

Answer 20

data points that are significantly different from the rest of the data set

Answer 21

Z-score, where Z is the standardized equivalent of the data value = (x-x_mean)/std.dev

Answer 22

Minimum Covariance Detection

Answer 23

remove outliers from multivariate samples (minimum covariance determinant)

Answer 24

The process of identifying missing data, then creating a substitute

Answer 25

- missing data is generally not allowed in training data sets - throwing out entire data points could throw out useful data - statistical techniques could be biased by missing data

Answer 26

3 ( = 3 std dev away from mean)

Answer 27

Bias into subsequent modeling

Answer 28

1. throw it out 2. fill in the gap

Answer 29

- simple statistics (use mean, median, a constant) - Multivariate imputation with bayesian stats - k-nearest neighbor imputation

Answer 30

- does the data include info that can predict the target? - does the granularity of the training and prediction match? - is there labeled data? - is the data accurate? Do you know where it came from? - is it easily accessible and readable? - are the missing values a small percentage of the fields of interest?

Answer 31

a sequence of explicit instructions which perform a specific task

Answer 32

asymptotic

Answer 33

Machine Learning

Answer 34

the study and usage of both algorithms and statistical models, which computer systems use, without explicit instructions, to learn how to perform specific taks

Answer 35

Deep Learning

Answer 36

Comp Sci; Optimization; Statistics

Answer 37

Clustering

Answer 38

has an associated category assigned to a specific set of features in the data set

Answer 39

belongs to only 1 cluster

Answer 40

k-means, hierarchical

Answer 41

soft clustering

Answer 42

- exploratory data analysis - dimensional (feature) reduction - image segmentation - anomaly detection - data mining

Answer 43

d = sqrt( ( x1 - x2)^2 + (y1-y2)^2 )

Answer 44

the arithmetic mean of the points in each dimension

Answer 45

a smaller amount of data

Answer 46

- % reduction drop of SSE -Hard stop limit to avoid infinite iteration and/or a known goal

Answer 47

dendrogram

Answer 48

probabilistic technique

Answer 49

arithmetic mean

Answer 50

too complex, maybe has too many predictors

Answer 51

the model is more complex than the data

Answer 52

- fault detection - predictive maintenance - speech recognition

Answer 53

loss function

Answer 54

w_i = (1/dist_i)/(sum(1 to k)of (1/dist_i))

Answer 55

formula for the hypotenuse of a right triange

Answer 56

- simple algorithm, with flexible options (distance calc method, # of k) - considered a benchmark for other classification methods

Answer 57

- sensitive to outliers and erroneous labels - memory intensive with larger k, pts, and features (giant distance matrices)

Answer 58

the error just on the training set

Answer 59

- can handle non-linear responses - excellent with categorical variables - easy to understand for a small number of features - once you build the model, classification of new data is computationally quick since it is just binary decisions

Answer 60

- struggles with a large number of features with smaller data size - difficult to understand for a large number of features

Answer 61

probabilistic

Answer 62

- Governing (Risk Management, Standards, Responsibility) - Designing (Automation, Sustainability, Design) - Enabling (Data, Incentives, Education)

Answer 63

between 92 and 173 trillion by 2050

Answer 64

- Renewable power gen. and demand forecasting - Grid optimization and operation - Management of energy demand and DER - Materials discovery and innovation

Answer 65

to a local minimum, rather than global minimum SSE

Answer 66

hierarchical

Exam1 Flashcards

(97 cards)