Exam1 Flashcards

1
Q

What year was the term AI coined?

A

1955, Darthmouth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the definition of AI

A

Any system that exhibits behavior that could be interpreted as human intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Weak AI is also called

A

Narrow AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

____ AI is good for systems that have predefined patterns to eliminate impossible options

A

Planning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Strong AI is also called

A

General AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Definition of Weak AI

A

model that is confined to a narrow task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some examples of weak AI tasks

A

Language to text processing; picture sorting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Siri is an example of a weak or strong AI?

A

weak

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Definition of strong AI

A

the machine displays all person-like behavior that you’d expect from an artificial human (emotions, humor, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What was an early name for the nodes in neural networks?

A

Perceptrons (Rosenblatt at Cornell)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When did the term “Deep Learning” become popular?

A

1990s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Reasons that machine learning has accelerated

A

Availability of data
Moore’s Law
IoT
Automated SW coding (sensors and controllers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Training and test data is

A

labelled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

3 categories of supervised learning

A
  • Binary classification
  • Multiclass classification
  • Regression analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If you have massive amounts of unlabeled data, ____ algorithm could be a good choice

A

k-means clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Bagging, boosting, and stacking are examples of

A

ensemble modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Definition of bagging

A

create several different version of the ML algorithm in parallel (like decision trees with different roof notes), and compare results, average out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Definition of Boosting

A

Use several different ML algorithms in sequence to boost accuracy of results (model 2 learns from model 1 etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Definition of Stacking

A

Use several different ML algorithms to boost accuracy (ex. k-NN on top of Naive Bayes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

For abstract reasoning, a _____ system reasoning may be best

A

symbolic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Definition of bias

A

gap between predicted value and actual outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Definition of variance

A

how scattered predicted values are +/- of actual outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the Turing test

A

Can the machine fool a human into thinking it’s a human if it’s behind a wall?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Big data is

A

unstructured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

One challenge of using AI for predictions is that AI uses _____ data

A

Historical (ex how would an AI model fall out of an unanticipated large event like Covid?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

One of the reason AI didn’t take off in the 60s and 70s was

A

limits of technological maturity (memory space, computational power)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When building an AI model, keep _____ in mind

A

the end goal in mind: who will use this model any why

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

raw data is

A

data collected in it’s original form, prior to any processing or adjustments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

3 types of Data analytics

A
  • Descriptive
  • Predictive
  • Prescriptive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Difference between predictive and prescriptive models

A

Predictive just predict the future (forecasts, etc), prescriptive change the future (control, optimization, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Examples of types of data

A
  • numeric vs non-numeric
  • categorical data (ex fault or no-fault)
  • structured vs unstructured
  • temporal, spatial, spatio-temporal
  • experimental vs operational
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Experimental data differs from operational data critically in that ___

A

experimental data will isolate a single (or few) variables from other variables, while operational data will have a much more impact from the surrounding environment (which was not controlled)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Definition of Big Data

A

data that challenges the current capabilities of a single computing unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What types of data would we encounter in energy systems

A
  • metered data
  • sub-metering
  • communications
  • measured data
  • data storage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What does CRISP-DM stand for

A

Cross industry standard process for data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

An input is also sometimes referred to as __

A

an instance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Definition of Data Analytics

A

the science of analyzing raw data to draw insight, and make conclusions from that data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Linear data cleaning workflow

A
  1. Access Data
  2. Detect Duty Cycles
  3. Remove Outliers
  4. Sanitize Gaps
  5. Check Process Limits
  6. Analyze data…
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

In univariate stats, variance =

A

std_dev^2

36
Q

Covariance is…

A

the variance between 2 variables

37
Q

Positive Covariance: variable A increases as variable B

A

increases

38
Q

In weak covariance, there is…

A

no apparent linear statistical dependence between the 2 variables

39
Q

Negative covariance, each variable “varies” ….

A

inversely to the other variable

40
Q

Unlike covariance, correlation is…

A
  • normalized to -1 to +1
  • unitless
41
Q

Correlation of A&B =

A

covariance of A&B / (std.devA*std.devB)

42
Q

____ increase modeling risk

A

outliers

43
Q

Outliers are:

A

data points that are significantly different from the rest of the data set

44
Q

What is the simplest outlier detection technique for univariate samples

A

Z-score, where Z is the standardized equivalent of the data value = (x-x_mean)/std.dev

45
Q

MCD stands for

A

Minimum Covariance Detection

46
Q

MCD can be used to

A

remove outliers from multivariate samples
(minimum covariance determinant)

47
Q

Definition of imputation

A

The process of identifying missing data, then creating a substitute

48
Q

Why is it important to impute data sets

A
  • missing data is generally not allowed in training data sets
  • throwing out entire data points could throw out useful data
  • statistical techniques could be biased by missing data
49
Q

The covariance and correlation matrices are

A

symmetric

50
Q

A typical Z score for outlier cutoff would be

A

3 ( = 3 std dev away from mean)

51
Q

Imputed data inherently introduces

A

Bias into subsequent modeling

52
Q

What are the 2 options to deal with missing data

A
  1. throw it out
  2. fill in the gap
53
Q

What are some ways to impute?

A
  • simple statistics (use mean, median, a constant)
  • Multivariate imputation with bayesian stats
  • k-nearest neighbor imputation
54
Q

What are some initial questions to ask when prepping data for ML

A
  • does the data include info that can predict the target?
  • does the granularity of the training and prediction match?
  • is there labeled data?
  • is the data accurate? Do you know where it came from?
  • is it easily accessible and readable?
  • are the missing values a small percentage of the fields of interest?
55
Q

Definition of an algorithm (comp sci)

A

a sequence of explicit instructions which perform a specific task

56
Q

_____ analysis is used to simplify complexity analysis

A

asymptotic

57
Q

_____ is a subset of AI

A

Machine Learning

58
Q

Definition of Machine Learning

A

the study and usage of both algorithms and statistical models, which computer systems use, without explicit instructions, to learn how to perform specific taks

59
Q

____ is a subset of Machine Learning

A

Deep Learning

60
Q

Machine Learning applies the fields of

A

Comp Sci; Optimization; Statistics

61
Q

Unsupervised ML models can be used for

A

Clustering

62
Q

Labeled data is data which ___

A

has an associated category assigned to a specific set of features in the data set

63
Q

In hard clustering, each data point…

A

belongs to only 1 cluster

64
Q

What clustering techniques are examples of hard clustering

A

k-means, hierarchical

65
Q

Guassian Mixture Modeling is an example of

A

soft clustering

66
Q

What are some applications of clustering?

A
  • exploratory data analysis
  • dimensional (feature) reduction
  • image segmentation
  • anomaly detection
  • data mining
67
Q

Formula for euclidean distance between 2 pts with 2 features

A

d = sqrt( ( x1 - x2)^2 + (y1-y2)^2 )

68
Q

In K-means clustering, a centroid is…

A

the arithmetic mean of the points in each dimension

69
Q

Hierarchical clustering can be preferrable over k-means when dealing with

A

a smaller amount of data

70
Q

What are some convergence criteria you could set for k-means

A
  • % reduction drop of SSE
    -Hard stop limit to avoid infinite iteration and/or a known goal
71
Q

Hierarchical clustering creates a _____

A

dendrogram

72
Q

Gaussian Mixture Modeling is a

A

probabilistic technique

73
Q

In GMM, the center of the cluster is the

A

arithmetic mean

74
Q

A model with overfitting is

A

too complex, maybe has too many predictors

75
Q

You could have ovefitting when

A

the model is more complex than the data

76
Q

overfitting is ____ common than underfitting with AI models

A

more

77
Q

What are some applications of classification?

A
  • fault detection
  • predictive maintenance
  • speech recognition
78
Q

Classification error is quantified by a

A

loss function

79
Q

What is the formula for inverse distance weighting

A

w_i = (1/dist_i)/(sum(1 to k)of (1/dist_i))

80
Q

Euclidean distance in 2D is the same as

A

formula for the hypotenuse of a right triange

81
Q

What are some advantages of k-NN?

A
  • simple algorithm, with flexible options (distance calc method, # of k)
  • considered a benchmark for other classification methods
82
Q

What are some disadvantages of K-NN

A
  • sensitive to outliers and erroneous labels
  • memory intensive with larger k, pts, and features (giant distance matrices)
83
Q

Resubstitution loss is…

A

the error just on the training set

84
Q

What are advantages to decision trees?

A
  • can handle non-linear responses
  • excellent with categorical variables
  • easy to understand for a small number of features
  • once you build the model, classification of new data is computationally quick since it is just binary decisions
85
Q

Disadvantages of decision trees

A
  • struggles with a large number of features with smaller data size
  • difficult to understand for a large number of features
86
Q

Naive Bayes is a ____ classification technique

A

probabilistic

87
Q

What are the 3 AI for energy transition principles

A
  • Governing (Risk Management, Standards, Responsibility)
  • Designing (Automation, Sustainability, Design)
  • Enabling (Data, Incentives, Education)
88
Q

How much investment does BNEF expect to need for a net-zero scenario

A

between 92 and 173 trillion by 2050

89
Q

What are the 4 main fields where AI could be used in Energy Systems

A
  • Renewable power gen. and demand forecasting
  • Grid optimization and operation
  • Management of energy demand and DER
  • Materials discovery and innovation
90
Q

K-means clustering has an inherent risk that the initial clusters converge….

A

to a local minimum, rather than global minimum SSE

91
Q

Is K-means sensitive to outliers?

A

yes

92
Q

Which has a higher time complexity, k-means or hierarchical clustering?

A

hierarchical

93
Q
A