Expert Systems Flashcards

1
Q

Expert System

A

interact with user to collect facts and help with a decision process

an expert system is a computer system that can make decisions that normally only experts can make

subtype of intelligent system

eg: dendral to explain organic molecules or mycin to advise bacterial therapy to physician (give confidence interval)
usages: medical, crime, …

Can be checked for corectness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Expert system vs Intelligent System vs Decision Support System

A

Vs IS
An expert system is only software. It’s not
embedded in the real world. So you could
say it’s a subcategory of intelligent systems

Vs Decision Support System
● An expert system combines knowledge with
reasoning, and makes the decision for
you by using the information you give it. Could explain decision to user.

● A decision support system helps you
process data (data analytics dashboard,
etc), and helps you make your own
decision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Components of an Expert System

A

● User makes query, gets sent to an inference
engine which interacts with the knowledge
base (composed of rules and facts
determined in the knowledge engineering
process)
● User receives back advice and perhaps an
explanation about why that particular advice
was given

  • Interface: for example chat-boxes or a question list.
    -Knowledge Base: built by experts combined with data
    scientists (or something related)
    The KB and inference engine use rules (if-then statements) and facts (data about specific cases/instances) to
    inference. Different knowledge representations are description logics, ontologies or non-monotonic logics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Inference Methods

A
Inference Methods that may be used:
If else statements (eg: myecin)
● Bayes rule and naive bayes
● Graphical models
● Factor Graphs
● Markov Network
● Bayesian Network

in general, exact inference is NP-complete. But we can do for example
marginalizing → determine the node that you want to have the probability for and
them you sum over all values that the node could possibly have (include outside
factors that influence the probability of the node).
-variable elimination
-loopy belief

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Knowledge Representations

A

Description, ontologies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Advantages

A

● Consistency. It will make the same decision
given the same data
● Memory. It won’t forget a rule or
fact…humans forget stuff all the time.
● Logic. No sentimentality that clouds its
decision making.
● Infinitely reproducible. Doesn’t get tired, can
be copy-pasted to other places

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Disadvantages

A

● Common sense is a problem. It’s hard to
program common sense
● No creativity (it can’t find new solutions to
new problems)
● Hard to maintain. Knowledge base, with
potentially 1000s of facts, needs to be
updated manually (this is the real killer)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Probabilistic Reasoning

A

when you have uncertainty and you want to model that uncertainty (as part of the
model). Example: facts are P(A=student) = 0.6, P(B=heads) = 0.5. Then reasoning: P(A|B) = x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Bayes Rule

A

: P(A|B) = P(B|A) ∙ P(A)/P(B) . In general, there is too much data needed to estimate a target variable.
Example: assume some lab test for a disease has 98% chance of giving positive result if the disease is present,
and 97% chance of giving a negative result if the disease is absent. Assume 0.8% of the population has this
disease. Given a positive result, what is the probability that the disease is present?
P(disease|Pos) = P(POS│DIS) ∙ P(DIS)
P(POS) =
0.98 ∙0.008/(0.98 ∙ 0.0008+0.03 ∙0.992)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Naive Bayes

A

approximation of Bayes’ rule. It assumes that given the value of the class, all the attributes are
independent (conditional independence). It’s much more feasible than Bayes’ rule, but makes extreme
assumptions.
Therefore we have two extremes: either we guess the joint probability distribution (which would yield optimal
classifier, but is infeasible in practice) or we use Naïve Bayes (which is much more feasible, but makes too
strong assumptions). We want something in between. –> factor graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Marginalizing vs Maximization

A

marginalizing means finding a proba value
P(V=..)

Maximization finds the most probable value V = argmax P(V=..) -> can be used with Bayes Rules to classify (whats the most probable value target/class)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Factor Graphs

A

graphs that group random variables into conditionally independent ‘cliques’.
bipartite graph representing the factorization of a function.

A factor graph is a type of probabilistic graphical model. A factor graph has two types of nodes:

Variables, which can be either evidence variables when their value is known, or query variables when their value should be predicted.

Factors, which define the relationships between variables in the graph. Each factor can be connected to many variables and comes with a factor function to define the relationship between these variables. For example, if a factor node is connected to two variables nodes A and B, a possible factor function could be imply(A,B), meaning that if the random variable A takes value 1, then so must the random variable B. Each factor function has a weight associated with it, which describes how much influence the factor has on its variables in relative terms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Markov Models

A

an undirected model that uses a non-directed graph where the cliques are fully connected subgraphs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Bayesian Network

A
k: a directed model that uses a directed graph and conditional
probability tables of a class given its parents. As a factor graph
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Marginalising through variable elimination

A

Variable elimination: an exact inference method that determines Px(Y|Z) by getting rid of all random variables
x in X\Y by multiplying all factors in which x appears and:
- Filling in the observed values of x if x is in Z
- Summing out over all possible values of x if x is not in Z.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Loopy belief Propagation marginalising

A

communicate potential values to one another and these are used to make the marginals
message passing algorithm with damped updates. ‘If convergence’ and marginal of
X is the product of converged messages sent to X.

17
Q

Sota adding probabilities to logic

A
  • Statistical relational AI: StaRAI
  • Distribution semantics: ProbLog
  • Causal probabilities: CP-logic (making things a bit simpler).
18
Q

Distribution Semantics

A

Graph approach to find proba of joint events
uses probabilistic predicates and logical rules. Basically first-order logic (logic with
variables) where ProbLog adds probabilities to logic rules (“probability that this predicate is true”). It assumes
marginal independence between ground atoms.

In ProbLog, to determine if there is an edge: draw a random number 0

19
Q

CP Logic: Causal Probabilites

A

joint probabilities can be hard to estimate by human experts, we’re much better at
prediction causal probabilities