Expert Systems Flashcards
Expert System
interact with user to collect facts and help with a decision process
an expert system is a computer system that can make decisions that normally only experts can make
subtype of intelligent system
eg: dendral to explain organic molecules or mycin to advise bacterial therapy to physician (give confidence interval)
usages: medical, crime, …
Can be checked for corectness
Expert system vs Intelligent System vs Decision Support System
Vs IS
An expert system is only software. It’s not
embedded in the real world. So you could
say it’s a subcategory of intelligent systems
Vs Decision Support System
● An expert system combines knowledge with
reasoning, and makes the decision for
you by using the information you give it. Could explain decision to user.
● A decision support system helps you
process data (data analytics dashboard,
etc), and helps you make your own
decision
Components of an Expert System
● User makes query, gets sent to an inference
engine which interacts with the knowledge
base (composed of rules and facts
determined in the knowledge engineering
process)
● User receives back advice and perhaps an
explanation about why that particular advice
was given
- Interface: for example chat-boxes or a question list.
-Knowledge Base: built by experts combined with data
scientists (or something related)
The KB and inference engine use rules (if-then statements) and facts (data about specific cases/instances) to
inference. Different knowledge representations are description logics, ontologies or non-monotonic logics
Inference Methods
Inference Methods that may be used: If else statements (eg: myecin) ● Bayes rule and naive bayes ● Graphical models ● Factor Graphs ● Markov Network ● Bayesian Network
in general, exact inference is NP-complete. But we can do for example
marginalizing → determine the node that you want to have the probability for and
them you sum over all values that the node could possibly have (include outside
factors that influence the probability of the node).
-variable elimination
-loopy belief
Knowledge Representations
Description, ontologies
Advantages
● Consistency. It will make the same decision
given the same data
● Memory. It won’t forget a rule or
fact…humans forget stuff all the time.
● Logic. No sentimentality that clouds its
decision making.
● Infinitely reproducible. Doesn’t get tired, can
be copy-pasted to other places
Disadvantages
● Common sense is a problem. It’s hard to
program common sense
● No creativity (it can’t find new solutions to
new problems)
● Hard to maintain. Knowledge base, with
potentially 1000s of facts, needs to be
updated manually (this is the real killer)
Probabilistic Reasoning
when you have uncertainty and you want to model that uncertainty (as part of the
model). Example: facts are P(A=student) = 0.6, P(B=heads) = 0.5. Then reasoning: P(A|B) = x.
Bayes Rule
: P(A|B) = P(B|A) ∙ P(A)/P(B) . In general, there is too much data needed to estimate a target variable.
Example: assume some lab test for a disease has 98% chance of giving positive result if the disease is present,
and 97% chance of giving a negative result if the disease is absent. Assume 0.8% of the population has this
disease. Given a positive result, what is the probability that the disease is present?
P(disease|Pos) = P(POS│DIS) ∙ P(DIS)
P(POS) =
0.98 ∙0.008/(0.98 ∙ 0.0008+0.03 ∙0.992)
Naive Bayes
approximation of Bayes’ rule. It assumes that given the value of the class, all the attributes are
independent (conditional independence). It’s much more feasible than Bayes’ rule, but makes extreme
assumptions.
Therefore we have two extremes: either we guess the joint probability distribution (which would yield optimal
classifier, but is infeasible in practice) or we use Naïve Bayes (which is much more feasible, but makes too
strong assumptions). We want something in between. –> factor graphs
Marginalizing vs Maximization
marginalizing means finding a proba value
P(V=..)
Maximization finds the most probable value V = argmax P(V=..) -> can be used with Bayes Rules to classify (whats the most probable value target/class)
Factor Graphs
graphs that group random variables into conditionally independent ‘cliques’.
bipartite graph representing the factorization of a function.
A factor graph is a type of probabilistic graphical model. A factor graph has two types of nodes:
Variables, which can be either evidence variables when their value is known, or query variables when their value should be predicted.
Factors, which define the relationships between variables in the graph. Each factor can be connected to many variables and comes with a factor function to define the relationship between these variables. For example, if a factor node is connected to two variables nodes A and B, a possible factor function could be imply(A,B), meaning that if the random variable A takes value 1, then so must the random variable B. Each factor function has a weight associated with it, which describes how much influence the factor has on its variables in relative terms.
Markov Models
an undirected model that uses a non-directed graph where the cliques are fully connected subgraphs.
Bayesian Network
k: a directed model that uses a directed graph and conditional probability tables of a class given its parents. As a factor graph
Marginalising through variable elimination
Variable elimination: an exact inference method that determines Px(Y|Z) by getting rid of all random variables
x in X\Y by multiplying all factors in which x appears and:
- Filling in the observed values of x if x is in Z
- Summing out over all possible values of x if x is not in Z.