Midterm 2 Flashcards
Hume & causality
•causal connections are product of observation
- spatial/temporal contiguity
- temporal succession
- constant conjunction
•relation between experiences not between facts (IN MIND)
Where does causal knowledge emerge from?
non causal input
Causal inference
infer causal relations from patterns of data
Why is causal inference difficult?
- probabilistic and incomplete data
- small samples
- different models can generate same data
Dominant theory of causal relations
people estimate the strength of causal relations on the basis of covariation between events
Contingency tables
Represent outcomes of numerous trials in which cause C is present/absent and effect E is present/absent
Delta-P rule
ΔP = P(E|C) - P(E|~C) P(E|C) = 1/(1+2) P(E|~C) = 3/(3+4) when ΔP + --> C = generative cause when ΔP - --> C = preventative cause when ΔP = 0 --> C = independent of E (non causal)
Common causality mistakes
- often people only compare when cause is present
- only compare when effect is present
Motivated Reasoning experiment
- liberal democrats more likely to correctly identify results by data in crime decreases condition
- conservative republicans more likely to correctly identify results by data in crime increases condition
Simplicity in understanding causes
Occam’s razor: simpler explanation is better (parsimony)
- causal structure (like contingency) must be inferred from input
Alien Disease Experiment
Alien with symptoms S1 and S2
- either has Tritchet’s Syndrome (S1+S2), Morad’s D (S1), Humel I (S2)
- most people said alien had Tritchet’s
Alien Disease Experiment with probability info
majority still choose D1 even though D2 & D3 is mathematically more likely
- people need disproportionate evidence in favour of complex explanation before it can rival simpler one
Deductive Reasoning
- conclusion follows logically from premises
- conclusion guaranteed to be true
Inductive Reasoning
- conclusion likely based on premises
- involves degree of uncertainty
Deductive Inference Rules
1) if premises are true, conclusion is true
2) premises provide conclusive evidence for conclusion
3) impossible for premises to be true and conclusion false
4) logically inconsistent to assert premises but deny conclusion
Modus Ponens
if p then q
p
therefore q
Modus Tollens
if p then q
~q
therefore ~p
Wason Selection task
"if card has vowel on one side then it has an even number on the other" E K 4 7 most people say to flip E and 4 correct answer actually E and 7 (apply modus tollens)
More Concrete version of Wason Task
if person is drinking beer then person must be over 21
‘drinking beer’ ‘drinking coke’ ‘16 yo’ ‘22 yo’
people find the correct answer
Syllogistic Reasoning
all A are B (major premise) all B are C (minor premise) therefore all A are C (conclusion) - logical validity of conclusion is determined entirely after accepting the premises as true - often subject to belief bias
ideological belief bias in syllogistic reasoning
liberals are better at identifying flawed arguments supporting conservative beliefs and vice versa
Mental Models
- postulated by Craik
- models constructed in working memory as a result of perception, comprehension of discourse, or imagination
- mental representations
- can underlie reasoning
- used to formulate conclusions & test strength of conclusions
- alternative to view that depends on formal rules of inference
What do mental models represent?
They represent explicitly what is true but not what is false
–> unexpected consequence = illusory inferences (belief bias)
mental model (Frenchmen & gourmets example)
all frenchmen are gourmets
some gourmets are wine drinkers
people say: some frenchmen are wine drinkers
construct model consistent with both premises
replace with Italians in last premise:
no one draws the same conclusion, different mental model
Wason’s 2-4-6 task
have to find the rule, given 2 4 6 is an ascending sequence
70% offer incorrect rule on first announcement
Dual Goal Wason task
correct and incorrect sequences labelled as DAX and MED
60% induced rule correctly
–> people do better when contrasting two viable alternatives
What’s special about thinking?
- structure-sensitive
- -> reasoning, etc. depends on capacity to represent and manipulate relational knowledge
- flexible in way in which knowledge is accessed
- -> apply old knowledge to new situations
Relational thinking across species: Match-to-sample task
B or C more like A?
chimpanzees answer differently than humans
Relational/Analogical Inference
- Inductive in nature
- analogical inference: generalizing properties/relations from one domain to another
- analogical transfer: solving problem in one domain based on solution in another domain (ex: fortress/radiation problem)
Gick/Holyoak radiation problem
control: no base problem no hint, 20%
base problem no hint: 30%
base problem + hint: 75%
Analogical transfer steps
Recognition (identify possible analog or base domain)
Abstraction (abstract general principle from base problem)
Mapping (apply principle to target)
Analogical inference
knowledge about base domain can be used to reason about target domain
–> structure mapping
Relations
- can be represented as a proposition which specifies which element fill the roles of the predicate
- can be nested within other relations (higher-order relations)
structured relational representations
attribute: big(sun)
lower-order relation: bigger(sun, planets)
higher-order relation: CAUSE[bigger(sun, planets), revolves around(planet, sun)]
analogy
when two conceptual domains share relational similarity
- one-to-one mapping: sun –> nucleus
- parallel connectivity: sun –> nucleus & planets –> electrons
constraints on analogical mapping
systematicity: deeply nested relational structures make better analogies
Manipulation of irrelevant superficial features (WWII vs. Vietnam)
subjects’ preferred policy was significantly more interventionist when scenario contained WWII features than Vietnam features
Structural Alignment
- helps people align objects based on relational positions rather than superficial similarity
- surface or structural similarity?
Relational reasoning in children
when given triads that showed relational pattern across different dimensions, have difficulty recognizing similar pattern
Theory of progressive alignment
comparison of highly similar before less similar items fosters re-representation of relevant relations
–> children more able to recognize
Near vs. Far transfer
near transfer: apply knowledge from a closely related base domain to the target domain (ex: water pump to steam engine)
far transfer: apply knowledge from seemingly distant domain to base target (ex: velcro)
Formal systems
system of axioms (propositions assumed true) + inference rules (allow for other conclusions to be derived)
Completeness
either it or its negation can be proved
Consistency
no statement in it such that both it and its negation are derivable
- if inconsistent then complete
Logicism
- Frege
- provide logical foundations for mathematics
- -> for ontological and epistemic reasons
- major flaw in system -> paradox showed inconsistent
Barber Paradox
barber shaves everyone that doesn’t shave themselves
but who shaves the barber?
Russell’s paradox
involves self-reference
different sets that contain themselves
Gödel’s incompleteness theorem
any consistent axiomatic system strong enough to carry out much of arithmetic is incomplete
- -> some thought this showed mechanism is false
- -> not true because that would mean humans can always see whether something is consistent or not
Problem theory
• general theory of problem solving as search through a space • 4 elements: - initial state - goal state - operators - path constraints
State Spaces and Search
initial state: where problem solving begins
goal state: what you want to reach
operators: actions to be taken that serve to alter current state
path constraints: ex, finding solution in least possible steps
Problem space
set of all states that can potentially be reached by applying the available operators
Search Trees
paths of initial state to goal state
Search strategies considerations
completeness (does it always find the goal state) optimality (shortest path) time complexity (how long) space complexity (keeping track of which states you've visited)
Factors that affect time and space complexity
B (branching factor/breadth)
D (depth in tree of goal state)
Brute force search strategy
- systematically consider all possible action sequences to find a path
- only uses info available in problem definition
- problem: exponentially increasing with depth, NP-hard (combinatorial explosion)
advantages: guaranteed to find a solution, good for simple problems
Breadth-first search
- try shortest paths first
(basically go through all the options)
finds shortest path
Depth-first search
explore states in order
conserves memory
Heuristic search techniques
- focus on promising areas
- uses evaluation function to score states in tree
advantages: good for complex problems with large search spaces
Hill-Climbing
always choose the next state with the lowest score
BUT search may halt without success in a local minimum
Best-first search
brute force search but prioritize lower score states
Forward vs. Backward search
forward (self explanatory)
–> applying operators to generate new state
backwards : allows to eliminate useless or spurious paths
–> finding operators to produce current state
Means-ends analysis
- mix of forward and backward search
- search is guided by detection of differences between current and goal states
1. compare current and goal state
2. select operator that would reduce differences
3. set new subgoal if operator cannot be applied
4. return to step 1
Sussman Anomaly
in process of achieving new sub-goal, might entail reversing/undoing a goal it had already achieved
STRIPS “Stanford Research Institute Problem Solver”
- simple reasonably expressive planning language
- actions connect before and after world states
- SHAKEY the robot
- -> simplicity of kinds of states can SHAKEY can represent limit its problem-solving capabilities
Frame Problem
only effects operator has on world are those specified by ‘add’ and ‘delete’ lists
–> in real world planning, hard assumption to make; can never be certain of extent of effects of an action
Constraint Satisfaction Problems (CSPs)
states and goal test conform to a standard, structured and simple representation
set of variables
set of constraints
goal
Neural Networks
- alternative to traditional processing models
- aka PDP (parallel distributed processing) or connectionist model
- biological plausibility
(unit/node = neuron)
Key components of a unit
- set of synapses (INPUTS) brings activations from other neurons
- processing unit sums up inputs, applies activation function
- output line transmits result to other neurons
Units
activation: activity of unit
weight: strength of connection between two units
learning: changing weight
Total input of units
sum of activation of j times the weight between i and j
Perceptron
one layer of input neurons feeding towards one output layer of McGulloch-Pitts neurons with full connectivity
- can compute any linear function
Boolean AND
sum > 1.5 to activate
Boolean OR
sum > 0.5 to activate
Boolean NOT
sum > -0.5
Multi-layered networks
activation flows from input units –> hidden units –> output units
weights determine how input patterns mapped to output patterns
Backpropagation
common weight-adjustment algorithm
Two learning methods of Hebbian learning
- unsupervised: network tries to discern regularities in input patterns
- supervised: input is associated with correct output and network’s job is to learn this input-output mapping (ex: NETtalk)
localist representation
each unit represents one item (ex: phoneme outputs in NETtalk)
distributed representation
each unit involved in representation of multiple items
- efficient
- even if come units don’t work, info still preserved
Catastrophic Interference
training for new rule increases error on old rule
Concurrent training
all items to be learned included in single training set
Sequential training
first learn one rule then the next
–> catastrophic interference
Deep neural networks
- many hidden layers
- capture more regularities in data and generalize better
- activity can flow from input to output and vice-versa
Generative Adversarial Net (GAN)
generator: learns to generate plausible data
discriminator: learns to distinguish generator’s fake data from real data
Can a general purpose algorithm outperform specialized algorithms in a task?
No
AI set
set of tasks that people and animals are good at
Machine Learning
study of algorithms that:
- improve their performance P
- at some task T
- with experience E
Traditional programming vs Machine learning
data/program –> computer –> output
data/output –> computer –> program
Tasks best solved by machine learning
- recognizing patters
- generating patterns
- recognizing anomalies
- prediction/recommendation
overfitting problem (regression)
perfect fit to sample but not good for making predictions
basically fitting noise
test set method
- randomly choose 30% of data to be test set
- remainder is training set
- perform regression on training set
- estimate future performance with test set
* imposes penalty for unnecessary complexity
Classification
learn f(x) to predict y given x y is categorical model learns criterion
linear classifiers
- linear function to separate classes
- does not always work well
k-Nearest neighbours (KNN)
test items assigned to class most common among k nearest neighbours
Clustering
given x1, x2, …
output hidden structure underlying x’s
ex: grouping individuals by genetic similarity
number of clusters dictated by K
K
too big: creates artificial boundaries within real data clusters
too small: disjoint groups of data are forced together
Dimensionality reduction
- determine source signals given only mixture
ex: blind source separation aka cocktail problem
Assumptions in dimensionality reduction
- source signals are statistically independent
- -> independent component analysis method
Spurious correlation
to avoid drawing inference from spurious correlation –> employ test set method
Machine Learning Problems
SUPERVISED UNSUPERVISED
DISCRETE CONTINUOUS
classification/categorization clustering
regression dimensionality reduction