Daniel Dennett and Transformers Flashcards
Daniell Dennett
a philosopher who advocated computational and functionalist views of the mind. He argues that mental states such as intelligence and sentience arise from physical and Turing-complete brain processes. LLMs suggest that machines exhibit complex, human-like behavior without being sentient.
Deep Learning
Mainly tensor algebra. ChatGPT has 96 compound layers with 175 billion parameters. This is 3 orders of magnitude less than the human brain. It was trained on a huge dataset of 499 billion words.
Transformer models map an input tensor space to an output tensor space by a sequence of linear transformations (with some nonlinearities). The parameters are learned when the model is trained over thousands of GPU days with an appropriate loss (error) function.
Distributional Semantics
When a word occurs in a text, its context is the set of words that appears nearby. With this approach we don’t need to construct word meanings, we infer or induce the meaning from the surrounding text, which makes LLMs self-supervised.
Kolmogorov Complexity
The Kolmogorov complexity of a problem is the length in bits of the shortest possible program in some language that solves the problem and terminates it. It is a form of algorithmic entropy.
Brain Capacity, Complexity and Intelligence
There is a small difference between a human brain neuron and that of an ape. Number of neurons is only one thing, brain complexity (structure, cortical folding etc.) is important.
Deep learning models have become more and more capable and now exhibit behavior that was not thought possible. Intelligence is maybe just an emergent property of the structure and complexity of our brains, we call this the capacity and complexity argument.
Emergent Properties in AI
Capabilities that arise from interactions within a complex system, but that are not present in individual components. In AI these capabilities are not explicitly programmed by a human. Human intelligence might just be an emergent property after evolution formed humans.
Note that with increasing model size (parameters, layers) the challenge of understanding how the models work increases.
This does not necessarily mean that researchers resort to calling phenomena ‘emerging properties’ as lack of understanding, but acknowledges the complexity and non-linearity of these systems, where direct tracing of specific outputs becomes impractically difficult.
Able to create systems that are complex, but still can’t fully understand the mechanics of them.
The observation that larger models more readily exhibit these emergent properties is a reflection of the current state of AI.
Why is Model Complexity Exploding?
Capacity for learning: Larger models, with more layers and parameters have a greater capacity to learn from data.
Richer Representation: With more parameters, larger models can develop more intricate representations of the data.
Scaling Laws in Language Models: Research has shown that larger language models tend to improve in performance in a predictable manner as they scale up. This leads to qualitative improvements in capabilities (aka emergent properties).
Complex Interactions: The complex interactions between the many layers and parameters in Large Models can lead to behaviours that are difficult to predict to emerge.
The Computational Theory of Mind
CTM posits that the mind is a universal computer and the mind is a program running on it. A version of functionalism, as a computer is seen as a relationship of states not components. Theory rooted in the idea that cognitive processes are just computation, ie. they involve operations on symbolic representations. Critics argue that theory doesn’t adequately account for non-rational aspects of human cognition such as emotion, intuition and sentience.
Geoffrey Hinton Interview
He thinks AI will in time have consciousness and surpass humans
We know what goes on roughly but when it gets complicated we don’t know
We designed the learning algorithm akin to designing the principle of evolution, but we don’t understand how the algorithms learn.
If we talk about functional intelligence
then LLMs provide some evidence for CTM
If human intelligence was not Turing-computable
it would have significant implications on our understanding of cognition. Would suggest that the brain has super Turing computable power, meaning it could outperform Turing machines.
This would refute the Church-Turing thesis which is foundational in computing.
Transformers for Astrophysics
Transformer models have tremendous potential for applications in astrophysics, these models can handle long range dependencies in the data and efficiently parallelize training.
Vision Transformers (ViTs), which adapt the Transformer architecture to process image patches instead of tokens, demonstrate the versatility of transformer models beyond text.
Graph Transformers apply the principles of Transformer architectures to graph-structured data. They combine the strengths of graph neural networks GNNs and Transformer models by leveraging attention mechanisms to model interactions between nodes in a graph.