Information Theory 1 Flashcards
What is the fundamental problem of communication introduced by Shannon? (communication via analogue phones)
According to Shannon, what was irrelevant to the engineering problem in communication?
-that you have to reproduce at one point either exactly or approximately a message selected at another point
-semantic aspects of communication/words because semantics of communication are vary between individuals
What is surprise?
a measure of information for a specific outcome (eg heads/tails)
If biased coin, 80% heads, what is the surprise and information like for both outcomes?
heads
tails
heads: low surprise, low information
tails: high, high
What is the surprise and information like for the outcomes of a coin with heads on both sides?
only heads as an outcome
zero surprise, zero information
(no information transfer)
What is the equation to calculate information/surprise using probability?
I = -log2(probability)
If the probability of a biased coins for heads is P(heads)=0.8, what is the surprise I(heads)= ?
What is the units for this ^?
I(heads)= -log2(0.8) = 0.32
bits ??
What is entropy?
Units?
-entropy is average surprise/ the weighted average information that is transferred
bits
How do you calculate entropy?
H(C)= summation of: probability timse the surprise for each outcome
eg.
H(C) = 0.8 · 0.32 + 0.2 · 2.32 = 0.72
What does the curve look like on a graph with probability on the x axis and the entropy on the y axis?
Why?
parabola (positive like a hill)
because when the probability is equal, the entropy peaks, this is because is more random than when using a biased coin. (because you can predict less but biased coin you can predict more/less surprise)
What is the units of surprise and entropy ?
-bits
To calculate surprise or entropy in Information Theory, why do we use log base 2?
because the outcomes are binary (only two possible options)
Generally as entropy increases, what happens to surprise?
surprise increase as the system is more random/uncertain and less predictable
What is the second-order Markov character models in the English language?
What is Zipf’s Law?
-prediction of certain two letter to be next to each other eg. q is zero apart from with u
-in each language, different words are used more or less frequently
What are LLMs?
large language models: like ChatGPT, these use billions of parameters and all written text available to generate convincing text
What does Information Theory allow us to do?
quantify the amount of information transmitted in a channel