Week 7 - Information Theory (Entropy) Flashcards
What is Entropy (Information Theory?)
ΔS = - log2 Pr(X=x)
Break this down so i understand it.
Δ (delta) S means change in uncertainty (so this means our degree of uncertainty)
The minus sign means that as a probability of an event increases, the amount of information associated with observing that event decreases. So for this example when trying to ID a person, the more we find out about them, the more the uncertainty drops and the more CERTAIN we are)
The log2 means because information is measured in bits 0 or 1 which equals 2. Log base 2 is a mathmatical function
Pr means probability so Pr (X=x) means the probability of random variable X taking on a specific value x) . An example of this would be knowing a birthday 1 / 365 so X=365 and x = 1
In plain English, this formula says that our uncertainty is reduced (ΔS) by a function (-log2) times the probability that a particular fact is true about a person
Why is Information Theory important for investigators?
It was originally developed by Claude Shannon in 1948. Relevant to a lot of areas but we are going to focus on it’s importance for the below 3 purposes:
- As a tool for identifying people (who want to stay anonymous)
- To keep ourselves safe / annonymous online when conducting OSINT to avoid compromsie (operational and personal)
- to find information about anything, regardless of how little we know about it
Using Information Theory in terms of identifying a person
- Information theory is about PIECES OF DATA (independent bits of info that together can uniquely ID someone).
- Every bit of info I get reduces my uncertainty about who they are.
- There are currently 8 billion (2 to the power of 33) people on the planet. So applying a Log base 2 function means Log2(2 to the power of 33) = 32.9 round up to 33 pieces of data is needed to ID someone.
- In this example Entropy is a measure of how close a particular fact comes towards uniquely identifying someone. It is
measured in bits. Essentially, it’s a fact that reduces our overall uncertainty about something
Entropy - an example to ID an unknown person
I start with nothing known.
OSINT gradually starts giving me bits of info.
I find their their sex. Uncertainty ΔS is reduced by –log 2 (1/2) = 1 bit
I find their star sign. Uncertainty ΔS is reduced by –log 2 (1/12) = 3.58 bits.
I find their actual dob. My ΔS is reduced by –log 2 (1/365) = 8.51 bits
I find that they live in Dublin. My ΔS is reduced by –log 2 (1.6m/8.0b) = 12.29 bits
(The population of Dublin divided by the world
population).
I already have 25.38 bits of info!
Think of this theory as quantifying what I already do at work to ID online offenders.
** think of targeted ads online. Small, seemingly irrelevant pieces of data can identify you – even if Google doesn’t know your name, it knows enough to serve
appropriate ads to you if you accidentally leak data **
Information Theory & LE a RL Example - Graham Dwyer
- Irish architect, married with family – found guilty of the murder of Elaine O’Hara in 2015.
- He murdered Elaine in August 2012.
- Case involved a sado-masochist relationship – many communications uncovered on Elaine’s phone/computer were with the offender - gave tiny bits of info away each time abouth themselves - such as habitual clothing, birth of his child, name of child, tattoo enquiries, a/l from work, a comitee meeting, car repairs, a flying competition, a salary cut, pay day, business with Polish Embassy, time of arrival home
- Prosecution could evidence all these facts about the defendent DWYER using various methods and argued this attributed the phone to DWYER.
Remember even things like spellign errors, routine phrases, language use can help ID someone.
What does this mean for our OSINT Investigations - Risk of Compromise
- IP address
- Cookies
- User agent strings
Browser fingerprinting. Particularly important if using LE isssued equipment - think screen resolution, language pack installed, web browser installed.
(Use coveryourtracks.eff.org) to test your browser uniqueness - we don’t want to be unique! How many bits of identifying info did you leak? How close to 33?
Smart criminals will have full statistical logging on their websites.
How can we keep safe?
- One option: Use TAILS (The Incognito Live System). Free, open source linux distribution. Runs from DVD, USB or SD, runs from RAM (volatile) leaves no trace on RAM unless you tell it to. LEAVES NO TRACE ON HDD. Security focused, private and anonymous, uses cryptography like PGP to safely store data. Easy to use.
- Another option: Whonix. Free open source desktop operating system. 2 parts: Whonix gateway (acts as tor gateway anonymising your internet traffic - all IP traffic goes through here so everything goes through tor) & Whonix workstation (isolated to the direct internet by the gateway - designed for day to day use). Run on seperate VMs. Compartmentalises different activites on different VMs.
SECURITY THROUGH ISOLATION. - Use Tor browser. Hides IP & provides minimal fingerprint.
- Use VPN or disposable non attributable pay as you go for internet to hotspot from.
- consider using an agent string switcher add on.