Specialization IB - question 4 (Probability) Flashcards

1
Q

Stavebni bloky pravdepodobnosti

A
  • Nahodny experiment - proces s nejistym vysledkem
  • Sample space (vyberovy prostor) - mnozina vsech moznych vysledku experimentu
  • Sample point - prvek z Sample Space
  • Event (jev) - podmnozina vyberoveho prostoru (Sample space) vysledek/vysledky nahodneho experimentu
  • Da se pracovat jako s mnozinami (A ∪ B, Co-A)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Stavebni bloky na prikladu

A

Nahodny experiment - hod kostkou (6-stranna)
Vyberovy prostor je {1, 2, 3, 4, 5, 6}
Elementarni jev (Sample point) napr. 1
Jev nahodneho pokusu napr. dostaneme suda cisla na kostce (podmnozina je {2,4,6})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Vlastostni prace s mnozinami

A

Komutativita - A ∪ B == B ∪ A
Asociativita - A ∪ (B ∪ C) == (A ∪ B) ∪ C
Distributivita - A ∪ (B ∩ C) == (A ∪ B) ∩ (A ∪ C)
Identita - A ∪ ∅ = A, A ∩ S = A
Komplement - A ∪ Co-A = S
De Morganovy Zakony -Co-(A ∪ B) = Co-A ∩ Co-B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Diskretni pravdepodobnostni prostor

A

trojice (S, F, P)
* S = Sample space (Vyberovy prostor)
* F = power set S (2^S)
* P = pravdepodobnostni funkce F -> [0, 1] s podminkami, ze P(S) = 1

Konecny/Pocitatelny prostor (pravdepodobnosti se daj scitat/pocitat)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Spojity pravdepodobnostni prostor (Continous probability space)

A

trojice (S, F, P)
* S = Sample space (Vyberovy prostor)
* F = sigma-algebra S = F je podmnozinou 2^S
* P = pravdepodobnostni funkce F -> [0, 1] s podminkami, ze P(S) = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Vlastnosti sigma-algebra

A

F je podmnozinou 2^S

  • Obsahuje prazdny prvek (∅ ∈ F)
  • Je uzavrena na doplnek (Closure on complement) - jestli A ∈ F tak potom i Co-A ∈ F
  • Je uzavrena na spocetna spojeni (Closure on union) - jestli A_1,..A_n ∈ F tak potom ⋃︀ A_i ∈ F (i from 1 do n); n = inf
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Podminecna pravdepodobnost

A

Pravdepodobnost jevu A, ze se stane jev B

P(A|B) = P(A ∩ B) / P(B)
<=>
* P(A ∩ B) = P(B) * P(A|B) ; if P(B) != 0
* P(A ∩ B) = P(A) * P(B|A) ; if P(A) != 0
* P(A ∩ B) = 0; if P(B) = P(A) = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Nezavisly jev?

A

P(A|B) = P(A)
P(B|A) = P(B)

P(A ∩ B) = P(A) * P(B)

(Ven diagram)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Law of total probability

A
  • B_1, …, B_n tvori S
  • A je event

P(A) = ∑︀ { P(A|B_i)*P(B_i) }

Pouziti: If independent cases (B_i) and we want to know some general property (across all cases) (A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bayes’ rule

A

Pouziti: Zname obecnou vlastnost a chceme ji zjistit pro konkretni pripad (subcases)

P(B_j | A) = { P(A|B_j)*P(B_j) } / P(A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Nahodne veliciny

A

Funkce, S-> R, ktera prirazuje realne cislo X(s) pro s ∈ S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Pouziti nahodnych velicin

A

v Probability mass distribution, Cumulative distribution function, stredni hodnota, rozptyl (atp.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Obraz - diskretni nahodna velicina

A

Mnozina Im(X) = { X(s) | s ∈ S }

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Inverzni obraz nahodnych velicin - diskretni nahodna velicina

A

jev takovy, ze: [X = x] = { s ∈ S | X(s) = x }

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Probability mass distribution - diskretni nahodna velicina

A

Probability mass distribution je funkce p_X : R -> [0, 1]

P(X = x) = ∑︀ P(s) (pro s, ze X(s) = x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Cumulative distribution function - diskretni nahodna velicina

A

je funkce F_X: R -> [0, 1]

F_X (t) = P(X ≤ t) = ∑︀ p_X(x) ( x ≤ t )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

priklady Diskretni distribuce

A
  • Uniform (rovnomerna) - kazdy vzorek ma stejnou sanci (napr. hod kostkou)
  • Bernoulli - vysledek je 0 nebo 1 - napr. hod minci
  • Geometric (Geometricka) - pocet bernoulliho pokusu nez prvni uspech (napr. hod minci dokud nepadne hlava)
  • Binomial - pocet k uspechu v n za sebou jdoucich pokusech (napr. pocet hlav v 10 hodech minci)
  • possion - vyskyt jevu za cas
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Probability mass distribution - Uniform

A

p_X (x_i) = 1/n ; if x_i ∈ Im(𝑋)
p_X (x_i) = 0 ; otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Probability mass distribution - Bernoulli

A

p_X (x) = p ; if x = 1
p_X (x) = 1-p ; if x = 0
p_X (x) = 0 ; otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Probability mass distribution - Geometric

A

p_X (a) = p * (1-p) ^ (a-1) ; if a ∈ N
p_X (a) = 0 ; otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Probability mass distribution - Binomial

A

p_X (k) = ( n over k ) * { p^(k) } * { (1-p) ^ (n - k) } ; 0 ≤ k ≤ n
p_X (k) = 0 ; otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Cumulative distribution - Uniform

A

F_X (t) = 0 ; t < x_1
F_X (t) = i/n ; x_i < t < x_(i+1)
F_X (t) = 1 ; >= x_n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Cumulative distribution - Bernoulli

A

F_X (t) = 0 ; t < 0
F_X (t) = 1-p ; 0 ≤ t < 1
F_X (t) = 1 ; t ≥ 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Cumulative distribution - Geometric

A

a=1 … |t| ∑ p * (1-p)^(a-1)

25
Q

Cumulative distribution - Binomial

A

k=1 … |t| ∑ ( n over k ) * p^(k) * (1-p)^(n - k)

26
Q

Spojita nahodna velicina

A

Nema PMF - ale density function (hustota pravdepodobnosti)

F(x) = from -inf .. x ∫︀ f_X (t) dt

27
Q

Joint discrete probability mass function, cumulative mass function

A

p_(X, Y) = P(X=x && Y=y)

another formula:

p_(X, Y) = P(Y=y | X=x)P(X=x) = P(X=x|Y=y)P(Y=y)

je v rozmezi [0, 1]

28
Q

Priklady distribuci spojitych nahodnych velicin

A
  • uniformi - rovnomerne
  • normalove - napr. distribuce IQ
  • Exponencionalni
29
Q

Diskretni nahodny vektor

A

je to vektor nahodnych velicin (X_1, … , X_r)

nejcasteji pro dve veliciny na porovnani

30
Q

Stredni hodnota (Expected value)

A

Ocekavana hodnota. Pro uniformni hodnoty prumer

E(X) = ︁∑ x * P(x) ; x ∈ Im(X)

31
Q

Stredni hodnota uniformni distribuce

A

E(X) = ∑ (x*1)/n = ( ∑ x ) / n

32
Q

Stredni hodnota Bernoulli

A

E(X) = 0 * (1-p) + 1*p = p

33
Q

Stredni hodnota binomial

A

E(X) = np

34
Q

Stredni hodnota geometricke

A

E(X) = 1/p

35
Q

Stredni hodnota geometricke

A

E(X) = 1/p

36
Q

Vlastnosti strednich hodnot

A
  • Linearita
    1. E(c * X) = c * E(X)
    2. E(X + Y) = E(X) + E(Y)
  • Jestli X, Y nezavisle -> E(X * Y) = E(X)*E(Y)
  • Podminena stredni hodnota: X, Y diskretni nahodne veliciny.
    1. E(X|Y=y) = ∑ x * P( X=x | Y=y )
  • Markov inequality (Markova nerovnost) - X je non-negative nahodna velicina s konecnou stredni hodnotou. Potom plati pro vsechny t > 0:
    P(X ≥ t) ≤ E(X) / t

Markov inequality stanovuje upper bound probablity na bazi stredni hodnoty.

37
Q

Rozptyl

A

second central moment
VAR(X) = E([X- E(X)] ^2) = ∑ (x - E(x))^2

38
Q

Stochastické procesy

A

collection of random variable { X_t | t ∈T }

  • T = cas
  • X_t = stav X v case t

sekvence epxerimentuje, ktere sleduji vyvoj v case (napr. teplota CPU, vyvoj populace) - bereme jenom hodnoty v diskretnim case (mereni v casovych bodech - treba jednou za hodinu)

39
Q

Markovovy řetězce

A

Jsou stochasticke procesy takove:

P(X_(t+1) = a | X_t = b) neboli budouci hodnota zavisi pouze na aktualni hodnote

Da se rict, ze vyjadruji pravdepodobnost prechodu do dalsiho stavu - ergo daji se reprezentovat automatem, matici (transition matrix).

napr. Ze stavu ‘A’ muze prejit do stavu ‘B’, ‘C’, ‘D’ avsak pravdepodobnosti vsech prechodu z ‘A’ se musi rovnat 1 (100%). Tedy P(‘A’, ‘B’) + P(‘A’, ‘C’) + P(‘A’, ‘D’) + P(‘A’, ‘A’) = 1

Pouziti v NLP napr. mame vety, analyzujem posloupnout a stvorime automat:
Zacina: slovem I, nasledne treba 30% don’t, 70% like a tak

40
Q

Teorie informace - entropie

A

H (X) = - ∑ p_X(x)*log p_X(x) ; p_X(x) je pravdepodobnostni funkce

Entropie 0 = vime, co se stane
Nejvetsi entropii ma uniformni distribuci

Vyjadruje se v bitech (informace). Stanovuje nejmensi ocekavany pocet bitu na preneseni dane informace (ocekavane - stredni hodnota vpodstate)

log_b = b - base, volime dle jednotky … log_2 pro bity. log_8 pro bytes treba
H(1/(mn), …) = H(1/m, …) + H(1/n, …); m a n jsou nezavisle

41
Q

teorie kodovani - huffmanovy kody

A

Casto pouzite pro loosless kompresi dat

funguje, ze nejcasteji vyskytujicimu prvku se da nejkratsi kodove slovo (0) a naopak - prvku x z X kde x ma nejmensi pravdepodobnost, tak x ma nejdelsi slovo.

a_1 = 1/2 … 0
a_2 = 1/4 … 10
a_3 = 1/8 … 110
a_4 = 1/8 … 111

42
Q

teorie kodovani - kapacita hlucneho kanalu

A

R = { log_2{M} } / n # (bitu pro prenos - throughput)

Kapacita kanalu:
C = max I(X; Y) = max [H(Y ) − H(Y | X )] kde
* nahodna velicina X je vstup
* nahodna velicina Y je vystup

43
Q

Standard Deviation

A

SQRT( VAR(X) ) // rozptyl je vypocitany jako jednotka^2 .. napr m^2

44
Q

Vlastnosti rozptylu

A
  • VAR(aX + b) = a^2 * VAR( X )
  • VAR(X) = E(X^2) - E(X)^2
45
Q

Covariance (Kovariance)

A

COV(X, Y) = E[X -E(X)] * E [Y - E(Y) ]

meri linearni zavislost
* Cov < 0 = anticorrelated
* Cov = 0 neutralni linearni zavisle
* Cov > 0 - linearne zavisle

46
Q

Zakon velkych cisel

A

Cim delsi sekvence nezavislych nahodnych velicin (experimentu), tim bliz se dostavame ke stredni hodnote (strong law of large numbers)

47
Q

Chebyshev inequality

A

P(|X - E(X)| ≥ t) ≤ VAR(X) / t^2

  • stanovuje uper bound na rozptylu
48
Q

Continuous-time Markov chain (CTMC)

A

Zatimco Markov chain diskretni je v periodickych casech, continous specifikuje libovolne mnozstvi casu nez nastane prechod

49
Q

Joint Entropy

A

entropie paru (n-tice)
X,Y - jsou korelovatelne
H (X, Y) = - ∑_x ∑_y { p(x, y)*log p(x, y) }

50
Q

Conditional Entropy

A

Mira nejistoty (H(X|Y)) hodnoty X za predpokladu, ze je dana hodnota Y

H(X|Y) = 0 jestli X je kompletne zavisle na Y
H(X|Y) = H(X) jestli X a Y jsou nezavisle

H (X|Y) = - ∑ P(X=x | Y=y)*log{ P(X=x | Y=y) }

51
Q

General Chain Rule

A

H(X_1, X_2, … X_n) = H(X_1) + H(X_2|X_1) + .. + H( X_n | X_1 )

Podobne:
D(p(x, y ) || q(x, y )) = D( p(x) || q(x)) + D( p(y|x) || q(y|x))

H(X , Y) = H(Y) + H(X|Y)

52
Q

Venn diagram (entropie)

A

Circle A = H(X)
Circle B = H(Y)

cast A and B (prunik) = I(X; Y)

dohromady A a B (union) = H(X, Y)

H(X|Y) = H(X) - I(X;Y)
H(Y|X) = H(Y) - I(X;Y)

53
Q

Relative entropy

A

Vyjadruje, jak se pravdepodobnosti rozlozeni p lisi od pravdepodobnostniho rozlozeni q

D(p || q) = - ∑ p(x) * log ( p(x) / q(x) )

54
Q

Mutual Information

A

Meri mnozstvi informaci, ktere veliciny X a Y maji spolecne

I(X; Y) = 0 - nezavisle
I(X; Y) = ∑ p(x, y) log { p(x, y) / ( p(x) * p(y) ) }

I(X;Y) = H(X) + H(Y) - H(X,Y)

55
Q

Conditional Mutual Information

A

I(X;Y|Z)=H(X|Z) - H(X|Y, Z)

Venn diagram: cast, ktera je spolecna pouze X a Y

56
Q

Probability of an error for the code

A

Mame kod (M, X^n, g)
* M = pocet kodovych slov
* f = enkodovaci funkce f: X^n: {1, 2, 3, …, M} -> X^n
* g = dekodovaci funkce g: Y^n -> {1, 2, 3, …, M}

Kanal je pak definov (X, p(y|x), Y)
* X je vstupni abeceda
* p(y|x) je pravdepodobnostni funkce
* Y je vystupni abeceda

λ_i = SUM_{y^n takovych, ze g(y^n) != i } { p( y^n | x^n ) }

max prob of error λ_max = max λ_i

57
Q

Symmetricky kanal

A

Kanal je pak definov (X, p(y|x), Y)
* X je vstupni abeceda
* p(y|x) je pravdepodobnostni funkce
* Y je vystupni abeceda

mapovani:
0 -> 0: 1-p
0 -> 1: p
1 -> 0: p
1 -> 1: 1-p

neb ze pravdepodobnosti, ze se bit prenese spravne jsou stejne.

58
Q

Memoryless channel

A

Bez pameti: zavislost vystupu zalezi ciste na aktualnim vstupu

p( y_k | x^k)