Bis Ende Flashcards
TRIE
Unterscheid PATRICIA und PAT-tree
PAT-Trees preprocess a text for searching patterns.
Sisstring
Page Rank
- between 0 and 1
- likelihood to be visited by a person randomly surfing the web an following links
- Weight x
- 1/N: probability to visit A when being selected randomly out of all documents
- sum of ranks of all documents which link to A, divided by their number of outgoing links
Page Rank ausrechnen
putting the ranks of the latest iteration into the formula until the values converge
Data Encryption Standard
- 64 bit key in 54 by ignoring every 8th bit
- permutation table: alt in Tabelle -> neue Postition in Tabelle
- round keys: dividing 54 into 28 bit halves and left shifting in q cyclic way
- different bits of the key are used in each of the 16 rounds of the DES, which is one of the strengths of this algorithm
- number of weak keys however is quite small (about 64)
Hexadecimal
DES
S-box
- nonlinear
- safety of DES
- each S-box is a 4x16 table
- In DES there are 8 fixed S-Boxes
- S-box reduces the amount of bits Input: 6 bits -> output: 4 bits
S-box nonlinear Beweis
- und letzte = Zeile
- dazwischen = Spalte
- bei 0 anfangen zu zählen
linearity would allow for attacks with easy linear algebra.
RSA
Explain how a search engine would proceed in order to return the relevant documents for our query and write down the mathematical formulas.
- eliminate all documents which do not contain any of the query terms
- number of times each query term occurs in each document, the so-called term frequency tfij (with i being the term and j being the document)
- inverse document frequency factor idfi: weight of terms that occur very frequently in the collection and increases the weight of terms that occur rarely.
- dividing the number of all documents (here: n) by the number of documents containing the term (document frequency dfi)
Take into consideration that some documents might be (a lot) larger than others. What problem might occur regarding our query terms and how can we prevent this problem.
- we need to normalize this count in order to prevent a bias towards longer documents to give a measure of the importance
- Nenner: to the maximum term frequency of any term occurring in the document dj
What could you do in order to improve the relevance of one of your documents? Which actions are considered manipulation?
search engine optimization (SEO)
on-/off-page factors
punish manipulations by giving the document a low relevance
- Term-/Keyword-frequency: use certain keywords very often (clever writing or spamming/doorway page)
- Link structure: Documents linking to your document improve its rating (link farms)
cryptology
- Confusion
- Diffusion
- One-Time Pad
- obscures the relationship between plaintext and ciphertext characters (e.g. substitution)
- dissipates the redundancy of the plaintext (e.g. permutation); diffusion alone is easily cracked
- only provably safe encryption technique: random key of the same length as the plaintext is generated and only used once
What are the requirements for a public key cryptosystem?