Lecture 2 Flashcards

1
Q

Tokenization

A

Process of bewakinf down the text into tokens (smaller chunks -words including punctutation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Each language has it is own stop words

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Stemming

A

Choping off letters from end of words until stem is reached

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Lancaster stemming

A

Considered with chopping off the word as much as possible (Sacrifice accuracy for speed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

lemmatization

A

takes into conisderation root of word

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Lemmatizer allow you to consider the part of speech of a word unlike the stem (lemmtizing a noun is different than lmmetizing a verb)

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

text is a list of sent which is seq of words

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pattern something that repeats itslef - find and analyze repitions

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If you want to find what text about : first you need to find words that are repeaeted and that are not stop words

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Patterns is about words which conversation woll b centered

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Regular expressions

A

Language to find paterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Disjunction is the brackets and inside it is used as or

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

[A-Z]

A

Any uppercase of the alphabet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

[^abc] means not small a not small b not c

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

[a^b]

A

a or b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

^the

A

Searching for the pattern the

17
Q

abc?

A

means abc or ab

18
Q

[abc]?

A

means a or b or c

19
Q
  • 0 or more occurences
    + 1 or more occurences
    . wild card (any
20
Q

+ is greedy

* is greedy

21
Q

$

22
Q

search

A

search until finds first match

23
Q

Match and serch must be called with group to show result

24
Q

Split split based on char

25
split can be like tokenize but it depends
tue
26
Positive look ahead | What follow must match
Negatice look ahead | Will return if the following will noit match
27
regex stemmer let u stem whatever u want
true
28
You can always tokenize using regex
true
29
Dynamic programming
tabular computation
30
Minimum edit distance
Measuring how similar two strings are
31
an edit may be deletion addition
true
32
Minimum edit distance
typo is least far
33
If two DNA can align implies they come from same source
true
34
Algin
exact the same