Chapter 1 Flashcards

Contains concepts in chapter 1 of manning's book

1
Q

Natural Language Processing

A

Concerned with processing natural languages such as English and Mandarin. Involves translating natural language into data that computer can use to learn about the world.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

NLP system

A

Referred to as a pipeline because it involves several processing stages where natural language flows in one end and processed output flows the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

FST (Finite State Transducer)

A

FSM that outputs a sequence of new symbols as it runs is called a finite state transducer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Formal languages

A

A Set of natural languages. Formal grammar can be used to generate many natural language statements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Regular expressions

A

Special kind of formal language grammar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Regular grammars

A

Predictable, provable behavior and flexible enough to power some sophisticated dialog engines and chatbots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DFA (Deterministic Finite Automaton)

A

A formal mathematical object that processes regular language is called a Finite State Machine or Deterministic Finite Automaton

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Regular exp notation

A

- OR
\ - preceding char can occur 0 or more times
[] - used to specify character class
* - regular expression matches any number of consecutive characters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Computational Theory of Mind

A

CTM assumes human-like NLP can be accomplished with finite set of logical rules that are processed in series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Distance Metrics (Levenshtein, Jaccard and Euclidean distance)

A

Useful for applications like spelling correctors and recognizing proper nouns where algorithm calculates the distances between words to find any spelling errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Document Representation

A

Can be represented as a vector, a sequence of integers for each word or token in that document.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Vector space

A

Different ways that word could be combined to create vectors. Relationships between these vector make up our model, which tries to predict combinations of words occurring in a collection of various words. Can represent these vector using a Counter in python.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Disadvantage with bag of words

A

Does not work well for interpreting context of sentences (those for which order is very important)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Disadvantage with one-hot vectors

A

High-dimensionality space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

SyntaxNet and Spacy

A

Two libraries that allowed natural language syntax tree parsers and made possible to extract syntactic and logical relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Chatbot processing stages

A

(PAGE)

1) Parse – extract features, structured numerical data from natural language text (SOTA: Tokenizers, Regular Expressions, tag, NER, extract info)
2) Analyzer – Generate and combine features by scoring text for sentiment, grammatically and semantics (Typically use a database is used) (SOTA: LSTM)
3) Generate – Compose possible responses using templates, search or language models (Search Templates, MCMC, LSTM, FSM)
4) Execute – Plan statements based on conversation history and objects and select the next response

A feedback loop (between 1 and 3) is used on generated text responses so that responses can be processed using same algorithms used to process user statements

17
Q

Layers for feature extraction and analysis

A

Characters -> Tokens -> Tagged tokens -> Syntax tree (fed into POS tagger) -> Entity relationships -> Knowledge base (fed to logical compiler, info extractor)

18
Q

Inferences

A

Logical extrapolations from a set of conditions detected in an environment

19
Q

Fuzzy regular expressions

A

Find closest grammar match among possible grammar rules instead of exact ones. Effective for question answering systems and task-execution assistant bots.