Chapter 1 Flashcards

Question 1

Q

Natural Language Processing

Answer

A

Concerned with processing natural languages such as English and Mandarin. Involves translating natural language into data that computer can use to learn about the world.

Question 2

Q

NLP system

Answer

A

Referred to as a pipeline because it involves several processing stages where natural language flows in one end and processed output flows the other.

Question 3

Q

FST (Finite State Transducer)

Answer

A

FSM that outputs a sequence of new symbols as it runs is called a finite state transducer

Question 4

Q

Formal languages

Answer

A

A Set of natural languages. Formal grammar can be used to generate many natural language statements.

Question 5

Q

Regular expressions

Answer

A

Special kind of formal language grammar

Question 6

Q

Regular grammars

Answer

A

Predictable, provable behavior and flexible enough to power some sophisticated dialog engines and chatbots

Question 7

Q

DFA (Deterministic Finite Automaton)

Answer

A

A formal mathematical object that processes regular language is called a Finite State Machine or Deterministic Finite Automaton

Question 8

Q

Regular exp notation

Answer

A

- OR
\ - preceding char can occur 0 or more times
[] - used to specify character class
* - regular expression matches any number of consecutive characters

Question 9

Q

Computational Theory of Mind

Answer

A

CTM assumes human-like NLP can be accomplished with finite set of logical rules that are processed in series

Question 10

Q

Distance Metrics (Levenshtein, Jaccard and Euclidean distance)

Answer

A

Useful for applications like spelling correctors and recognizing proper nouns where algorithm calculates the distances between words to find any spelling errors

Question 11

Q

Document Representation

Answer

A

Can be represented as a vector, a sequence of integers for each word or token in that document.

Question 12

Q

Vector space

Answer

A

Different ways that word could be combined to create vectors. Relationships between these vector make up our model, which tries to predict combinations of words occurring in a collection of various words. Can represent these vector using a Counter in python.

Question 13

Q

Disadvantage with bag of words

Answer

A

Does not work well for interpreting context of sentences (those for which order is very important)

Question 14

Q

Disadvantage with one-hot vectors

Answer

A

High-dimensionality space

Question 15

Q

SyntaxNet and Spacy

Answer

A

Two libraries that allowed natural language syntax tree parsers and made possible to extract syntactic and logical relationships

Question 16

Q

Chatbot processing stages

Answer

Study These Flashcards

A

(PAGE)

1) Parse – extract features, structured numerical data from natural language text (SOTA: Tokenizers, Regular Expressions, tag, NER, extract info)
2) Analyzer – Generate and combine features by scoring text for sentiment, grammatically and semantics (Typically use a database is used) (SOTA: LSTM)
3) Generate – Compose possible responses using templates, search or language models (Search Templates, MCMC, LSTM, FSM)
4) Execute – Plan statements based on conversation history and objects and select the next response

A feedback loop (between 1 and 3) is used on generated text responses so that responses can be processed using same algorithms used to process user statements

Question 17

Q

Layers for feature extraction and analysis

Answer

Study These Flashcards

A

Characters -> Tokens -> Tagged tokens -> Syntax tree (fed into POS tagger) -> Entity relationships -> Knowledge base (fed to logical compiler, info extractor)

Question 18

Q

Inferences

Answer

Study These Flashcards

A

Logical extrapolations from a set of conditions detected in an environment

Question 19

Q

Fuzzy regular expressions

Answer

Study These Flashcards

A

Find closest grammar match among possible grammar rules instead of exact ones. Effective for question answering systems and task-execution assistant bots.

Chapter 1 Flashcards

Contains concepts in chapter 1 of manning's book (19 cards)