Lesson 1 Intro to NLP Flashcards

Question 1

Q

What is Natural language

Answer

A

Natural language refers to the language that is used for communication between humans

Question 2

Q

What is NLP

Answer

A

Is the process of using computers to extract meaning from text

Question 3

Q

What are the 2 components in NLP

Answer

A

Natural language understanding (NLU)
Natural language generation (NLG)

Question 4

Q

What is AI-complete in NLP

Answer

A

Requires all types of knowledge and context awareness human posses
AI-complete is the most difficult problem in the field of AI

Question 5

Q

What are the 3 fields that interact with one another to form NLP

Answer

A

Computer Science
Artificial Intelligence
Linguistics

Question 6

Q

What are the ambiguity of language (6 points)

Answer

A

Synonymies: different words with the same meaning
Polysemy: same word, different meaning, different usage
Text and speech are unstructured data.
No fixed structure in sentence format
No fixed schema: grammar
Sometimes dirty: Misspell, slang, abbreviations

Question 7

Q

What are the 3 stages of history of NLP

Answer

A

1950s-1980s: linguistics methods and rules
1980s-Now: Statistical + Machine learning methods
Now-Future: Deep learning

Question 8

Q

How has NLP evolved over the years

Answer

A

1950s-1980s: linguistics methods and rules

o Approach focused on
 Linguistics: grammar rules, sentence structure parsing
 Handwritten rules: huge sets of logical (if/else) statements
 Phase structure grammar: conversion of sentences into forms that computers can understand
o Problems:

 Too complex to maintain
 Cannot scale
 Cannot generalize

1980s-Now: Statistical + Machine learning methods
o Approach shifted from linguistics to data driven
o Increasing computational power and ease to access of text
 Web page
 Digital archives

o NLP starts using statistical and probabilistic models
 Data mining -> text mining

o Generic machine learning algorithms applied to NLP tasks
 Sentiment analysis using logistic regression
 Language models with Markov models

Question 9

Q

What can we expect for the future of NLP

Answer

A

Now-Future: Deep learning

o More advances in computing power with parallelization (GPU)

o Availability of large datasets becomes the norm

o Neural Network
 Learnt word representation with finite dimensions
 Capture semantic and relationships among words

o RNN (recurrent neural network) / LSTM (Long Short-Term Memory)
 Allows sequential processing and leaning of text
 Application into machine translation tasks and questions/ answering systems

o Attention-based model
 A way to place various degree of focus (attention) on different part of the text
 Break-through in machine translation and text generation tasks

Question 10

Q

What is Natural Language Understanding (NLU)

Answer

A

Converting of text/speech into concept space/ computer-readable format is Natural Language understanding (NLU)

Question 11

Q

What is Natural Language Generation (NLG)

Answer

A

Process of going from the concept space back to either speech or text is Natural Language Generation (NLG)

Question 12

Q

What are the 6 examples of NLU applications

Answer

A

Document classification
o Classify documents into categories
o Classify emails as spam and not spam
o Classify products as positive and negative
o Assign labels to documents

Document Recommendation
o Choosing most relevant document based on some information or ‘finger print’
o Choosing the most relevant webpages based on query to search engine
o Recommend news articles based on past articles liked or read
o Recommend restaurants based on restaurant reviews

Topic modelling
o Breaking a set of documents into topics at the word level
o See how prevalence of certain topics covered in a magazine changes over time
o Find documents belonging to a certain topic

Intent Matching
o Understanding that there are many ways to say or ask for the same thing
o Use in dialog systems

Natural Language Search
o Speaking or typing into a device using their everyday language rather than keywords
o Natural language question answering
o Chatbot

Language Identification
o Determining which natural language given content is in

Question 13

Q

What are the 4 examples of NLG applications

Answer

A

Machine Translation
o Automatically translate text between languages

Document Summarization
o Automatically generate text summaries of documents

Text generation

Question and answering
o Is concerned with building unsupervised models/systems that provides answer to questions based on large and diverse text sources

Image Captioning

Question 14

Q

What is Large Language Models (LLMs)

Answer

A

Deep learning models trained to produce text
Backbone of modern Natural Language Processing
Pre-trained by academic institutions and big tech companies such as OpenAI
As the number of parameters increases, the model can acquire more granular knowledge and improve its predictions

Question 15

Q

What are the 10 steps in training a LLM from scratch

Answer

A

Data collection
o Gather a vast and diverse text corpus from the internet, website, social media platforms, academic sources, and etc. The quality and quantity of data are crucial

Data pre-processing
o Clean and format the data, including tokenization and handling special characters

Model Architecture
o Choose a Transformer-based model and design its structure, including the number of layers, attention mechanisms, and other hyperparameters

Initialization
o Initialize model parameters with random values or pre-trained weights from a similar model

Training objective
o The objective of LLMs is to enable the model to predict the next word or sequence of words in a sentence based on input data

Training
o Use backpropagation and optimization algorithms (e.g. Adam) to update model parameters, minimizing the chosen loss function

Regularization
o Apply techniques like dropout and layer normalization to prevent overfitting

Hyperparameter Tuning
o Fine-tune various hyperparameters to optimize model performance

Evaluation
o Assess the model’s performance using appropriate metrics and validation datasets

Inference
o Once trained, the LLM can be used for various natural language processing tasks, such as text generation, translation, and summarization

Question 16

Q

Describe the life of language model

Answer

Study These Flashcards

A

LLM landscape is competitive, innovative and short-lived
At the ‘upstream’, the model is pre-trained
After the release, the model is adopted and deployed at the ‘downstream’ applications. Extra find-tuning is required to specific domains and tasks
A better model comes around the corner

Question 17

Q

Describe fine-tuning

Answer

Study These Flashcards

A

Language modelling is a powerful upstream task
NLP is mostly used for more targeted downstream tasks, such as sentiment analysis, question answering and information extraction
During fine-tuning, a portion of the model is ‘freezed’ and the rest is further trained with domain -or task – specific data
In GPT-3, prompt-based approached is adopted The learning happens dynamically during the prediction phase rather than in a dedicated training phase

Question 18

Q

What is upstream and downstream tasks

Answer

Study These Flashcards

A

Upstream tasks refers to tasks that happens before training of the model and downstream tasks refers to tasks after the training of the model

Lesson 1 Intro to NLP Flashcards

(18 cards)