Lesson 1 Intro to NLP Flashcards

1
Q

What is Natural language

A

Natural language refers to the language that is used for communication between humans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is NLP

A

Is the process of using computers to extract meaning from text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 2 components in NLP

A

Natural language understanding (NLU)
Natural language generation (NLG)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is AI-complete in NLP

A
  • Requires all types of knowledge and context awareness human posses
  • AI-complete is the most difficult problem in the field of AI
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 3 fields that interact with one another to form NLP

A
  • Computer Science
  • Artificial Intelligence
  • Linguistics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the ambiguity of language (6 points)

A
  • Synonymies: different words with the same meaning
  • Polysemy: same word, different meaning, different usage
  • Text and speech are unstructured data.
  • No fixed structure in sentence format
  • No fixed schema: grammar
  • Sometimes dirty: Misspell, slang, abbreviations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 3 stages of history of NLP

A
  • 1950s-1980s: linguistics methods and rules
  • 1980s-Now: Statistical + Machine learning methods
  • Now-Future: Deep learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How has NLP evolved over the years

A

1950s-1980s: linguistics methods and rules

o Approach focused on
 Linguistics: grammar rules, sentence structure parsing
 Handwritten rules: huge sets of logical (if/else) statements
 Phase structure grammar: conversion of sentences into forms that computers can understand
o Problems:

 Too complex to maintain
 Cannot scale
 Cannot generalize

1980s-Now: Statistical + Machine learning methods
o Approach shifted from linguistics to data driven
o Increasing computational power and ease to access of text
 Web page
 Digital archives

o NLP starts using statistical and probabilistic models
 Data mining -> text mining

o Generic machine learning algorithms applied to NLP tasks
 Sentiment analysis using logistic regression
 Language models with Markov models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What can we expect for the future of NLP

A

Now-Future: Deep learning

o More advances in computing power with parallelization (GPU)

o Availability of large datasets becomes the norm

o Neural Network
 Learnt word representation with finite dimensions
 Capture semantic and relationships among words

o RNN (recurrent neural network) / LSTM (Long Short-Term Memory)
 Allows sequential processing and leaning of text
 Application into machine translation tasks and questions/ answering systems

o Attention-based model
 A way to place various degree of focus (attention) on different part of the text
 Break-through in machine translation and text generation tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Natural Language Understanding (NLU)

A

Converting of text/speech into concept space/ computer-readable format is Natural Language understanding (NLU)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Natural Language Generation (NLG)

A

Process of going from the concept space back to either speech or text is Natural Language Generation (NLG)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the 6 examples of NLU applications

A

Document classification
o Classify documents into categories
o Classify emails as spam and not spam
o Classify products as positive and negative
o Assign labels to documents

Document Recommendation
o Choosing most relevant document based on some information or ‘finger print’
o Choosing the most relevant webpages based on query to search engine
o Recommend news articles based on past articles liked or read
o Recommend restaurants based on restaurant reviews

Topic modelling
o Breaking a set of documents into topics at the word level
o See how prevalence of certain topics covered in a magazine changes over time
o Find documents belonging to a certain topic

Intent Matching
o Understanding that there are many ways to say or ask for the same thing
o Use in dialog systems

Natural Language Search
o Speaking or typing into a device using their everyday language rather than keywords
o Natural language question answering
o Chatbot

Language Identification
o Determining which natural language given content is in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the 4 examples of NLG applications

A

Machine Translation
o Automatically translate text between languages

Document Summarization
o Automatically generate text summaries of documents

Text generation

Question and answering
o Is concerned with building unsupervised models/systems that provides answer to questions based on large and diverse text sources

Image Captioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Large Language Models (LLMs)

A
  • Deep learning models trained to produce text
  • Backbone of modern Natural Language Processing
  • Pre-trained by academic institutions and big tech companies such as OpenAI
  • As the number of parameters increases, the model can acquire more granular knowledge and improve its predictions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the 10 steps in training a LLM from scratch

A

Data collection
o Gather a vast and diverse text corpus from the internet, website, social media platforms, academic sources, and etc. The quality and quantity of data are crucial

Data pre-processing
o Clean and format the data, including tokenization and handling special characters

Model Architecture
o Choose a Transformer-based model and design its structure, including the number of layers, attention mechanisms, and other hyperparameters

Initialization
o Initialize model parameters with random values or pre-trained weights from a similar model

Training objective
o The objective of LLMs is to enable the model to predict the next word or sequence of words in a sentence based on input data

Training
o Use backpropagation and optimization algorithms (e.g. Adam) to update model parameters, minimizing the chosen loss function

Regularization
o Apply techniques like dropout and layer normalization to prevent overfitting

Hyperparameter Tuning
o Fine-tune various hyperparameters to optimize model performance

Evaluation
o Assess the model’s performance using appropriate metrics and validation datasets

Inference
o Once trained, the LLM can be used for various natural language processing tasks, such as text generation, translation, and summarization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the life of language model

A
  • LLM landscape is competitive, innovative and short-lived
  • At the ‘upstream’, the model is pre-trained
  • After the release, the model is adopted and deployed at the ‘downstream’ applications. Extra find-tuning is required to specific domains and tasks
  • A better model comes around the corner
17
Q

Describe fine-tuning

A
  • Language modelling is a powerful upstream task
  • NLP is mostly used for more targeted downstream tasks, such as sentiment analysis, question answering and information extraction
  • During fine-tuning, a portion of the model is ‘freezed’ and the rest is further trained with domain -or task – specific data
  • In GPT-3, prompt-based approached is adopted The learning happens dynamically during the prediction phase rather than in a dedicated training phase
18
Q

What is upstream and downstream tasks

A

Upstream tasks refers to tasks that happens before training of the model and downstream tasks refers to tasks after the training of the model