Natural Language Processing (Powerpoint) Flashcards

Question 1

Q

Vad är PageRank?

Answer

A

En algoritm som rankar hemsidor (web pages) baserat på deras importance och relevans.

Importance mäts baserat på Antalet och Kvaliten av länkar som pekar till den sidan.

Ett underliggande antagande är att viktigare hemsidor har är länkade till från flera andra hemsidor.

Question 2

Q

Hur fungerar PageRank algorithmen?

Answer

A

Webben representeras som en riktad graf

Varje web page tilldelas initialt “an equal” PageRank Value
PageRanken uppdateras sedan iterativt för varje sida baserat på PageRank och andra sidor som länkar till sidan
Finns också en dampening Factor som ser till att PageRank distributionen konvergerar
- Är baserad på sannolikheten av att en användare fortsätter att klicka vidare på länkar
Konvergens innebär att algoritmen fortsätter att uppdatera PageRank värden tills dess att förändringarna mellan iterationer faller under en viss tröskel och rankningen är då stabiliserad.

Question 3

Q

Vad är Natural Language Processing?

Answer

A

Understanding och Generation of natural (human) language?

Subfield of AI that focuses on the interaction between computers and humans through natural language. The goal of NLP is to enable computers to understand, interpret, and respond to human language in a way that is both meaningful and useful.

Question 4

Q

Vad är de huvudsakliga uppgifterna av NLP?

Answer

A

Text Analysis
- Speech recognition
- Natural Language generation
- Machine Translation
- Sentiment Analysis
  ○ Determine sentiment or emotion expressed in a piece of text
- Named Entitiy Recognition
  ○ Identifiera och klassifiera key elements in text, such as names of people, organizations, locations etc.
- Question Answering
  - Text Summarization

Question 5

Q

Key Stages of Processing Language (Traditional)

Answer

A

Lexical analysis
Syntactic Analysis
Semantic Analysis
Pragmatic Analysis

Question 6

Q

Vad innebär Lexical Analysis?

Answer

A

Breaks down the text into its basic units of meaning, known as tokens
Tokens can be words, phrases or other meaningful elements
Identify and classify parts of speach (nouns, verbs, adjectives, etc.)

Question 7

Q

Syntactic Analysis

Answer

A

Checking for correct word order and hierarchical organization

Question 8

Q

Semantic Analysis

Answer

A

Understanding meaning of words and sentences.
Goes beyond the structure to interpret the actual meaning
Disambiguation, determine relationships between words and phrases

Question 9

Q

Pragmatic Analysis

Answer

A

Understanding language in context and interpreting the intended meaning based on situational factors
Identifying speech acts
Understanding deixis , impliations and inferred meanings behind the text

Question 10

Q

Vad är Bag-Of-Words?

Answer

A

Technique in natural language processing for: Text Representation and Feature Extraction

Converts text into numberical feature vectors.

Question 11

Q

Vad fokuserar BOW model på?

Answer

A

Focuses solely on the frequency of words in the text

Question 12

Q

Hur fungerar BOW modelen?

Answer

A

Corpus Creation
- Collect a set of documents to analyze
Vocabulary Building
- Create a vocabulary of all unique words. Each unique word becomes a feature in the model
Tokenization
- Tokenize each document in the corpus in to individual words/tokens
Vectorization
- Skapar en vector där varje dimension correspons to a word in the vocabulary.
- The value in each dimension is the frequency of the corresponding

Question 13

Q

Vad används BOW modellen för?

Answer

A

Text classification
- Sentiment Analysis
- Information Retrieval
- Document Clustering

Question 14

Q

Vilka är Algorithms for content analysis of documents?

Answer

A

Latent Semantic Analysis
Text Summarization
Named Entity Reconition (NER)
Sentiment Analysis

Question 15

Q

Vad är Latent Semantic Analysis (LSA)?

Answer

A

Algorithm used for analyzing the content of a document.

Analyzes the relationship between words and phrases in a document to identify the underlying concepts
- Used to identify related terms, find synonyms and group similar documents together

Question 16

Q

Vad är applikationer av LSA?

Answer

Study These Flashcards

A

Information retrieval
- Improve search accuracy by retrieving documents that are semantically related to the query, even if they do not contain the exact query terms
Document similarity
- Used to measure the similarity between documents by comparing their positions in the reduced concept space
Text Summarization
- Helps in identifying the main topics or concepts in a collection of documents, which can be used for summarization
Topic Modeling
- Reveal underlying topics in a set of documents based on the patterns of term usage

Question 17

Q

Vilka är stegen av LSA

Answer

Study These Flashcards

A

Steps of LSA
1. Document-Term Matrix Creation
a. Rows - documents
b. Columns - Words
c. Each cell in the matrix represents the frequency of term in the document
2. Term Weighting
a. The raw frequency counts in the document-term matrix are typically transformed using term weighting schemes to reflext the importance of terms in documents
3. Singular Value Decomposition (SVD)
4. Dimensionality Reduction
5. Concept space representation

Question 18

Q

Vad är Named Entity Recognition (NER)?

Answer

Study These Flashcards

A

Algorithm used to analyze the content of documents.

Locate and classify named entities mentioned in unstructured text into pre-defined categories

Eg. Person, Organizations, monetary values etc.

Question 19

Q

Vad är Sentiment Analysis?

Answer

Study These Flashcards

A

Algorithm used for analyzing the content of a document.

Also known as Opinion Mining or Emotion AI

Uses NLP, text analysis,to systemtically identify, extract, quantify and study affective states and subjective information

Goal is to answer the question: “What do people feel about a certain topic?”

Question 20

Q

Exempel på textanalytiska programvaror och deras funktion?

Answer

Study These Flashcards

A

IBM Watson
- Question answering computer system capable of answering questions posed in natural language
Gavagi Explorer
- Extrahera meningar, följa ämnesspår, hitta förhållanden och rikta sig mot specifika målgrupper
Lexalytics
- En textanalysplattform som kan hjälpa till att extrahera meningar, följa ämnesspår, hitta förhållanden och rikta sig mot specifika målgrupper.
Google Cloud Natural Language
- En molntjänst för textanalys som kan hjälpa till att extrahera meningar, följa ämnesspår, hitta förhållanden och rikta sig mot specifika målgrupper.

Question 21

Q

Vad är några algorithmer som mäter relationships in a web of documents (web pages)?

Answer

Study These Flashcards

A

PageRank
HITS
SALSA
SimRank
TrustRank

Question 22

Q

Vad är Bayes Sats?

Answer

Study These Flashcards

A

Sats inom sannolikhetsteorin som används för att bestämma betingade sannolikheter
- Används till att kombinera insamlade, statistiska data med andra informationskällor

Question 23

Q

Vad för algoritmer används för feature extraction?

Answer

Study These Flashcards

A

Bag-of-Words (BoW)

Natural Language Processing (Powerpoint) Flashcards

(23 cards)