Natural Language Processing (Powerpoint) Flashcards

1
Q

Vad är PageRank?

A

En algoritm som rankar hemsidor (web pages) baserat på deras importance och relevans.

Importance mäts baserat på Antalet och Kvaliten av länkar som pekar till den sidan.

Ett underliggande antagande är att viktigare hemsidor har är länkade till från flera andra hemsidor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hur fungerar PageRank algorithmen?

A

Webben representeras som en riktad graf

  • Varje web page tilldelas initialt “an equal” PageRank Value
  • PageRanken uppdateras sedan iterativt för varje sida baserat på PageRank och andra sidor som länkar till sidan
  • Finns också en dampening Factor som ser till att PageRank distributionen konvergerar
    • Är baserad på sannolikheten av att en användare fortsätter att klicka vidare på länkar
  • Konvergens innebär att algoritmen fortsätter att uppdatera PageRank värden tills dess att förändringarna mellan iterationer faller under en viss tröskel och rankningen är då stabiliserad.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Vad är Natural Language Processing?

A

Understanding och Generation of natural (human) language?

Subfield of AI that focuses on the interaction between computers and humans through natural language. The goal of NLP is to enable computers to understand, interpret, and respond to human language in a way that is both meaningful and useful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Vad är de huvudsakliga uppgifterna av NLP?

A
  • Text Analysis
    • Speech recognition
    • Natural Language generation
    • Machine Translation
    • Sentiment Analysis
      ○ Determine sentiment or emotion expressed in a piece of text
    • Named Entitiy Recognition
      ○ Identifiera och klassifiera key elements in text, such as names of people, organizations, locations etc.
    • Question Answering
      • Text Summarization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Key Stages of Processing Language (Traditional)

A
  1. Lexical analysis
  2. Syntactic Analysis
  3. Semantic Analysis
  4. Pragmatic Analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Vad innebär Lexical Analysis?

A
  • Breaks down the text into its basic units of meaning, known as tokens
  • Tokens can be words, phrases or other meaningful elements
  • Identify and classify parts of speach (nouns, verbs, adjectives, etc.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Syntactic Analysis

A

Checking for correct word order and hierarchical organization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Semantic Analysis

A
  • Understanding meaning of words and sentences.
  • Goes beyond the structure to interpret the actual meaning
  • Disambiguation, determine relationships between words and phrases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Pragmatic Analysis

A
  • Understanding language in context and interpreting the intended meaning based on situational factors
  • Identifying speech acts
  • Understanding deixis , impliations and inferred meanings behind the text
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Vad är Bag-Of-Words?

A

Technique in natural language processing for: Text Representation and Feature Extraction

Converts text into numberical feature vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Vad fokuserar BOW model på?

A

Focuses solely on the frequency of words in the text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hur fungerar BOW modelen?

A
  1. Corpus Creation
    - Collect a set of documents to analyze
  2. Vocabulary Building
    - Create a vocabulary of all unique words. Each unique word becomes a feature in the model
  3. Tokenization
    - Tokenize each document in the corpus in to individual words/tokens
  4. Vectorization
    - Skapar en vector där varje dimension correspons to a word in the vocabulary.
    - The value in each dimension is the frequency of the corresponding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Vad används BOW modellen för?

A
  • Text classification
    • Sentiment Analysis
    • Information Retrieval
    • Document Clustering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Vilka är Algorithms for content analysis of documents?

A
  1. Latent Semantic Analysis
  2. Text Summarization
  3. Named Entity Reconition (NER)
  4. Sentiment Analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Vad är Latent Semantic Analysis (LSA)?

A

Algorithm used for analyzing the content of a document.

Analyzes the relationship between words and phrases in a document to identify the underlying concepts
- Used to identify related terms, find synonyms and group similar documents together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Vad är applikationer av LSA?

A
  1. Information retrieval
    - Improve search accuracy by retrieving documents that are semantically related to the query, even if they do not contain the exact query terms
  2. Document similarity
    - Used to measure the similarity between documents by comparing their positions in the reduced concept space
  3. Text Summarization
    - Helps in identifying the main topics or concepts in a collection of documents, which can be used for summarization
  4. Topic Modeling
    - Reveal underlying topics in a set of documents based on the patterns of term usage
17
Q

Vilka är stegen av LSA

A

Steps of LSA
1. Document-Term Matrix Creation
a. Rows - documents
b. Columns - Words
c. Each cell in the matrix represents the frequency of term in the document
2. Term Weighting
a. The raw frequency counts in the document-term matrix are typically transformed using term weighting schemes to reflext the importance of terms in documents
3. Singular Value Decomposition (SVD)
4. Dimensionality Reduction
5. Concept space representation

18
Q

Vad är Named Entity Recognition (NER)?

A

Algorithm used to analyze the content of documents.

Locate and classify named entities mentioned in unstructured text into pre-defined categories

Eg. Person, Organizations, monetary values etc.

19
Q

Vad är Sentiment Analysis?

A

Algorithm used for analyzing the content of a document.

Also known as Opinion Mining or Emotion AI

Uses NLP, text analysis,to systemtically identify, extract, quantify and study affective states and subjective information

Goal is to answer the question: “What do people feel about a certain topic?”

20
Q

Exempel på textanalytiska programvaror och deras funktion?

A
  1. IBM Watson
    - Question answering computer system capable of answering questions posed in natural language
  2. Gavagi Explorer
    - Extrahera meningar, följa ämnesspår, hitta förhållanden och rikta sig mot specifika målgrupper
  3. Lexalytics
    - En textanalysplattform som kan hjälpa till att extrahera meningar, följa ämnesspår, hitta förhållanden och rikta sig mot specifika målgrupper.
  4. Google Cloud Natural Language
    - En molntjänst för textanalys som kan hjälpa till att extrahera meningar, följa ämnesspår, hitta förhållanden och rikta sig mot specifika målgrupper.
21
Q

Vad är några algorithmer som mäter relationships in a web of documents (web pages)?

A
  1. PageRank
  2. HITS
  3. SALSA
  4. SimRank
  5. TrustRank
22
Q

Vad är Bayes Sats?

A
  • Sats inom sannolikhetsteorin som används för att bestämma betingade sannolikheter
    • Används till att kombinera insamlade, statistiska data med andra informationskällor
23
Q

Vad för algoritmer används för feature extraction?

A

Bag-of-Words (BoW)