Linguistic and Representational Concepts Flashcards

1
Q

Difference between a root and a lemma

A

In natural language processing, the root of a word is the base form of the word, without any inflections or affixes. For example, the root of the word “running” is “run”, and the root of the word “cats” is “cat”. The root of a word is also known as the stem.

On the other hand, the lemma of a word is the base form of the word that is used for dictionary lookup and inflectional analysis. The lemma of a word may be different from its root, especially in cases where the root has been modified by inflections or affixes. For example, the lemma of the word “running” is “run”, but the lemma of the word “cats” is “cat”, since the plural form of the word “cat” is not listed in the dictionary.

Overall, the key difference between a root and a lemma is that the root is the base form of a word without any inflections or affixes, while the lemma is the base form of a word that is used for dictionary lookup and inflectional analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Difference between inflectional and derivational morphology

A

Inflectional morphology and derivational morphology are two types of word formation processes in natural language. Inflectional morphology involves the addition of inflections or affixes to a word to indicate grammatical features such as tense, person, gender, and number. Derivational morphology, on the other hand, involves the creation of new words by adding affixes or combining words in different ways.

Inflectional morphology is typically used to indicate grammatical features that are inherent to a word, such as its tense or person. For example, the suffix -s is added to the base form of a verb to indicate the third-person singular present tense, as in the word “runs”. Inflectional morphology is regular and predictable, and it does not change the basic meaning of the word.

Derivational morphology, on the other hand, is used to create new words with new meanings. For example, the prefix un- can be added to a word to indicate the opposite meaning, as in the word “unhappy”. Derivational morphology is less regular and predictable than inflectional morphology, and it can change the basic meaning of the word.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a lexical compound? In what context would it come up?

A

A lexical compound is a type of word that is formed by combining two or more words or word parts. Lexical compounds are common in many languages, and they can be found in a variety of contexts.

There are several types of lexical compounds, depending on the structure and meaning of the words that are combined. Some common types of lexical compounds include the following:

Compound nouns: These are words that consist of two or more nouns, such as “bookcase” or “software”.

Compound verbs: These are words that consist of two or more verbs, such as “to babysit” or “to walk out”.

Compound adjectives: These are words that consist of two or more adjectives, such as “blue-green” or “well-intentioned”.

Compound adverbs: These are words that consist of two or more adverbs, such as “sometimes” or “everywhere”.

Lexical compounds can be found in many different contexts, including everyday language, technical language, and literature. They are often used to create new words that are more precise or expressive than the individual words that make up the compound.

Overall, lexical compounds are an important part of the vocabulary of many languages, and they can be found in a variety of contexts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Part-of-Speech and where is it used?

A

Part-of-speech is a grammatical category that is assigned to each word in a sentence, based on its syntactic function. In natural language processing, part-of-speech tagging is the process of automatically assigning a part-of-speech to each word in a sentence, using a set of grammar rules or a probabilistic model.

There are several common parts of speech, including nouns, verbs, adjectives, adverbs, and pronouns. Each part of speech has a specific role in the sentence, and the correct assignment of part-of-speech to each word is important for understanding the meaning of the sentence.

Part-of-speech tagging is used in many natural language processing tasks, such as parsing and sentiment analysis. In these tasks, the part-of-speech tags provide important information about the syntactic structure of the sentence and the meaning of the words, which can be used to make predictions or generate output.

Overall, part-of-speech tagging is an important step in natural language processing, as it allows us to analyze the syntactic structure of a sentence and to extract the meaning of the words in it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Difference between Open-class Words and Closed-class Words

A

Open-class words and closed-class words are two categories of words in a language. Open-class words are words that can be added to the vocabulary of a language, such as nouns, verbs, adjectives, and adverbs. Closed-class words, on the other hand, are words that are fixed in the vocabulary of a language, such as prepositions, conjunctions, and articles.

Open-class words are typically the most common and productive words in a language, and they are used to convey the meaning of a sentence. They are often the focus of language learning and language teaching, as they are the words that are most frequently used and that have the most variation in meaning.

Closed-class words, on the other hand, are typically less common and less productive than open-class words. They are used to indicate the syntactic structure of a sentence, rather than its meaning. They are often learned implicitly, as they are not the focus of language learning and teaching.

Overall, the key difference between open-class words and closed-class words is that open-class words can be added to the vocabulary of a language, while closed-class words are fixed in the vocabulary of a language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When is a grammar context-free?

A

A grammar is context-free if it can be expressed using context-free rules. In formal language theory, a context-free grammar is a type of formal grammar that consists of a set of productions, or rules, for generating strings of symbols. A context-free grammar is said to be context-free if the left-hand side of each production consists of a single non-terminal symbol, and the right-hand side can consist of any combination of terminal and non-terminal symbols.

A context-free grammar is called context-free because the productions do not depend on the context in which the symbols appear in the sentence. In other words, the meaning or function of a symbol is determined solely by its position in the sentence, rather than by the symbols that appear before or after it.

Context-free grammars are used in natural language processing to describe the syntax of a language, and they are often used to generate or parse sentences in that language. They are also used in computer science and other fields to model the structure of systems and processes.

Overall, a grammar is context-free if it can be expressed using context-free rules, and if the productions do not depend on the context in which the symbols appear in the sentence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are terminal and non-terminal (phrasal) categories

A

n formal language theory, terminal and non-terminal symbols are two types of symbols that are used in grammar to generate strings of symbols. Terminal symbols are the basic building blocks of a language, and they represent the words or tokens that appear in a sentence. Non-terminal symbols, on the other hand, are symbols that represent phrases or larger units of meaning in a sentence.

Terminal symbols are also known as lexical categories, as they represent the individual words or tokens in a sentence. Examples of terminal symbols in English include nouns, verbs, adjectives, and adverbs. Terminal symbols are the smallest units of meaning in a sentence, and they cannot be broken down into smaller units.

Non-terminal symbols, on the other hand, are also known as phrasal categories, as they represent phrases or larger units of meaning in a sentence. Examples of non-terminal symbols in English include noun phrases, verb phrases, and prepositional phrases. Non-terminal symbols are composed of one or more terminal symbols, and they can be further broken down into smaller units of meaning.

Overall, terminal and non-terminal symbols are two types of symbols that are used in grammar to generate strings of symbols. Terminal symbols represent the individual words or tokens in a sentence, while non-terminal symbols represent phrases or larger units of meaning in a sentence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are bounded and unbounded dependencies?

A

In natural language processing, bounded and unbounded dependencies are two types of dependencies between words in a sentence. Bounded dependencies are dependencies between words that are close together in the sentence, typically within a few words of each other. Unbounded dependencies, on the other hand, are dependencies between words that are further apart in the sentence, and may be separated by many other words.

Bounded dependencies are relatively easy to model and analyze, as the words that are involved in the dependency are close together and can be easily identified. For example, in the sentence “The cat sat on the mat”, the noun “cat” and the verb “sat” are in a bounded dependency, as they are close together in the sentence and the verb directly depends on the noun.

Unbounded dependencies, on the other hand, are more challenging to model and analyze, as the words that are involved in the dependency may be far apart in the sentence and may be separated by many other words. For example, in the sentence “I think that the cat sat on the mat”, the noun “cat” and the verb “sat” are in an unbounded dependency, as they are separated by several other words and the verb does not directly depend on the noun.

Overall, bounded and unbounded dependencies are two types of dependencies between words in a sentence. Bounded dependencies are relatively easy to model and analyze, while unbounded dependencies are more challenging.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is dependency syntax?

A

Dependency syntax is a type of syntactic analysis that focuses on the dependencies between words in a sentence, rather than on the hierarchical structure of the sentence. In dependency syntax, each word in the sentence is treated as a node, and the dependencies between the words are represented as directed edges between the nodes.

In dependency syntax, the head of a phrase is the word that the other words in the phrase depend on. For example, in the noun phrase “the cat”, the noun “cat” is the head of the phrase, as it determines the meaning of the phrase and the other words in the phrase (i.e. the article “the”) depend on it.

Dependency syntax is used in natural language processing to model the relationships between words in a sentence, and to extract information about the meaning and structure of the sentence. It is often used in combination with other types of syntactic analysis, such as phrase structure grammar and constituent structure, to provide a more complete picture of the sentence.

Overall, dependency syntax is a type of syntactic analysis that focuses on the dependencies between words in a sentence, and it is used to model the relationships between words and to extract information about the meaning and structure of the sentence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are head words in Syntax?

A

In syntax, a head word is the central word in a phrase that determines the grammatical properties of the phrase. For example, in the noun phrase “the big red ball,” “ball” is the head word because it determines that the phrase is a noun and that the other words in the phrase, “the,” “big,” and “red,” are adjectives modifying the noun. In a verb phrase, the head word is typically the main verb, and in an adjective phrase, the head word is typically the adjective. The concept of a head word is important in syntactic analysis because it allows us to understand the structure and function of phrases in a sentence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a synonym?

A

A synonym is a word or phrase that has the same or nearly the same meaning as another word or phrase. For example, “big” and “large” are synonyms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a hypernym?

A

A hypernym is the opposite of a hyponym; it is a more general term that encompasses a group of more specific terms. In the example above, “mammal” is a hypernym of the more specific term “dog.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a hyponym?

A

A hyponym is a word or phrase that is more specific than a more general term. For example, “dog” is a hyponym of the more general term “mammal.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Distributional Hypothesis?

A

The Distributional Hypothesis is a linguistic principle that states that words that are used in similar contexts tend to have similar meanings. This principle is based on the idea that the meaning of a word can be inferred from the words that surround it and the contexts in which it is used. For example, if we see the word “big” often used in sentences with the word “large,” we can infer that the two words have similar meanings.

The Distributional Hypothesis has been a central principle in the field of natural language processing (NLP), where it is used to develop algorithms for tasks such as word sense disambiguation and machine translation. It is also used in the development of word embedding models, which are used to represent words in a continuous vector space in a way that captures the semantic relationships between words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Examples of Open-class and Closed-class words

A

Here are some examples of open-class and closed-class words in English:

Open-class words: book, run, happy, quickly

Closed-class words: of, and, the, with

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can we model and represent unbounded dependencies?

A
17
Q

What is reference, coreference, and anaphora?

A

In natural language, reference is the act of using a word or phrase to refer to something or someone. For example, in the sentence “The cat sat on the mat,” the word “cat” is a reference to a specific feline, and the word “mat” is a reference to a specific object.

Coreference is the relationship between two or more words or phrases in a text that refer to the same thing or person. For example, in the sentence “The cat sat on the mat. She was very happy,” the pronouns “she” and “her” are coreferential with the noun “cat” because they both refer to the same feline.

Anaphora is a type of coreference in which a word or phrase is used to refer back to something mentioned earlier in the text. For example, in the sentence “The cat sat on the mat. She was very happy,” the pronoun “she” is an anaphor that refers back to the noun “cat” earlier in the sentence.

Reference, coreference, and anaphora are important concepts in natural language processing because they are used to understand the relationships between words and phrases in a text, and they can be challenging to model due to the complexity of human language.

18
Q

What are lambda expressions in formal linguistics?

A

In formal linguistics, lambda expressions are used to represent the abstract structure of expressions in a language. Lambda expressions are a way of representing the basic building blocks of a language and the rules for combining them to form more complex expressions.

For example, in the lambda calculus, a formal system for representing the syntax and semantics of programming languages, lambda expressions are used to represent the structure of expressions in a language. In the lambda calculus, a lambda expression takes the form of an abstraction, which is a function that takes one or more arguments and returns a result. For example, the lambda expression λx.x represents a function that takes an argument x and returns x as the result. This lambda expression can be applied to an argument to produce a new expression, such as (λx.x) y, which represents the result of applying the lambda expression λx.x to the argument y.

In formal linguistics, lambda expressions are used to represent the basic building blocks of a language and the rules for combining them to form more complex expressions. This allows linguists to study the abstract structure of a language and to formalize the rules for generating and interpreting expressions in the language.

19
Q

Constraints and preferences on coreference resolution

A

Coreference resolution is the task of identifying and interpreting the relationships between words and phrases in a text that refer to the same thing or person. In order to perform coreference resolution, a model must consider a number of constraints and preferences that are specific to the language and the context in which the text is written.

Some of the constraints and preferences that can affect coreference resolution include the following:

Syntactic constraints: The syntactic structure of a sentence can provide clues about the relationships between words and phrases. For example, a pronoun is more likely to be coreferential with a noun that appears earlier in the sentence and that is in the same syntactic role (e.g. subject or object).

Semantic constraints: The meaning of a word or phrase can provide clues about its potential antecedents (the words or phrases that it might refer to). For example, a pronoun is more likely to be coreferential with a noun that has a similar meaning.

World knowledge: In some cases, knowledge about the real world can help to disambiguate potential antecedents. For example, if a text mentions “the president” and “he,” it is likely that “he” refers to the president, rather than to someone else mentioned in the text.

Discourse context: The words and phrases that appear in the surrounding discourse can provide clues about the relationships between words and phrases in a sentence. For example, if a pronoun is followed by a noun that is a potential antecedent, that noun is more likely to be the actual antecedent.

In order to perform coreference resolution effectively, a model must consider these constraints and preferences, as well as others that are specific to the language and the context in which the text is written.

20
Q

What is a Winograd schema and what are they used for?

A

A Winograd schema is a type of sentence that is designed to test the ability of a natural language processing (NLP) system to understand and interpret language. A Winograd schema consists of two sentences that are similar in structure but have different meanings, and the goal of a Winograd schema challenge is to determine which sentence is the correct one based on the given context.

For example, consider the following Winograd schema:

1) The trophy would not fit in the brown suitcase because it was too big.
2) The trophy would not fit in the brown suitcase because it was too small.

In this example, sentence 1 is the correct sentence, because the trophy is too big to fit in the suitcase. However, sentence 2 is also a plausible sentence, because it is possible that the suitcase is too small to hold the trophy.

Winograd schemas are used to test the ability of NLP systems to understand and interpret language in a way that is similar to how humans do. This is important because the ability to understand language is essential for many NLP tasks, such as machine translation, question answering, and summarization. By testing NLP systems on Winograd schemas, researchers can evaluate their performance on these tasks and identify areas for improvement.

21
Q

What is expressivity in a formal grammar?

A

In formal language theory, the term expressivity refers to the ability of a formal grammar to describe and generate the strings of a language. A grammar is said to be more expressive if it can describe a broader range of strings, while a grammar with less expressivity can only describe a more limited set of strings.

Expressivity is often determined by the specific set of rules and symbols used in the grammar, as well as the overall structure of the grammar. For example, a grammar with more complex rules and a larger number of symbols is generally considered to be more expressive than a grammar with simpler rules and a smaller number of symbols.

In general, the expressivity of a grammar can have a big impact on its usefulness and applicability in a given situation. A more expressive grammar may be better able to capture the nuances and subtleties of a language, while a less expressive grammar may be better suited to simpler tasks.

22
Q

What is verifiability of a formal grammar

A

A meaning representation of a sentence that can be used to determine whether a sentence is true with respect to a given model of the world.

23
Q

Soundness and Completeness of formal systems

A

In formal language theory, soundness and completeness are two important properties that can be used to evaluate the correctness of a formal system, such as a formal grammar or a logical system.

A formal system is said to be sound if it only produces valid conclusions. In other words, a sound system never produces a false statement or a false proof of a statement.

On the other hand, a formal system is said to be complete if it can produce a proof for any statement that is actually true. In other words, a complete system is able to prove all true statements within its domain.

Together, soundness and completeness form a powerful combination. A formal system that is both sound and complete is considered to be a “correct” system, as it is able to produce only valid conclusions and can prove all true statements within its domain.

24
Q

Pros and Cons of Distributional semantics

A

Pro: A useful way to represent the meanings of words
Pro: Can deal with notions of simiality
Con: Not clear how to deal with compositionality
Con: Unclear how to do inference - this is a problem for sentential not lexical semantics.

25
Q

A Meaning representation should be… (5)

A

Verifiable

Unambiguous - produces one ouput for each sense of a sentence. A sentence with multiple senses should have a different output for each sense

Has a canonical form - If there are two different sentences that have the same meaning, they should have the same meaning representation.

Can do Inference: We should be able to verify sentences directly and by drawing conclusions based on the input meaning representation and facts in a knowledge base.

Expressive:
- Ideally able to express the meaning of any natural language sentence.

  • In practice we may use smaller meaning representations that cover a lot of what we want.
26
Q

What does it mean for a meaning representation to have a canonical form?

A

Has a canonical form - If there are two different sentences that have the same meaning, they should have the same meaning representation