VIII: Natural Language Processing Flashcards
What is natural language processing?
Natural language processing (NLP) is about interacting and communicating with computers in written and verbal form. The ultimate goal of natural language processing is to build software that analyzes, understands and generates human languages naturally. Traditionally ML has been used for NLP tasks.
What steps are involved in NLP?
Analysis and interpretation done by serving meaning from an input in order to provide useful information as an output. This processing includes parsing sentences in order to determine the structure and meaning of a sentence, extracting syntax, semantics and also include the generation of an output that is useful for an end user.
When did the process of NLP start?
In 1950, Alan Turing proposed the imitation game as a way of looking at intelligence in a computer and thus inspired work efforts that lead to NLP. Other famous historic moments are the creations of ELIZA “the chatbot” and Minsky “the scripted bot based on sequential information as schemata”. One early attempt is the finite state machine (FSA) that accepted input and carried out simple commands, such as open/close doors. FSM could be used to recognize and generate strings in such a way that the nodes of a graph are the states of the generator.
What were some of the earlier tries of NLP?
One early attempt to handle natural language is the finite state machine (FSA) that accepted inputs and carried out simple commands, open and close doors for an example.
What is parsing and generation?
Parsing and generation are essential when dealing with conversations between humans and computer systems. Parsing means breaking down sentences and identifying their structures while generation means assembling the constituents into meaningful structures like text or audio.
What is a paser?
A parser is a program that works with the grammatical structure of sentences by using relationships of words. Parsing texts is about analyzing the words for grammar and to check correctness or use the word for a particular purpose.
Why are subfields important?
When working with NLP, understanding languages by dividing it into subtasks is important. Syntax is the structure of sentence building. Morphology is the structure of words from the most primitive meaningful grammatical units of a language. Lexicon is a dictionary and pragmatics is situation-based.
What is part-of-speech tagging?
A certain part of a sentence gets tagged in order to analyze and identify each word in the sentence.
Backus-Naur Form:
Backus-Naur Form (BNF) is a notation used to describe the syntax of a programming language or formal grammar. It consists of production rules that define the structure of valid sentences in the language. BNF uses symbols, terminals, and non-terminals to represent elements of the language, and it specifies how these elements can be combined using production rules. BNF is widely used in the field of computer science and plays a fundamental role in the design of programming languages and the development of parsers and compilers.
Context-free grammar:
Context-free grammar (CFG) is a formal grammar that describes the syntax of a language. It consists of a set of production rules that define how symbols can be combined to form valid sentences. In a CFG, each production rule consists of a non-terminal symbol on the left-hand side and a sequence of symbols (terminals or non-terminals) on the right-hand side. The non-terminal symbols can be replaced with other symbols based on the production rules, allowing for recursive expansion. CFGs are widely used in various areas of computer science, including parsing, natural language processing, and compiler design.
Definite clause grammar
Definite Clause Grammar (DCG) is a grammar formalism used for specifying the syntax and structure of natural languages. It extends context-free grammars by incorporating logical rules and constraints. DCG rules consist of a head and a body, where the head represents a linguistic category and the body defines the rules and conditions associated with that category. DCG allows for the specification of context-dependent rules and supports the integration of logic programming features. It is commonly used in natural language processing and computational linguistics to parse and generate sentences based on a set of grammar rules.
What are the limitations of NLP?
Pretrained models in Natural Language Processing (NLP) have certain limitations. First, they may not adequately capture the nuances and specific context of a particular domain or task since they are trained on general text corpora. Fine-tuning or domain-specific training might be necessary to achieve optimal performance. Second, pretrained models heavily rely on the quality and representativeness of the training data, which can introduce biases and inaccuracies. It’s crucial to be mindful of these biases and perform careful evaluation and testing.
Regarding sentiment analysis, it also has limitations. First, sentiment analysis models may struggle with sarcasm, irony, or nuanced expressions, often misinterpreting them. Second, sentiment analysis relies heavily on context, and minor changes in phrasing or negation can alter the sentiment classification. Third, sentiment analysis models might not account for cultural or individual differences in language usage and interpretation of emotions. Lastly, sentiment analysis can be influenced by data biases, leading to inaccurate or unfair predictions for certain groups. Addressing these limitations requires ongoing research, training on diverse data, and considering a broader range of contextual factors in sentiment analysis models.