File 16.0, 16.3-16.5: Language and computers (F) Flashcards
Word spotting
Program focusses on words it knows and ignores the ones it doesn’t
Limited domain
Programs perform better when their use is restricted to a limited domain
Spoken-language dialogue system
Make use of less complex types of systems such as interactive systems that produce but don’t understand speech, or systems that present options verbally but require the user to answer by pressing buttons.
Isolated speech
The user speaks the input clearly and without extraneous words
Continuous speech
The input can be more like normal speech
Barge-in
The user interrupts and talks over the computer. Systems differ as to whether they can deal with this.
Automatic speech recognition
Involves the use of computers to transform spoken language into written (or computer-understandable) language
Language processing and understanding
Often a deep analysis is required, including building syntax tress tonfigure out the inputs structure
Parsing
Analyzing sentences syntactically is known as parsing
Dialogue management
The system needs to understand the intentional structure of the conversation
For example, the main intention is to schedule a travel itinerary, but this goal can be achieved by accomplishing certain subtasks.
Error recovery
Getting the conversation back on track after a misunderstanding
Text generation
Involves the use of computers to respond to humans using natural language by creating sentences that convey the relevant information
Speech synthesis
The words that make up the generated text must be converted into a sequence of sounds
Wizard of Oz simulation
In which users think they are interacting with the actual computer system, but in fact, a hidden human controller simulates some aspects of the system
This is used to test the system and change any defaults.
Translation
The task of converting a text written in one language (source language) into a text in another language (the target language)
Machine translation
The use of computer to carry out translation
Fully automatic high-quality translation (FAHQT)
The main aim of achieving for machine translation
Direct translation
The system he design or bilingual unidirectional translation: every word is translated and then some reordering is performed based on morphological and syntactic rules of the target language in order to produce the finished test.
Interlingua method
The source language is first translated into an intermediate abstract representation that contains efficient information in it to allow the creation of a target language text. Allows the creation it multilingual systems with relative ease
Transfer method
The source text is analyzed to produce a source language intermediate representation, which is then transferred to a target language intermediate representation and then the target language text is generated.
Hybrid or mixed systems
Nowadays, for translations of text multiple different systems are combined and used for translation
Corpus
A collected body of text
Corpus linguistics
Involves the design and the annotation of corpus materials that are required for specific purposes.
Balanced corpora
Try to remain balanced among different genres
Reference corpus
Frozen corpora, meaning that once a specified amount of texts has been collected and annotated, the corpus is complete.
Monitor corpus
As new texts continue to be written or spoken, a monitor corpus continues to grow, gathering more and more data.
Hansard corpus
Contains French and English version of the same Canadian parliamentary sessions
Multext corpus
Contain more then two languages
Bi-text
Texts that contain the same sentences written in different languages
Parallel corpus
Contains bi-texts
Levels of annotation
Corpora can be made to show different kinds of linguistically relevant information, called representations. Each representation receives a label called annotation (such as noun, verb, or word function)