Dialogue Systems Flashcards
Dialog, speech acts
A dialogue is a sequence of turns. Each turn is a single contribution from one speaker.
Utterances in a dialogue are called speech acts.
A speech act can be of four main typologies:
- constative: answering, claiming, confirming, denying, disagreeing, stating
- directive: advising, asking, forbidding, inviting, ordering, requesting
- commissive: promising, planning, vowing, betting
- acknowledgment: apologizing, greeting, thanking, accepting an acknowledgment
Dialogue systems: grounding
A dialogue is a collective act where participants exchange information. To this end, participants need to establish a common ground.
In this process, very often the hearer sends acknowledgments that she has understood the speaker. This is called grounding.
Chatbots main classes
Three main classes of chatbots:
- rule-based systems: use hand-written regular expressions
- corpus-based systems
- hybrid systems
Corpus-based chatbots and techniques
Corpus-based systems mine large datasets of human-human conversations.
Once a chatbot has been put into practice, the acquired human turns can be used as additional data for fine tuning.
Two main neural techniques to provide a response to a user turn:
- response by retrieval: BERT [CLS]
- response by generation: encoder-decoder
Corpus-based chatbots: response by retrieval
Let C be a corpus of conversations. The main idea is to view a user turn as a query q, and to retrieve from C the response r* that is most similar to q.
We use a bi-encoder model, in which we train two separate encoders to encode the user query and the candidate response.
We can implement this through BERT’s [CLS] token: write formulas
Corpus-based systems: response by generation
The main idea is to think of response production as an encoder-decoder task, transducing from the user’s prior turn to the system’s turn.
Digital assistants, architectures
Digital assistants have the goal of helping a user to solve specific tasks.
We distinguish two main architectures for digital assistants:
- frame-based architecture: one of the very early architectures, still in use in medium-scale systems
- dialogue-state architecture: more advanced, used in modern, large-scale industrial systems
Frame-based dialogue systems, tasks
A frame is a kind of knowledge structure representing information and intentions that the system needs to extract from user’s sentences, and is defined as a collection of slot, value pairs.
The system goal is to fill the slots in the frames with the appropriate values.
3 general tasks:
- Domain classification: in case of multi-domain dialogue systems, detect the appropriate domain.
- Intent determination: given the domain, which goal is the user trying to accomplish?
- Slot filling: extract the particular slots and fillers needed to carry out the user intent.
example of frame at slide 28 pdf 14…
Dialogue-state architecture, components
Dialogue-state architecture is a more advanced version of the frame-based architecture.
A typical dialogue-state system is based on 6 components:
- Automatic speech recognition
- Natural language understanding component extracts slot fillers from the user utterances using machine learning.
- Dialogue state tracker maintains the current state of the dialogue
- Dialogue policy component decides what to do next: answer a question, ask a clarification, make a suggestion, and so on.
- Natural language generation component can condition on the exact dialogue context, to produce turns that seem much more natural
- Text to speech
Dialogue-state architecture: natural language understanding component
This component exploits sequence labeling and sentence classification techniques from previous lectures to solve the following three tasks:
- domain classification
- intent extraction
- slot filling
We need a training set that associates each sentence with the correct domain, intent, and set of slots.
Dialogue acts in dialogue-state systems
Dialogue-state systems make use of dialogue acts, which:
- implement the conversation turn
- carry out the function of speech act and of grounding
Depending on the domain, each system uses a specific set of dialogue act categories to classify its dialogue acts.
Dialogue-state architecture: dialog state tracker component, correction acts
A dialogue state consists of:
- the entire frame at a given point of the dialogue
- the user’s most recent dialogue act
The dialogue state tracker computes the current dialogue state, that is:
- updates the running frame
- classifies the user’s most recent dialogue act
If a dialogue system misunderstands an utterance, the user will generally correct the error by reformulating the utterance. This is called a correction act.
Dialog state tracker is also in charge of detecting correction acts, and interacts with slot filling to decide which slot value is being changed.
Dialogue-state architecture: dialog policy component
This component decides what dialogue act the system should generate at step i, based on the entire dialogue state.
Âi = armgmax Ai in A of P(Ai|Framei-1, Ai-i, Ui-1)
Aj be the act from the system and Uj be the act from the user.
Probabilities can be estimated by a neural classifier, using neural representations of the slot fillers and the utterances.
Dialogue-state architecture: natural language generation component
This component generates the text of a response to the user, once the policy has decided what dialog act to generate.
This task is modelled in two stages:
- content planning: what to say?
- sentence realization: how to say it? translates from the dialogue act and its arguments to text sentences.
Sentence realizer is trained on representation/sentence pairs from a large corpus of labeled dialogues.
Dialogue systems: evaluation
Chatbots are evaluated by humans, who assign a score.
- Participant evaluation: evaluation is carried out by the human who talked to the chatbot
- Observer evaluation: evaluation is carried out by a third party who reads a transcript of a human/chatbot conversation