Extra questions Flashcards
Cognitive Behavioural Therapy (CBT)
Cognitive Behavioural Therapy (CBT) is considered to be one of the best methods to address stress as it is easy to implement on people and also gives good results.
This therapy includes understanding the behaviour and mindset of a person in their normal life.
With the help of CBT, therapists help people overcome their stress and live a happy life.
What is a Chabot?
A chatbot is acomputer program that’s designed to simulate human conversation through voice commands or text chats or both. Eg: Mitsuku Bot, Jabberwacky etc.
Syntax:
Syntax refers to the grammatical structure of a sentence.
Semantics:
It refers to the meaning of the sentence.
Which package is used for Natural Language Processing in Python programming?
Natural Language Toolkit (NLTK). NLTK is one of the leading platforms for building Python programs that can work with human language data.
What is inverse document frequency?
To understand inverse document frequency, first we need to understand document frequency.
Document Frequency is the number of documents in which the word occurs irrespective of how many times it has occurred in those documents.
In case of inverse document frequency, we need to put the document frequency in the denominator while the total number of documents is the numerator.
Does the vocabulary of a corpus remain the same before and after text normalization? Why?
No,the vocabulary of a corpus does not remain the same before and after text normalization. Reasons are –
●In normalization the text is normalized through various steps and is lowered to minimum vocabulary since the machine does not require grammatically correct statements but the essence of it.
●In normalization Stop words, Special Characters and Numbers are removed.
●In stemming the affixes of words are removed and the words are converted to their base form. So,after normalization, we get the reduced vocabulary
Explain the relation between occurrence and value of a word.
As the occurrence of words drops, the value of such words rises.
As shown in the graph, occurrence and value of a word are inversely proportional.
The words which occur most (like stop words) have negligible value.
Why are human languages complicated for a computer to understand? Explain.
The communications made by the machines are very basic and simple. Human communication is complex. There are multiple characteristics of the human language that might be easy for a human to understand but extremely difficult for a computer to understand.
Arrangement of the words and meaning -There are rules in human language. There are nouns, verbs, adverbs, adjectives. A word can be a noun at one time and an adjective some other time. This can create difficulty while processing by computers.
What are the steps of text Normalization? Explain them in brief
- Sentence Segmentation -Under sentence segmentation, the whole corpus is divided into sentences
- Tokenisation-After segmenting the sentences, each sentence is then further divided into tokens. Tokens is a term used for any word or number or special character occurring in a sentence
- Removing Stop words, Special Characters and Numbers -In this step, the tokens which are not necessary are removed from the token list.
- Converting text to a common case -After the stop words removal, we convert the whole text into a similar case, preferably lower case. This ensures that the case-sensitivity of the machine does not consider same words as different just because of different cases.
- Stemming In this step, the remaining words are reduced to their root words. In other words, stemming is the process in which the affixes of words are removed and the words are converted to their base form.
Lemmatization -in lemmatization, the word we get after affix removal (also known as lemma) is a meaningful one.