LUDE midterm 1 Flashcards
what is phonology?
how sounds are organized in languages
what is morphology?
how words and word forms are built
what is syntax
how to build sentences
what is Semantics
meaning of words and sentences
what is pragmatics?
how meaning works in context
what are the 2 sub fields of phonetics?
sounds that human vocal tract can produce // gestures that sign languages have
what is NLP
Natural language processing is a subfield of computer science and (AI) that helps computers understand and communicate with human language.
what are the goals of NLP?
NLP allows computers and digital devices to recognize, understand and generate text and speech.
what are the three types of writing systems?
Alphabetic systems
Syllabic systems
Logographic systems
what language is an example of the alphabetic system?
English and korean
what language is an example of the syllabic system?
Japanese
what language is an example of the logographic system?
chinese
how are the 3 types of writing systems differentiated?
the content represented by the symbols/characters in the written language
how is the alphabetic system split up?
phonemic, abjads and phonetic
What is the phonemic alphabet?
Sets of letters arranged in a specific way, each letter represents a phoneme
What is an abjad?
also known as consonant alphabets. They have independent letters for consonants and may indicate vowels using some of the consonant letters and/or with diacritics.
ie: arabic
what is the phonetic alphabet?
symbols associated with the sounds of english letters ie: ipa
What is the Syllabic system?
building blocks of speech, usually with a structure of CVC
What is the Abugidas system?
the main element is the syllable
What is an example language in the Abugidas system?
Hindi, cree, dene
what is the importantce of diacritics in the Abugidas system?
they change or mute the inherent vowel
What is the syllabary system?
A syllabary has a different glyph for each syllable.
what is transliteration?
a conversion of the characters in one writing system to another system
why is IPA important & why is it helpful?
ipa accurately describes pronunciation. IPA eliminates the ambiguities of spelling by assigning unique symbols to each distinct sound,
What is the logographic system?
a symbol representing a unit of meaning, chinese
What is the pictograph system?
pictures of the items to which they refer, Traffic symbol systems
what is a bit?
binary digit
how many bits there are in a byte?
1 byte = 8 bits
can you explain a byte?
A group of eight 0s and 1s is a byte.
If we have 8 slots and each of them can be 1 or 0, it means we have 28 (=256) unique combinations
what is ascii?
The Standard Code for Information Interchange ASCII, common character encoding format for text data in computers and on the internet.
how many symbols ASCII can encode
128 symbols, 33 non printables
what is unicode?
represent the characters in ALL writting systems
how many bytes are in utf8?
1-4
Each sequence of bytes begins with a…
0
The amount of 1s before the initial 0 tells the computer…
how many bytes are in one symbol.
Binary (Base-2) system is represented by
only 0s and 1s
Decimal (Base-10) system is represented by
decimal uses 0-9
Hexadecimal (Base-16) is represented by both…
letters and numbers
what is the main difference between UTF-8 and UTF-32
UTF-8
whats the difference between vowels and consonants
vowels require the vocal tract to be open and consonants have the vocal tract closed or partially
Consonants have low amplitude while vowels have high amplitude
whats the difference between voiced and voiceless consonants
whether or not the vocal cords vibrate
what is acoustic phonetics?
study of speech sounds, amplitude of waveforms, and frequency on spectrum
what is a sample rate?
the number of recorded discrete points
what are the key concepts of acoustic phonetics?
Frequency, Amplitude, Formant
what is frequency?
cycle per second Pitch, high & low note, from auditory perspective
What is Amplitude?
loudness
what is formant?
a concentration of acoustic energy around a particular frequency in the speech wave
how can f1 identify a vowel?
F1 corresponds to the height of the vowel, openness of the mouth
how can f2 identify a vowel?
F2 corresponds to the frontness or backness of the vowel, position of the tongue
why is spoken language harder to ‘adapt’ for computer in comparison to the written language
Different vocal tracts
Dental alignment and oral anatomy
Different pronunciations
Dialects, variations
Speech sound disorders
what is ASR
auto speech recognition: processing of human speech into a written format
What is used to train a machine learning-based ASR system (what it learns from)?
We give audio imput computer looks at spectrogram freq, hz, and formants and learns from it
how did speech recognition work before machine learning
Matching spectrograms data with templates.
Speaker-dependent machines
why are ASR technologies are important for the endangered languages documentation?
theres a lack of textual data so asr processes speech data to textual
what is parametric speech synthesis
speech is based on pitch, duration and formants
what is neural speech synthesis
speech is based on raw audio waveforms from text
what are the four approaches computational linguistics?
Rule-based approach
Statistical approach
Machine learning approach
Hybrid approach
what are three reasons why consistent spelling is important?
Faster reading;
Efficient communication;
Easy access to information;
what are the 3 types of spelling error?
typos, nonword errors, & real word errors
whats a typographical error?
we pressed the wrong word
whats a Nonword errors
misspelled words, unrecognized names, insertion deletion, phonetic spelling
what is a morpheme?
The smallest meaningful unit
whats a free morpheme
they can stand alone as independent words. They don’t need to be attached to other morphemes like cat
whats a bound morpheme?
cannot stand alone as independent words. They must be attached to a free morpheme (a base or root word) (un-, unhappy)
whats an inflectional affix
a segment will attach to the word but it wont change the word type ie) like –> likes is still a verb
What is a derivational affix?
a segment will attach to the word but it WILL change the word type
What is the correct order of the spell-checker workflow?
- text processing
- non word error detection
- generation of candidates
- suggestions
- user decision or auto correct
what is tokenization?
splitting a text into words;
what is stemming?
removing inflectional suffixes
what are the 2 Possible Causes of Spelling Errors?
Language-specific issue, & Technology-related factors
what is POS tagging
breaking the words down into their type
whats an example of user imput?
the full sentence that you type in
ie) this cat is bigger than mine
whats an example of tokenization
full sentence into individual words
whats an example of stemming
removing inflectional suffixes - this cat be big then i
what are two reasons why dictionary methods of spell-checking are not always
the most effective?
Long wordlist and they keep adding words
Unit of entry, different words for prepositions cat –> cats
whats an n-gram?
N-grams are sequences of “n” items from a given text or speech. These items can be words, syllables, letters, or phonemes.
How do you count the number of word/character n-grams
Identify N: Decide on the value of “n” (e.g., 2 for bigrams, 3 for trigrams).
Split the text: Break the sentence or paragraph into individual words.
Form the N-grams: Group the words in sequences of “n”.
what does the Soundex system do?
words with similar characteristics are in a bin and a misspelt word with a similar key and characteristics will be pulled from said bin
how do you convert a word to soundex
use the calculator or ask chatgpt
how does the confusion matrix work?
A confusion matrix is a visualization of how well a classification model is performing. It shows the actual vs. predicted results for your model, helping you see where it’s making correct predictions and where it’s getting things wrong.
what are the rules for edit distance?
substitution1, deletion1, transpose2, insertion1
3 possible operations in dynamic programming are….
delete, insert, substitute
what is the goal of the dynamic programming method?
Technical solution to finding the most efficient route
what is a real word error?
real word error is a word thats spelt correctly but the meaning isn’t write ie) their is 4 swans
why are real word mistakes more difficult for computers to fix than non-word mistakes?
because real word errors are spelt correctly but their intended meaning is wrong
whats a syntactic tree?
a syntactic tree is a way of organizing a sentence into phrasal categories
what are the 2 techniques that grammar checkers use?
relaxation-based techniques and mal-rules
what is a relaxation-based technique for grammar checking?
it can be forgiving of mistakes typically improper use of verb/nouns
what is the mal-rule technique for grammar checking?
person input rules in to computer and computer learns based off of rules
why does mal rule suck
because you have to enter in all the rules
how do you calculate probability?
look at slide 19 on 6.2
what is wordnet?
wordnet is a website that compares the SEMANTIC relationship between words
what is a learner’s language corpus
collection of written or spoken texts produced by language learners used to study their language patterns, errors, and development.
what are 2 reasons why large language models (LLM) are better in real word mistake detection?
they have a better understanding of context, can catch agreement mistakes between clauses and can adapt to writing preferences
what is call?
Computer-Assisted Language Learning
what is icall?
icall uses linguistic properties to make CALL better
what is a frame-based call system?
anything multiple choice or fill in the blank
what is a positive transfer model?
syntactically the learning language is similar to the known language
what is a negative transfer model?
syntactically the learning language is NOT similar to the known language and when trying to speak or use learning language they try to apply known language rules