LUDE midterm 1 Flashcards

1
Q

what is phonology?

A

how sounds are organized in languages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is morphology?

A

how words and word forms are built

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is syntax

A

how to build sentences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is Semantics

A

meaning of words and sentences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is pragmatics?

A

how meaning works in context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the 2 sub fields of phonetics?

A

sounds that human vocal tract can produce // gestures that sign languages have

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is NLP

A

Natural language processing is a subfield of computer science and (AI) that helps computers understand and communicate with human language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the goals of NLP?

A

NLP allows computers and digital devices to recognize, understand and generate text and speech.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the three types of writing systems?

A

Alphabetic systems
Syllabic systems
Logographic systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what language is an example of the alphabetic system?

A

English and korean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what language is an example of the syllabic system?

A

Japanese

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what language is an example of the logographic system?

A

chinese

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how are the 3 types of writing systems differentiated?

A

the content represented by the symbols/characters in the written language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how is the alphabetic system split up?

A

phonemic, abjads and phonetic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the phonemic alphabet?

A

Sets of letters arranged in a specific way, each letter represents a phoneme

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an abjad?

A

also known as consonant alphabets. They have independent letters for consonants and may indicate vowels using some of the consonant letters and/or with diacritics.
ie: arabic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is the phonetic alphabet?

A

symbols associated with the sounds of english letters ie: ipa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the Syllabic system?

A

building blocks of speech, usually with a structure of CVC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the Abugidas system?

A

the main element is the syllable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is an example language in the Abugidas system?

A

Hindi, cree, dene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the importantce of diacritics in the Abugidas system?

A

they change or mute the inherent vowel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the syllabary system?

A

A syllabary has a different glyph for each syllable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is transliteration?

A

a conversion of the characters in one writing system to another system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

why is IPA important & why is it helpful?

A

ipa accurately describes pronunciation. IPA eliminates the ambiguities of spelling by assigning unique symbols to each distinct sound,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the logographic system?
a symbol representing a unit of meaning, chinese
26
What is the pictograph system?
pictures of the items to which they refer, Traffic symbol systems
27
what is a bit?
binary digit
28
how many bits there are in a byte?
1 byte = 8 bits
29
can you explain a byte?
A group of eight 0s and 1s is a byte. If we have 8 slots and each of them can be 1 or 0, it means we have 28 (=256) unique combinations
30
what is ascii?
The Standard Code for Information Interchange ASCII, common character encoding format for text data in computers and on the internet.
31
how many symbols ASCII can encode
128 symbols, 33 non printables
32
what is unicode?
represent the characters in ALL writting systems
33
how many bytes are in utf8?
1-4
34
Each sequence of bytes begins with a...
0
35
The amount of 1s before the initial 0 tells the computer...
how many bytes are in one symbol.
36
Binary (Base-2) system is represented by
only 0s and 1s
37
Decimal (Base-10) system is represented by
decimal uses 0-9
38
Hexadecimal (Base-16) is represented by both...
letters and numbers
39
what is the main difference between UTF-8 and UTF-32
UTF-8
40
whats the difference between vowels and consonants
vowels require the vocal tract to be open and consonants have the vocal tract closed or partially Consonants have low amplitude while vowels have high amplitude
41
whats the difference between voiced and voiceless consonants
whether or not the vocal cords vibrate
42
what is acoustic phonetics?
study of speech sounds, amplitude of waveforms, and frequency on spectrum
43
what is a sample rate?
the number of recorded discrete points
44
what are the key concepts of acoustic phonetics?
Frequency, Amplitude, Formant
45
what is frequency?
cycle per second Pitch, high & low note, from auditory perspective
46
What is Amplitude?
loudness
47
what is formant?
a concentration of acoustic energy around a particular frequency in the speech wave
48
how can f1 identify a vowel?
F1 corresponds to the height of the vowel, openness of the mouth
49
how can f2 identify a vowel?
F2 corresponds to the frontness or backness of the vowel, position of the tongue
50
why is spoken language harder to ‘adapt’ for computer in comparison to the written language
Different vocal tracts Dental alignment and oral anatomy Different pronunciations Dialects, variations Speech sound disorders
51
what is ASR
auto speech recognition: processing of human speech into a written format
52
What is used to train a machine learning-based ASR system (what it learns from)?
We give audio imput computer looks at spectrogram freq, hz, and formants and learns from it
53
how did speech recognition work before machine learning
Matching spectrograms data with templates. Speaker-dependent machines
54
why are ASR technologies are important for the endangered languages documentation?
theres a lack of textual data so asr processes speech data to textual
55
what is parametric speech synthesis
speech is based on pitch, duration and formants
56
what is neural speech synthesis
speech is based on raw audio waveforms from text
57
what are the four approaches computational linguistics?
Rule-based approach Statistical approach Machine learning approach Hybrid approach
58
what are three reasons why consistent spelling is important?
Faster reading; Efficient communication; Easy access to information;
59
what are the 3 types of spelling error?
typos, nonword errors, & real word errors
60
whats a typographical error?
we pressed the wrong word
61
whats a Nonword errors
misspelled words, unrecognized names, insertion deletion, phonetic spelling
62
what is a morpheme?
The smallest meaningful unit
63
whats a free morpheme
they can stand alone as independent words. They don't need to be attached to other morphemes like cat
64
whats a bound morpheme?
cannot stand alone as independent words. They must be attached to a free morpheme (a base or root word) (un-, unhappy)
65
whats an inflectional affix
a segment will attach to the word but it wont change the word type ie) like --> likes is still a verb
66
What is a derivational affix?
a segment will attach to the word but it WILL change the word type
67
What is the correct order of the spell-checker workflow?
1. text processing 2. non word error detection 3. generation of candidates 4. suggestions 5. user decision or auto correct
68
what is tokenization?
splitting a text into words;
69
what is stemming?
removing inflectional suffixes
70
what are the 2 Possible Causes of Spelling Errors?
Language-specific issue, & Technology-related factors
71
what is POS tagging
breaking the words down into their type
72
whats an example of user imput?
the full sentence that you type in ie) this cat is bigger than mine
73
whats an example of tokenization
full sentence into individual words
74
whats an example of stemming
removing inflectional suffixes - this cat be big then i
75
what are two reasons why dictionary methods of spell-checking are not always the most effective?
Long wordlist and they keep adding words Unit of entry, different words for prepositions cat --> cats
76
whats an n-gram?
N-grams are sequences of "n" items from a given text or speech. These items can be words, syllables, letters, or phonemes.
77
How do you count the number of word/character n-grams
Identify N: Decide on the value of "n" (e.g., 2 for bigrams, 3 for trigrams). Split the text: Break the sentence or paragraph into individual words. Form the N-grams: Group the words in sequences of "n".
78
what does the Soundex system do?
words with similar characteristics are in a bin and a misspelt word with a similar key and characteristics will be pulled from said bin
79
how do you convert a word to soundex
use the calculator or ask chatgpt
80
how does the confusion matrix work?
A confusion matrix is a visualization of how well a classification model is performing. It shows the actual vs. predicted results for your model, helping you see where it's making correct predictions and where it's getting things wrong.
81
what are the rules for edit distance?
substitution1, deletion1, transpose2, insertion1
82
3 possible operations in dynamic programming are....
delete, insert, substitute
83
what is the goal of the dynamic programming method?
Technical solution to finding the most efficient route
84
what is a real word error?
real word error is a word thats spelt correctly but the meaning isn't write ie) their is 4 swans
85
why are real word mistakes more difficult for computers to fix than non-word mistakes?
because real word errors are spelt correctly but their intended meaning is wrong
86
whats a syntactic tree?
a syntactic tree is a way of organizing a sentence into phrasal categories
87
what are the 2 techniques that grammar checkers use?
relaxation-based techniques and mal-rules
88
what is a relaxation-based technique for grammar checking?
it can be forgiving of mistakes typically improper use of verb/nouns
89
what is the mal-rule technique for grammar checking?
person input rules in to computer and computer learns based off of rules
90
why does mal rule suck
because you have to enter in all the rules
91
how do you calculate probability?
look at slide 19 on 6.2
92
what is wordnet?
wordnet is a website that compares the SEMANTIC relationship between words
93
what is a learner’s language corpus
collection of written or spoken texts produced by language learners used to study their language patterns, errors, and development.
94
what are 2 reasons why large language models (LLM) are better in real word mistake detection?
they have a better understanding of context, can catch agreement mistakes between clauses and can adapt to writing preferences
95
what is call?
Computer-Assisted Language Learning
96
what is icall?
icall uses linguistic properties to make CALL better
97
what is a frame-based call system?
anything multiple choice or fill in the blank
98
what is a positive transfer model?
syntactically the learning language is similar to the known language
99
what is a negative transfer model?
syntactically the learning language is NOT similar to the known language and when trying to speak or use learning language they try to apply known language rules