Summary Flashcards

1
Q

Most fundamental qualities of sound

A

Pitch (wavelength) and loudness (amplitude)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The larynx is formed by 4 cartilages

A
  • Thyroid
  • Cricoid
  • 2 Arytenoids
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Vocal folds and Vocal Tract

A

Vocal folds are two bands of muscle that are located within the larynx (voice box). They vibrate when air is pushed through them, producing sound.

The vocal tract is the area of the body which includes the vocal folds and all of the other structures involved in producing sound, such as the mouth, nose, and throat.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When does the vocal folds shorten and when lengthen

A

Short : thyroid cartilage contracts –> arytenoid slides –> decreasing of the distance vocal processes and thyroid prominence

Length: cricoid cartilage contracts –> thyroid and cricoid rotate –> increase distance vocal processes and thyroid prominence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does contraction of cartilages do?

A

manipulate length of vocal folds, abduction (vocal folds further) and adduction (vocal folds closer)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

3 involved systems in speech production

A
  1. sub glottal system (initiation phase –> breathing)
  2. glottal system (phonation phase –> Bernoulli so contraction cartilages)
  3. supra-glottal system (articulation phase –> oral and pharyngeal cavity)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Characterizing vowel and consonants

A

vowels:
- location (front, central, back) –> front means higher f1
- tongue position (high, mid, low) –> high means lower f2
- mouth position (rounded or unrounded)

consonant:
- place
-manner
- voiced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

speech characteristics

A
  1. Periodicity –> voiced
  2. local maximum –> vowel
  3. silence and pre voicing –> plosive
  4. noise –> fricatives
  5. burst –> plosive
  6. change in amplitude –> change in sound
  7. change is sound structure –> change mouth position
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Coarticulation

A

the process of blending one sound into another in order to achieve a desired pronunciation
- anticipatory (u influences word onset in stew)
- carryover (u influences consonant in use)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

prosodic features
- properties of larger units of speech and reflects elements of language not encoded by grammar or choice of vocabulary
- To convey meaning and emotion

A
  • intonation (use of pitch to convey meaning in speech)
  • stress (emphasis placed on certain syllables of a word or phrase)
  • Tone (the emotion or attitude in speech)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Two parts of the Fourier spectrum

A
  • Amplitude spectrum
  • Phase spectrum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Fourier transform

A

The Fourier transform is a mathematical technique used to transform a signal from its time domain into its frequency domain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain briefly how the functionality of the cochlea is similar to Fourier Analysis

A

The functionality of the cochlea is similar to Fourier analysis in that it breaks down sound waves into their frequency components. This is done by converting the sound wave into an electrical signal, which is then analyzed by the cochlea. The cochlea then separates the signal into different frequency bands, allowing the auditory system to interpret the sound.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Path of sound

A

Ear canal –> eardrum –> ossicles –> cochlea

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Three small bones (ossicles) in middle ear and function

A

Malleus, incus and stapes
to transmit tiny sound vibrations to the cochlea

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Function and parts of inner ear

A
  • Cochlea
  • Basilair membrane
  • oval window and round window are openings

responsible for converting sounds waves into electrical signals that can be interpreted by the brain

The cochlea also helps to filter out background noise and adjust the volume of incoming sounds. (Bandpass-filter)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Outer ear, parts and function

A
  • auricle (outside)
  • ear canal (connects to middle ear)

funneling the acoustic wave into ear canal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

middle ear, parts and function

A

transfers vibrations of air particles into vibrations of mechanical structures

  • Eardrum
  • ossicles (malleus incus stapes)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does the acoustic reflex?

A

spans the space between stapes and wall of middle ear, if this contracts it reduces the motion of the stapes

  • protects ear from loud noise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Otitis media with effusion

A

Infections where ear cavity fills up with fluid and no longer perform an impedance bridge between air-filled ear canal and fluid filled cochlea.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Mel scale frequency

A

a logarithmic frequency scale used to measure the perceived pitch of a sound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Basic idea of Fourier transform

A

any signal can be approximated by sum of cosines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

VoCoder

A
  • Encoder coding the speech
  • Decoder re-synthesizing speech

technique for coding speech for more efficiently for long distance phone calls

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A3 Scrambling

A

to encode longer distance radio-telephone calls
- frequency bands were rearranged and inverted
- intercepted and decoded by Germans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

SIGSALY (Project X or Green Hornet)

A

based on Vocoder
- needed for encryption (white noise stored on 2 vinyl phonographic records)
- special turntables to synchronize time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Concatenation

A

process of splicing together pieces together of pre-recorded speech

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Signal processing modification

A

process of changing a pre-recorded signal to produce a desired sound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Advantages and disadvantages of concatenation

A

A
: ability to produce natural-sounding speech
: flexibility in creating new words
: speed of production

D
: lack of control over the sound of the speech
: its susceptibility to error
: inability to produce continuous speech

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Advantages and disadvantages of signal processing modification

A

A
: producing greater degreee of control over the sound of input

D
: more computationally intensive
: more difficult to create new words or phrases with its technique

30
Q

Challenges for speech perception

A
  1. Lack of invariance problem
    • phonetic environment
    • differing speech conditions (tempo)
    • speaker variation (dialects)
  2. perceptual constancy and normalization
    • ability recognize and interpret speech sounds regardless the context
    • map signals to independent category
  3. speech segmentation problem
    • difficult to identify and segment individual speech sounds
31
Q

First generation speech synthesis

A

generated by explicit model

  • articulatory synthesis –> using physiological models that stimulate movement vocal tract and articulators.
  • source-filter models –> two components combined, a source (vocal folds) with a filter (vocal tract)
  • formant synthesizers –> digital synthesizers that use combination of source-filter and pre-recorded vocal sample to generate realistic sounding speech
32
Q

Cochlear implants (application of the SIGSALY)

A

–> neuroprosthetic device that bypasses the normal acoustic hearing process by electric stimulation of auditory nerve

33
Q

Generations of speech synthesis

A

first –> source waveform is generated by explicit model
second –> source waveform is generated by data
third –> source waveform is learned from the data

34
Q

second generation speech synthesis

A

tradeoff between processing speed and memory
- model based
- sample based

35
Q

third generation of speech synthesis

A

input is Mel frequency cepstral coefficients
- divide signal in frames of 20-40 ms
- mel filter bank (determine filter bank energies)
- log transform
- compute discrete cosine transform (DCT)

36
Q

Unit selection

A
  • Generating speech using data base of pre-recorded speech samples and selecting most appropriate units of speech form the data base

++ more natural speech
– less generalizable and more recordings needed

36
Q

Unit selection

A
  • Generating speech using data base of pre-recorded speech samples and selecting most appropriate units of speech form the data base

++ more natural speech
– less generalizable and more recordings needed

37
Q

diphones

A

the sound between two adjacent phones, combined to form words

38
Q

advan and disadvantages for third generation speech synthesis

A

A
: automatically train so avoid hand written rules
: high quality synthesis and compact

D
: speech has to be generated by parametric model, final quality is dependent on parameter-to speech technique used

39
Q

applications of text to speech

A
  1. people with visual impairments to listen to text
  2. listening to text during driving
  3. travel information in public transport
40
Q

components of a text to speech synthesizer

A
  • text analysis
    • identify tokens
    • tokenizing (split in smaller chunks)
    • normalization (determine spoken variant of each token)
  • linguistic analysis
    • phonemes
    • prosodic information (intonation, duration, stress, rhythm)
  • waveform generation (1,2,3)
41
Q

Corpus

A

a collection of texts with some unifying characteristics

42
Q

regular expression

A

sequence of characters that define a search pattern in strings of text such as words, phrases and numbers

43
Q

Major uses of corpora?

A
  • applicative (develop nlp tools)
  • analytical (empirical basis on the distribution of constructions and language phenomena)
44
Q

how to do regular expression

A
  • normalizing text (standard form)
  • tokenization (splice words)
  • lemmatization (find similar roots)
  • stemming (make simpler to roots)
  • sentence segmentation (breaking a sentence)
  • compare words and strings
44
Q

dimensions of variation

A
  • multiple languages (code switching)
  • genre (source of the text)
  • demographic characteristics writer
  • language changes over time
45
Q

datasheet properties

A

motivation
situation
language variety
collection process
annotation process
distribution

46
Q

normalization process

A
  1. tokenizing
    • token learner
    • token segmenter
  2. normalizing word formats
    • case folding (lower case)
    • lemmatization
    • morphological parsing
    • stemming
  3. segmenting sentences
47
Q

Homophones and homographs

A

phones –> same sound, different spelling

graphs –> same spelling, different sound

48
Q

Semantic relations

A

synonymy, antonymy, hypernymy/hyponymy, meronymy/holonymy, co-hyponyms

49
Q

synonymy

A

house - villa

same sense, different word

50
Q

antonymy

A

good - bad tegenstelling

51
Q

hypernymy/ hyponymy

A

“dog” is a hyponym of the word “animal”
because animal is less specific

52
Q

meronymy / holonymy

A

fingers is meronym of hand because it is a part of the hand

hand is the homonymy of fingers because it is the whole

53
Q

meronymy / holonymy

A

fingers is meronym of hand because it is a part of the hand

hand is the homonymy of fingers because it is the whole

54
Q

co-hyponyms

A

cat and dog are co-hyponyms because both a type of word animal

55
Q

associated words

A

cup and coffee because belong to same semantic field

56
Q

Connotation / evaluation

A

positive (happy) negative (sad) connotation

pos (great). neg (terrible) evaluation

57
Q

important dimensions of affective meaning

A

1 valence (neg of pos )
2 arousal (excited or not)
3 dominance (control or not)

58
Q

sentiment

A

positive or negative evaluation language

59
Q

two most common used models in vector semantics

A

tf-idf and word2vec

60
Q

tf-idf

A

measure the importance of a term in a document relative to other documents in a corpus

61
Q

word2vec

A

methods used to represent words in a vector space in order to capture semantic and syntactic relationships between words

62
Q

cosine similarity

A

measure of similarity between two vectors, which is calculated by taking the cosine of the angle between the vectors

63
Q

PPM (point wise mutual information)

A

see if a word appears more often with a word than expected

64
Q

Skipgram vs Cbow

A

two methods used to represent words in a vector space

  • CBOW is method used to predict a set of context words given a target word
  • Skipgram is a method used to predict a target word given a set of context words
65
Q

two kind of similarities

A

first-order co-occurrence (wrote and book)
if they are nearby
second-order co-occurrence (wrote and said)
if they have similar neighbors

66
Q

aims to identify opinions
1

A
  1. SO polarity
  2. PN polarity
  3. strength of PN polarity
  4. extracting opinions
67
Q

Balanced corpus

A

big in size
mixed language
full texts
different domains and genres
range of text categories
well documented

68
Q

classifying corpora

A

1 mode (written, spoken, mixed…)
2 representativeness (balanced, specialized)
3 time (diachronic, synchronic)
4 language (mono, multi, parallel, comparable)
5 sampling (full documents, sample)
6 mark up (raw annotated)