LING330: Quiz #5 Flashcards
What do you have to do when you have interacting forces?
Add amplitudes of positive forces
Subtract amplitudes of the negative forces
What info do we have to know to define a sinusoid wave?
Frequency + amplitude (not phase as its not important for speech sounds)
A graph of frequency and amplitude of different waves (with the algebraic sum of the waves) is called a…
Spectrum
Is a complex wave sinusoidal?
No, it’s periodic as its cycle repeats regularly (in a pattern)
What is the basic frequency (the rate at which the pattern repeats) of a complex wave called?
Fundamental frequency (f0)
What determines the pitch of the sound wave?
It’s fundamental frequency (f0)
Harmonics
Component frequencies; their different frequencies and amplitudes are what give a sound it’s quality (why the same note on a piano and a violin sound different)
Fundamental frequency is always equal to…
The greatest common factor of the component frequencies (this number is also where all the numbers of the different waves line up and start over again together)
**the more sinusoids you add together, the more fast changing and complex a pattern you can create
General description of devices for recording sound work
Transfer patterns of speech vibration from the air to a more durable medium
Describe Edison’s phonograph (1877)
Used a stylus to magnify sound waves that came from sound vibrations and etch them into a revolving wax cylinder (later onto a plastic disk)
Stylus ran back over grooves=vibration replicated
Magnetic recorders (invented in 1898 and continued to improve over following decades)
Microphone membrane converts sound vibrations into voltage variation in an electric current
This electric current was then used to create a varying magnetic field
Metal wire or tape passed through the field was magnetized in the corresponding pattern
Playback=running tape back through the magnetic “heads” of the recorder, which converted the magnetic field back to electricity back to membrane variations in a speaker
THUS specific sound events could be preserved and replayed
Kymograph
Kymo=Greek word for wave
Talker speaks into mask connected to a tube
Other end of tube=pressure sensitive membrane connected to a stylus
Air pressure variations of speech caused the membrane and stylus to vibrate
Stylus rested on revolving drum covered with smoked paper
As drum revolved, stylus etched out a white line that directly recorded air pressure variations
What is and what isn’t visible on Jone’s kymograph?
Can see:
Periodic vibrations of vowels
Weak vibration of voiced sounds
Duration of differences of sounds
Can’t see:
Complexity of vocalic wave form
Oscilloscopes and sound spectrographs
Like tap recorders, used a microphone to transfer patterns of vibration in the air into patterns of variation in electrical current
Sound spectrograph:
Used principle similar to Edison’s revolving wax cylinder, but used an electronic filter to separate frequency bands
-could only analyze about 2 seconds of speech at a time (2 or 3 words)
-short speech sample recorded onto magnetic disk then sample replayed multiple times
-each time sample replayed = output passed through variable electronic filter (set to let pass only a specific range of electromagnetic frequencies, like the bass/treble knob on a radio)
-instead of releasing sound, output of electronic filter fed into a moving electric stylus that would etch a dark line onto chemically treated paper attached to a revolving drum
-darkness of burned line=amount of electricity coming through filter=amount of speech energy within that specific frequency range
How was acoustic analysis done at the beginning of the 21st century?
By computer (fast, accurate, easily portable on laptop) Disadvantage: can't handle analog signals because speech waves=analog signals and computers can only process info represented digitally (numbers)
Analog signal
Continuously varying wave (like second hand of a clock sweeping smoothly around an old fashioned clock face)
Speech waves=analog signals
Computers can’t process these
How is analog to digital (A to D) conversion done?
Through SAMPLING
Aka taking repeated measurements at regular intervals (ex: collecting temp every hour and connecting the dots, making an analog wave)
In speech sampling: microphone converts sound pressure wave into variation in electric current (with strength of current proportional to air pressure)
-sound card component in computer=measures the voltage of electric current at regular intervals and records the measurements
-record of measurements=digital representation of speech wave
What two questions must be addressed to get a high-quality signal for a sample?
1- how often to sample (SAMPLING RATE)
2- how precisely to measure (QUANTIZATION)
The higher the sampling rate…
The more info the digital representation will contain
How fast do you have to sample to detect the presence of a sinusoidal component in a complex wave?
TWICE as fast as the highest frequency you want to measure
This is because you have to capture a measurement twice within its period (once in positive phase and once in negative phase)
The Nyquist limit
Highest frequency that can be captured at a given sampling rate
(Ex: Nyquist limit for a sampling rate of 44,000 is 22,000 hz)
Explain aliasing
If frequencies above Nyquist limit are present in the signal being sampled, the wrong shape will appear because the info between the sampled points is lost and the connected dots will take the shape of a much lower frequency (a wave of a much longer period)
Basically: high frequency masquerades as a lower frequency
Result=distortion of digital signal
What kind of wave do you get when you combine two simple sinusoids?
A complex wave
How is aliasing avoided?
By removing all frequencies above the Nyquist limit from the sound signal before the analog-to-digital conversion takes place
Done automatically by a program when the signal is at the electrical stage by passing the current through a low-pass filter (all frequencies above Nyquist limit are blocked)
Quantization
How precise a measurement is (the higher the sampling rate (more decimal places), the more space it takes up on a computer)
Must decide how much rounding error can be accepted
Computer audio systems will default to 16 bits per sample, but 8 bits doesn’t sound bad
Why are digital recordings better quality than analog tape recordings?
In analog recordings: plastic or metal tape=stretches and distorts + noise from turning cogs and hissing tape travelling through heads could never be completely eliminated
Signal-to-noise ratio (SNR)
Goal is to maximize this in recordings
Recording should be as clean and clear as possible (more signal, less noise)
How do you get the most signal possible in a recording?
Take full advantage of the system’s DYNAMIC RANGE (adjust the range to match the sound)
Quantization error
Background noise in a recording due to representation of the continuous analog signal as a series of discrete levels
**the higher the bit rate, the more levels available and the lower the quantization error
Clipping
When the volume on the dynamic range is turned up too far and the amplitude peaks are cut off in a recording
Result: distortion
Factors to remember when recording to fully utilize the dynamic range without clipping
- outside noise in the enviro
- speakers raise and lower their voices, turn their heads, shift their bodies
- papers crinkle with scripts
- *head mounted microphone set to the side of speakers lips can reduce variation + watch level meter
Uni-directional vs omni-directional microphones
Uni: designed for single talker
Omni: pick up sound from all directions so best for recording multiple speakers on one channel
The most basic representation of a speech file
A waveform
Aka a graph of changes in air pressure (amplitude) over time
What type of sound has the highest relative amplitude in a waveform?
Vowels (bc mouth is open)
Also complex repeating pattern (periodic)
**diffs between absolute amplitude in vowels of different waveforms are just due to variation in how loudly each utterance is spoken
Sonorant consonants like nasals and laterals look like vowels in waveforms. What’s the diff ?
Lower amplitude
Less complexity
Waveforms of voiced stops
Periodic
Lower amplitude than vowels and sonorants (sound of vocal fold opening and closing is beating through closed vocal tract)
Transient burst when closure is released into the vowel
In American English: [b, d, g]=periodic energy dies down during the closure unless stop is between other voiced sounds
Voiceless fricatives (waveform)
No repeating pattern, appear as random noise
Strident fricatives=high amplitude
Non strident= may have very low amplitude, may be hard to distinguish from voiceless stops (clue: fricatives not followed by burst)
Voiceless stops (waveform)
Easy to spit bc silent during closure phase (no amplitude) so appear as flat line in waveform (unless there is background noise)
Usually followed by a burst
Aspirated stops: followed by aspiration noise
Voiced fricatives (waveform)
Combine periodicity and noise
Voiced stops: periodicity can die out toward the end of the consonant
Marking off segments based on points of closure and release etc in a waveform
Segmentation
Speech analysis programs allow for this
How do you see the difference between aspirated and unaspirated consonants?
In differences in VOT (voice onset time) aka the amount of time that elapses between the release of the consonant and the onset of periodicity for the vowel
Spectral analysis
Allows us to analyze segment quality (allows us to quantify, visualize and analyze component frequencies and thus to quantify, visualize and analyze the details of sound quality)
Involves algorithms which mathematically analyze the signal in order to accomplish what the electronic filters in the sound spectrograph did: to test the strength of diff frequencies that might be present
Waveform of glottal vibration
A “sawtooth” wave aka steep upslope (because of pressure increase when the vocal folds are blown open) and then a gradual decrease (as they’re pulled together by the Bernoulli effect)
Periodic pattern
How do harmonic frequencies of vocal fold vibration relate to the fundamental frequency?
Harmonic frequencies will always occur at integer multiples of the fundamental frequency (the period of each sub-vibration has to fit exactly into the period of the fundamental)
Ex: voice with f0 of 100 hz, harmonics occur at 200 hz, 300 hz, 400 hz etc
The lower the voice = the more ___ the harmonics
The more dense the harmonics
Why are women’s voices disadvantaged in spectral analysis?
Typically higher f0 = less harmonics present + more breathiness = harmonics that are present may have lower amplitude (especially at higher frequencies)
White noise
APERIODIC sound
Pressure variations are totally random
Wide/broad-band spectrogram vs a narrow band spectrogram
Wide/broad band spectrogram: formant frequencies (regions of high amplitude energy; reflecting changes in resonance frequencies as vocal tract articulators change position) show up as broad bands (in wide band=spectra taken from short windows of speech signal at frequent intervals so the changes over short time periods are evident)
Narrow band spectrogram: when windows at less frequent intervals are used; individual harmonics can be distinguished but time dimension is less precise