Chapter 4 : Data Processing Flashcards
Where does data come from?
Agent’s sensors
What does higher sampling frequency mean?
Less information missing
What is aliasing?
When sampling, signals become aliases of each other and can’t reconstruct original signal anymore
What is noise?
Obscures features in data
What is artificat?
Makes it appear as a feature exists when it does not
is DFT created on idea of in time any signal can be seen as a sum of sine functions
True
What is PSD (Power Spectrum Density)?
Shows the power at each point of frequency. Co-related to squared amplitude of DFT.
What is an modified Periodogram?
When the window size is not a vector of 1 x number of data points
What is welch method?
- Divide signal into segments
- Take periodogram of each of the segments
- Average all periodograms
Purpose of welch method?
Smoother PSD with reduced variance
What domain are filters applied?
Frequency domain
What is frequency response?
Explains how a filter or system effects signals in frequency domain in terms of amplitude response and phase response
What is FIR filter?
Impulse Response is finite because no feedback loop
What type of filter is moving average filter?
Low pass filter
Why normalize filter?
So the size of the output does not depend on size of input
What is the FIR Output Filter Delay?
Output delay of symmetric FIR filters is
(Window size - 1) / 2
Why do we need other filters other then moving average filter?
- Moving average filter needs to be large to remove a lot of noise
- A large filter causes more delay
- Transition band of moving average filter can be very large
What is a matched filter?
FIR filter which tries to extract features from a known spatial signal
What are problems with matched filter?
- Very sensitive to signal change
- Very fine-tuned for signals
What type of output does median filter give?
Constant output (box version)
What is the purpose of feature selection?
- Achieves faster training
- Select ML models with less complexities which leads to a less chance of overfit and easier to interpret
- Better generalization can be achieved, accuracy can be improved
What does variance-based feature selection do?
Trying to extract features with most information in them
What does correlation-based feature selection do?
Tries to remove features that are very similar to other features
What does univariate-based feature selection do?
Evaluate each feature individually with respect to output to see which ones are most important
What does sequential-based feature selection do?
Uses greedy algorithm to pick local optimal features, uses both:
- Forward-SFS algorithm
- Backward-SFS algorithm
What is the variable deletion techniques in missing data resolution?
Get rid of variables that have 50-60% missing data
What is the curse of dimensionality?
As number of dimensions grow, the learning/search grows exponentially
What can proper dimensionality reduction lead to?
- Less complex processing
- Faster processing
- Easier visualization
What are PCA algorithm steps
- Normalize the data
- Calculate the covariance matrix
- Calculate eigenvectors and eigenvalues
- Sort by highest-to-lowest by eigen values
- Pick however many dimensions needed
What happens if Pearson-correlation is 0?
The signals show no relationship to eachother
What does Independent Component Analysis (ICA) do?
Used in source separation, maximizes the statistical independence
What conditions are needed for ICA to perform?
- Number of observes must be greater then or equal to number of sources
- The different sources of information must be independent
- Information must be additive
Problems with ICA?
- Order of outputs cannot be determined
- Not perfect
- Computationally expensive (iterative algorithm)
Why is ICA called “blind” source separation (BSS)?
Due to the fact that there is not much particular information known about the sources