Part 1 : Data Acquisition and Characteristics Flashcards
Analogue to Digital conversion involves
Sampling and Quantisation.
Sampling
Ascertain the momentary value of (an analogue signal) many times a second so as to convert the signal to digital form.
Quantisation
The process of mapping a large set of input values to a (countable) smaller set.
Nyquist Shannon Sampling Theorem
If a function x(t) contains no frequencies higher than B hertz, it is completely determined by giving its ordinates at a series of points spaced 1/2B seconds apart.
Nyquist Shannon Sampling Theorem (Laymans)
If the highest frequency in the signal is f(max) the sampling rate must be at least 2f(max).
Valid distance measure D(a,b) has properties
- Non-negative
- Reflexive
- Symmetric
- Satisfies Triangular Inequality
Minowski Distant or order p (p-norm distance) is defined as
D(x,y) = (Σ|x(i) - y(i)|^p)^(1/p)
When p=1, 1-norm distance, Minowski
(aka Manhattan)
D(x,y) = Σ|x(i) - y(i)|
When p=2, 2-norm distance, Minowksi
(aka Euclidean)
D(x,y) = ((x-y)^T(x-y))^(1/2)
When p=∞, ∞-norm distance, Minowski
(aka Chebyshev)
D(x,y) = max(|x1-y1|, |x2-y2|,…,|xn-yn|)
Time series
Successive measurements made over a time interval
(Numerical Time Series), P-norm Distances can only
- Compare time series of the same length
- Very Sensitive respect to signal transformations
- Shifting
- Uniform Amplitude Scaling
- Non-Uniform Amplitude Scaling
- Uniform Time Scaling
Dynamic Time Warping (Berndt and Clifford, 1994)
- Replaces Euclidean one-to-one with many-to-one
- Recognises similar shapes, even in the presence of shifting and/or scaling
- X = (x0,…,xn) and Y = (y0,…,yn) and Rest(X) = (x1,…,xn)
- DTW(X,Y) = D(x0,y0) + min{DTW(x, REST(Y)), DTW(REST(X),Y), DTW (REST(X), REST(Y)))}
- Solved Efficiently using dynamic programming by building an nxm distance matrix
(Distance Symbolic) in text could be
- Syntactic
- Semantic
Syntactic
- Defined over symbolic data of the same length
- Measures the number of substitutions required to change one string/number into another
Syntactic e.g. Hamming Distance
Returns the number of mismatches, max = length
Syntactic e.g. Edit Distance
Measures the minimum number of ‘operations’ required to transform one sequence to another
Syntactic e.g. Edit Distance - Operations
- Insertion
- Substitutions
- Deletion
Semantic
Built on top of a hierarchy of word semantics, e.g. WordNet (Princeton)
Semantic tool WordNet
Contains 117,000 synsets (synset: set of one or more synonyms that are interchangeable in some context)
Semantic e.g. WUP ( Wu and Palmer Distance, 1994)
WUP finds the path length to the root node from the LCS (Least Common Subsrciber), the value is scaled by the sim of the path lengths from the original concepts to the root.
Semantic e.g. WUP Equation
WUP(C1,C2) = (2N3)/(N1+N2+(2N3))
WordNet Relationships - Hyponomy
(is-a relationship) e.g. furniture -> bed
WordNet Relationships - Meronymy
(part-of relationship) e.g. chair -> seat