- Defined over symbolic data of the same length - Measures the number of substitutions required to change one string/number into another

Part 1 : Data Acquisition and Characteristics Flashcards by Holly May Baker

Analogue to Digital conversion involves

Sampling and Quantisation.

How well did you know this?

Not at all

Perfectly

Sampling

Ascertain the momentary value of (an analogue signal) many times a second so as to convert the signal to digital form.

How well did you know this?

Not at all

Perfectly

Quantisation

The process of mapping a large set of input values to a (countable) smaller set.

How well did you know this?

Not at all

Perfectly

Nyquist Shannon Sampling Theorem

If a function x(t) contains no frequencies higher than B hertz, it is completely determined by giving its ordinates at a series of points spaced 1/2B seconds apart.

How well did you know this?

Not at all

Perfectly

Nyquist Shannon Sampling Theorem (Laymans)

If the highest frequency in the signal is f(max) the sampling rate must be at least 2f(max).

How well did you know this?

Not at all

Perfectly

Valid distance measure D(a,b) has properties

Non-negative
Reflexive
Symmetric
Satisfies Triangular Inequality

How well did you know this?

Not at all

Perfectly

Minowski Distant or order p (p-norm distance) is defined as

D(x,y) = (Σ|x(i) - y(i)|^p)^(1/p)

How well did you know this?

Not at all

Perfectly

When p=1, 1-norm distance, Minowski

(aka Manhattan)

D(x,y) = Σ|x(i) - y(i)|

How well did you know this?

Not at all

Perfectly

When p=2, 2-norm distance, Minowksi

(aka Euclidean)

D(x,y) = ((x-y)^T(x-y))^(1/2)

How well did you know this?

Not at all

Perfectly

When p=∞, ∞-norm distance, Minowski

(aka Chebyshev)

D(x,y) = max(|x1-y1|, |x2-y2|,…,|xn-yn|)

How well did you know this?

Not at all

Perfectly

Time series

Successive measurements made over a time interval

How well did you know this?

Not at all

Perfectly

(Numerical Time Series), P-norm Distances can only

Compare time series of the same length
Very Sensitive respect to signal transformations
Shifting
Uniform Amplitude Scaling
Non-Uniform Amplitude Scaling
Uniform Time Scaling

How well did you know this?

Not at all

Perfectly

Dynamic Time Warping (Berndt and Clifford, 1994)

Replaces Euclidean one-to-one with many-to-one
Recognises similar shapes, even in the presence of shifting and/or scaling
X = (x0,…,xn) and Y = (y0,…,yn) and Rest(X) = (x1,…,xn)
DTW(X,Y) = D(x0,y0) + min{DTW(x, REST(Y)), DTW(REST(X),Y), DTW (REST(X), REST(Y)))}
Solved Efficiently using dynamic programming by building an nxm distance matrix

How well did you know this?

Not at all

Perfectly

(Distance Symbolic) in text could be

Syntactic

- Semantic

How well did you know this?

Not at all

Perfectly

Syntactic

Defined over symbolic data of the same length

- Measures the number of substitutions required to change one string/number into another

How well did you know this?

Not at all

Perfectly

Syntactic e.g. Hamming Distance

Returns the number of mismatches, max = length

How well did you know this?

Not at all

Perfectly

Syntactic e.g. Edit Distance

Measures the minimum number of ‘operations’ required to transform one sequence to another

How well did you know this?

Not at all

Perfectly

Syntactic e.g. Edit Distance - Operations

Insertion
Substitutions
Deletion

How well did you know this?

Not at all

Perfectly

Semantic

Study These Flashcards

Built on top of a hierarchy of word semantics, e.g. WordNet (Princeton)

Semantic tool WordNet

Study These Flashcards

Contains 117,000 synsets (synset: set of one or more synonyms that are interchangeable in some context)

Semantic e.g. WUP ( Wu and Palmer Distance, 1994)

Study These Flashcards

WUP finds the path length to the root node from the LCS (Least Common Subsrciber), the value is scaled by the sim of the path lengths from the original concepts to the root.

Semantic e.g. WUP Equation

Study These Flashcards

WUP(C1,C2) = (2N3)/(N1+N2+(2N3))

WordNet Relationships - Hyponomy

Study These Flashcards

(is-a relationship) e.g. furniture -> bed

WordNet Relationships - Meronymy

Study These Flashcards

(part-of relationship) e.g. chair -> seat

WordNet Relationships - Troponymy

[for verb hierarchies] (specific manner) e.g. communicate -> talk -> whisper

WordNet Relationships - Antonymy

(strong contact) e.g. wet dry

Mean - 1D

µ = 1/N(Σx(i))

Variance(Spread) - 1D

σ^2 = 1/(N-1)(Σ(x(i)-µ)^2)

Standard Deviation - 1D

σ = (1/(N-1)(Σ(x(i)-µ)^2))^(1/2)

Mean - Multi-Dimensional

Calculated independently for each dimension

Variance - Multi-Dimensional

Computed along each dimension using a Covariance Matrix

Covariance Matrix

- Variances on the diagonal | - Also measures correlation

Covariance Matrix (Correlation)

- Positive Covariance, means a proportional relationship between the variables - Negative Covariance, indicates an inverse proportional relationship

Eigenvectors and Eigenvalues

Av = λv - v -> EigenVector - λ -> EigenValue

Characteristic Equation

|A − λI| = 0 - I -> Identity Matrix - A -> Determinant of the Matrix

Determinant of a Matrix

|A| = ad − bc

Eigenvectors define principle axis

- Major Axis: eigenvector corresponding to larger Eigenvalue - Minor Axis: eigenvector corresponding to smaller Eigenvalue - Represented using major and minor axis of ellipses

Data Normalisation Methods

- Rescaling - Standardisation (aka z-score) - Scaling to unit length

(Data Normalisation) Rescaling

x' = ( x- min(x) / max(x)-min(x) )

(Data Normalisation) Standardisation

x' = (x-µ) / σ

(Data Normalisation) Scaling to Unit Length

x' = x / ||x||

Data Outliers

small number of points with values significantly different from that other points, not always easy to remove

Median

- Difficult, median of two sets cannot be defined in terms of the individual medians

Note - Sample Variance Vs. Variance

Only estimates, (N-1) gives unbiased estimate of the variance

Normal Distribution

N(µ, σ^2)

Normal Distribution - Standard Deviation

- 1sd - 68% - 2sd - 95% - 3sd - 99.9%

Normal Distribution - Multi-Dimensional

Part 1 : Data Acquisition and Characteristics Flashcards

(47 cards)