Part 1 : Data Acquisition and Characteristics Flashcards

1
Q

Analogue to Digital conversion involves

A

Sampling and Quantisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sampling

A

Ascertain the momentary value of (an analogue signal) many times a second so as to convert the signal to digital form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Quantisation

A

The process of mapping a large set of input values to a (countable) smaller set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Nyquist Shannon Sampling Theorem

A

If a function x(t) contains no frequencies higher than B hertz, it is completely determined by giving its ordinates at a series of points spaced 1/2B seconds apart.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Nyquist Shannon Sampling Theorem (Laymans)

A

If the highest frequency in the signal is f(max) the sampling rate must be at least 2f(max).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Valid distance measure D(a,b) has properties

A
  • Non-negative
  • Reflexive
  • Symmetric
  • Satisfies Triangular Inequality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Minowski Distant or order p (p-norm distance) is defined as

A

D(x,y) = (Σ|x(i) - y(i)|^p)^(1/p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When p=1, 1-norm distance, Minowski

A

(aka Manhattan)

D(x,y) = Σ|x(i) - y(i)|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When p=2, 2-norm distance, Minowksi

A

(aka Euclidean)

D(x,y) = ((x-y)^T(x-y))^(1/2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When p=∞, ∞-norm distance, Minowski

A

(aka Chebyshev)

D(x,y) = max(|x1-y1|, |x2-y2|,…,|xn-yn|)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Time series

A

Successive measurements made over a time interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

(Numerical Time Series), P-norm Distances can only

A
  • Compare time series of the same length
  • Very Sensitive respect to signal transformations
  • Shifting
  • Uniform Amplitude Scaling
  • Non-Uniform Amplitude Scaling
  • Uniform Time Scaling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Dynamic Time Warping (Berndt and Clifford, 1994)

A
  • Replaces Euclidean one-to-one with many-to-one
  • Recognises similar shapes, even in the presence of shifting and/or scaling
  • X = (x0,…,xn) and Y = (y0,…,yn) and Rest(X) = (x1,…,xn)
  • DTW(X,Y) = D(x0,y0) + min{DTW(x, REST(Y)), DTW(REST(X),Y), DTW (REST(X), REST(Y)))}
  • Solved Efficiently using dynamic programming by building an nxm distance matrix
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

(Distance Symbolic) in text could be

A
  • Syntactic

- Semantic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Syntactic

A
  • Defined over symbolic data of the same length

- Measures the number of substitutions required to change one string/number into another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Syntactic e.g. Hamming Distance

A

Returns the number of mismatches, max = length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Syntactic e.g. Edit Distance

A

Measures the minimum number of ‘operations’ required to transform one sequence to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Syntactic e.g. Edit Distance - Operations

A
  • Insertion
  • Substitutions
  • Deletion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Semantic

A

Built on top of a hierarchy of word semantics, e.g. WordNet (Princeton)

20
Q

Semantic tool WordNet

A

Contains 117,000 synsets (synset: set of one or more synonyms that are interchangeable in some context)

21
Q

Semantic e.g. WUP ( Wu and Palmer Distance, 1994)

A

WUP finds the path length to the root node from the LCS (Least Common Subsrciber), the value is scaled by the sim of the path lengths from the original concepts to the root.

22
Q

Semantic e.g. WUP Equation

A

WUP(C1,C2) = (2N3)/(N1+N2+(2N3))

23
Q

WordNet Relationships - Hyponomy

A

(is-a relationship) e.g. furniture -> bed

24
Q

WordNet Relationships - Meronymy

A

(part-of relationship) e.g. chair -> seat

25
Q

WordNet Relationships - Troponymy

A

[for verb hierarchies] (specific manner) e.g. communicate -> talk -> whisper

26
Q

WordNet Relationships - Antonymy

A

(strong contact) e.g. wet dry

27
Q

Mean - 1D

A

µ = 1/N(Σx(i))

28
Q

Variance(Spread) - 1D

A

σ^2 = 1/(N-1)(Σ(x(i)-µ)^2)

29
Q

Standard Deviation - 1D

A

σ = (1/(N-1)(Σ(x(i)-µ)^2))^(1/2)

30
Q

Mean - Multi-Dimensional

A

Calculated independently for each dimension

31
Q

Variance - Multi-Dimensional

A

Computed along each dimension using a Covariance Matrix

32
Q

Covariance Matrix

A
  • Variances on the diagonal

- Also measures correlation

33
Q

Covariance Matrix (Correlation)

A
  • Positive Covariance, means a proportional relationship between the variables
  • Negative Covariance, indicates an inverse proportional relationship
34
Q

Eigenvectors and Eigenvalues

A

Av = λv

  • v -> EigenVector
  • λ -> EigenValue
35
Q

Characteristic Equation

A

|A − λI| = 0

  • I -> Identity Matrix
  • A -> Determinant of the Matrix
36
Q

Determinant of a Matrix

A

|A| = ad − bc

37
Q

Eigenvectors define principle axis

A
  • Major Axis: eigenvector corresponding to larger Eigenvalue
  • Minor Axis: eigenvector corresponding to smaller Eigenvalue
  • Represented using major and minor axis of ellipses
38
Q

Data Normalisation Methods

A
  • Rescaling
  • Standardisation (aka z-score)
  • Scaling to unit length
39
Q

(Data Normalisation) Rescaling

A

x’ = ( x- min(x) / max(x)-min(x) )

40
Q

(Data Normalisation) Standardisation

A

x’ = (x-µ) / σ

41
Q

(Data Normalisation) Scaling to Unit Length

A

x’ = x / ||x||

42
Q

Data Outliers

A

small number of points with values significantly different from that other points, not always easy to remove

43
Q

Median

A
  • Difficult, median of two sets cannot be defined in terms of the individual medians
44
Q

Note - Sample Variance Vs. Variance

A

Only estimates, (N-1) gives unbiased estimate of the variance

45
Q

Normal Distribution

A

N(µ, σ^2)

46
Q

Normal Distribution - Standard Deviation

A
  • 1sd - 68%
  • 2sd - 95%
  • 3sd - 99.9%
47
Q

Normal Distribution - Multi-Dimensional

A

Normal Distribution - Multi-Dimensional