Discrete&Continuous Data Flashcards

1
Q

Types of Attribute

A

Continuous
Ordinal
Nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Instances aren’t labelled

A

Unsupervised ML

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Not enough instances are labelled

A

Semi-supervised ML

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

instances are all labelled

A

Supervised ML

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

instances are ordered

A

sequence learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

nominal learners

A

NB
1-R
DT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

continuous learners

A

KNN
NP
SVM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Nominal Attributes, but Numeric Learner

A

(1) For k-NN and NP: Hamming distance
(2) randomly assign numbers to attribute values
• If scale is constant between attributes, this is not as bad an idea as it sounds! (But still undesirable)
• Worse with higher-arity attributes (more attribute values)
• Imposes an attribute ordering which may not exist
(3) one–hot encoding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Hamming distances

A

the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

one–hot encoding

A

If nominal attribute takes m values, replace it with m
Boolean attributes

Example:
hot = [1, 0, 0]
mild = [0, 1, 0]
cool = [0, 0, 1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Pros & Cons of one-hot encoding

A

Pro: solve the nominal attribute in continuous learner issue

Con: massively increase the feature space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Numeric Attributes, but Nominal Learner

A

(1) NB
(2) DT
(3) 1-R

Discretization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Types of Naive Bayes

A

• Multivariate NB: attributes are nominal, and can take any
(fixed) number of values

• Binomal (or Bernoulli) NB: attributes are binary (special
case of MV)

• Multinomial NB: attributes are natural numbers,
corresponding to frequencies

• Gaussian NB: attributes are real numbers, use Probability Density Function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

numeric attributes for DT

A

(1) Binarisation

(2) Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Binarisation

A

Each node is labelled with ak , and has two branches: one
branch is ak ≤ m, one branch is ak > m.

Info Gain/Gain Ratio must be calculated for each non-trivial “split point” for each attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Con of Binarisation

A

leads to arbitrarily large trees

17
Q

Discretisation

A

the translation of continuous attributes onto nominal attributes

Steps:

  1. decide on the interval (out-of-scope)
  2. map each continuous value onto a discrete value

Types:

  1. Unsupervised (does not know/use the class label)
  2. Supervised (know/use the class label)
18
Q

Unsupervised Discretisation

A

(1) Naive
(2) Equal Size
(3) Equal Frequency
(4) K-Means Clustering

19
Q

Naive Unsupervised Discretisation

A

treat each unique value as a discrete nominal value

20
Q

Pros & Cons of Naive Unsupervised Discretisation

A

Advantages:
• simple to implement

Disadvantages:
• loss of generality
• no sense of ordering
• describes the training data, but nothing more (overfitting)

21
Q

Equal Size Unsupervised Discretisation

A

Identify the upper and lower bounds and partition the overall space into n equal intervals = equal width

min = 64
max = 83
interval: 64-70, 71-75; 80-83;

22
Q

Pros & Cons of Equal Size Unsupervised Discretisation

A

Advantages:
• simple

Disadvantages:
• badly effected by outliers
• arbitrary n

23
Q

Equal Frequency Unsupervised Discretisation

A
sort the values, and identify breakpoints which
produce n (roughly) equal-sized partitions = equal
frequency

1st bin: 1st-4th instances
2nd bin: 5th-8th instances

24
Q

Pros & Cons of Equal Frequency Unsupervised Discretisation

A

Advantages:
• simple

Disadvantages:
• arbitrary n

25
K-Means Clustering
(1) Select k points at random (or otherwise) to act as seed clusters (2) Assign each instance to the cluster with the nearest centroid (3) Compute seed points as the centroids of the clusters of the current partition (the centroid is the centre, i.e., mean point, of the cluster) (4) Go back to 2, stop when no reassignments (converge) It may or may not converge at the end but fast converge is fairly typical One typical improvement runs k-means multiple times (with random seeds), looking for a common clustering and simply ignore runs which don’t converge within τ iterations
26
Pros & Cons of K-Means Clustering
Strengths: • relatively efficient: O(tkn), where n is # instances, k is # clusters, and t is # iterations; normally k,t <= n Weaknesses: • tends to converge to local minimum; sensitive to seed instances • need to specify k in advance • not able to handle non-convex clusters • “mean” ill-defined for nominal attributes
27
Supervised Discretisation
(1) Naive (2) v1 improvement (3) v2 improvement
28
Naive Supervised Discretisation
“Group” values into class-contiguous intervals Steps: 1. Sort the values, and identify breakpoints in class membership 2. Reposition any breakpoints where there is no change in numeric value 3. Set the breakpoints midway between the neighbouring values
29
Pros & Cons of Naive Supervised Discretisation
Advantages: • simple to implement Disadvantages: • no sense of ordering • usually creates too many categories (overfitting)
30
Improvement on Naive Supervised Discretisation
v1: delay inserting a breakpoint until each “cluster” contains at least n instances of the majority class v2: merge neighbouring clusters until they reach a certain size/at least n instances of the majority class
31
Probability Mass Functions (PMF)
For a discrete random variable X that takes on a finite or countably infinite number of possible values, we determined P(X = x) for all of the possible values of X, and called it the probability mass function.
32
Probability Density Function (PDF)
For continuous random variables, the probability that X takes on any particular value x is 0. That is, finding P(X = x) for a continuous random variable X is not going to work. Instead, we'll need to find the probability that X falls in some interval (a, b), that is, we'll need to find P(a < X < b). We'll do that using a probability density function
33
a popular PDF
Gaussian/normal distribution
34
Gaussian distribution
* symmetric about the mean * area under the curve = 1 * to estimate the probability, we need mean µ and standard deviation σ of a distribution X
35
Why Gaussians?
• In practice, a normal distribution is a reasonable approximation for many events * This is a consequence of the Central Limit Theorem * More careful analysis shows that the mean is almost always normally distributed, but outliers can wreak havoc on our probability estimates