Exam 1 Flashcards

1
Q

how does the conda package manager help

A

it automatically handles dependencies for you

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how does the conda environment manager help

A

It allows us to make isolated environments with different dependencies for different projects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

two things data consists of

A

objects and attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Properties of attributes

A
  1. Distinctness
  2. Order
  3. Meaningful differences
  4. Meaningful ratios
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Names of Types of attributes in order

A
  1. Nominal
  2. Ordinal
  3. Interval
  4. Ratio
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Characteristics of a dataset

A

Dimensionality – Number of attributes in a dataset.
Size – Number of objects (rows).
Sparsity – How many values are missing or zero.
Resolution – Level of detail in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

issues with data

A

Noise – Random errors or meaningless variations.
Outliers – Values that are very different from the rest.
Missing values – Data that isn’t recorded.
Duplicate data – Repeated entries in the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mapping distance measures

A

Euclidean distance – Used for continuous variables.
Manhattan distance – Used for grid-based movement (taxi car in manhattan).
Chebyshev distance – Used in chess for king’s moves.
Mahalanobis distance – Used in statistics.
SMC (Simple Matching Coefficient) – Used for binary attributes.
Jaccard distance – Measures similarity of sets (e.g., common words in two documents).
Correlation – Measures relationships between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

minkowski distance

A

A general formula for distance calculations that includes multiple types:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

define machine learning

A

Machine learning is the process of making computers learn patterns from data without being explicitly programmed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

General Strategy of Machine Learning

A

Collect data.
Train a model using data.
Evaluate performance.
Use the model for predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

supervised vs unsupervised

A

supervised Learning – Has labeled data (e.g., spam detection).
Unsupervised Learning – No labels; finds patterns (e.g., clustering customers).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Classification vs. Regression

A

Classification – Predicts categories (e.g., cat vs. dog).
Regression – Predicts continuous values (e.g., stock prices).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain decision trees

A

A model that makes decisions by splitting data based on attributes.
Used for classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

explain support vector machines(SVM)

A

A model that finds the best boundary between different classes.
Used in image recognition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

explain Clustering

A

Groups similar objects together.
Example: Finding customer segments in a business.