Exam 1 Flashcards

Question 1

Q

how does the conda package manager help

Answer

A

it automatically handles dependencies for you

Question 2

Q

how does the conda environment manager help

Answer

A

It allows us to make isolated environments with different dependencies for different projects

Question 3

Q

two things data consists of

Answer

A

objects and attributes

Question 4

Q

Properties of attributes

Answer

A

Distinctness
Order
Meaningful differences
Meaningful ratios

Question 5

Q

Names of Types of attributes in order

Answer

A

Nominal
Ordinal
Interval
Ratio

Question 6

Q

Characteristics of a dataset

Answer

A

Dimensionality – Number of attributes in a dataset.
Size – Number of objects (rows).
Sparsity – How many values are missing or zero.
Resolution – Level of detail in the data.

Question 7

Q

issues with data

Answer

A

Noise – Random errors or meaningless variations.
Outliers – Values that are very different from the rest.
Missing values – Data that isn’t recorded.
Duplicate data – Repeated entries in the dataset.

Question 8

Q

Mapping distance measures

Answer

A

Euclidean distance – Used for continuous variables.
Manhattan distance – Used for grid-based movement (taxi car in manhattan).
Chebyshev distance – Used in chess for king’s moves.
Mahalanobis distance – Used in statistics.
SMC (Simple Matching Coefficient) – Used for binary attributes.
Jaccard distance – Measures similarity of sets (e.g., common words in two documents).
Correlation – Measures relationships between variables.

Question 9

Q

minkowski distance

Answer

A

A general formula for distance calculations that includes multiple types:

Question 10

Q

define machine learning

Answer

A

Machine learning is the process of making computers learn patterns from data without being explicitly programmed.

Question 11

Q

General Strategy of Machine Learning

Answer

A

Collect data.
Train a model using data.
Evaluate performance.
Use the model for predictions.

Question 12

Q

supervised vs unsupervised

Answer

A

supervised Learning – Has labeled data (e.g., spam detection).
Unsupervised Learning – No labels; finds patterns (e.g., clustering customers).

Question 13

Q

Classification vs. Regression

Answer

A

Classification – Predicts categories (e.g., cat vs. dog).
Regression – Predicts continuous values (e.g., stock prices).

Question 14

Q

Explain decision trees

Answer

A

A model that makes decisions by splitting data based on attributes.
Used for classification.

Question 15

Q

explain support vector machines(SVM)

Answer

A

A model that finds the best boundary between different classes.
Used in image recognition.

Question 16

Q

explain Clustering

Answer

Study These Flashcards

A

Groups similar objects together.
Example: Finding customer segments in a business.

Exam 1 Flashcards

(16 cards)