Intro Flashcards

1
Q

Machine learning?

A

Machine learning is programming a computer to optimize criterion using sample data or past experience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Machine learning is??

A

“A computer program is said to learn from experience E w.r.t some task T and some performance evaluation measures P, if its performance P on a task T improves with experience E.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Suppose your email program watches which email you do or do not mark as spam, and based on that learns how to better filter spam. What is task T in this setting?

A

1) Classifying email as spam or not.
2) Watching you label email as spam or not.
3) The number of fractions correctly marked as spam or not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Before ML, applications:-

A

1) Data Cleaning
2) Data Pre-processing
3) Data transformation/Normalization
4) Data Integration
5) Multidimensional data issues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Cleaning

A

Data cleansing or data cleaning is the process of detecting and correcting corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Pre-processing

A

Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues. Data preprocessing prepares raw data for further processing.

examples:Data preprocessing is used database-driven applications such as customer relationship management and rule-based applications (like neural networks).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data transformation/Normalization

A

Data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integration and data management tasks such as data wrangling, data warehousing, data integration and application integration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data Integration

A

Data integration is the process of combining data from different sources into a single, unified view. Integration begins with the ingestion process, and includes steps such as cleansing, ETL mapping, and transformation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Multidimensional data issues

A

In statistics, econometrics, and related fields, multidimensional analysis (MDA) is a data analysis process that groups data into two categories: data dimensions and measurements. For example, a data set consisting of the number of wins for a single football team at each of several years is a single-dimensional (in this case, longitudinal) data set. A data set consisting of the number of wins for several football teams in a single year is also a single-dimensional (in this case, cross-sectional) data set. A data set consisting of the number of wins for several football teams over several years is a two-dimensional data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Association rule mining

A

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

basket analysis

A

Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

method of association rule mining

A

aproiri algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

frequent patterns

A

What items are frequently purchased together in your SaveMart?
A typical association rule
P(Milk | Bread) = 0.7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

types of learning

A

1) supersvised

2) unsupervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

supersvised

A

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

unsupervised learning

A

Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data.

17
Q

Classification and Label Prediction

A
Construct models based on some training examples
Describe and distinguish classes or concepts for future prediction
Predict some unknown class labels
18
Q

Typical Methods

of classification

A

Decision trees, Naïve Bayes, SVM, Neural Networks

19
Q

Applications of classifications

A

Sentiment Classification
Spam Classification
Disease Prediction
Match result prediction

20
Q

Credit Card Approval

A
example of classification.
A credit card company receives thousands of applications for new cards. Each application contains information about applicant,
Age
Marital status
Annual salary
Outstanding debts
Credit rating 
etc.
Problem: to decide whether an application should approved or not.
21
Q

Loan Application Example

A

classification example

22
Q

Regression

A

Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task. Regression models a target prediction value based on independent variables. It is mostly used for finding out the relationship between variables and forecasting. Different regression models differ based on – the kind of relationship between dependent and independent variables, they are considering and the number of independent variables being used.

23
Q

regression example

A

Predicting price of house when area is given.

24
Q

Clustering

A

Unsupervised learning
Class label is unknown
Group data to form clusters

25
Q

Clustering principle

A

Maximize inter-class difference and minimize intra-class difference

26
Q

Methods

of clustering

A

K-Means Clustering

Cobweb

27
Q

Applications

of clustering

A

Document clustering

Groups or similar people finding

28
Q

Outlier Detection

A

In data mining, anomaly detection is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data

29
Q

Outlier analysis

A

Also known as Anomaly Detection

Outlier is a data object that does not comply with the general behavior of the data.

30
Q

Applications

of outlier detection

A

Rare event analysis

Fraud detection

31
Q

Why Machine Learning?

A

The Explosive growth of data: from terabytes to petabytes

TB = 1024 GB, PB = 1024 TB

32
Q

Why Machine Learning?Major source of data

A

Business: web, e-commerce, transactions, stocks …
Science : Remote sensing, bio-informatics
……
Society : news, YouTube, digital cameras

33
Q

process a typical view of ML

A
INPUT ->>
DATA PREPROCESSING->>
MACHINE LEARNING->>
POST PROCESSING->>
PATTERN INFORMATION KNOWLEDGE
34
Q

DATA PREPROCESSING

A

1) DATA INTEGRATION
2) NORMALIZATION
3) FEATURE SELECTION
4) DIMENSION REDUCTION

35
Q

MACHINE LEARNING

A

1) PATTERN DISCOVERY
2) ASSOCIATION AND CORRELATION
3) CLASSIFICATION
4) CLUSTERING
4) OUTLIER ANALYSIS

36
Q

POST PROCESSING

A

1) PATTERN EVALUATION
2) PATTERN SELECTION
3) PATTERN INTERPRETATION
4) PATTERN VISUALIZATION