Intro Flashcards by FA17-BSE-095 HANIA AHMER

Machine learning?

Machine learning is programming a computer to optimize criterion using sample data or past experience.

How well did you know this?

Not at all

Perfectly

Machine learning is??

“A computer program is said to learn from experience E w.r.t some task T and some performance evaluation measures P, if its performance P on a task T improves with experience E.”

How well did you know this?

Not at all

Perfectly

Suppose your email program watches which email you do or do not mark as spam, and based on that learns how to better filter spam. What is task T in this setting?

1) Classifying email as spam or not.
2) Watching you label email as spam or not.
3) The number of fractions correctly marked as spam or not.

How well did you know this?

Not at all

Perfectly

Before ML, applications:-

1) Data Cleaning
2) Data Pre-processing
3) Data transformation/Normalization
4) Data Integration
5) Multidimensional data issues

How well did you know this?

Not at all

Perfectly

Data Cleaning

Data cleansing or data cleaning is the process of detecting and correcting corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.

How well did you know this?

Not at all

Perfectly

Data Pre-processing

Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues. Data preprocessing prepares raw data for further processing.

examples:Data preprocessing is used database-driven applications such as customer relationship management and rule-based applications (like neural networks).

How well did you know this?

Not at all

Perfectly

Data transformation/Normalization

Data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integration and data management tasks such as data wrangling, data warehousing, data integration and application integration.

How well did you know this?

Not at all

Perfectly

Data Integration

Data integration is the process of combining data from different sources into a single, unified view. Integration begins with the ingestion process, and includes steps such as cleansing, ETL mapping, and transformation.

How well did you know this?

Not at all

Perfectly

Multidimensional data issues

In statistics, econometrics, and related fields, multidimensional analysis (MDA) is a data analysis process that groups data into two categories: data dimensions and measurements. For example, a data set consisting of the number of wins for a single football team at each of several years is a single-dimensional (in this case, longitudinal) data set. A data set consisting of the number of wins for several football teams in a single year is also a single-dimensional (in this case, cross-sectional) data set. A data set consisting of the number of wins for several football teams over several years is a two-dimensional data set.

How well did you know this?

Not at all

Perfectly

Association rule mining

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness

How well did you know this?

Not at all

Perfectly

basket analysis

Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items.

How well did you know this?

Not at all

Perfectly

method of association rule mining

aproiri algorithm

How well did you know this?

Not at all

Perfectly

frequent patterns

What items are frequently purchased together in your SaveMart?
A typical association rule
P(Milk | Bread) = 0.7

How well did you know this?

Not at all

Perfectly

types of learning

1) supersvised

2) unsupervised

How well did you know this?

Not at all

Perfectly

supersvised

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples.

How well did you know this?

Not at all

Perfectly

unsupervised learning

Study These Flashcards

Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data.

Classification and Label Prediction

Study These Flashcards

Construct models based on some training examples
Describe and distinguish classes or concepts for future prediction
Predict some unknown class labels

Typical Methods

of classification

Study These Flashcards

Decision trees, Naïve Bayes, SVM, Neural Networks

Applications of classifications

Study These Flashcards

Sentiment Classification
Spam Classification
Disease Prediction
Match result prediction

Credit Card Approval

Study These Flashcards

example of classification.
A credit card company receives thousands of applications for new cards. Each application contains information about applicant,
Age
Marital status
Annual salary
Outstanding debts
Credit rating 
etc.
Problem: to decide whether an application should approved or not.

Loan Application Example

Study These Flashcards

classification example

Regression

Study These Flashcards

Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task. Regression models a target prediction value based on independent variables. It is mostly used for finding out the relationship between variables and forecasting. Different regression models differ based on – the kind of relationship between dependent and independent variables, they are considering and the number of independent variables being used.

regression example

Study These Flashcards

Predicting price of house when area is given.

Clustering

Study These Flashcards

Unsupervised learning
Class label is unknown
Group data to form clusters

Clustering principle

Maximize inter-class difference and minimize intra-class difference

Methods | of clustering

K-Means Clustering | Cobweb

Applications | of clustering

Document clustering | Groups or similar people finding

Outlier Detection

In data mining, anomaly detection is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data

Outlier analysis

Also known as Anomaly Detection | Outlier is a data object that does not comply with the general behavior of the data.

Applications | of outlier detection

Rare event analysis | Fraud detection

Why Machine Learning?

The Explosive growth of data: from terabytes to petabytes | TB = 1024 GB, PB = 1024 TB

Why Machine Learning?Major source of data

Business: web, e-commerce, transactions, stocks … Science : Remote sensing, bio-informatics …… Society : news, YouTube, digital cameras

process a typical view of ML

``` INPUT ->> DATA PREPROCESSING->> MACHINE LEARNING->> POST PROCESSING->> PATTERN INFORMATION KNOWLEDGE ```

DATA PREPROCESSING

1) DATA INTEGRATION 2) NORMALIZATION 3) FEATURE SELECTION 4) DIMENSION REDUCTION

MACHINE LEARNING

1) PATTERN DISCOVERY 2) ASSOCIATION AND CORRELATION 3) CLASSIFICATION 4) CLUSTERING 4) OUTLIER ANALYSIS

POST PROCESSING

1) PATTERN EVALUATION 2) PATTERN SELECTION 3) PATTERN INTERPRETATION 4) PATTERN VISUALIZATION

Intro Flashcards

(36 cards)