Intro Flashcards
Machine learning?
Machine learning is programming a computer to optimize criterion using sample data or past experience.
Machine learning is??
“A computer program is said to learn from experience E w.r.t some task T and some performance evaluation measures P, if its performance P on a task T improves with experience E.”
Suppose your email program watches which email you do or do not mark as spam, and based on that learns how to better filter spam. What is task T in this setting?
1) Classifying email as spam or not.
2) Watching you label email as spam or not.
3) The number of fractions correctly marked as spam or not.
Before ML, applications:-
1) Data Cleaning
2) Data Pre-processing
3) Data transformation/Normalization
4) Data Integration
5) Multidimensional data issues
Data Cleaning
Data cleansing or data cleaning is the process of detecting and correcting corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.
Data Pre-processing
Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues. Data preprocessing prepares raw data for further processing.
examples:Data preprocessing is used database-driven applications such as customer relationship management and rule-based applications (like neural networks).
Data transformation/Normalization
Data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integration and data management tasks such as data wrangling, data warehousing, data integration and application integration.
Data Integration
Data integration is the process of combining data from different sources into a single, unified view. Integration begins with the ingestion process, and includes steps such as cleansing, ETL mapping, and transformation.
Multidimensional data issues
In statistics, econometrics, and related fields, multidimensional analysis (MDA) is a data analysis process that groups data into two categories: data dimensions and measurements. For example, a data set consisting of the number of wins for a single football team at each of several years is a single-dimensional (in this case, longitudinal) data set. A data set consisting of the number of wins for several football teams in a single year is also a single-dimensional (in this case, cross-sectional) data set. A data set consisting of the number of wins for several football teams over several years is a two-dimensional data set.
Association rule mining
Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness
basket analysis
Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items.
method of association rule mining
aproiri algorithm
frequent patterns
What items are frequently purchased together in your SaveMart?
A typical association rule
P(Milk | Bread) = 0.7
types of learning
1) supersvised
2) unsupervised
supersvised
Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples.