ch1 Flashcards
big data
data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them
machine learning
a set of methods thst can automatically detect patterns in data, and then use the uncovered patterns to predict future data, or perform other kinds of decision making under uncertainty
the long tail
property in data sets of a few instances being highly common/likely while other instances are quite rare
this is apparently shared in data generated across a wide variety of domains (why?)
supervised learning
the machine learning task of inferring a function from labeled training data
Formally put, the task is to learn a mapping from inputs x to outputs y, given a labeled set of input-output pairs D = {(Xi, Yi)} for i=1 through N
Here, D is the training set, and N is the number of training examples.
features, attributes. covariates
input-output pairs that make up the training set in a supervised learning problem
response variable
output as a function of input in a supervised lesrning problem
categorical/nominal variable
a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property
classification vs regression
When the response variable is categorical (Ie takes on a finite range of values), supervised learning is called classification. When the response variable is resl-valued, the problem is known as regression.
ordinal regression
supervised learning when the response variable occurs over a finite range of values but they have some natural ordering (eg grades A-F)
descriptive/unsupervised learning, or knowledge discovery
learning where only inputs D - {Xi} over i[1 to N are provided and the goal is to find “interesting patterns” in the data.
A much less well-defined problem, since we aren’t told what kinds of patterns to look for and no obvious error metric exists
reinforcement learning
learning a decision based on occasional reward or punishment signals
binary classification
classification between two classes
multiclass classificstion
classifcation between more than two classes
multi-label classifcation
classification where class labels aren’t mutually exclusive
often best viewed as predicting multiple related binsry class labels from the same inout (multiple output model)
multiple output model
model of multi-label classification treating problem as involving prediction of multiple related binary class labels
function approximation
formalization of classification w/ assumption y = f(x) for some unknown function f and goal to estimate the function f given a labeled training set and predict using y^ = f^(x)
maximum a posteriori probability (MAP) estimate
an estimate of an unknown quantity, that equals the mode of the posterior distribution
posterior probability
the conditional probability that is assigned after the relevant evidence or background is taken into account.
posterior probability distribution
the probability distribution of an unknown quantity, treated as a random variable, conditional on the evidence obtained from an experiment or survey
central idea of bayesian statistics
probability is orderly opinion, and that inference from data is nothing other than the revision of such opinion in the light of relevant new information.
bag-of-words / vector space model
a simplifying representation used in natural language processing and information retrieval (IR). Also known as vector space model[1]. In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity.
Documents and queries are represented as vectors. Each dimension corresponds to a separate term. If a term occurs in the document, its value in the vector is non-zero.
tf–idf or TFIDF, short for term frequency–inverse document frequency
often used as a weighting factor in searches of information retrieval, text mining, and user modeling. The tf-idf value increases proportionally to the number of times a word appears in the document, but is often offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.