AI Flashcards
What are the 2 types of data ?
Numerical Data and Categorical Data.
What kind of value does Numerical or Continuous data accept ?
Can accept any value within a finite or infinite interval (e.g., height, weight, temperature, blood glucose, …).
What are the 2 types of Numerical or Continuous data ?
Interval and ratio.
Describe data on an interval scale.
Can be added and subtracted but cannot be meaningfully multiplied or divided because there is no true zero. For example, we cannot say that one day is twice as hot as another day.
Describe data on a ratio scale.
Has true zero and can be added, subtracted, multiplied or divided (e.g., weight).
Categorical or Discrete variable is the one that has ….. .
two or more categories (values).
What are the 2 types of categorical variables ?
Nominal and ordinal.
Describe Nominal variables.
Has no intrinsic ordering to its categories. For example, gender is a categorical variable having two categories (male and female) with no intrinsic ordering to the categories.
Describe Ordinal variables.
Has a clear ordering. For example, temperature as a variable with three orderly categories (low, medium and high).
What is a frequency table ?
Is a way of counting how often each category of the variable in question occurs. It may be enhanced by the addition of percentages that fall into each category.
What is Encoding or continuization ?
Is the transformation of categorical variables to binary or numerical counterparts. An example is to treat male or female for gender as 1 or 0. Categorical variables must be encoded in many modeling methods (e.g., linear regression, SVM, neural networks).
What are the 2 types of encoding ?
Binary and Target-based.
What is Binning or discretization ?
Is the process of transforming numerical variables into categorical counterparts.
An example is to bin values for Age into categories such as 20-39, 40-59, and 60-79.
Numerical variables are usually discretized in the modeling methods based on ….. .
frequency tables (e.g., decision trees).
Binning may improve accuracy of the predictive models by ….. or ….. .
reducing the noise, non-linearity.
What is a Dataset ?
Is a collection of data, usually presented in a tabular form. Each column represents a particular variable, and each row corresponds to a given member of the data.
Alternatives for columns: ….., ….., ….. .
Fields, Attributes, Variables.
Alternatives for rows: ….., ….., ….., ….., ….., ….. .
Records, Objects, Cases, Instances, Examples, Vectors.
Alternatives for values: ….. .
Data.
In predictive modeling, ….. or ….. are the input variables
predictors, attributes.
In predictive modeling, ….. or ….. is the output variable
target, class attribute.
In predictive modeling, the output variable value is determined by ….. and ….. .
the values of the predictors, function of the predictive model.
Pattern recognition predicts the future by ….. .
means of modeling.
What is Predictive modeling ?
Is the process by which a model is created to predict an outcome.
If the outcome is categorical it is called ….. .
Classification.
If the outcome is numerical it is called ….. .
Regression.
What is Descriptive modeling or clustering ?
Is the assignment of observations into clusters so that observations in the same cluster are similar.
What is Classification ?
Is a predicting the value of a categorical variable (target or class) by building a model based on one or more numerical and/or categorical variables (predictors or attributes).
What is ZeroR classifier ?
Is the simplest classification method which relies on the target and ignores all predictors.
ZeroR classifier simply predicts the ….. .
Majority category (class).
Although there is no predictability power in ZeroR, it is useful for ….. .
determining a baseline performance as a benchmark for other classification methods.
What is the ZeroR classifier Algorithm ?
Construct a frequency table for the target and select its most frequent value.
What is OneR classifier ?
Short for “One Rule”, is a simple classification algorithm that generates one rule for each predictor in the data, then selects the rule with the smallest total error as its “one rule”.
To create a rule for a predictor, we ….. .
Construct a frequency table for each predictor against the target.
OneR produces rules only slightly less accurate than state-of-the-art classification algorithms while ….. .
producing rules that are simple for humans to interpret.