EDA Flashcards
Exploratory Data Analysis
What is the relationship between dependend and independ variable in Machine Learning task
For any machine learning algorithm , we will be using independent variable to predict the dependent variable
Input and output variable?
Statistic perspective
Those columns that are the inputs are referred to as input variables.
Whereas the column of data that you may not always have and that you would like to predict for new input data in the future is called the output variable. It is also called the response variable.
Computer science perspective
A row often describes an entity (like a person) or an observation about an entity. As such, the columns for a row are often referred to as attributes of the observation. When modeling a problem and making predictions, we may refer to input attributes and output attributes.
output attribute = program(input attributes)
Another name for columns is features, used for the same reason as attribute, where a feature describes some property of the observation. This is more common when working with data where features must be extracted from the raw data in order to construct an observation.
Another computer science phrasing is that for a row of data or an observation as an instance. This is used because a row may be considered a single example or single instance of data observed or generated by the problem domain.
prediction = program(instance)
What is input vector?
Typically, you have more than one input variable. In this case the group of input variables are referred to as the input vector.
Dependent and Indepenede and relation to input and output?
For example, a statistics text may talk about the input variables as independent variables and the output variable as the dependent variable. This is because in the phrasing of the prediction problem the output is dependent or a function of the input or independent variables.
dependent variable = f(independent variables)
he data is described using a short hand in equations and descriptions of machine learning algorithms. The standard shorthand used in the statistical perspective is to refer to the input variables as capital “x” (X) and the output variables as capital “y” (Y). Y = f(X)
Models and Algorithm
There is one final note of clarification that is important and that is between algorithms and models.
This can be confusing as both algorithm and model can be used interchangeably.
A perspective that I like is to think of the model as the specific representation learned from data and the algorithm as the process for learning it.
model = algorithm(data)
For example, a decision tree or a set of coefficients are a model and the C5.0 and Least Squares Linear Regression are algorithms to learn those respective models.
In Python and R , for machine learning do u need to import libraries?
No, in R u dont need to Import libraries while in Pyhon u need to import pands , scikilt learn etc