Flashcards for exam
Linear Regression (y = mx + c)
It’s a basic form of regression analysis. ‘m’ represents the slope of the line, indicating how much ‘y’ changes for a unit change in ‘x’. ‘c’ is the y-intercept, showing where the line crosses the y-axis.
Residue (ith value)
The difference between the observed value (yi) and the predicted value (ŷi). It’s a measure of the error in predictions.
Unsupervised Learning Examples
Clustering (like K-means), Association (like Apriori algorithm), and Dimensionality Reduction (like PCA).
Applications of Computer Vision
Object detection, facial recognition, medical image analysis, autonomous vehicles, and surveillance.
Logistic Regression (Sigmoid Curve)
Used for binary classification problems. The sigmoid function outputs a value between 0 and 1, representing the probability of a particular class.
Examples of Unstructured Data
Text files, images, videos, social media posts, and emails.
Range for Classification
In binary classification, the output is typically in the range of 0 to 1, indicating the probability of belonging to a certain class.
Image Extraction
Involves processing and analyzing images to derive meaningful information from them.
Relationship Between AI and ML
Machine Learning is a subset of Artificial Intelligence. AI is a broader concept of machines being able to carry out tasks in a smart way, while ML is a current application of AI based on the idea that we should be able to give machines access to data and let them learn for themselves.
Steps in Machine Learning
Typically include data collection, data preprocessing, model selection, training the model, model evaluation, and model tuning/deployment.
Data Collection for Machine Learning
This is often referred to as ‘Data Mining’ or ‘Data Gathering’.
Improving Facial Recognition Accuracy
Techniques include using more diverse datasets, applying robust algorithms, and incorporating 3D facial recognition technologies.
Turing Test
A test of a machine’s ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human.
Regression Models
Examples include Linear Regression, Logistic Regression, and Polynomial Regression.
Classification Model
A type of model that is used to separate data into different classes. This can be binary classification (like spam detection) or multi-class classification (like image categorization).
Linear Function vs. Logistic Regression
Linear functions are used for predicting continuous values(relationship between height and weight), whereas logistic regression is used for binary classification (yes or no). The choice depends on the nature of the problem (regression vs classification).
Supervised Machine Learning
Involves training a model on a labeled dataset. Steps include data collection, data cleaning, choosing a model, training the model, and evaluating its performance.
Process of Classification:
It involves taking some kind of input, processing it, and categorizing it into a certain class or category.
Confusion Matrix Terms:
True Positive: Correctly predicted positive class.
True Negative: Correctly predicted negative class.
False Positive (Type I Error): Incorrectly predicted positive class.
False Negative (Type II Error): Incorrectly predicted negative class.
Confusion Matrix and Accuracy:
A tool to measure the performance of a classification model. Accuracy is calculated as (True Positives + True Negatives) / Total number of samples.
ML for Natural Language Processing (NLP):
Involves applying ML techniques to understand and manipulate human language.
Syntax and Semantics in NLP
Syntax refers to the arrangement of words in a sentence to make grammatical sense. Semantics refers to the meaning conveyed by a text.
Statistical Machine Translation
An approach to machine translation that uses statistical models based on bilingual text corpora.
Linear Regression Interpretation on a Graph:
Part 1 (Data Cleaning): It’s the process of correcting or removing inaccurate records from a dataset, improving its quality.
Part 2 (Line Interpretation): In y = mx + c, ‘m’ represents the slope (rise over run), and ‘c’ the y-intercept. The line of best fit minimizes the sum of the squares of the vertical distances of the points from the line.
Point Residual:
The difference between an observed value and the value predicted by a model.
Mean Absolute Error
Average of the absolute errors between predicted and actual values.
Root Mean Square Error:
Square root of the average of squared differences between prediction and actual observation.
Coefficient of Determination (R²):
Measures how well future outcomes are likely to be predicted by the model.
Relative Squared Error
Sum of the squared differences between the actual and predicted values, normalized by the total variation in the dataset.