Data Science Flashcards
deep learning
subset of machine learning that can learn unsupervised from unstructured data (the deep refers to the depth of a networks layers)
machine learning
subset of AI that automates analytical model building. emphasizes systems can learn from data, identify patterns, and make decisions
artificial intelligence
theory and development of computer systems that normally require human intelligence
3 types of machine learning
supervised learning, unsupervised learning, reinforcement learning
data science
field of study that combines information technology, modeling, and business management to extract insights from data. Machine learning is a subset
describe process of node in neural net deciding to fire?
takes the number passed from each of its connector nodes below it, multiplies each by its weight, and ]fires outgoing connections to nodes above if sum exceeds threshold val
describe neural net training briefly
weights and thresholds are set to random vals. training data fed up from bottom layer (input layer) through successive layers (getting multiplied an added), until arrives transformed in output layers. Weights and thresholds continually adjusted until training data with same labels yield similar outputs
Who first proposed neural nets?
McCullough and Pitts in 1944 at University of Chicago (later went to MIT to start first cognitive sci dept)
neural net (and concerns)
means of doing machine learning, in which a computer learns to perform some task by analyzing training examples. Usually hand labelled in advance. Concerns are model transparency
Rank decision tree, linear regression, random forest in descending interoperability, accuracy
- Linear regression. 2. Decision Tree 3. Random forest
1. Random Forest 2. Decision Tree 3. Linear regression
Examples of open source machine learning libraries
sklearn, tensorsflow, keras, H20
evalML
featureLabs product that finds best model from many popular libraries to use
featuretools
featureLabs product for constructing single table of features from multiple tables
overfitting
capturing noise and patterns that don’t generalize well to unseen data (opposite is underfitting)
supervised learning
uses labelled data and you have and training data with the “correct answer” you’re looking for