Machine Learning Flashcards
What is Machine Learning?
Finds patterns in data then uses those patterns to predict the future
What is the Learning?
Identifying patterns and then recognizing those patterns when you see them again
What is the simple Machine Learning workflow?
- Feed data that contains patterns
- Machine learning algorithms find patterns
- This outputs a “model” that recognizes patterns
- Applications can use the model to get probabilities of match
Good machine learning requires what?
- Lots of data
- Lots of computing power
- Effective algorithms
Who cares about Machine Learning?
- Business leaders - want solutions to problems
- Software devs - create better applications
- Data Scientists
What is a Data Scientist?
- Someone familiar with statistics
- Machine learning software (and ability to code it)
- Some problem domain (ideally)
Who are some Machine Learning vendors?
- SAS Analytics
- RapidMiner Studio
- Alteryx Analytics
And “megavendors” and “cloud”
- IBM, SAP, Oracle, MS Azure, Amazon
What is R?
- open source programming language and environment for machine learning
- very popular and many available packages
- been around a long time, since 90s
- most popular
What are the 3 tenants of Machine Learning?
- Are you asking the right question?
- Do you have the data you need to answer that question?
- How do you measure success? How do you know when you are done?
What is the Machine Learning pipeline?
- Raw data w/ someone w/ domain knowledge
- Pre-processing data into prepared data often many iterations until you are ready
- Apply learning alogorithm(s)
- Get Candidate Model and iterate to find best model
- Deploy the model
- Recreate model regularly based on new data and changing world
What is Training Data?
- Prepared data used to create a model
- Creating a model is “training” a model
What is Supervised vs Unsupervised Learning?
- Supervised - the value you want to predict is in the data, the data is “labeled”
- Un-supervised - the value you want is not in data, data is “not labeled”
- Supervised is most common
What are Features in Machine Learning?
- “Features” are basically columns of the data
What are categories of Machine Learning problems?
- Regression (supervised) - (how many will I sell next month)
- Classification (supervised) - (is CC trans fraud?) - returns probability
- Clustering (unsupervised) - what are our customer segments?
What are types of algorithms?
- Decision tree
- Neural network
- Bayesian
- K-Means (for clustering)