Data Science Essentials Flashcards
What is a Random Variable?
A random variable assigns a numerical value to each possible outcome of a random experiment.
What are the 5 Vs of Big Data?
- Velocity
- Veracity
- Variability
- Volume
- Value
What does machine learning include?
Machine Learning is a computing technique that has its origins in artificial intelligence (AI) and statistics. Machine Learning solutions include:
- Classification - Predicting a Boolean true/false value for an entity with a given set of features.
- Regression - Predicting a real numeric value for an entity with a given set of features.
- Clustering - Grouping entities with similar features.
What does the 5 number Summary Statistic contain?
- Min
- Max
- Q1
- Q2
- Q3
Python Merge Data Frames….Good Examples Link
http://chrisalbon.com/python/pandas_join_merge_dataframe.html
What is one of the first steps of machine learning?
Now in general, the first step in machine learning is to figure out how to represent your data as a vector.
CRISP-DM Process?
See Image
What does summary statistics generally contain?
Summary statistics generally include the mean, the median and quartiles of the data. This gives you a first quick look at the distribution of data values.
What is the benefit of a scatter plot matrix?
Scatter plot matrix methods quickly produce a single overall view of the relationships in a dataset.
The scatter plot matrix allows you to examine the relationships between many variables in one view.
The data science process includes the following activities:
- Data selection.
- Preprocessing.
- Transformation.
- Data Mining.
- Interpretation and evaluation.
What is a discrete random variable?
A discrete random variable has a number of
outcomes that you could count.
What are some aspects of Data Analytic Thinking?
- replace intuition with data driven analytical decisions.
- Transform raw data to valuable asset
- Increase pace of action
WHAT IS DATA SCIENCE?
Data Science is the exploration and quantitative analysis of all available structured and unstructured data to develop understanding, extract knowledge, and formulate actionable results.
What is a continuous variable?
A continuous variable is a variable that has an infinite number of possible values. In other words, any value is possible for the variable. A continuous variable is the opposite of a discrete variable, which can only take on a certain number of values.
Types of Machine Learning algorithms?
- Linear Regression
- Logistic Regression
- Decision Tree
- SVM
- Naive Bayes
- KNN
- K-Means
- Random Forest
- Dimensionality Reduction Algorithms
- Gradient Boost & Adaboost