Definitions Flashcards
Data Science is the art of turning data into actions.
Combines: Domain Expertise, Statistics, and Computing Skills
Flows back and forth between deductive and inductive reasoning
Relatively new discipline in which methodologies and frameworks are still being solidified
Inter-related concepts
of Data Science
Analytics, Business Analytics, Data Science, Business Intelligence, Data Analytics, Big Data, Statistical Learning.
Deductive Reasoning
Theory Driven, Hypothesis —-> To Analytics.
Inductive Reasoning
Empirically Drive, Analytics —–> Hypothesis
Big Data:
Data in which the volume, variety, or velocity of information prohibits analysis via conventional desktop or server scale tools.
Distributed Processing (or computing):
A solution to the big data problem. Platforms which allow the power of individual machines to be simultaneously utilized to solve big data problems (e.g. Hadoop)
Machine Learning:
Most closely associated with Inductive reasoning. Algorithms that allow computers to learn from data without explicit instructions from the operator.
Supervised Learning:
Machine learning in which the outcome is defined by the operator. Can think of predicting outcomes.
Unsupervised Learning:
Machine learning in which the outcome is not defined. Can think of classifying observations or dimensions.
Regression:
A class of problems in which the objective is to predict the value of an outcome.
Classification:
A class of problems in which the objective is to predict which group or “class” of an observation is likely to belong to.
Parametric Techniques:
Techniques in which there are specific assumptions about the nature and/or shape of relationships between variables. E.g. in linear regression the slope of a line is being fit.
Non-parametric Techniques:
Techniques in which there are not specific assumptions about the nature and/or shape of the relationships between variables. E.g. decision trees.
Un-Structured Data:
Data that has no easily identified structure (e.g. free-form text responses)
Types of Analytics
Descriptive Analytics: What is or has been?
Predictive Analytics: What is likely to happen?
Prescriptive Analytics: What should you do?