Proctor Exam 2- Secondary Data Syndicated and Big Data Flashcards
Machine learning can be applied anywhere there is a
need for quick automatic decisions based on ongoing feedback from patterns in the environment.
Machine learning can be applied anywhere there is a
need for quick automatic decisions based on ongoing feedback from patterns in the environment.
Problems that lend themselves to machine learning:
- The data causes problems for traditional analytic techniques, such as where variables
are highly correlated, data is non-linear, or where there are far more variables than
records (so called “wide and shallow” datasets) - Accuracy is more important than understanding
- Potential outputs are defined, but the action is dependent on conditions which themselves cannot be easily predicted or identified before the event happens.
- Rules and associations might be perceived or deduced, but are not easily described by logical rules
Problems that lend themselves to machine learning:
- The data causes problems for traditional analytic techniques, such as where variables
are highly correlated, data is non-linear, or where there are far more variables than
records (so called “wide and shallow” datasets) - Accuracy is more important than understanding
- Potential outputs are defined, but the action is dependent on conditions which themselves cannot be easily predicted or identified before the event happens.
- Rules and associations might be perceived or deduced, but are not easily described by logical rules
What makes this a good machine-learning problem is that the decisions and
the variables constantly change and the value of one variable and the right decision may depend
on the values of many more variables. Humans instinctively make these assessments, but it is
impossible to discretely list every rule and situation for a computer to look up and evaluate.
What makes this a good machine-learning problem is that the decisions and
the variables constantly change and the value of one variable and the right decision may depend
on the values of many more variables. Humans instinctively make these assessments, but it is
impossible to discretely list every rule and situation for a computer to look up and evaluate.
Machines learn by studying data to detect patterns or by applying known rules (algorithms) to:
Categorize like or unlike people or things
Identify patterns and relationships that were unknown before analysis
Predict likely outcomes or actions based on identified patterns
Detect anomalous or unexpected behaviors
Machines learn by studying data to detect patterns or by applying known rules (algorithms) to:
Categorize like or unlike people or things
Identify patterns and relationships that were unknown before analysis
Predict likely outcomes or actions based on identified patterns
Detect anomalous or unexpected behaviors
Machines learn through essentially an exhaustive process of trial and error, sifting through information, comparing the information to a goal, making adjustments, and trying again
Machines learn through essentially an exhaustive process of trial and error, sifting through information, comparing the information to a goal, making adjustments, and trying again
Within Machine Learning The traditional advanced analytic techniques you will learn about later in this course are not well
suited for the unstructured nature of some big data
Within Machine Learning The traditional advanced analytic techniques you will learn about later in this course are not well
suited for the unstructured nature of some big data
Machine learning, however, takes advantage of a computer’s ability to
follow rules and execute swift comparisons as a fast way to understand patterns and meaning in data.
The algorithms automatically sort data, testing and comparing what it has seen in the past to what it is seeing in the present. The learning may lead to a new understanding of behavior or it might serve as automatic input to an action executed by another computer process.
Machine learning, however, takes advantage of a computer’s ability to
follow rules and execute swift comparisons as a fast way to understand patterns and meaning in
data.
The algorithms automatically sort data, testing and comparing what it has seen in the past to
what it is seeing in the present. The learning may lead to a new understanding of behavior or it might serve as automatic input to an action executed by another computer process.
Supervised learning always has a predetermined outcome provided by the programmer. The machine seeks faster, more efficient, or more accurate ways to meet the goal based on the data and the programmers input.
Supervised learning always has a predetermined outcome provided by the programmer. The machine seeks faster, more efficient, or more accurate ways to meet the goal based on the data and the programmers input.
Supervised Learning
Process
Machine is given pre-classified data and discovers a pattern associated with the classification. As more data becomes available, the machine adjusts its associations and gets better at classifying
Example
Sorting out junk emails from wanted content
Limitations
Only works on one task at a time.
User may not be able to interpret the associations behind the sorting.
Traditional stat tool Regression Classification Decision Trees Random Forests Bayesian statistics
Supervised Learning
Process
Machine is given pre-classified data and discovers a pattern associated with the classification. As more data becomes available, the machine adjusts its associations and gets better at classifying
Example
Sorting out junk emails from wanted content
Limitations
Only works on one task at a time.
User may not be able to interpret the associations behind the sorting.
Traditional stat tool Regression Classification Decision Trees Random Forests Bayesian statistics
Unsupervised Learning
Here the data determines the outcome. The algorithm’s mission is to extract structure from the data, and to present the structure in a way that is useful to us. Data is segmented and scored based on what the computer itself decides is relevant or related.
Unsupervised Learning
Here the data determines the outcome. The algorithm’s mission is to extract structure from the data, and to present the structure in a way that is useful to us. Data is segmented and scored based on what the computer itself decides is relevant or related.
Unsupervised Learning
Process
Machine is given a lot of data and told to hunt for patterns and “clusters” of things related to each other. It draws its own conclusions about relationships.
Example
Recommendation engines
Loyalty card targeting
Limitation
Usually requires human input after the fact
Traditional Statistics Tool Factor Analysis Cluster Analysis Multidimensional Scaling Principle Component
Unsupervised Learning
Process
Machine is given a lot of data and told to hunt for patterns and “clusters” of things related to each other. It draws its own conclusions about relationships.
Example
Recommendation engines
Loyalty card targeting
Limitation
Usually requires human input after the fact
Traditional Statistics Tool Factor Analysis Cluster Analysis Multidimensional Scaling Principle Component
Reinforcement Learning
This type of learning has no supervisor, but instead it has a reward signal that defines success. Similar to human learning, when success is rewarded, the machine tries to learn the patterns that result in receiving the reinforcement signal. The machine’s decisions affect the subsequent data it receives.
Reinforcement Learning
This type of learning has no supervisor, but instead it has a reward signal that defines success. Similar to human learning, when success is rewarded, the machine tries to learn the patterns that result in receiving the reinforcement signal. The machine’s decisions affect the subsequent data it receives.
Reinforcement Learning
Process
Machine not only analyzes data but uses the output to improve efficiency or create new strategies. Learns how to apply a set of rules toward an outcome in the most efficient way.
Example
Game playing bots
War Game Simulations
Limitation
Strategies may not be understandable by humans so may be limited to one situation.
Traditional Statistics Tool
Game Theory
Linear Programming
Reinforcement Learning
Process
Machine not only analyzes data but uses the output to improve efficiency or create new strategies. Learns how to apply a set of rules toward an outcome in the most efficient way.
Example
Game playing bots
War Game Simulations
Limitation
Strategies may not be understandable by humans so may be limited to one situation.
Traditional Statistics Tool
Game Theory
Linear Programming
Unsupervised Learning works well if we have little or limited knowledge of the data. The best examples of this application are so-called targeting engines or recommendation engine . When a
supermarket checkout machine issues you a coupon at checkout
Unsupervised Learning works well if we have little or limited knowledge of the data. The best examples of this application are so-called targeting engines or recommendation engine . When a
supermarket checkout machine issues you a coupon at checkout
The best examples of reinforcement machine learning are machines that play games. Typically the
machine is taught the rules of the game and given a goal to win.
The best examples of reinforcement machine learning are machines that play games. Typically the
machine is taught the rules of the game and given a goal to win.
Big Data is “The record of all interactions with people, institutions, and things recorded and stored digitally.”
Big data, then, is the digital trail left by humans and their connected machines.
Big Data is “The record of all interactions with people, institutions, and things recorded and stored digitally.”
Big data, then, is the digital trail left by humans and their connected machines.
The 7 V’s of Big Data
Volume Velocity Variety Variability Visualization Veracity Value
The 7 V’s of Big Data
Volume Velocity Variety Variability Visualization Veracity Value
Within the Healthcare industry, pharmaceutical syndicated services tracks sales, price, and
distribution of most pharmaceuticals.
Within the Healthcare industry, pharmaceutical syndicated services tracks sales, price, and
distribution of most pharmaceuticals.
The majority of pharmaceutical data comes from patient billing and processing at every point of
the drug supply chain.
The majority of pharmaceutical data comes from patient billing and processing at every point of
the drug supply chain.
Unlike most CPG categories however, the pharmaceutical sales and
distribution chain involves many government regulations, physicians, other providers, and
insurance companies as mediators of sales to the patient as the ultimate consumer. As we have
said, drug products historically have not been easily tracked using industry standard digital codes
such as the UPC .
Unlike most CPG categories however, the pharmaceutical sales and
distribution chain involves many government regulations, physicians, other providers, and
insurance companies as mediators of sales to the patient as the ultimate consumer. As we have
said, drug products historically have not been easily tracked using industry standard digital codes
such as the UPC .
Big Data Researcher Skills
These are the most common skills found on big data analytic teams:
Programming Data Manipulation Exploratory Data Analytics Mathematics Statistics Business Skills domain Expertise People Skills communication Skills
Big Data Researcher Skills
These are the most common skills found on big data analytic teams:
Programming Data Manipulation Exploratory Data Analytics Mathematics Statistics Business Skills domain Expertise People Skills communication Skills
The evolution of big data has produced an entirely new field called data science, an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms.
The evolution of big data has produced an entirely new field called data science, an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms.
In addition to the typical and historical skills of qualitative techniques and traditional statistical skills of quantitative surveys and analytics, modern researchers are concerned with:
Data Curation
Data Governance
Data Provenance
In addition to the typical and historical skills of qualitative techniques and traditional statistical skills of quantitative surveys and analytics, modern researchers are concerned with:
Data Curation
Data Governance
Data Provenance
Data Curation
Right data is assembled for the right question
Data Curation
Right data is assembled for the right question
Data Governance
Data is secure & accurate
Data Governance
Data is secure & accurate
Data Provenance
Data from reputable sources & tracked through all potential uses
Data Provenance
Data from reputable sources & tracked through all potential uses
Machine Learning is also being used to understand the return on investment of marketing itself
( MROI ), that is, measuring how much money is generated by investing in marketing
Machine Learning is also being used to understand the return on investment of marketing itself
( MROI ), that is, measuring how much money is generated by investing in marketing
The datasets
that track marketing spending are large, have more variables than cases, and contain
relationships that are often non-linear mixed with responses that are well behaved and
straightforward.
The datasets
that track marketing spending are large, have more variables than cases, and contain
relationships that are often non-linear mixed with responses that are well behaved and
straightforward.
Prediction and understanding are related but independent goals of market research; It is simply a decision of the business regarding the
problem at hand whether one goal might be favored over the other.
Prediction and understanding are related but independent goals of market research; It is simply a decision of the business regarding the
problem at hand whether one goal might be favored over the other.
striking the right balance between prediction and understanding is still required
striking the right balance between prediction and understanding is still required
The complexity of big data
and the algorithms that read it have increased the frequency with which predictive models take
precedence as more and more marketing occurs in digital environments where automation is
possible and desirable.
The complexity of big data
and the algorithms that read it have increased the frequency with which predictive models take
precedence as more and more marketing occurs in digital environments where automation is
possible and desirable.
data that is collected for any purpose subsequently used in research other than to meet the needs of your
particular study is called “secondary” data.
data that is collected for any purpose subsequently used in research other than to meet the needs of your
particular study is called “secondary” data.
secondary data in some detail:
any purpose other than to meet the needs of your particular study.
non-specific research purposes, called “syndicated” or multi-client
data.
another purpose and subsequently used in research.
secondary data in some detail:
any purpose other than to meet the needs of your particular study.
non-specific research purposes, called “syndicated” or multi-client
data.
another purpose and subsequently used in research.
The second most important question in your research design, just behind the original purpose of
the research, is “What is already known about your research goal?”
The second most important question in your research design, just behind the original purpose of
the research, is “What is already known about your research goal?”
To reiterate and re-emphasize: All market research designs for every research project should
begin with an assessment of what is already known about your research problem. The answer to
that question almost always involves the search for and use of secondary data in its many forms,
whether you are merely searching the Internet or buying needed data from a broker.
To reiterate and re-emphasize: All market research designs for every research project should
begin with an assessment of what is already known about your research problem. The answer to
that question almost always involves the search for and use of secondary data in its many forms,
whether you are merely searching the Internet or buying needed data from a broker.
the advantage of each imagined example of secondary research is that it costs
less time, money and effort.
the advantage of each imagined example of secondary research is that it costs
less time, money and effort.
No
single organization outside of governments could fund entire population studies such as censuses
or large -scale public health studies, for example
No
single organization outside of governments could fund entire population studies such as censuses
or large -scale public health studies, for example