Big data analytics and data science role Flashcards
What are the 5 v’s?
Volume Velocity variety Value Veracity
What does Veracity mean?
willingness to believe data is good
What are the 2 types of data?
Meta and para
What does metadata mean?
minimum you should know about the data
What does paradata mean?
how has the data been processed
What are the 4 data structures?
structured
semi-structured
quasi-structured
unstructured
what are the data repositories?
data islands
data warehouses
analytic sandbox
what is a data island?
isolated datamarts. record keeping in spreadsheets and low volume DBMS
what is a data warehouse?
centralised data repository. Supports BI and reporting
what is an analytic sandbox?
assets from multiple sources ready for analysis
What are the three big data project success factors?
timely decision making
processing throughout
flexibility
what three ways does an analytic sandbox support big data success factors?
provides high performance analysis
ingests data from different sources
owned by the DS rather than IT
What are the business drivers of big data/data science?
optimise business processes
predict new business opportunities
mitigate business risk
meet legal and regulatory requirements
what are the four parts of the big data ecosystem?
data devices
data collectors
data aggregators
data users/buyers
what are data devices in the big data ecosystem
they continuously gather data about the world (phones)
what are data collectors in the big data ecosystem
interact with many organisations and institutions. Provides them with information to access their services
what are data aggregators in the big data ecosystem
take data from multiple sources and combine and enrich them to provide data to consumers
what are data users/buyers in the big data ecosystem
users consume data from their own sensor net and data collector along with data acquired from data aggregators to help form data decision making
which is in the past and which is in the future?
BI - the past
DS - the future
What are the 4 key roles within DS?
analytical talent
data savvy professionals
technology enablers
knowledge engineers
what does a knowledge engineer do?
wrangle the data ready for projects to consume
what are the 5 things a data scientist should be?
quantitative - do math curious and creative technical - code skeptical - question communicate and collaborate
what are the 3 original V words?
Volume
Variety
Velocity
Describe semi-structured data?
XML (Coding)
Describe Quasi- structured data?
web clickstream
Big data uses ELT what does it mean?
extract load transform
what is a data savvy professional?
intro of understanding of DS
what is an analytical talent?
training in quantitative methods
what is a technology enabler?
looks at hardware and software