Lecture 1 Flashcards
data science
What is the role of a data scientist? (responsible data analytics from a data scientist perspective)
data scientist:
technical tools:
* has statistical tools for data analytics
* has the fundamentals of machinine learning for data analytics
* makes the design choices
Responsible analysis
* accounts for data bias and bias mitigation
* accounts for other stakeholders and “non-customers”
* decides on which design choices are made
What are the four flavours/parts in data analytics?
Descriptive analytics
Diagnostic analytics
Predictive analytics
Prescriptive analytics
What is descriptive analytics?
(Main question and tools)
Main question: What is happening?
Tools: Visualization, Statistics
What is Diagnostic analytics?
(Main question and tools)
Main question: why did it happen?
tools: Advanced statistics. clustering
What is Predictive analytics?
(Main question and tools)
Main question: What is likely to happen?
Tools: Supervised, unsupervised machine learning
What is Prescriptive analytics?
(Main question and tools)
Main question: What should I do about it?
Tools: Monitoring, Stakeholder analysis
What are the data science aspects?
- Proper Data Usage
- Data Nature
- Data Type
- Data Visualiation
- Modelling
- Validation
What are the goals of data science?
- To have an overview and terminology
- to know where to look for answers
- to ask the “right” questions
- to answer the “right” answers
- data value
- opportunities
- challenges
I think to understand the data value and add value to the data
What does data science consist of? (Data as integral part)
- collecting
- curating
- cleaning
of the data
collecting gathering the data
curating select, organize, and look after the data
cleaning (the data that has been collected and curated) now fixing or removing incorrect, corrupted, incorrectly formatted, duplicate or incomplete data within the data set.
this can be visualized, analysized, modeling
These steps, collecting curating and cleaning can be presented by: visualizing, analysing, and modeling (slide 37 week 1)
Modelling questions (to ask yourself)
- Why do I want to model?
- what is useful to model?
- what can i model?
- how will the model be used?
- who is going to use the model?
Data questions (to ask yourself)
- What data do I need?
- What data do i have?
- How hard is it to get the data?
What is the essence of data science?
To refine the questions,
( slide 48, bit vague but i think asking questions as a DS, getting responses (from customers who do not know a lot about data science) and based on those responses refining the question and asking new, more specific questions. )
What is data?
- “Factual information (such as
measurements or statistics)
used as a basis for reasoning,
discussion, or calculation - Information in digital form that
can be transmitted or
processed - Information output by a
sensing device or organ that
includes both useful and
irrelevant or redundant
information and must be
processed to be meaningful” –
Merriam-Webster Dictionary
Types of Data
There is a lot of different types:
* Transport
* Geographical
* cultural
* scientific
* financial
* statistical
* meteorological (about weather)
* natural (nature)
Types of data structures
- Structured data
- semi-structured data
- unstructured data