Lecture 3 – Visualising statistics Flashcards
What is the Difference Between
Data analysts
Data scientists
Data engineers
Data analysts are primarily people who develop insights with data ….
- Data scientists are primarily people who develop data models and products, that in turn produce insights …
- Data engineers are primarily people who manage data infrastructure, automate data processing and deploy models at scale …
Explain the different analytic levels
Descriptive Analytics: gain insight from historical data
* plot sales results by region and product category
* correlate with advertising revenue per region
Predictive analytics: make prediction using statistical and
machine learning techniques
* predict next quarter’s sales results using economic projections and advertising targets
Prescriptive analytics: recommend decisions using optimisation, simulation, etc.
* recommend which regions to advertise in given a fixed budget
Which of the following is a prescriptive analytics task (as opposed to a predictive analytics task)?
A. Suggesting a traffic route based on prior data for the time of data and incident reports.
B. Predicting travel time of multiple traffic routes
C. Estimating the student enrolment number of FIT5145 in 2023 Sem 1
D. Measuring the likelihood of a student getting HD in the final exam of FIT5145
A. Suggesting a traffic route based on prior data for the time of data and incident reports.
What are influence diagrams?
method for modeling data and decision making
Influence Diagrams (a.k.a Decision Graphs) are:
* directed graphical model with 4 types of nodes:
- chance nodes, known variable nodes, action/decision nodes and objective/utility nodes
- model the “influences”, “causes”, random (“chance”) outcomes, “actions”, “goals”
involved in a decision problem - provide a coarse abstraction, a conceptual model
Explain the node types of an influence diagram
An Influence Diagram:
A. is a model giving possible situations or outcomes.
B. consists of nodes and arcs.
C. is an alternative to decision tree.
D. consists of nodes and arcs and is an alternative to decision tree.
D. consists of nodes and arcs and is an alternative to decision tree.
Name the four growth laws
Explanations about change in IT and society:
- Moore’s Law
- Koomey’s Law
- Bell’s Law
- Zimmerman’s Law
What does Moore’s Law say?
==> capability and size of IT
Number of transistors per chip doubles every 2 years (starting from 1975)
Transistor count translates to:
* more memory
* bigger CPUs
* faster memory, CPUs (smaller==faster)
Pace currently slowing
What does Koomey’s Law say?
==> capability and size of IT
- Corollary of Moores Law
- Amount of battery needed will fall by a factor of 100 every decade
- Leads to ubiquitous computing
What does Bell’s Law say?
==> purpose of IT
- Corollary of Moore’s Law and Koomey’s Law
- “Roughly every decade a new, lower priced computer class forms based on a new programming platform, network, and interface resulting in new usage and the establishment of a new industry.”
e.g., PCs -> mobile computing -> cloud -> internet-of-things
What does Zimmermann’s Law say?
==> relationship between privacy and IT
- Zimmerman is creator of Pretty Good Privacy (PGP), an early encription system
- “Surveillance is constantly increasing”
- Privacy constantly decreasing
Growth, business, and business models
As information technology develops and with more data collected, businesses utilise it and incorporate it in their business models (–> innovation)
Definition business model:
A business model describes the rationale of how an organization creates, delivers, and captures value, in economic, social, cultural or other contexts.
What kinds of businesses do we have operating in the Data Science world?
Information brokering service: buys and sells data/ information for others
Information-based differentiation: satisfies customers by providing a differentiated service built on the data/information.
Information-based delivery network: deliver data/ information for others.
Information provider: business selling the data/ information it collects.
The Bloomberg Terminal:
* a computer system provided by Bloomberg L.P.
* enables professionals to monitor and analyse real-time financial market data
* also place trades on the electronic trading platform
* is a proprietary secure network
Amazon.com
* An assembly line for the retail industry, with support for embedded online retailers.
* Huge stock of books, DVDs, CDs, etc. easily searchable.
* extensive cusomter reviews
–> Information-based differentiation: satisfies customers by providing a differentiated service (superior information (reviews), range)
–> Information-based deliverynetwork:
- they deliver information for others;
- retailers in the Amazon marketplace get customers directed to them and other retailer’s support
LexisNexis
- provides world’s largest electronic database for legal and public-records related information.
What is statistics?
“The practice or science of collecting and analysing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative samples”.
Two main statistical analytical methods:
* descriptive statistics – explaining data
* inferential statistics – finding regularities in irregular data
Mode, median, mean, variance, standard deviation
mode: which value is most common,
median: what is the value in the middle of the data
mean: the average value.
variance: average of how much values tend to differ from the mean.
Standard deviation: is the square root of the variance.
Example
Data: 2, 4,4,4,5,5,7,9
Mode: 4
Median: 4.5
Mean: 5
var = ((2-5)^2 + (4-5)^2 +(4-5)^2 + … + (9-5)^2)/8 = 4
sd = 4^0.5 = 2