Module 1 - Data and Foundations Flashcards
The 2 types of data
Quant: temperature (continuum), integer variables (discrete), etc.
Qual: toggles that switch betw. controls (logical/state), categorical (ie. red buttons)
“Event”?
Event = a specific outcome with a probability
Axioms of probabiltiy
P(S) = 1 (probability of some outcome S in sample space happening is 1)
0 <= P(S) <= 1
If I have 2 MUTUALLY EXCLUSIVE events: P(E1 U E2) = P(E1) + P(E2)
Random variables
Attach numerical labels to an outcome
Example of random variable
If random variable value is ‘grey’ then number assigned is 4
T is the random variable for any temperature value on a continuum scale
Cumulative probability function for discrete random variable
F_X(R) - P(x <= k) = SUM(xi*Pi) (xi=i) for all i = 0 –> k
Cumulative probability function for continuous random variable
f_X(a) = P(a - st.dev < x < a + st.dev)
Cumulative distribution for continuous random var
P(x < k) = INT(f_X(x) dx)
Expected value for discrete cases
xi*P_X(xi) –> sum for all xi and P_X(xi)
Expected value for continuous cases
E{X} = INT(x*f_X(x) dx)
Linear operator conditions for expected value
Commutative: E{X1 + X2} = E{X1} + E{X2}
Mult. by scalar: E{kX} = kE{X}
Human-centred insights
Ethnography
________ ________ brings in human-centred insights, which is another kind of _____.
Design thinking
Data
Anthropogenic
Made by humans
Problems with surveillance tech to collect data from the public
- Social policies bring in questions of privacy
- Who owns the tech? Are they ethical?
- Misidentification, inappropriate conclusions drawn from data
- Gaps in monetary network (too expensive)
We need a ___________ data set to feed a model so it is trained for the appropriate task
Representative
In engineering, we use data to help develop solutions that lie at the intersection of…
Technical feasibility, social desirability, financial viability
Sources of variability
Ambient fluctuations
Instrumentation and measurement fluctuations
(For chem processes) Models and system/process representation
Ambient fluctuations
Disturbances within a process or physical system
Human interventions
External disturbances
Instrumentation and measurement fluctuations
Electronic noise
Physical location of instrument
Models and system/process representation
Assumption of well-mixed is spatially distributed
Associated with simplification in model form
Active collection
Make a series of planned moves on process
Increases info content, guarantees “causality”/”cause and effect” relationships
Passive collection
Record process values without actively intervening
“Historical” databases = passively collected data, ie. browser history
Role of statistical methods
- Decision-making under uncertainty
- Categorizing and modelling variability (ie. Poisson distr w/ mean, var)
- Basis for “variability accounting” –> how it propogates
- Data “microscope”
- Effective presentation of results in graph/quant forms