Machine Learning & AI Flashcards
9 breakthrough technologies 2022
- unhackable Internet: entangled particles transmitted end to end, unable to be read w/o disrupting content
- hyper-personalized medicine: takes large team to develop treatment for rare condition; solutions using digital speed
- digital currency: has potential if backed by real currency
- anti-aging drugs: senolytics - remove senescent cells that create low-level inflammation, creating toxicity, and also benefits of
- AI-discovered molecules: 10^60 chemical molecules possible; use machine learning to speed up process of finding possible drugs
- satellite mega-constellations: blanket the world with high speed internet or junk-ridden minefield
- tiny AI: lower carbon emission; increases speed; increases privacy (local storage)
- differential privacy: inject noise into user data to increase privacy
- climate change attribution: improving techniques to link weather to climate change; disentangling factors
adversarial inputs
- aim at misclassification in order to avaid detection
- e.g. - malicious documents designed to evade antivirus, and email attempting to evade spam filters
mutated input: attackers actively minimize classifers detection rate using undetectable payloads. To develop detection systems
- limit information leakage: don’t provide error codes or confidence values
- limit probing: limit how many paylods they can test (e.g. captha)
- ensemble learning: combing variable detection methods
- attempt to make machine learning available to people without strong expertise in the field,
- automate repetitive tasks which enables a data scientist to focus more on the problem rather than the models
- automate data pipeline components - helps to avoid errors that might slip in with manual processes
baseline models
- calculate
- accuracy
- efficiency
- cost
- examples
- naive bayes
- linear regression
- Markov Model (NLP)
batch normalization
- N ~ (0, 1)
- this is where activation are the most dynamic
- do this to every layer
- exponential-smoothed average
- removes the need for bias
- need batch and running meaning and variance
- regularizes like noise injection because it adds error term to norm
bayes rule
- p(y|x)=p(x|y)p(y)/p(x)
- likelihood*prior/normalizer
- prior p(y)=instances/total
bidirectional RNN image classification
- transpose
- vertical RNN and horizontal RNN
- global maxpool on both
bidirectional RNN
- make another recurrent unit but read in reverse
- concat hidden to get size ht 2M
- ht=[h–>,h]
- many to one case
- output is O=[hT–>,h1]
- could also take max
- is able to predict first item in sequence
bidirectional RNN architecture
black swan AI attack
- sooner or later, an attack will throw off your classifier
- develop incident response process
- have necessary controls to delay or halt processing when debugging
- know who to call
- use transfer learning to protect new products
- pretrained models or public datasets
- leverage anomaly detection
- e.g. - abuse of free tier to mine data (changes of uses of platform)
cognitive computing
- perform specific, human-like tasks in an intelligent way using machine learning
- simulate human thought processes using a computerized model
- imitate the way the human brain functions
collaborative filtering
- user rank: s(i,j). = µi +∑wii’{rij-µi’}/∑|wii’|
- pearson correlation (correlation b/w variables) - basically cosine similarity
- wii’=∑(xi-µ(x)i)(yi-µ(y)i)/√(∑(xi-µ(x)i)2∑(yi-µ(y)i)2)
- Ψii’: numerator is over rating by both users
- Ψi, Ψi’: denominator is over respective user
collaborative filtering formula
concatenation trick
- the weights of the input and hiddens can be concatenated
- size Mx(M+D)
- acts as a filter; blurring, edge detection, etc..
- denoted I*K(image convoluted with filter/kernel)
- a(t)*w(t)=∫a(τ)w(t-τ)dτ continuous or a[x]*w[x]=Σa[x2-x]w[x]
- signal and kernel interchangeable
- can replace x with x’
- filters are the pattern were looking for, spikes are occurance
convolution versus cross-correlation
convolution: (f*g)[n] = ∑f[m]g[n-m]
cross-correlation: (f♦g)[n] = ∑f[m]g[n+m]
convolutional neural network
- imitates the visual cortex in the brain
- invariant to shifts in features of image (translational invariance)
- convolution->pooling->convolution->pooling->fully connected layer
- 3d image has 3d filter
- multiply filter by input elementwise; pooling looses a dimension
- the features that are found actually resemble things in input (layer 1 shape, layer 2 body part, label 3 face)
- use for classification
- use for autoencoder
- entropy and variance are correlated to unpredictableness
- -T.mean(T.log(y[T.arange(y.shape[0]), t]))
data lake/warehouse
- lake: large-scale pool of raw data without a concrete purpose
- data warehouse: repository for structured, filtered data that has already been processed for a specific purpose.
- data lakes were born out of the need to harness big data and benefit from the raw, granular structured and unstructured data for machine learning
- data warehouses for analytics used by business users.
data pipeline
- encapsulate a number of processing steps required to prepare data for machine learning
- performing “data prep” operations such as cleansing data and handling missing data and outliers
- also transforming data into a form better suited for machine learning
- includes training or fitting a model and determining its accuracy
- automated so their steps may be performed on a continued basis
data poisoning
- feeding training adversarial data to the classifier to polute training data to perform worse
model skewing:
- attackers pollute training data to shift learned boundary
- use sensible data sampling
- don’t allow one user too much influence to system
- use weight decay
- compare classifer to previous
- dark launch: compare two outputs on same traffic
- backtesting: A/B testing of fraction of traffic
- build golden standard dataset: classifier must accurately predicct
feedback weaponization: use feedback systems to attack legitimate users and content
- verify feedback befor making decisions
- don’t assume benefactor is responsible: hackers cover tracks to penalize users
deeper is better
less hidden units per layer and achieve better performance
DeepMind wavenet
- text to speech
- uses CNN
- is used by Google Assistant
dis/advantage full batch
- wont work on big dataset due to size
- maximize likelihood over entire set
distribution ML pipeline tools
- Apache Spark
- Apache Airflow
- Kubeflow
docker containers
- small, user-level virtual machine that helps data scientists build, install, and run code
- built from a script
- ability to version control a data science environment
does data influence training time?
- training time does not depend on amount of data
- only the creation time increases
dropout regularization
- drops random nodes during training
- hidden layer units rely on multiple input
- emulates ensembles
- approximates 2^(# neurons) networks
edge analytics
- data collection and analysis where an analytical computation is performed on data at the point of collection (e.g. a sensor) instead of waiting for the data to be sent back to a centralized data store
- IoT model of connected devices has become more established
- filter for what information is worth sending to a central data store for later use.
ensemble network
- majority vote; better accuracy than one model; two methods (different features, same features)
- 100 features 1 million data points
- 10 networks of 10 features
exponentially-smoothed average
- z(t)=(1-å)Y(t-1)+å*costs(t), (0 < å < 1)
- y(t) = z / (1 - decay ** t + 1)
- more recent values matter more
- thus better for non-stationary data
face map
Face maps provide the ability to combine different types of relationships into one that you can do math on
THEOREM is that if you combine relationships that are null or simpler, you reconstruct the larger system
- Co-limits
- Validating: measure the low dimensional entropy and the high dimensional one
- Can iteratively combine low dimensional bad ones with high dimensional good ones until you fine the right one
fancy method
- NLP task
- max pool hidden y in the RNN
- determine if some sequence is in the data
- has 1 output
fastest AI
gated recurrent unit
- similar to rate, you have to choose b/w
- taking the old value
- or taking the new value
- if reset gate r(t) = 0, its like beginning new sequence
- but, h(t) will be a combo of h(t-1) and hhat(t)
- has same form as update gate
- is essentially a forgetting factor
- update gate weights previous
- z(t) = act(xtWXZ + ht-1Whz + bz)
geospatial analytics
- handle geographic information system (GIS) data (e.g. GPS data) and imagery (e.g. satellite photographs)
- uses geographic coordinates as well as identifier variables such as street address and zip code
- create geographic models and data visualizations for more accurate modeling and predictions.
GPU acceleration
- use GPUs and CPU to hasten
- GPU database accelerates certain database operations
- represents a connection between a collection of entities (e.g. spending habits of consumers)
- vertex (nodes): node attributes such as age and height, and number of neighbors
- edge: is relationship between customer and product - edge identity and and weights
graph database
- uses “graph theory” to store, map and query relationships of data elements
- collection of nodes and edges
- node represents an entity such as a product or customer
- have: a unique identifier, a set of outgoing edges and/or incoming edges, in addition to a set of key/value pairs
- edge represents a connection or relationship between two nodes
- have: unique identifier, a starting-place and/or an ending-place node along with a set of properties
graph database image
grid search
exhaustive method that loops through all possible combinations of variable
GRU architecture
how to fix overrepresentation of end
- only add end token n % of the time
- otherwise stop on second to last word
how to prevent or cause same prediction every time
- dont model the initial word probability distribution
- p(w(0)) = softmax(f(‘start’)), w(0) = randint(V, p=p(w(0)))
- output probability instead of deterministic output
increase CNN invariance
- modify training data
- orientation,
- color,
- size, etc…
- can be used to increase one portion of data size if there is class imbalance
indirect encoding
- field of neuroevolution
- analogous to pruning in lottery ticket
- experissive while reducing parameters