Machine Learning & AI Flashcards

1
Q

9 breakthrough technologies 2022

A
  1. unhackable Internet: entangled particles transmitted end to end, unable to be read w/o disrupting content
  2. hyper-personalized medicine: takes large team to develop treatment for rare condition; solutions using digital speed
  3. digital currency: has potential if backed by real currency
  4. anti-aging drugs: senolytics - remove senescent cells that create low-level inflammation, creating toxicity, and also benefits of
  5. AI-discovered molecules: 10^60 chemical molecules possible; use machine learning to speed up process of finding possible drugs
  6. satellite mega-constellations: blanket the world with high speed internet or junk-ridden minefield
  7. tiny AI: lower carbon emission; increases speed; increases privacy (local storage)
  8. differential privacy: inject noise into user data to increase privacy
  9. climate change attribution: improving techniques to link weather to climate change; disentangling factors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

adversarial inputs

A
  • aim at misclassification in order to avaid detection
  • e.g. - malicious documents designed to evade antivirus, and email attempting to evade spam filters
  • mutated input: attackers actively minimize classifers detection rate using undetectable payloads. To develop detection systems
    1. limit information leakage: don’t provide error codes or confidence values
    2. limit probing: limit how many paylods they can test (e.g. captha)
    3. ensemble learning: combing variable detection methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

self-attention

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

AutoML

A
  • attempt to make machine learning available to people without strong expertise in the field,
  1. automate repetitive tasks which enables a data scientist to focus more on the problem rather than the models
  2. automate data pipeline components - helps to avoid errors that might slip in with manual processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

baseline models

A
  • calculate
    • accuracy
    • efficiency
    • cost
  • examples
    • naive bayes
    • linear regression
    • Markov Model (NLP)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

batch normalization

A
  • N ~ (0, 1)
  • this is where activation are the most dynamic
  • do this to every layer
  • exponential-smoothed average
  • removes the need for bias
  • need batch and running meaning and variance
  • regularizes like noise injection because it adds error term to norm
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

bayes rule

A
  1. p(y|x)=p(x|y)p(y)/p(x)
    1. likelihood*prior/normalizer
  2. prior p(y)=instances/total
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

bidirectional RNN image classification

A
  • transpose
    • vertical RNN and horizontal RNN
  • global maxpool on both
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

bidirectional RNN

A
  • make another recurrent unit but read in reverse
  • concat hidden to get size ht 2M
    • ht=[h–>,h]
  • many to one case
    • output is O=[hT–>,h1]
    • could also take max
  • is able to predict first item in sequence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

bidirectional RNN architecture

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

black swan AI attack

A
  • sooner or later, an attack will throw off your classifier
  1. develop incident response process
    1. have necessary controls to delay or halt processing when debugging
    2. know who to call
  2. use transfer learning to protect new products
    1. pretrained models or public datasets
  3. leverage anomaly detection
    1. e.g. - abuse of free tier to mine data (changes of uses of platform)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

cognitive computing

A
  • perform specific, human-like tasks in an intelligent way using machine learning
  • simulate human thought processes using a computerized model
  • imitate the way the human brain functions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

collaborative filtering

A
  • user rank: s(i,j). = µi +∑wii’{rij-µi’}/∑|wii’|
  • pearson correlation (correlation b/w variables) - basically cosine similarity
    • wii’=∑(xi-µ(x)i)(yi-µ(y)i)/√(∑(xi-µ(x)i)2∑(yi-µ(y)i)2)
    • Ψii’: numerator is over rating by both users
    • Ψi, Ψi’: denominator is over respective user
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

collaborative filtering formula

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

concatenation trick

A
  • the weights of the input and hiddens can be concatenated
  • size Mx(M+D)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

convolution

A
  • acts as a filter; blurring, edge detection, etc..
  • denoted I*K(image convoluted with filter/kernel)
  • a(t)*w(t)=∫a(τ)w(t-τ)dτ continuous or a[x]*w[x]=Σa[x2-x]w[x]
  • signal and kernel interchangeable
  • can replace x with x’
  • filters are the pattern were looking for, spikes are occurance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

convolution versus cross-correlation

A

convolution: (f*g)[n] = ∑f[m]g[n-m]

cross-correlation: (f♦g)[n] = ∑f[m]g[n+m]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

convolutional neural network

A
  • imitates the visual cortex in the brain
  • invariant to shifts in features of image (translational invariance)
  • convolution->pooling->convolution->pooling->fully connected layer
  • 3d image has 3d filter
  • multiply filter by input elementwise; pooling looses a dimension
  • the features that are found actually resemble things in input (layer 1 shape, layer 2 body part, label 3 face)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

cross-entropy

A
  • use for classification
  • use for autoencoder
  • entropy and variance are correlated to unpredictableness
  • -T.mean(T.log(y[T.arange(y.shape[0]), t]))
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

d(sigmoid(x))/dx

A

y*(1-y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

d(tanh(x))/dx

A

1-y**2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

data lake/warehouse

A
  • lake: large-scale pool of raw data without a concrete purpose
  • data warehouse: repository for structured, filtered data that has already been processed for a specific purpose.
  • data lakes were born out of the need to harness big data and benefit from the raw, granular structured and unstructured data for machine learning
  • data warehouses for analytics used by business users.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

data pipeline

A
  • encapsulate a number of processing steps required to prepare data for machine learning
  • performing “data prep” operations such as cleansing data and handling missing data and outliers
  • also transforming data into a form better suited for machine learning
  • includes training or fitting a model and determining its accuracy
  • automated so their steps may be performed on a continued basis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

data poisoning

A
  • feeding training adversarial data to the classifier to polute training data to perform worse
  • model skewing:
    • attackers pollute training data to shift learned boundary
    1. use sensible data sampling
      • don’t allow one user too much influence to system
      • use weight decay
    2. compare classifer to previous
      1. dark launch: compare two outputs on same traffic
      2. backtesting: A/B testing of fraction of traffic
    3. build golden standard dataset: classifier must accurately predicct
  • feedback weaponization: use feedback systems to attack legitimate users and content
    1. verify feedback befor making decisions
    2. don’t assume benefactor is responsible: hackers cover tracks to penalize users
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

deeper is better

A

less hidden units per layer and achieve better performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

DeepMind wavenet

A
  • text to speech
  • uses CNN
  • is used by Google Assistant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

dis/advantage full batch

A
  • wont work on big dataset due to size
  • maximize likelihood over entire set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

distribution ML pipeline tools

A
  • Apache Spark
  • Apache Airflow
  • Kubeflow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

docker containers

A
  • small, user-level virtual machine that helps data scientists build, install, and run code
  • built from a script
  • ability to version control a data science environment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

does data influence training time?

A
  • training time does not depend on amount of data
  • only the creation time increases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

dropout regularization

A
  • drops random nodes during training
  • hidden layer units rely on multiple input
  • emulates ensembles
  • approximates 2^(# neurons) networks
  • MULTIPLY PREDICTION LAYERS BY P-KEEP AT END
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

edge analytics

A
  • data collection and analysis where an analytical computation is performed on data at the point of collection (e.g. a sensor) instead of waiting for the data to be sent back to a centralized data store
  • IoT model of connected devices has become more established
  • filter for what information is worth sending to a central data store for later use.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

ensemble network

A
  • majority vote; better accuracy than one model; two methods (different features, same features)
  • 100 features 1 million data points
  • 10 networks of 10 features
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

exponentially-smoothed average

A
  • z(t)=(1-å)Y(t-1)+å*costs(t), (0 < å < 1)
  • y(t) = z / (1 - decay ** t + 1)
  • more recent values matter more
  • thus better for non-stationary data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

face map

A

Face maps provide the ability to combine different types of relationships into one that you can do math on

  • THEOREM is that if you combine relationships that are null or simpler, you reconstruct the larger system
    • Co-limits
  • Validating: measure the low dimensional entropy and the high dimensional one
  • Can iteratively combine low dimensional bad ones with high dimensional good ones until you fine the right one
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

fancy method

A
  • NLP task
  • max pool hidden y in the RNN
  • determine if some sequence is in the data
  • has 1 output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

fastest AI

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

gated recurrent unit

A
  • similar to rate, you have to choose b/w
    • taking the old value
    • or taking the new value
  • if reset gate r(t) = 0, its like beginning new sequence
    • but, h(t) will be a combo of h(t-1) and hhat(t)
    • has same form as update gate
    • is essentially a forgetting factor
  • update gate weights previous
    • z(t) = act(xtWXZ + ht-1Whz + bz)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

geospatial analytics

A
  • handle geographic information system (GIS) data (e.g. GPS data) and imagery (e.g. satellite photographs)
  • uses geographic coordinates as well as identifier variables such as street address and zip code
  • create geographic models and data visualizations for more accurate modeling and predictions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

GPU acceleration

A
  • use GPUs and CPU to hasten
  • GPU database accelerates certain database operations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

graph

A
  • represents a connection between a collection of entities (e.g. spending habits of consumers)
  • vertex (nodes): node attributes such as age and height, and number of neighbors
  • edge: is relationship between customer and product - edge identity and and weights
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

graph database

A
  • uses “graph theory” to store, map and query relationships of data elements
  • collection of nodes and edges
  • node represents an entity such as a product or customer
    • have: a unique identifier, a set of outgoing edges and/or incoming edges, in addition to a set of key/value pairs
  • edge represents a connection or relationship between two nodes
    • have: unique identifier, a starting-place and/or an ending-place node along with a set of properties
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

graph database image

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

grid search

A

exhaustive method that loops through all possible combinations of variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

GRU architecture

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

how to fix overrepresentation of end

A
  • only add end token n % of the time
  • otherwise stop on second to last word
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

how to prevent or cause same prediction every time

A
  • dont model the initial word probability distribution
  • p(w(0)) = softmax(f(‘start’)), w(0) = randint(V, p=p(w(0)))
  • output probability instead of deterministic output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

increase CNN invariance

A
  • modify training data
    • orientation,
    • color,
    • size, etc…
  • can be used to increase one portion of data size if there is class imbalance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

indirect encoding

A
  • field of neuroevolution
  • analogous to pruning in lottery ticket
  • experissive while reducing parameters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

item-item vs user-user

A
  • item-item
    • choose items for user b/c liked similar items in the past
  • user-user
    • choose items for user b/c liked by similar users
  • uses the same algorithms just transpose the ratings matrix
  • comparing items opposed to users provides more data, also it is faster
51
Q

k-fold cross validation

A
  1. split into k groups
  2. training groups [1:k]
  3. training groups [0]+[2:k]
  4. etc…
  5. use t-test to compare intra
52
Q

KL divergence

A
  • use to compare 2 distributions similarity
  • gradient is the same as cross entropy/negative log-likelihood
  • kl and cross entropy replaceable for back propagation
53
Q

L1 regularization

A

encourages sparsity (=0)

54
Q

L2 regularization

A

encourages small weights (~=0);

55
Q

latency & sensors in python

A
  • write in pytorch or tensorflow
  • rewrite model in C++
    • either rewrite completely or serialize it
      • pytorch tracing function
      • tensor flows graph mode
56
Q

linear regression regularization

A
  • J = ∑(y - y^)2+0.5*lambda(||*||F2+||*||F2+…+||*||F2)
    • ||*||F2=Frobenius norm
    • * is a weight matrix
  • punishes complexity
  • improves error on unseen data
57
Q

local minimum

A
  • is more likely a saddle point therefore is not really a problem
  • slides off
  • very low probability for all dimensions
58
Q

logarithmic sampling

A

how you look for parameters to ensure you get breadth instead of close values

59
Q

lottery ticket

A
  • find lucky subnetwork with sparse network
  • present in LSTMs and transformers
  • fit on small device
60
Q

low-code/no-code

A
  • ML applications w/ drag and drop components
  • connect components to create a finished application
  • many enterprise Business Intelligence (BI) platforms fall into this platform category
61
Q

LSTM algorithm

A
62
Q

LSTM architecture

A
63
Q

LSTM params

A
  • Input gate:
    • params: Wxi, Whi, Wci, bi
    • depends on: x(t), h(t-1), c(t-1)
  • Candidate cell:
    • params: Wxc, Whc, bc
    • depends on: x(t), h(t-1)
  • Forget gate:
    • params: Wxf, Whf, Wcf, bf
    • depends on: x(t), h(t-1), c(t-1)
  • Output gate:
    • params: Wxo, Who, Wco, bo
    • depends on: x(t), h(t-1), c(t-1)
64
Q

machine learning security

A

malicious modification of networks to worse output

65
Q

main problem with AI

A
  • don’t have common sense
  • networks find correlation, not causation
  • example research: causal bayesian network
66
Q

major AI conferences

A
  1. NeurIPS (NIPS): neural networks, but not exclusively
  2. International Conference for Machine Learning (ICML): general machine learning
  3. International Conference on Learning Representations (ICLR): really the first conference focused on deep learning.
  4. Association for the Advancement of Artificial Intelligence (AAAI): more application based
  5. IEEE conferences
    1. International Joint Conference on NN (IJCNN)
    2. IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)
    3. IEEE Congress on Evolutionary Comput (CEC)
67
Q

Markov decision process

A
  1. set of all states: measurements
  2. set of all actions: actions the agent can do
  3. set of all rewards: received at each step
  4. state transition probabilities
  5. discount factor (gamma)
68
Q

mixed data

A

multiple independent data types

  • Numeric/continuous
  • Categorical
  • Image
69
Q

model stealing

A
  • steal prediction models and spam filters in order to optimize against them
  • e.g. - blackbox probing for stock mrket predictions
  • model reconstruction: recreate model by probing the public API and refining own model using it as an oracle
  • membership leakage: attacker builds shadow models that is used to determine whether a given record was used to train a model
70
Q

MSE versus cross-entropy

A
  • MSE for regression
    • assumes target continuous and normally distributed
    • doesnt punish misclassification enough
    • vanishing gradient
  • cross-entropy for classification
    • maximizes likelihood
    • decision boundary is large
    • converge faster
71
Q

naive bayes

A
  • generative classifier (p(x|y) instead of discriminitive p(y|x))
  • p(x|y)=Πp(xn|y)
  • naive because assume no covariance
  • IID assumption valid if you use PCA
72
Q

natural graphs

A

DEFINITION: a set of points that have an inherent relationship between them

  • Co-occurance: capture user behavior based on interactions with data

EXAMPLES:

  1. Citation Graph: capture relationship between articles to other articles using citations
  2. Natural Language: node represent entity and edges represent relationships between pairs of entities
73
Q

neural structured learning (NSL)

A

GOAL: optimized supervised AND neighbor loss to keep structural similarity to learn the structure; (bonus) requires less data

  • Graph Regularization: idea is to train ANN with graph-regularized objective harnessing labeled and unlabeled data
  • Adversarial learning: generate adversarial neighbors by keeping structure similarity to other samples. Why important?
  • Adversarial structures: as opposed to graphs, they are implicitly inferred;
    • Use similarity between instances (using pertained embedder), else, If don’t have similar structures you create adversarials intended to mislead the neural network to producing the incorrect classification
      • Usually perturbations generated by reverse gradient
74
Q

neuro-symbolic

A
  • symbolics: better at abstraction and reasoning
  • ml: better at scalability and pattern recognition
  • hybrid: understanding causal relationships
75
Q

neurons in brain

A
  • 10^11 neurons in the human brain
  • each is connected to 1,000 to up to 10,000 other neurons
76
Q

node classification methods

A
  • deepwalk: derive embeddings from truncated random walk from graph data
  • graph CNN: Use Convolution on linear layers; each layer equals expanding of network
  • graph BERT: removed dependency on links
77
Q

noise injection

A
  • type of regularization
  • N ~ (0, var << 1)
  • injection to inputs and weights
78
Q

NSL Architecture

A
79
Q

olfactory machine learning

A
  • uses 4 layers
  • finds the maximum signal strength
  • one neuron can represent multiple smells
    • as opposed to visual cortex which one neuron is one pixel
  • attempts coctail party problem
    • narrows in on particular sound signals in short period
    • disentangle conversations
80
Q

one-hot encoding

A

assign random numbers

i.e red = 1, blue = 2, and green = 3

81
Q

open-source libraries

A
  • Tensorflow (Google): symbol math library; high level keras; c++ support
  • Pytorch (Facebook): ML/DL
  • Scikit-learn: classification, regression, and clustering algorithms
  • Jax: Basically numpy with ML support and GPU, TU
  • PySpark: extremely fast cluster computing for python using spark on standalone cluster
    • Spark SQL for dataframes
    • MLib for ML (for spark cluster - runs on Hadoop, Apache Mesos, Kubernetes)
82
Q

overfit

A
  • really complex model with many parameter may not do good
  • bias variance tradeoff
  • simpler models overfit less
  • regularization
    • noise injection
    • dropout
    • batch normalization
83
Q

p(drop/keep)

A
  1. probability of keeping or dropping a node
  2. typically p(keep) 0.8 for input layers and p(drop) 0.5 for dropout layers
  3. multiply by mask
84
Q

parity problem

A
  • application: data transmission in communication system
  • sends bitstream to reciever
  • if odd 1’s add parity bit 1
  • if odd 1’s even add parity bit 0
  • if odd then there is an error
85
Q

penalty term

A

term that grows as the weight grows

86
Q

pooling

A
  • a method using in CNNs to downsample (reduce size of features)
  • take the maximum/average in a block
87
Q

pretrained models

A
  • bigger model with more comprehensive pretraining data
  • performs better at multiple downstream tasks
  • fewer training examples
  • saves money collecting more annotated samples
88
Q

rated recurrent neural network (RRNN)

A
  • Z = rate
  • weight two things
    • f(x(t), h(t-1)): output of RNN
    • h(t-1): previous value of hidden state
  • element-wise multiplication
    • h(t) = (1-Z)°h(t-1) + Z°f(x(t), h(t-1))
  • acts like a low pass filter
  • Z can be calculated many ways
    • weight param
    • function of X
      • z(t) = f(WXZx(T)+WHZh(t-1)+bZ)
89
Q

recommendation interfaces

A
  • news feed
  • product feed
  • similar product
  • associated products
  • search engine
  • advertisements
90
Q

recurrent unit (Elman unit)

A
  • h(t)=f(WhTh(t-1)+WXTx(t)+bh)
  • y(t)=softmax(W0Th(t)+b0)
  • relies on all previous states, no markov assumption
    • X(t): D, Wi: D x M
    • h(t): M, Wh: M x M
    • Y(t): K, W0: M x K
91
Q

recurrent vs markov model

A
  • using softmax with RNN is about optimizing joint probability
  • longer sequences dont go to 0
  • joint probability is implicit in model
92
Q

ReLU

A
  • rectified linear unit
  • biologically plausible due to asymmetry
  • better gradient propagation (no pretraining prep needed for deep networks)
  • however dying relu can occur when all activation is turned off
93
Q

same/half padding

A
  • padding the input with 0 (~half the length of the filter) to allow the output size to equal input size
  • filters usually odd size to allow both sides of input to have equal padding
94
Q

saturated

A

the region of a function (such as sigmoid) that are not dynamic

95
Q

selecting CNN layers

A
  • depth increases through layers, height and width decrease (reverse for type 2 - image in output)
  • convolutional on the side of image
  • fully-connected on the side of vector
  • fully connected layers can all be the same size (research has found that it does not overfit)
96
Q

self attention diagram 1

A
97
Q

self-attention diagram 2

A
98
Q

solving class imbalance

A

sample with replacement for until the smaller set is the same size as the bigger set

99
Q

spiking neural networks

A

activation potentials resemble biological neuron

100
Q

stochastic gradient descent (SGD)

A
  • assume all data IID
  • very slow
  • error improves long-term (although it may or may not immediately)
  • sample with replacement (sometimes)
101
Q

structured v. unstructured data

A
  • tabular data versus raw data
  • data that looks like an sql data with input and id neatly defined versus non-structured like an image
102
Q

supervised learning for recommender systems

A
  • input demographics
    • age, gender, religion
    • occupation, education
    • location, race
  • predict the users reaction
    • did they buy item
    • did they click on add
103
Q

truncated back propagation

A
  • long sequences take a lot of time
  • stop at certain number of time steps
104
Q

types of RNN

A
  1. label for sequence
  2. label for each layer
  3. no labels
105
Q

unrolled RNN

A
106
Q

valid/full mode

A
  1. valid mode: output n-k+1
  2. full mode: output n+k-1
107
Q

vanishing gradient/exploding gradient

A
  • sigmoid is at max 0.25
  • a**n=0/inf
  • learn very slowly/not at all
108
Q

weight initialization

A
  1. tanh: 1/M1 or 1/(M1+M2);
  2. relu: 2/M1;
  • DRAWING FROM N~(0, 1)
  • should normalize input X=(X-µ)/std
109
Q

why initialize weights randomly

A

prevents symmetry (like having one unit!)

110
Q

why sigmoid?

A
  • monotonically increasing
  • is the output of binary logistic regression (neuron like logistic)
111
Q

zero-day inputs

A

not very often, but happen

  • new product or feature launch: new funcationalities open new attack surfaces
  • increase incentive: google cloud compute to mine bitcoin
112
Q

image-as-graph

A

each pixel represents a 3d RGB vector connected to 8 “adjacent” neighbors

113
Q

Text-as-graph

A

Represent text as a sequence of tokens in a directional graph

114
Q

Examples of Graphs

A
  • Social networks: people as nodes and their relationship as an edge
  • molecules: Atoms as nodes and covalent bonds as edges
  • Citation networks:each paper is a node and and edge is a citation between one paper and another
    • Can even do word embedding of abstract as information about one node
115
Q

Graph vs. Node vs. Edge TASK

A
  • Graph: predict the property of the entire graph (.e.g. smell of molecule, label,)
  • Node: predict the identity of role of each node (e.g. member loyal to John or Jane, image segmentation, POS tagging)
  • Edge: determine the relationship between objects in images
116
Q

representing graphs in ML

A
  1. Tabular: often sparse
    • node: feature matrix N x D (nodes x features)
    • connectivity:
  2. Lists: more computationally efficient
    • Nodes = [[0, 1], [1, 0], [0, 0]
    • Edges = [[2, 1], [1, 1,]]
    • Adjacent list = [[1, 0], [2, 0]]
117
Q

graph neural network

A

an optimizable transformation on all attributes of the graph (nodes, edges, global-context)

  • graph-in, graph-out: architecture accepts graph as an input, with information loaded into its nodes, edges, and global-context, and progressively transforms the embeddings, without changing the connectivity of the input graph
118
Q

simplest GNN

A
  • Learn embeddings for graph attributes (nodes, edges, global) W/O changing connectivity
  • Multilayer perceptron on each component
    • N(n_f|n_0), E(e_f|e_0), G(g_f|g_0)
119
Q

Large Language Generative Models Sizes

A
  • params: NVidia-Turing (570 b), GPT3 & AI21 Jurassic (175 b)
  • tokens: GPT3 (300 b), chinchilla (1.4 t)
  • human feedback training
120
Q

areas for improvement generative models 2022

A
  1. memory - recall previous information
  2. summarize - decompose info into elements
  3. Model/Data quality assurance -
    1. 4.6-17.2 T tokens (books, literature, articles)
    2. NOT blogs, webpages, social media, spoken content
  4. bigger is not better = more cost, impossible to beat, …
  5. text-to-video
121
Q

affective AI

A
  • emotion detection in image, video, text, sound waves
  • current technologies average samples which leads to possible bias (e.g. - cultural and sex differences)
122
Q

2024 predictions

A
  1. humanoid robots - maybe a foundational model
  2. ML Ops - weights and biases (raised 200 million at 1 billion)
  3. ethics and law tossed in generative AI mix
    1. still lacking basic knowledge
  4. AI search using generative AI
  5. mixed modal foundational models
123
Q

6 openAI competitors

A

whomever can run cheapest and best for the job

  1. AdeptAl - ~660M Greylock - 20 employees - browser extension for instructional surfing
  2. Al21 - 750 million - 160 employees - customize models for automated copywriting, summarize documents
  3. Anthropic - 4 billion - 80 employees - general-purpose LLMs free of bias and toxicity
  4. Character - 18 employees - create and interact with chatbots that can role play
  5. Cohere - ~1 bil Tiger Global - summarizing documents, copywriting, and search
  6. Inflection - 1.23B Grevlock - communicate with machines to relay our thoughts and ideas
124
Q

How to find bad labelled data?

A
  1. Sort by model losses and finding ambiguous samples, AND sorting by confidence where model and ground truth disagree
  2. Confident Learning: pruning noisy data based on prediction intervals (worse than aforementioned method)