Machine Learning and Image Recognition Flashcards
Week 9 Lecture 1
What is machine learning?
The science of programming computers such that they can learn from data without being explicitly programmed to perform the task.
What problems is ML good for?
- Where existing solutions require many rules or hand-tuning of parameters
- Complex problems without an existing solution
- Noisy environments
- Insights from large volumes of data
Types of machine learning
- Supervised: The system is trained with data and a label/desired output. The goal is to map the data to the desired output.
- Unsupervised: The system is trained only with the data. The goal is to create a representation of the data.
How do we build a simple linear ML model?
- Straight line model y = mx + c
- Evaluate the hypothesis at each value of x and find the difference between the hypothesis and the data points
- Find the mean squared error (loss function)
- Find the local minimum on a gradient descent graph
- If the loss doesn’t change then the function doesn’t affect the outcome
What is bioimage informatics?
- A subfield of bioinformatics and computational biology
- Using computational techniques to analyse:
1. High throughput imaging data
2. Microscopy data for patient tissue samples, cell populations with drug treatments, molecular interactions etc. - Typical computational tools used are computer vision, machine learning, clustering etc.
Cryoelectron microscopy
- Aims to identify candidate molecules in CryoEM data
- Very noisy images
- Each image is the projection of a 3D object
- Images can be used to reconstruct the 3D structure of a protein and possibly the different states
High content screening
- Using high throughput microscopy to image cells under different conditions, e.g. genetic/chemical perturbations
- Use informatics methods to extract information from the images and compare between perturbations
Example of bioimage informatics in drug discovery
- Take and label images of cells, image the fingerprints
- Using an in vitro assay of the same drugs, work out how to map the image fingerprint to the protein activity
- Work out what the change in protein activity might be
- More than 200 targets informed, leading to boosted hit rate and diversity for drug discovery projects
Digital pathology
Automated diagnostics based on image data:
- Data from digital slides of patient samples and the associated medical metadata
- Disease diagnosis
- Prediction of therapeutic interventions
Multi-dimensional image data
Biological image datasets often use multiple markers
- Typically fluorescent reporters for a molecule of interest
- An image may contain multiple channels, each reporting on a different marker
- Each channel represents a marker for the same set of cells
Common machine-learning tasks
- Detection of an object of interest
- Classification: assignment of a label (e.g. cell) to an image
- Segmentation: assignment of a per-pixel label to an image
- Feature extraction: image restoration and enhancement
Activation functions
- Maps the input to a specific range of values
- e.g. you may wish to only allow positive activations
- Transforms the input to an output that scales monotonically with the input
- These activation function adds non-linearity to the overall function of the network
- Need to be differentiable
Convolutional Neural Net
- Convolution: matrix multiplication of image patch with a kernel followed by addition of bias
- Activation function: maps the output of the conv layer to a new range
- Pool: downsamples the output by pooling pixels
- Conv, ReLU, and Pool layers are typically stacked together to form a deep neural network
Pooling operations
- Pooling operations reduce the spatial scale
- Image size is halved after every pool
- The next layer of convolutions effectively has a larger receptive field
Define ‘loss’ in ML
- The measure of error in the task
- Loss function is the cross-entropy function