Bio 5 - M06 - AI and big data in biology Flashcards
*DeepMind’s AlphaFold is known for its breakthrough in which area?
Protein folding. It is a technology that predict protein structures very precisely
DEEPMINDS ALPHAFOLD = PREDICTING PROTEIN STRUCTURES FROM GIVEN SEQUENCES OF AMINO ACIDS
*Which data type often requires AI and big data techniques for analysis in proteomics?
- Proteomics involves a large amount of data, AI and big data-techniques are crucial to analyse these data.
- To analysis protemics two critical data types are often benefitial:
1. Mass Spectrometry (MS) data
Provides information about protein fragments based on their mass-tocharge ratios
Quantitative Proteomics Data
- To analysis protemics two critical data types are often benefitial:
MASS SPECTROMETRY DATA CAN BE ANALYSED WITH AI AND BIG DATA TECHNIQUES, TO UNDERSTAND PROTEIN COMPOSITIONS AND THEIR FUNCTIONS
*How do neural networks help in predicting protein structures?
Have the ability to learn complex patterns and relationships from large datasets. They are therefore able to predict based on these dataset.
NEURAL NETWORKS HELPS PREDICTING PROTEIN STRUCTURES BY PATTERN IDENTIFICATION OF AMINO ACID SEQUENCES THAT CORRELATE WITH SPECIFIC STRUCTURAL CONFIGURATIONS.
*Why is big data crucial for protein-protein interaction studies?
The Protein Data Bank (PDB) is a comprehensive repository that stores 3D structures of proteins, nucleic acids, and complex assemblies. Researchers and AI models often use this database to retrieve known protein structures for analysis and modeling.
BIG DATA IS CRUCIAL FOR UNDERSTANDING PROTEIN PROTEIN INTERACTIONS SINCE PROTEINS CAN HAVE MULTIPLE POTENTIAL INTERACTION PARTNERS. BIG DATA HELPS MANAGING ANALYZING AND INTERPRETING THIS VAST INTERACTION NETWORK
*Which of the following best describes the main challenge of protein folding that AI aims to tackle?
The protein folding problem concerns predicting the three-dimensional structure of a protein solely from its amino acid sequence. The 3D structure is crucial as it determines the protein’s function.
BIGGEST CHALLENGE OF PROTEIN FOLDING = PREDICTING 3D STURCTURES SOLELY FROM ITS AMINO ACID SEQUENCE. 3 D STURCTURE IS IMPORTANT SINCE IT DETERMINES THE FUNCTION OF THE PROTEIN
*What feature do many deep learning models in protein science leverage for sequence pattern recognition?
Many deep learning models in protein science leverage the primary amino acid sequence of proteins for sequence pattern recognition.
SPOT SMALL IMPORTANT PATTERNS IN DATA. FINDING SPECIFIC AMINO ACID PATTERNS THAT MATTER FOR HOW PROTEINS WORK OR ARE SHAPED
*Which of the following statements best describes the relationship between Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL)?
“Artificial Intelligence is the broader concept that encompasses Machine Learning, which in turn is a subset of AI. Deep Learning is a specific type of Machine Learning that utilizes neural networks with multiple layers to learn intricate patterns from data.”
Artificial intelligence is the overarching system. Machine learning is a subset of AI. Deep learning is a subfield of machine learning
AI–> ML—> DL
AI: BROAD FIELD AIMING TO CREATE INTELLIGIENT MACHINES
ML: SUBSET OF AI WHERE ALGORITM LEARN FROM DATA
DL: SUBSET OF ML THAT EMPLOYS DEEP NEURAL NETWORKS
*Transfer learning, a method where pre-trained models are fine-tuned for a new task, has been adapted in protein prediction tasks. What’s the main advantage of this approach?
Transfer learning, used in machine learning, is the reuse of a pre-trained model on a new problem. In transfer learning, a machine exploits the knowledge gained from a previous task to improve generalization about another.
The main advantage of using transfer learning in protein prediction tasks is that it allows leveraging knowledge gained from pre-trained models on large datasets to improve performance on new, related tasks with smaller datasets. This approach can significantly reduce the amount of labeled data needed for training, speed up the training process, and improve the accuracy of predictions, especially in cases where limited labeled data is available for the specific protein prediction task.
Q: How is transfer learning utilized in protein prediction tasks?
A: Transfer learning involves fine-tuning pre-trained models for new tasks in protein prediction, providing a significant advantage by leveraging knowledge learned from previous tasks, thus requiring less data and computational resources for training.
TRANSFER LEARNING: KNOWLEDGE FROM ONE DOMAIN OR TASK TO IMPROVE PERFORMANCE ON NEW RELATED TASK. TIME SAVED, BETTER PERFORMING.
*What is a potential pitfall when training AI models on protein databases that have bias towards well-studied proteins?
If a model is predominantly trained on well-studied proteins, it may not perform well on less-studied or rare proteins. The model’s predictions could be biased toward the properties of the more common proteins.
RISK WHEN AI MODELS ARE TRAINED ON PROTEIN DATABASES THAT HAVE BIAS TOWARD WELL STUDIED PROTEINS = NOT PERFORM WELL ON LESS-STUDIED OR RARE PROTEINS. PREDICTION COULD BE BIASED.
*Which of the following statements accurately differentiates between Supervised, Unsupervised, and Reinforcement Learning?
These are three primary learning paradigms. Supervised Learning uses labeled data, Unsupervised Learning uncovers hidden structures in unlabeled data, and Reinforcement Learning is about decision-making through interaction with environments.
Supervised learning is like teaching with answers already provided, unsupervised learning is exploring without a guide, and reinforcement learning is learning by trying and being rewarded for good actions.
SUPERVISED LEARNING: USES LABELED DATA
UNSUPERVISED LEARNING: UNCOVERS HIDDEN STRUCTURES IN UNLABELED DATA
REINFORCEMENT LEARING: DECISION MAING THROUGH INTERACTION WITH ENVIRONMENTS.
*Which of the following best describes Neural Networks in the context of Machine Learning?
A neural network is a method in artificial intelligence that teaches computers to process data in a way that is inspired by the human brain. It is a type of machine learning process, called deep learning, that uses interconnected nodes or neurons in a layered structure that resembles the human brain.
NEURAL NETWORKS = COMPUTATIONAL MODELS INSPIRED BY HUMAN BRAINS NETWORK OF NEURONS. DESIGNED TO DETECT PATTERNS AND RELATIONSHIPS IN DATA.
*Which of the following best describes the phenomenon of overfitting in machine learning models?
Overfitting occurs when a model learns the training data too well, including its noise and outliers. This compromises its ability to generalize to new, unseen data.
OVERFITTING: MODEL LEARNS THE TRAINING TOO WELL INCLUDING ITS NOISE AND OUTLIERS.
*In machine learning, why is data typically split into training, validation, and test sets?
The training set is used to train the model, the validation set is used to tune hyperparameters and prevent overfitting, and the test set provides an unbiased evaluation of the model’s performance.
TRAINING: TRAIN THE MODEL
VALIDATION: TUNE HYPERPARAMETERS AND PREVENT OVERFITTING
TEST SETS: UNBIASED EVALUATION OF THE MODELS PERFORMANCE
*In the context of deep learning, particularly in models like Transformers, what role does the “attention” mechanism play?
The attention mechanism lets models weigh different parts of the input differently, giving “attention” to more relevant parts, especially in tasks like sequence-to-sequence prediction, which Transformers excel at
ATTENTION MECHANISM: WEIGHT DIFFERENT PARTS OF THE INPUT DIFFERENTLY. ‘EXAMPLE: TRANSFORMER MODEL (CHAT GPT)
*In the context of deep learning, how does the “denoising diffusion” model operate?
The “denoising diffusion” model cleans up images by gradually removing noise, starting with a noisy version of the image and refining it step by step until it’s clear.
The “denoising diffusion” model is used to improve the quality of images by removing noise, which can enhance the performance of various computer vision tasks such as image recognition or segmentation.
DENOISING DIFFUSION MODEL: ADDING NOISE TO DATA PROGRESSIVELY OR THE OPPOSITE. IMAGE GENERATING HIGH QUALITY