BIO6 - AI and Big Data Flashcards
1) DeepMind’s AlphaFold is known for its breakthrough in which area?
- Protein structure prediction.
2) Which data type often requires AI and big data techniques for analysis in proteomics?
-Store datasæt (sequential data)
3) How do neural networks help in predicting protein structures?
- Neural networks are a subset of machine learning and artificial intelligence (AI) that are designed to recognize patterns and make decisions based on data. Neural networks are used for a wide range of tasks like image and speech recognition, natural language processing, and autonomous decision-making.
4) Why is big data crucial for protein-protein interaction studies?
Fordi det er så komplekst, er der brug for meget data og varieret data.
Proteins rarely act alone; they form complex interaction networks, with each protein potentially interacting with numerous others. These networks can have thousands of proteins and millions of interactions.
5) Which of the following databases is commonly used for retrieving protein structures for AI modeling?
AlphaFold Protein Structure Database: Created by DeepMind, this database provides predicted protein structures using AlphaFold, an AI-based model. The database has made it easier to access predicted structures for almost every protein in the human proteome and many other organisms.
6) Which of the following best describes the main challenge of protein folding that AI aims
to tackle?
The main challenge of protein folding that AI aims to tackle is predicting the three-dimensional structure of a protein from its amino acid sequence. Proteins can fold into highly complex structures, and the specific 3D shape of a protein is crucial for its function. However, predicting how a sequence will fold is extremely challenging due to the vast number of possible configurations
7) What feature do many deep learning models in protein science leverage for sequence
pattern recognition?
Attention Mechanisms
8) Which of the following statements best describes the relationship between Artificial
Intelligence (AI), Machine Learning (ML), and Deep Learning (DL)?
9) Transfer learning, a method where pre-trained models are fine-tuned for a new task, has
been adapted in protein prediction tasks. What’s the main advantage of this approach?
The main advantage of transfer learning in protein prediction tasks is that it leverages knowledge from pre-trained models (e.g., trained on large protein datasets) to improve performance on new, related tasks with limited data. This reduces the need for extensive data and training time while providing accurate and generalized predictions.-at den er pre trained og fine tuned til nye tasks. For ved protein prediction skal der være trænet på mange sæt, men stadig kunne behandle nye input som nye input (ikke overfitted)
10) What is a potential pitfall when training AI models on protein databases that have bias
towards well-studied proteins?
AI modellen vil have samme bias, som det data den er trænet på. Garbage in Garbage out. Igen er overfitting en risiko.
11) Which of the following statements accurately differentiates between Supervised,
Unsupervised, and Reinforcement Learning?
Supervised learning involves training a model using labeled data, where the input-output pairs are known, to predict outcomes for new, unseen data. For example, predicting house prices based on historical data.
Unsupervised learning works with unlabeled data to find patterns, structures, or groupings, such as clustering customers based on their purchasing behavior.
Reinforcement learning trains an agent to make sequential decisions by interacting with an environment and receiving feedback in the form of rewards or penalties, such as teaching a robot to navigate a maze.
12) Which of the following best describes Neural Networks in the context of Machine
Learning?
Selv kan konkludere på sequential data, altså selv komme på noget nyt.
13) Which of the following best describes the phenomenon of overfitting in machine
learning models?
Overfitting er når modellen passer for godt på træningsdataen.
14) In machine learning, why is data typically split into training, validation, and test sets?
To avoid overfitting data, and ensuring low generalization error.
15) In the context of deep learning, particularly in models like Transformers, what role does
the “attention” mechanism play?
Der vælges/prioriteres hvilken information er relevant og fokusere specifikt på dette. Contextual Understanding. Self-attention. Parrallelization, processes det parrallet hvilket speeder det up.
By focusing on the most important parts of the input and capturing complex dependencies, attention mechanisms have led to significant improvements in the performance of deep learning models, especially in NLP tasks.