Exam june 2021 Flashcards

Question 1

Q

What is characterizing model parameters? Explain

Answer

A

Model paramters stay constant with time and consists of x(0), k, and yhat

Question 2

Q

What are characterizing model states? Explain.

Answer

A

Change over time, are the x1, x2 etc

Question 3

Q

Ordinary differential equations (ODEs) are often hard to solve analytically. Describe a
method/methods to instead numerically solve ODEs. Include in your answer what we need to
know about parameters and show with a real example how a computation can look like. Feel
free to use drawings.

Answer

A

The euler method uses the formula:
x(Δt)=x(0)+d/dt(x)(0)*Δt

start with a samll time step such as 0.1 then depending on the slope of the curve if it’s very small you can take small or big steps and the theory is that we create a tangent line that can match our theoretical model

It’s a numerical solution for ordinary differential equations

Question 4

Q

How do we evaluate if a model is a good explanation to some experimental data? What can
we do if there is a bad agreement between a model and experimental data for a specific set of
model parameters?

Answer

A

We start with a visual inspection, see if the model fits well with the data mean. We can then do a Chi2 square test to do a goodness of fit test to see if we can accept the model or not. H0 means there are not differences in model and data / the residuals are small. H1 means there is a difference / the residuals are large.

If there is a bad agreement for specific model parameters we can try to change these parameters to see if they are better.

Question 5

Q

What is an optimization algorithm, how is such an algorithm used and what different kinds
of algorithms are there?

Answer

A

An optimization algorithm is a cost function analysis to see which model fits the best / which has the lowest cost (smallest residuals).

There are global and local, global optimization searches both up and downhill and local optimization searches only downhill but is better to find the deepest curves so both are oftentimes used as the same time.

Question 6

Q

Explain the test cross-validation in relation to modeling in systems biology. What do we
test and what happens if the test leads to a rejection? What is the next step?

Answer

A

Cross validation is a test to analyze if we have overfitted the model to the data. H0 means it’s not overfitted and H1 means it’s overfitted. We can take away a few paramters or data points and do a new cross validation to see if it’s better.

Question 7

Q

Describe the steps of a systems biology modeling project and explain the different outcomes
and reflect around those outcomes

Answer

A

Based on litterature and knowledge and data and experiments we form models and equations and then perform statistical tests. First of all we do a Chi2 test to see if we can accept the model or maybe we need to change some paramters. We can do a likelihood ratio test if we have multiple models and want to find the best one or cross validation test if we think our data is overfitted. Then we can use our model to make predicrtions, base new experiments or to explain biological functions behind reactions.

Question 8

Q

Choose a biological network of choice, define what is a node in this particular
network, what interactions do exist, and what types are the underlying
interactions (motivate your answer)

Answer

A

Nodes: Each node in the network represents a unique protein, which may be involved in a variety of different biological processes. Nodes are typically labeled with the name or identifier of the protein they represent.

Edges: Each edge in the network represents an interaction between two proteins, which may take a variety of forms. For example, an edge may represent a physical binding interaction between two proteins, or it may represent a functional interaction in which one protein regulates the activity of another.

Underlying reactions: The interactions between proteins in the network are often based on underlying biochemical reactions, such as protein-protein binding or enzyme-substrate interactions. These reactions can be represented as edges in the network, with the nodes representing the proteins or other molecules involved in the reaction.

Question 9

Q

8b : Draw the graph of the network defined by the following adjacency matrix (1p)

Answer

A

Draw this

Question 10

Q

Q8: Is the network directed/weighted/signed?

Answer

A

Directed network: If the network is directed, then the adjacency matrix will be asymmetric.

Weighted network: If the network is weighted, then the adjacency matrix will have nonzero values that represent the strength or weight of the connections between nodes.

Question 11

Q

Q8: Is the network connected?

Answer

A

If a network is completely connected then all nodes have connections to each other. A network can have cliques were parts of the network are completely connected to each other. One can also calculate the clustering coefficient to determine the global transitivity (how connected the network is) based on:

It’s closed triplets / closed+ open triplets

Each closed triplet / triangle counts as tree while open triplet is one

Question 12

Q

Q8: What is the average shortest path of this network?

Answer

A

average shortest path = (sum of shortest path distances for all node pairs) / (total number of node pairs)

Question 13

Q

Consider the human protein-protein interaction network. (tot 5p)
a. Sketch the degree distribution. (1,5p)

Answer

A

The degree distribution of a protein-protein network in humans is expected to follow a power-law distribution, also known as a scale-free distribution. This means that there are a few highly connected proteins (hubs) in the network, while most proteins have relatively few connections.

Question 14

Q

In the human protein-protein interaction network: Where are we expected to find the highest fraction of disease-associated
genes, please motivate why this is likely.

Answer

A

The disease module hypothesis states that complex diseases are often not due to malfunctioning of a single gene but a disease module, aka a group of densely connected nodes. This means that multiple genes and pathways are affected and causes the disease.

Degree correlate with lethality, meaning that if a node has a high degree is has higher correlation to lehtality and disease asscoiated genes. This is because if the gene is used a lot and is involved in many pathways if something goes wrong it will go wrong in a lot of places causing a higher lethality and more disease asscotation.

Question 15

Q

Please compare degree with a more complex measures of centrality. What
pros and cons has the different measures in the context of identify the most
important genes.

Answer

A

There are a few different ways of measuring centrality such as:
Degree
Closeness
Eigenvector
Betweenness

The simplest centrality measure is the degree centrality, which is defined by the number of connections attached to each node.

In-degree represents the number of directed connections reaching a node, while out-degree represents the number of directed edges leaving a node.

Closeness centrality is the average distance of the node to all others. A central node, with high closeness, should therefore be close to all other nodes in the network in terms of their shortest path distances.

Eigenvector centrality is ranking centrality in measures of the node being linked to many other important nodes. The important nodes has high centrality to other nodes. So it’s one node that has high centrality and is connected to many other central nodes.

Betweeness centrality is measuring the number of shortest paths going through the node.

Question 16

Q

Disease modules (tot 6p)
a. What do you expect a network community in the context of protein interaction
networks do correspond to? Are there any evolutionary reasons
for network communities?

Answer

A

A network community is a set of nodes that are densely connected to each other but sparsely connected to the rest of the network. Nodes in the same network are often involved in the same pathways, regulatory mechanisms, or other biological processes. For example protein interaction networks could be:

Signaling pathways: Proteins can interact with each other to transmit signals within a cell or between cells. These signaling pathways are critical for many cellular processes, including cell growth, differentiation, and apoptosis.

Metabolic pathways: Proteins can also interact with each other to catalyze biochemical reactions that are involved in metabolic pathways. These pathways are responsible for the breakdown and synthesis of molecules that are essential for cell function, such as carbohydrates, lipids, and amino acids.

Transcriptional regulation: Proteins can interact with DNA to regulate gene expression. This can involve direct interactions between transcription factors and DNA, as well as indirect interactions through intermediary proteins.

Question 17

Q

Describe as detailed as possible the different steps in one algorithm
frequently used to identify disease modules.

Answer

A

here are different options, a clique based algorithm is MCODE for example.

MCODE (Molecular Complex Detection) is a clique-based algorithm designed to identify densely connected subgraphs (modules) in protein-protein interaction networks. The algorithm works by scoring each node in the network based on its local connectivity and then recursively expanding highly scored nodes into a dense subgraph.

The algorithm consists of the following steps:

Node Scoring: The algorithm assigns a local score to each node in the network based on its degree and the degree of its immediate neighbors. The score is calculated as the sum of the product of the degrees of each node in a given node’s neighborhood. The higher the score, the more likely the node is to be part of a densely connected module.

Seed Selection: The algorithm selects the highest-scoring node as a seed node and expands it into a module by including all its first neighbors with a score greater than a pre-defined cutoff.

Module Expansion: The algorithm continues to expand the module by adding neighboring nodes that meet a specified score cutoff until no more nodes can be added without decreasing the overall score of the module.

Module Scoring: The algorithm calculates a score for each module based on the sum of the scores of its nodes.

Output: The algorithm outputs all modules with a score above a predefined cutoff.

MCODE is a powerful algorithm for identifying biologically relevant subgraphs in protein-protein interaction networks, and it has been successfully applied to a variety of biological systems, including cancer and infectious diseases.