Exam june 2021 Flashcards
What is characterizing model parameters? Explain
Model paramters stay constant with time and consists of x(0), k, and yhat
What are characterizing model states? Explain.
Change over time, are the x1, x2 etc
Ordinary differential equations (ODEs) are often hard to solve analytically. Describe a
method/methods to instead numerically solve ODEs. Include in your answer what we need to
know about parameters and show with a real example how a computation can look like. Feel
free to use drawings.
The euler method uses the formula:
x(Δt)=x(0)+d/dt(x)(0)*Δt
start with a samll time step such as 0.1 then depending on the slope of the curve if it’s very small you can take small or big steps and the theory is that we create a tangent line that can match our theoretical model
It’s a numerical solution for ordinary differential equations
How do we evaluate if a model is a good explanation to some experimental data? What can
we do if there is a bad agreement between a model and experimental data for a specific set of
model parameters?
We start with a visual inspection, see if the model fits well with the data mean. We can then do a Chi2 square test to do a goodness of fit test to see if we can accept the model or not. H0 means there are not differences in model and data / the residuals are small. H1 means there is a difference / the residuals are large.
If there is a bad agreement for specific model parameters we can try to change these parameters to see if they are better.
What is an optimization algorithm, how is such an algorithm used and what different kinds
of algorithms are there?
An optimization algorithm is a cost function analysis to see which model fits the best / which has the lowest cost (smallest residuals).
There are global and local, global optimization searches both up and downhill and local optimization searches only downhill but is better to find the deepest curves so both are oftentimes used as the same time.
Explain the test cross-validation in relation to modeling in systems biology. What do we
test and what happens if the test leads to a rejection? What is the next step?
Cross validation is a test to analyze if we have overfitted the model to the data. H0 means it’s not overfitted and H1 means it’s overfitted. We can take away a few paramters or data points and do a new cross validation to see if it’s better.
Describe the steps of a systems biology modeling project and explain the different outcomes
and reflect around those outcomes
Based on litterature and knowledge and data and experiments we form models and equations and then perform statistical tests. First of all we do a Chi2 test to see if we can accept the model or maybe we need to change some paramters. We can do a likelihood ratio test if we have multiple models and want to find the best one or cross validation test if we think our data is overfitted. Then we can use our model to make predicrtions, base new experiments or to explain biological functions behind reactions.
Choose a biological network of choice, define what is a node in this particular
network, what interactions do exist, and what types are the underlying
interactions (motivate your answer)
Nodes: Each node in the network represents a unique protein, which may be involved in a variety of different biological processes. Nodes are typically labeled with the name or identifier of the protein they represent.
Edges: Each edge in the network represents an interaction between two proteins, which may take a variety of forms. For example, an edge may represent a physical binding interaction between two proteins, or it may represent a functional interaction in which one protein regulates the activity of another.
Underlying reactions: The interactions between proteins in the network are often based on underlying biochemical reactions, such as protein-protein binding or enzyme-substrate interactions. These reactions can be represented as edges in the network, with the nodes representing the proteins or other molecules involved in the reaction.
8b : Draw the graph of the network defined by the following adjacency matrix (1p)
Draw this
Q8: Is the network directed/weighted/signed?
Directed network: If the network is directed, then the adjacency matrix will be asymmetric.
Weighted network: If the network is weighted, then the adjacency matrix will have nonzero values that represent the strength or weight of the connections between nodes.
Q8: Is the network connected?
If a network is completely connected then all nodes have connections to each other. A network can have cliques were parts of the network are completely connected to each other. One can also calculate the clustering coefficient to determine the global transitivity (how connected the network is) based on:
It’s closed triplets / closed+ open triplets
Each closed triplet / triangle counts as tree while open triplet is one
Q8: What is the average shortest path of this network?
average shortest path = (sum of shortest path distances for all node pairs) / (total number of node pairs)
Consider the human protein-protein interaction network. (tot 5p)
a. Sketch the degree distribution. (1,5p)
The degree distribution of a protein-protein network in humans is expected to follow a power-law distribution, also known as a scale-free distribution. This means that there are a few highly connected proteins (hubs) in the network, while most proteins have relatively few connections.
In the human protein-protein interaction network: Where are we expected to find the highest fraction of disease-associated
genes, please motivate why this is likely.
The disease module hypothesis states that complex diseases are often not due to malfunctioning of a single gene but a disease module, aka a group of densely connected nodes. This means that multiple genes and pathways are affected and causes the disease.
Degree correlate with lethality, meaning that if a node has a high degree is has higher correlation to lehtality and disease asscoiated genes. This is because if the gene is used a lot and is involved in many pathways if something goes wrong it will go wrong in a lot of places causing a higher lethality and more disease asscotation.
Please compare degree with a more complex measures of centrality. What
pros and cons has the different measures in the context of identify the most
important genes.
There are a few different ways of measuring centrality such as:
Degree
Closeness
Eigenvector
Betweenness
The simplest centrality measure is the degree centrality, which is defined by the number of connections attached to each node.
In-degree represents the number of directed connections reaching a node, while out-degree represents the number of directed edges leaving a node.
Closeness centrality is the average distance of the node to all others. A central node, with high closeness, should therefore be close to all other nodes in the network in terms of their shortest path distances.
Eigenvector centrality is ranking centrality in measures of the node being linked to many other important nodes. The important nodes has high centrality to other nodes. So it’s one node that has high centrality and is connected to many other central nodes.
Betweeness centrality is measuring the number of shortest paths going through the node.