kap 16: fundamentals of bioinformatics Flashcards

Question 1

Q

Describe what a confusion matrix is

Answer

A

allows visualization of the performance of an supervised learning algorithm with a classification report on accuracy and precision, therby one can see if two classes are mislabeling one as other even if the data set is unbalanced.

Each row of the matrix represents the
instances in an actual class
each column represents the instances in a
predicted class

table contains: number of false positives, false negatives, true positives, and true negatives

Question 2

Q

What is a motif?

Answer

A

a characteristic type of spatial arrangement of atoms, in DNA and RNA is governed by the type of 2nd structure formed by a repeated nucleoside or nucleotide sequence in the primary structure.

For proteins, its a small region of 3D structure or amino acid sequence conserved among different proteins that may (or may not) be defined by a unique chemical or biological function

Question 3

Q

What is supervised/unsupervised machine learning?

Answer

A

Supervised learning algorithms are trained using labeled data. Unsupervised learning algorithms are trained using unlabeled data.

Question 4

Q

How would you evaluate the success of classification?

Answer

A

In mind of strictly data:
I would use statistics regarding likelyhood to evaluate the machine learning classification to its set of data by the use of hierarchial Bayesian, to evaluate the neural model on its uncertainty in likelyhoods with the probability distribution with a variational interference search

Question 5

Q

What is a regression problem?

Answer

A

the problem of predicting a continuous quantity of real-value output in order to approximate a mapping function from the input variable.

Question 6

Q

What is a classification problem?

Answer

A

the problem of predicting a discreet quantity of real-value output in order to approximate a mapping function from the input variables, that requires a observation to be classified as not belonging to other classes of other observations, therby giving uncertaintly in the data distribution

Question 7

Q

What are the similarities between a perceptron and a neuron

Answer

A

The perceptron is a mathematical model of a biological neuron. While in actual neurons the dendrite receives electrical signals from the axons of other neurons, in the perceptron these electrical signals are represented as numerical values.

At the synapses between the dendrite and axons, electrical signals are modulated in various amounts. This is also modeled in the perceptron by multiplying each input value by a value called the weight

The output signals are produced when the total strength of the input signals exceed a certain threshold in neurons, which is mirrored with a step function to calculate the weighted sum of inputs from the perceptron

Question 8

Q

Describe the meaning of likelihood in machine learning.

Answer

A

quantifies how good one’s machine learning model is given the results as parameters, based on a hypothesis on the mean of distribution of a set of data that’s been observed. the probability of success is varied in order to get the binomial likelihood function that depics if the given result is unlikely.

Question 9

Q

How do you evaluate the success of structural prediction?

Answer

A

motifs of sequences is derived by searching profiles to find patterns with database alignments of sequences to compute numerical scores depicting characteristic structures from a conversion matrix with the help of hierarchial clustering – a statistical method that correlates sequences to a motif.

Question 10

Q

What is an a priori/ a posteri distribution in machine learning?

Answer

A

the posterior distribution summarizes what you know after the data has been observed using parameters that have been updated with likelyhood on the data distribution

While a priori distribution is a probability distribution that expresses one’s beliefs about a quantity before some evidence is taken into account before an interferance search.