definities bio-inf Flashcards by Frauke Bradt

database

a collection of related data
–> data = known facts
–> implicit meaning: suggest/understood without being directly stated
–> assume computerized
–> structured with some degree of interaction

How well did you know this?

Not at all

Perfectly

tables in a database

= data types

How well did you know this?

Not at all

Perfectly

each table has several

records

How well did you know this?

Not at all

Perfectly

a type is described by several

attributes

How well did you know this?

Not at all

Perfectly

a specific attribute that makes it possible that each record can be uniquely identified

= key identifier

How well did you know this?

Not at all

Perfectly

related data records are linked trough

foreign keys

How well did you know this?

Not at all

Perfectly

an alignment

= arrangement of sequences that show where they are similar and where they differ
–> arrangement that results in an optimal score given a subst matrix and gap penalty

How well did you know this?

Not at all

Perfectly

similarity

sequences are comparable on a set of criteria (can be mesured)

How well did you know this?

Not at all

Perfectly

homology

sequences have a commen ancestor (is or is not)

How well did you know this?

Not at all

Perfectly

gaps

= presentation form for insertions and deletions in alignments

How well did you know this?

Not at all

Perfectly

substituiton matrices

necessary for biologivally relevant calculation of the quality of the alignment

How well did you know this?

Not at all

Perfectly

PAM matrix

= point accepted mutations matrix

–> based on a model of prot evolution
–> provides a measure of the probability of AZ substitution based on counting effective substitutions in a large DB of evolutionary similar sequences

How well did you know this?

Not at all

Perfectly

1PAM unit

= evolutionary period required to produce an average of one accepted point-mutation per 100 AZ

How well did you know this?

Not at all

Perfectly

PAM 1 matrix

= matrix with probabilities for replacements during 1PAM unit

How well did you know this?

Not at all

Perfectly

mutation score

log (measured mutation freq/expected mutation freq)

–> neg numbers: observed less freq than expected by chance
–> pos numbers: observed more freq than expected by chanche
–> zero: observed as freq as expected

How well did you know this?

Not at all

Perfectly

BLOSUM

= blocks substitution matrix
–> BLAST uses this matrix
–> based on much lager dataset than PAM to calculate mutation freq
–> based on blocks databaset

blocks database

families of prot with similar biochemical functions
–> familie members where aligned and blocks of high similarity were considered
–> within the blocks sequences with similarity higher than a treshold were clustered
–> blosum 62: treshold = 62%

BLAST mechanism

breaks seq down into short words using a sliding window
–> default word size, typically 3 for prot seq
11 for nucleotide seq
for each word in the query seq, BLAST generates a list of neighboring words. Similarity is determined using a scoring matrix (=BLOSUM62) and only words that score above a certain treshold are considere neighbors
BLAST searches the database for occurences of neighboring words. Efficiency –> look forr matches of short words rather than aligning the entire query
extension: when a match is found in the database, BLAST attempts to extend the alignment in both directions to see if a high-scoring alignment can be formed. Local alignment in the regions surrounding the match
if the extended alignment score exceeds a pre-defined threshold, it is reported as a significant match or hit. These are further analyzed and ranked using bit score/E-value..

Raw score (S)

= the sum of substitution and gap scores
–> little to no meaning, is dependent on the scoring system
–> identical alignments with diff subst matrix wil yield a diff S

Bit scores (S’)

= normalized raw scores

S’ = (λS - lnK)/ ln2
–> λ & K = scale parameters depending on subs matrix and gap penaltys
–> can be used to compare alignment scores from diff searches

E-value

= expectation value

E = mn 2^(-S’)
–> m = lenght of the query
–> n = total lenght of sequences in the database

–> E depends on size of the dataset
–> number of diff alignments with score equivalent or better than S that are expected to occur in a database search by chanche

–> the lower the E value the more significant the score

node degree

= total number of edges a node has

node closeness

= how close a node is to all other nodes in the network

node betweenness

if a node lies on the shortest paths between other nodes in the network –> nodes role as bridge

hub

= node with a very high degree

text mining

search literature for reference prot-prot interactions

orthology

predict interactions based on orthologous pairs in other species

domain pairs

predict interactions based on domains interacting in other prot

ontology

formal representation of concepts within a domain and the relationships between those concepts

GO structure

3 parts --> cellular component --> molecular function --> biological process molecular function term refers to a single reaction or activity sets of functions make up a biological process a gene prod may have several function or process terms

accession ID

unique identifier

alternate ID

when 2/more are identical in meaning they are merged into a single term --> all terms ID's are preserved so that no info is lost

ICD

= international statistical classification of diseases and related health problems --> international standard for reporting diseases/ health conditions -->keep track of safety and quality guidelines

GSEA

= gene set enrichment analysis --> given a list of genes found to be differentially expressed in an experiment comparing a phenotype to a control, what are the biological processes, cellular components and molecular functions that are implicated in this phenotype

motif

= short DNA/prot seq that is associated with biological functions

motif discovery

you want to look if there is a common motif in your seq

motif finding

you have a common motif and want to know if its in your seq