definities bio-inf Flashcards

1
Q

database

A

a collection of related data
–> data = known facts
–> implicit meaning: suggest/understood without being directly stated
–> assume computerized
–> structured with some degree of interaction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

tables in a database

A

= data types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

each table has several

A

records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

a type is described by several

A

attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

a specific attribute that makes it possible that each record can be uniquely identified

A

= key identifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

related data records are linked trough

A

foreign keys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

an alignment

A

= arrangement of sequences that show where they are similar and where they differ
–> arrangement that results in an optimal score given a subst matrix and gap penalty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

similarity

A

sequences are comparable on a set of criteria (can be mesured)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

homology

A

sequences have a commen ancestor (is or is not)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

gaps

A

= presentation form for insertions and deletions in alignments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

substituiton matrices

A

necessary for biologivally relevant calculation of the quality of the alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

PAM matrix

A

= point accepted mutations matrix

–> based on a model of prot evolution
–> provides a measure of the probability of AZ substitution based on counting effective substitutions in a large DB of evolutionary similar sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

1PAM unit

A

= evolutionary period required to produce an average of one accepted point-mutation per 100 AZ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

PAM 1 matrix

A

= matrix with probabilities for replacements during 1PAM unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

mutation score

A
  1. log (measured mutation freq/expected mutation freq)

–> neg numbers: observed less freq than expected by chance
–> pos numbers: observed more freq than expected by chanche
–> zero: observed as freq as expected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

BLOSUM

A

= blocks substitution matrix
–> BLAST uses this matrix
–> based on much lager dataset than PAM to calculate mutation freq
–> based on blocks databaset

17
Q

blocks database

A

families of prot with similar biochemical functions
–> familie members where aligned and blocks of high similarity were considered
–> within the blocks sequences with similarity higher than a treshold were clustered
–> blosum 62: treshold = 62%

18
Q

BLAST mechanism

A
  1. breaks seq down into short words using a sliding window
    –> default word size, typically 3 for prot seq
    11 for nucleotide seq
  2. for each word in the query seq, BLAST generates a list of neighboring words. Similarity is determined using a scoring matrix (=BLOSUM62) and only words that score above a certain treshold are considere neighbors
  3. BLAST searches the database for occurences of neighboring words. Efficiency –> look forr matches of short words rather than aligning the entire query
  4. extension: when a match is found in the database, BLAST attempts to extend the alignment in both directions to see if a high-scoring alignment can be formed. Local alignment in the regions surrounding the match
  5. if the extended alignment score exceeds a pre-defined threshold, it is reported as a significant match or hit. These are further analyzed and ranked using bit score/E-value..
19
Q

Raw score (S)

A

= the sum of substitution and gap scores
–> little to no meaning, is dependent on the scoring system
–> identical alignments with diff subst matrix wil yield a diff S

19
Q

Bit scores (S’)

A

= normalized raw scores

S’ = (λS - lnK)/ ln2
–> λ & K = scale parameters depending on subs matrix and gap penaltys
–> can be used to compare alignment scores from diff searches

20
Q

E-value

A

= expectation value

E = mn 2^(-S’)
–> m = lenght of the query
–> n = total lenght of sequences in the database

–> E depends on size of the dataset
–> number of diff alignments with score equivalent or better than S that are expected to occur in a database search by chanche

–> the lower the E value the more significant the score

21
Q

node degree

A

= total number of edges a node has

22
Q

node closeness

A

= how close a node is to all other nodes in the network

23
Q

node betweenness

A

if a node lies on the shortest paths between other nodes in the network –> nodes role as bridge

24
hub
= node with a very high degree
25
text mining
search literature for reference prot-prot interactions
26
orthology
predict interactions based on orthologous pairs in other species
27
domain pairs
predict interactions based on domains interacting in other prot
28
ontology
formal representation of concepts within a domain and the relationships between those concepts
29
GO structure
3 parts --> cellular component --> molecular function --> biological process molecular function term refers to a single reaction or activity sets of functions make up a biological process a gene prod may have several function or process terms
30
accession ID
unique identifier
31
alternate ID
when 2/more are identical in meaning they are merged into a single term --> all terms ID's are preserved so that no info is lost
32
ICD
= international statistical classification of diseases and related health problems --> international standard for reporting diseases/ health conditions -->keep track of safety and quality guidelines
33
GSEA
= gene set enrichment analysis --> given a list of genes found to be differentially expressed in an experiment comparing a phenotype to a control, what are the biological processes, cellular components and molecular functions that are implicated in this phenotype
34
motif
= short DNA/prot seq that is associated with biological functions
35
motif discovery
you want to look if there is a common motif in your seq
36
motif finding
you have a common motif and want to know if its in your seq
37