Bioinformatics Flashcards

1
Q

hva er en tool innenfor bioinformatics?

A

et program som gjør en spesifikk oppgave eks: sam, bam eller fastq

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

hvordan er fastq-formate oppbygd?

A

Each sequence represented with four lines:
* Line 1 always begins with a ‘@’ character and is followed by a
sequence identifier and various information from the sequencing.
(This information is optional and vary between datasets)
* Line 2 is the raw sequence letters.
* Line 3 begins with a ‘+’ character and is optionally followed by the
same sequence identifier (and any description) again.
* Line 4 encodes the quality values for the sequence in Line 2, and
must contain the same number of symbols as letters in the
sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

hvilke problemer kan man støte på i transkriptom alignment?

A

Har man exoner som spleiser sammen i mrna får du sekvenser som ikke er i genomet, ender opp med at man ikke finner alignment i kromosomet og man for gaps i alignmenten.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

hvordan kan man unngå gaps i alignmente av sekvenser?

A

Begynner å aligne en sekvens, der det ikke er alignemt vil den stoppe og dele alignmenten i 2 slik at den kan gå og finne den andre delen et annet sted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

hvorfor indexerer man et genom?

A

Indexing a genome can be explained similar to indexing a book. If you want to know on which page a certain word appears or a chapter begins, it is much more efficient/faster to look it up in a pre-built index than going through every page of the book until you found it. Same goes for alignments. Indices allow the aligner to narrow down the potential origin of a query sequence within the genome, saving both time and memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

hvordan gjør man Sequence alignment to reference?

A
  • Align to reference genome
    (DNA or RNA) or
    transcriptome (RNA)
  • Find out where your
    sequences match the
    reference.
  • Analyse the genomic regions
    where sequences
    accummulate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

hva er featurecounts?

A

featureCounts:
* A software program developed for counting reads to genomic features
such as genes, exons promoters and genomic bins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

hva er viktig å huske på ved Experimental design

A
  • Typical experiment has two conditions:
    Control and Experimental (for example
    samples from healthy and diseased
    individuals)
  • Technical replicate: Same biological
    sample in different runs
  • Biological replicate: Sample from
    different biological source (for example
    different patient) in different runs
  • Due to the good technical
    reproducability in RNA-Seq, biological
    replicates are more important than
    technical
  • At least three biological replicates is
    recommended for proper statistical
    testing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

hva er borrow varians?

A
  • Use the Negative Binomial Distribution
    (NBD)
  • Problem: In the NBD the variance
    cannot be directly estimated from the
    mean
  • Trick: Use the variance of other
    features in the dataset with similar
    expression level to estimate the
    variance for each feature
  • Borrow variance from similar features
  • Solution to the overdispersion problem
    for datasets with few replicates
  • Leads to more robust estimates of
    significance. Fewer false positives.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

hva er PCA?

A

Multi-dimensional experimental design
Find the directions with most variation in the data (PC – Principal Components)
Transfrom data to plane (axes) defined by Principal Components (PC1 and PC2)
Map of your data and the relations between samples and variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

hva er en GSEA test?

A

Single sample GSEA (ssGSEA): GSEA
performed on each individual sample in a
dataset
* Rank genes in each sample according to
expression value (highest first)
* Positive score: Geneset genes enriched at the
top of ranked list
* Negative score: Geneset genes enriched at the
bottom of ranked list
* A positive score indicates that the sample has
the property the geneset represent
* Here: Enriched for Non-Canonical Wnt-pathway

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

hva er chipseq konseptet?

A
  • Crosslink cells using formaldehyde (”freeze” the
    protein-DNA interactions)
  • Fragment DNA (200-500bp fragment length)
  • Use antibody towards TF of interest to ”fish” for
    DNA-fragments bound by the TF
  • Wash of the proteins and isolate the DNA fragments
  • Sequence the DNA fragments.
  • Single end sequencing
  • 75-150 bp current standard
    ChIP-seq profiles
  • Library of tags with constant lengths
  • Typical number of tags in current studies: 5-50m
  • Sequenced tags are aligned to reference genome
  • There will be enrichment of tags (peaks) at genomic positions where the DNA was bound by the TF (transcription factors) of interest.
  • Can generate 100 to 50 000 peaks depending on factor and experiment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly