Data analysis Flashcards

Question

What is the most appropriate test used to analyse data?

Answer 1

Pairwise analysis using t-tests or ANOVA is the most appropriate

Answer 2

Determining the fold up or down cutoffs to figure out what is truly significant

Answer 3

Ranking genes according to the evidence of difference in gene expression Score the differences using fold changes, t-statistics or a combination

Answer 4

Heat maps Volcano plots

Answer 5

Determines how deleterious the mutation is Smaller the p-value, the more deleterious

Answer 6

The magnitude and direction of the fold-change

Answer 7

A law that describes how the computer power doubles every two year Technologies which keep up the pace with Moore's law are cutting edge

Answer 8

No The computer power is weaker than the sequencing power We have the sequence, but don't know how to interpret it

Answer 9

Genomic footprinting Epigenetic profiling Whole genome sequencing

Answer 10

Transcriptome RNA footprinting Transcriptome expression profiling

Answer 11

Blood exome

Answer 12

Looking at individual tissue types with certain diseases to infer the function of mutated genes Since post-translational mutations in RNA is not present in DNA

Answer 13

To integrate and interpret all the information together To look for the clinical application of data

Answer 14

Primary analysis Secondary analysis Tertiary analysis

Answer 15

Determine the run/sample quality through looking at the quality values of the visual information (colours)

Answer 16

Determine the sample /information quality through aligning the sequences

Answer 17

Data interpretation

Answer 18

Top line = position of the flow cell and the read it comes from Second line = sequence itself Third line = positive or negative read At the bottom = uality score

Answer 19

How confident the software is that this is the specific base

Answer 20

Count number Number of sequences that line up the reference

Answer 21

Numerical values

Answer 22

Count number Counting the number of reads that map to each gene using programs

Answer 23

Text-based formats for storing both a biological sequence and its corresponding quality scores

Answer 24

Character showing the sequence letter and quality score

Answer 25

1. Base calling 2. Variant calling 3. Annotation 4. Filtering 5. Reporting

Answer 26

Aligning the sequence to the reference to compare

Answer 27

QC alignment Alignment

Answer 28

Percentage of reads properly or uniquely mapped Among the mapped reads, determine the percentage or reads in the exon, intron and intergenic regions 5' or 3' bias

Answer 29

Intergrated genome viewer A software that allows us to visualise the reads on the genome, highlighting potential variants

Answer 30

Expression levels in IGV

Answer 31

The ratio of the commonality of alterations in the host cell If it is a SNP = 50/50 If it is a mutation = less frequent

Answer 32

The gene expression levels The length of the gene

Answer 33

Longer genes have naturally more reads, so the read number does not always reflect the expression rate

Answer 34

RPKM Reads per kilobase per million mapped reads

Answer 35

Counts of mapped fragments / total mapped fragments (million) X exon length of transcripts (KB)

Answer 36

Fold changes in protein expression If the gene is expressed at higher levels than the protein = splice variant

Answer 37

Rather than looking at the level of expression, we look art somatic mutations

Answer 38

Allele counting Probabilistic methods - uses a bayesian model to statistically quantify the number of allelic variants Heurisic approach - based on thresholds

Answer 39

Looks at the number of reads to determine whether there is a variant or not Heuristic approach

Answer 40

If tumour and normal match the reference = reference If tumour and normal do not match the reference = germline

Answer 41

Calculate the significance of allele frequency difference by Fisher's exact test If the difference is significant - if normal matches reference = somatic - if normal is heterozygous = LOH - if normal and tumour are both variants and different = unknown If the difference is not significant - combine the tumour and normal read counts for each allele, recalculate p-value and call germline

Answer 42

Deletion Novel sequence insertion Mobile-element insertion Interspersed duplication Tandem duplication Inversion Translocation

Answer 43

SeattleSeq Oncotator Annovar

Answer 44

SNVs Small indels Both common and novel

Answer 45

Human genomic point mutations and indels with relevant data to cancer researchers

Answer 46

Genetic variants Detected from diverse genomes

Answer 47

Annotates the SNPs and informs as to how likely they are to be deleterious

Data analysis Flashcards

(71 cards)