Exam3Lec4TheCancerGenomeAtlas Flashcards
What is the biggest data project and what is its purpose?
NIH and its purpose is to help us understand cancer by comparing various pt samples
Is cbioportal info proccessed or unproccessed?
LOTS of processing
it is a public websote and it cant have a whole pt genome
What is Raw Sequencing files and where can you access this?
require high cyber security. This is due to being able to identify a person, you can access on genomic data commons, requires an application
Explain this Article with Dr. B in it: Big genes are big mutation targets: a connection to cancerous spherical cells.
A gene is large, you get a lot of mutations. He compates high a low frequency (large and small genes) and found that a lot of large genes had more mutations, but alot of are common mutations, they dont actaully cause the cancer.Mutagenesis is relatively random. Therefore, large genes (Like ones that encode for cytoskeletal proteins) could have lots of mutation
Article w/Dr. B in it
What type of TCGA files were analyzed in the article?
Un proccessed
Article w/Dr. B in it
What principle or process of mutagenesis is revealed by the ratios of silent to amino acid altering mutations in Table 1 of the article?
You changed the amino acid, but was it a silenced mutation, menaing that it didnt cause the cancer
HIGH FREQ= More Passenger mutation (higher ratio of silenced mutation in amino acid change)
LOW FREQ=Mutation happening in exact right spot ( all mutationns needed to cause cancer)
Article w/Dr. B in it
What is the potential medical or clinical significance of large coding region mutations in cancer?
More commonly mutations
Article w/Dr. B in it
Table 1: Ratio of silent mutation to AA changes. What can be concluded?
You can see there are
A larger number of “High Frequency
Mutations” but in these there are more
Silent mutations
Silent mutation= Change in AA but
Did not cause cancer
article with Dr. B in it
Table 2: Ratios of high to low frequency mutation groups, for average number of mutated genes in the various gene sets. What can be concluded?
As seen for colon cancer out of 53, 9.1 were oncoproteins. the oncoprotein mutations are over-represented in the lower frequency groups for colon and lung cancer.
Mutations have a very large random componentWhen there are lots of mutations, that doesn’t necessarily mean there will be a phenotypic effect ( doesn’t cause cancer)
If you have very few mutations its going to be a higher proportion of those mutations driving the cancer
How does one obtain access to information available for the large collection of cancer samples?
Through processed and unproccessed data
What is processed data?
lots of types of data representing tumor biopsies (no normal tissue) and patient info: You get from cBioPortal and download; Excel, text files.
- PUBLIC ACCESS, download excel to look carefully
- Can see displays and recover the processed data
- Only tumor biopsy data is available; not data on normal tissue
- **Can’t get SNPs **out of processed data because the normal tissue is not available
you cant compare geneome of blood vs genome of cancer with processed data
What is unprocessed data?
reads from sequencing machines, representing DNA or RNA;** tumors and normal tissue**: can find from Genomic Data Commons
- NEED TO APPLY, controlled access and download raw data from NIH
- If looking for a splice variant, would need to look at the unprocessed data
- Original sequencing data need to see unprocessed data
- Need to be able to use code or write code to help to look through unprocessed data
you get the actual reads
Processed (or curated or annotated): What are the types of data?
- Somatic amino acid substitutions
-ONLY CODON, not the surrounding Nucleotides (that’s controlled) - Transcriptome (RNASeq values, NOT reads); RNA microarray results. (NOT NUMBER OF READS)
- Whole genome methylation of cytosines (methylome)
- Copy number variation (keep in mind N-myc in neuroblastoma and DHFR)
-WILL NOT GET SEQUENCE INFORMATION
-MYC=ONCOPROTEIN
-DHFR=Generating thymine - microRNA
- Clinical information (Tumor represented a person who smoked, age, tx, ect
YOU GET FROM http://www.cbioportal.org/
What are the limitations to processed data?
- No processed data from whole genome sequences; can’t see intron mutations (it may be interesting to see if a particular mutation affects splicing)
- Some holding back of information to protect patient security
- NIH does not provide a raw DNA sequence for cancers; however, the amino acid sequences are freely available
- Nothing in the processed data represents an original DNA or RNA sequence because that information could be used to identify a patient
Tumor supressor genes
MSI = microsatellite instability = VNTR instability (more
number of repeats due to strand slippage not being repaired by MMR) = MMR defect.
MSI refers to Measuring MMR defects
If have a mismatch repair defect, these repeats are altered during strand slippage
What does this figure represent?
Small section of the results for the cBioPortal
each bar: tumor sample
red-MSI high= MMR defect
dark green-Missene Mutation Putative driver=drives cancer
light green-Putative passenger: DOES NOT cause cancer
What does this figure represent?
A portion of an Excel file from the patient order download, showing the first several patients with MSI (microsatellite instability).
last number of 407 is the number of different pt woth MSI (so with MMR defect)
What does this figure represent?
A portion of an Excel file from cBioPortal, Colon adenocarcinoma mutations (DFCI). You can basically see what mutation pt got and compare and form hypothesis. So here you see patient number on the left side, genes on the top, and in the middle of the table mutations. You COMPARE ALL THE DATA IN AN EXCEL FILE AND TEST A SPECIFIC HYPOTHESIS
So MMI=MMR defect and you can download an excel file and see what actual mutations it caused and form a hypothesis. A large amy of ppl have this muation can it be related to a specific cancer?
Hypothesis we want to test out about OSTEOGENESIS IMPERFECTA: DO CYTOSKELETAL PROTEINS HAVE MORE TRUNCATING MUTATIONS ?If so, do the shapes differ greatly than in mismatch repair patients
Yes there are lot of truncatingmutations and it did cause cell to be a diff shape. If a truncating mutation occurs earlier and consequently would have half of the collagen protein missing, would expect the phenotypic outcome to be more deleterious
For BRAF, melanoma (SKCM TCGA legacy), a ____ tab is available with the results from cBioPortal.
survival. You can type in cancer for skin cutaneous melanoma and see cell genes mutated as well as the survival rates and download the excel sheet.
What does this screenshot represent?
cBioPortal “oncoprint”, showing mutations for BRAF only. So 52% of ppl with this cancer (melanoma) has this mutation. We can then compare their survival rate vs everyone who didnt have braf mutation survival rate.
Survival curve for BRAF mutations
Pts who do have BRAF mutations (red) do better than those who do not
Why? They have quickly growing tumors. Due to this they are more susceptible to chemo
We target rapidly dividing cells. More vulnerable to apoptosis.
the same goes for pt who have Oncoproteins mutations
Study with Dr. B study: TCGA: increased oncoprotein coding region mutations correlate with greater expression of apotosis-effector genes and a postive outcome for stomach adenocarcinoma.
The data reported in this article is a reflection of the fact that proliferation genes and apoptosis genes share transactivators (TA).
The more oncoproteins, the more activation of apoptosis genes. Thus, such cancers can be less deadly.
There is a “ fail-safe” becasue the TA that activate pro-proliferation also activate apoptosis genes (to control). So if a cell rec that its messed up, it will activate the apotisos gene and kill itselff.
- When a significant number of mutations accumulate for oncoproteins, one is bound to land on a cell apoptosis gene
- Cell that divide more quickly are more sensitive to chemotherapy
- Also, when cells carry out too much DNA replication, apoptosis is induced in the cell
the more oncoportein (more-pro proliferative protein) the more chance the cell would kill itslef= the more sensitie to chemo= the more chance of survival of pt
What is included in unprocessed data?
- The actual reads that come off of the sequencing machines, for tumor specimens: exomes WGS, RNAseq files.
-
Whole genome sequences, WGS (There is no processed version of WGS mutations for either tumor or normal tissue.(NOT PUBLIC)
* All patient-normal tissue sequences, e.g., exomes, WGS. - Normal RNASeq (transcriptome), very rare
YOU GET FROM
https://portal.gdc.cancer.gov/repository
Blood exomes are used to establish what is normal and what is mutant (compare blood exoms to cancerous tissue)