papers Flashcards
the format for metadata storage in arrayexpress?
MAGE-TAB format it has 2 spread sheets: Investigation Description Format (IDF) file and the Sample Data Relationship Format (SDRF) file
what is investigation description format in arrayexpress?
The IDF contains an overview of the whole experiment, including the title, the submitter’s contact details, publication information, protocols and the experimental variables.
The SDRF format in arrayexpress?
The SDRF describes all the sample characteristics (e.g. cell type) or any treatment that the sample has been subjected to (e.g. growth in low oxygen conditions), and links each sample to its corresponding data file. The structure of the SDRF, i.e. the order of the columns, reflects the experimental workflow from source material, through intermediate steps (e.g. labelling of nucleic acids, preparation of sequencing libraries, running of sequencing assays) to raw and processed data.
what are CEL files?
the raw data for microarray experments
how to specify the exp factor like cold and the organism in the arrayexpress search?
efv: cold AND organism: “Oryza sative”
what are the dyes for control and treatment in the microarray?
CY3 : control
cy5: treatment
DREB/CBF is a part of which type of TFs?
ERF
what does the green text in the linux shell imply?
the computer is ready to accept our commands
what is working directory in linux?
One important concept to understand is that the shell has a notion of a default location in which any file operations will take place. This is its working directory. If you try to create new files or directories, view existing files, or even delete them, the shell will assume you’re looking for them in the current working directory unless you take steps to specify otherwise.
what are the slashes in the folder addresses?
directory seperators
the function for creating a list in python?
list()
if no arguments are passed it would return an empty list
the code for installing packages in R?
install.packages(‘limma’)
what’s the code for converting the data into a dataframe?
x=data.frame()
what’s the main difference b/w lists and tuples?
lists are mutable tuples are not
what is a csv file
comma separated file, is a text file which contains a list of data which is separated by comma
what does tuple unpacking mean?
splitting tuple elements into individual variables
a,b=(1,2)
the code for importing matplotlib.
import matplotlib.pyplot as plt
the code for getting a metabolite by id in cobrapy?
model.metabolites.get_by_id(‘metabolte id’)
what are the types of boundary reactions and why they’re called pseudo reactions?
All of them are unbalanced pseudo
reactions, that means they fulfill a function for modeling by adding to or removing metabolites from the model
system but are not based on real biology.
an exchange, demand and sink reaction definition?
An exchange reaction is a reversible reaction that adds to or removes
an extracellular metabolite from the extracellular compartment. A demand reaction is an irreversible reaction that
consumes an intracellular metabolite. A sink is similar to an exchange but specifically for intracellular metabolites,
i.e., a reversible reaction that adds or removes an intracellular metabolite.
codes for printing out the exchange sink and demand rxns of a model.
print(“exchanges”, model.exchanges)
print(“demands”, model.demands)
print(“sinks”, model.sinks)
the code for reading excel files in python?
how to read only a single column?
import pandas as pd
df = pd.read_excel (‘C:\Users\Ron\Desktop\name of the file.xlsx’, usecols=[‘gene ID’])
how to convert a dataframe to a list? how to do it with specific values?
genes_list =genes.values.tolist()[0:9]
how to generally convert something to the list?
model_gene_list=list(rice_model.genes)
how to convert the list items to strings?
for i in list:
i=’‘.join(i or i.id)
the CAM metabolism overall definition?
In plants
performing CAM photosynthesis, the stomata open at night and
CO 2 is fixed and stored in the vacuole in the form of a carboxylic
acid such as malate, citrate, or isocitrate (Maclennan et al., 1963;
Lüttge, 1990; Gawronska and Niewiadomska, 2015; Igamberdiev
and Eprintsev, 2016). During the hot, dry daytime hours, the
stomata can remain closed to minimize water loss, and the stored CO 2 is remobilized for fixation by Rubisco in the chloroplast,
accompanied by the accumulation of storage carbohydrates.
Although this cycle is energetically expensive, it conserves pre-
cious water and is an efficient alternative to direct daytime CO 2
fixation by Rubisco as in C 3 photosynthesis
how to know where the jupyter notebooks are being saved?
on a notebook we type pwd and it shows the working directory
what does io in python mean?
import output
what is an iterable?
Iterators are objects that allow you to traverse through all the elements of a collection
another def: iterator doesn’t give all the values but one value at a time
how an iterator can be created
You can create an iterator object by applying the iter() built-in function to an iterable.
You can use an iterator to manually loop over the iterable it came from. A repeated passing of iterator to the built-in function next()returns successive items in the stream. Once, when you consumed an item from an iterator, it’s gone. When no more data are available a StopIteration exception is raised.
what’s the iterator protocol?
The iterator objects are required to support the following two methods, which together form the iterator protocol:
iterator.__iter__()
Return the iterator object itself. This is required to allow both containers (also called collections) and iterators to be used with the for and in statements.
iterator.__next__()
Return the next item from the container. If there are no more items, raise the StopIteration exception.
what is an iteration?
the act of going over a collection is called iteration. collections are like lists, tuples etc..
iterator is an object which can be used to iterate over a collection. the iter method gives the iterator and the next method is going to give us the next value using this iterator.
how to get a list of methods pertained to an object?
dir(the name of the variable the object was assigned to)
how to understand if sth is iterable?
it should have a method called __iter__
we can find it out by the dir() function
how to make a pandas dataframe?
df=pd.DataFrame(data, columns=[‘name’,’age’])
how to concatenate dataframes using pandas?
first we need to make a list of all the dataframes.
frames=[df1,df2,df3]
result=pd.concat(frames)
what is pandas.concat() function?
Concatenate pandas objects along a particular axis with optional set logic
how to have continuous indices while merging 2 dataframes?
we should add an additional argument:
df=pd.contact(data,ignore_index=True)
how to get a textual index when merging pandas dataframes and in what way it can be useful?
df=pd.concat(data, keys=[‘India’,’US’])
we can’t use this code while having the ignore_index argument.
we can call the subset of the dataframe pertaining to each key by using: df.loc[‘India’]
how to stick different pandas dataframes horizontally (adding to columns instead of rows)?
df=pd.concat(data,axis=1)
the default value is 0 which adds to the rows
how to add a new column as series to our pandas dataframe?
s=pd.Series([‘humid’,’dry’], name=’event’)
df=pd.concat([data,s],axis=1)
the different types of merging?
inner join: only takes shared values between two dataframes.
outer join: considers the whole values.
left join: takes the values of the 1st dataframe and the shared values,
right: takes the values of the 2nd dataframe and the shared values,
the code for reading an SBML model in cobrapy?
my_model=read_sbml_model(‘path to the file’)
what are different rice gene IDs?
Rice(Oryza sativa) has more than one form gene ID for the genome. The two main gene ID for rice genome are the RAP (The Rice Annotation Project, , and the MSU(The Rice Genome Annotation Project, . All RAP rice gene IDs are of the form Os##g####### as explained on the website . All MSU rice gene IDs are of the form LOC_Os##g##### as explained on the website . All SYMBOL rice gene IDs are the unique name on the NCBI(National Center for Biotechnology Information,
how to call a package in r?
library(biomaRT)
the code for getting the r version?
R.version
or in the linux terminal R –version
how to update r?
the new version of r should be installed via r-project.org and then we should choose CRAN
so the next time you open r studio, it will be working with the updated version.
the 4 arguments in the biomart search?
Attributes: the column headers that we want in our outputs.
Filters: filters are our input data
Values: are identifiers that are used along filters to limit our results
Mart: the argument for database selection. it’s the first thing we choose (we specify the database)
how to search a gene in a certain organism in NCBI?
name of the gene AND human [orgn]
how to get the annotation for a platform from GEO?
click on the platform in the table below u see the annotations and for downloading click on download full table
how to get the code for installing certain packages in r
go to the bioconductor and search the name of the package and u will find the code for installing
how to read raw cel files in r?
how to costumize the reading?
first set the working directory to the folder that r files are in then run ReadAffy() function.
in the function if we write widget=T, a new window will be opened and we can select
the image function shows what and what should we deciphere from it?
it’s the image from microarray chip and the white dots are the dots with expressed genes the black dots are not expressed genes and we should check the integrity of these dots across the chip
how to see each sample covers which range of genes? I mean which range of numbers?
we should draw a boxplot
boxplot()
the code for drawing a histogram in r
hist(data)
how a histogram shows the quality of the samples?
مثلا اگر قله همه نمودارها روی یا حول و حوش یک عدد بود یعنی کیفیت دیتای گرفته شده از سمپلها خوبه
بعد از نرمال کردن هم باید ببینیم اگر برطرف شد اوکیه اگر نشد باید حذفش کنیم
how to check the quality of the RNAs used for microarray in R? how should it be looked like?
AffyRNAdeg(data)
دنسیتی سمت 3 پریم باید از 5 پریم کمتر باشه
چون تخریب از 3 پریم به 5 پریم صورت میگیره
بهترین حالت این است که یک نمودار نزولی داشته باشیم نه حالت زیگزاگ
how to normalize the data?
normalized_data=rma(data)
2 codes for showing the normal data?
یکی بعد از باز کردن دیتای نرمال شده با rma و بخش assay data رو که میزنیم توی کنسول یه کد مینویسه که میتونیم سیوش کنیم راه دیگه استفاده از فانکشن exprs هست
what is justRMA () for?
برای وقتایی که رم دستگاه کمه و کامپیوتر نمیتونه آنالیز و انجام بده با روش آر ام ای معمولی
اینجا دیگه داده نرمال رو فقط نشون میده حالت لارج اکسپرشن ست نیست
how to see which methods exist for background correction and normalization of transcriptome data in r?
bgcorrect. methods(NO ARGUMENTS)
normalize. methods(data)
how to manually choose for normalization and bg correction methods in one line of code in r?
data=expresso(data=trans_data, widget=T)
how to know the range of colors being recognized by R?
color()
how to write a table in r? how to specify to seperate the values by tab?
write.table(x,file=’data.txt’, quote=F, sep=”\t”)
how to get rid of browse[1] in R?
type c in the console, but the function will continue running and if you press q you will exist the browser and the function both
how to paste a number in whole rows of a column in excel?
first write the num in another cell, then copy the cell then select the cells you want to have that number and then press the arrow in the paste> paste special> operation> add
the function for performing some conditions in r?
gene_up= subset(data, name of the col we wanna perform the condition >2 )
the shortcut key for renaming the files?
click on the file and press f2
how to check if a newer version for rstudio exist?
in rstudio open help> check for updates
how to change the row names of a table?
row.names(data)=x
how to bring help of a function in r?
we should click on the function and press f1
how to delete NA data from our vriable in R?
data=na.omit(data)
it removes the row containing NA data
why do we specify upper and lower bound in FBA?
These bounds enforce thermodynamic reversibility and mechanistic (max uptake and secretion rate) constraints for the rxn.
how GEM models can further elucidate
how changes in one component affect other pathways and cell phenotypes?
since these models connect genes to measur-
able cell phenotypes (e.g., growth, cell energetics, pathway
fluxes, biosynthesis of cell components, byproduct secretion,
etc.)
what are Model extraction methods (MEMs)?
Model extraction methods (MEMs) employ diverse algorithms
to extract cell-line- or tissue-specific models from a GEM
what column vector v specifies?
which contains
the unknown fluxes through each of the reactions of the S matrix.
which system is underdetermind and it means what?
system of linear equations is established by
multiplying the S matrix by a column vector, v. the product of this matrix multiplication must
equal zero, S · v = 0 (Gianchandani et al., 2009). Because the
resulting system is underdetermined (i.e., too few equations, too
many unknowns), linear programming (LP) is used to optimize
for a particular flux, Z, the objective function, subject to under-
lying constraints
how the objective function is depicted?
he objective function typically takes on the
form of:
Z = c · v
where c is a row vector of weights for each of the fluxes in col-
umn vector v, indicating how much each reaction in v contributes
to the objective function, Z
what is the task of FBA?
Thus, the task of FBA is to find a solution to v that
lies within the bounded solution space and that optimizes the
objective function at the same time.
gimme algorithm guarantees what?
guarantees to both produce a functioning
metabolic model based on gene expression levels and quan-
tify the agreement between the model and the data is called
the Gene Inactivity Moderated by Metabolism and Expression
(GIMME) algorithm