Article Example
Expressed sequence tag dbEST is a division of Genbank established in 1992. As for GenBank, data in dbEST is directly submitted by laboratories worldwide and is not curated.
Cancer Genome Anatomy Project An early technique used by CGAP is digital differential display (DDD), which uses the Fisher exact test to compare libraries against each other, in order to find a significant difference between populations. CGAP ensured that DDD was able to compare between all cDNA libraries in dbEST, and not just those which were generated by CGAP.
Cancer Genome Anatomy Project CGAP's initial goal was to establish a Tumor Gene Index (TGI) to store the expression profiles. This would have contributions to both new and existing databases. This contributed to two types of libraries, the dbEST and later dbSAGE. This was performed in a series of steps:
Expressed sequence tag High-throughput analyses of ESTs often encounter similar data management challenges. A first challenge is that tissue provenance of EST libraries is described in plain English in dbEST. This makes it difficult to write programs that can unambiguously determine that two EST libraries were sequenced from the same tissue. Similarly, disease conditions for the tissue are not annotated in a computationally friendly manner. For instance, cancer origin of a library is often mixed with the tissue name (e.g., the tissue name "glioblastoma" indicates that the EST library was sequenced from brain tissue and the disease condition is cancer). With the notable exception of cancer, the disease condition is often not recorded in dbEST entries. The TissueInfo project was started in 2000 to help with these challenges. The project provides curated data (updated daily) to disambiguate tissue origin and disease state (cancer/non cancer), offers a tissue ontology that links tissues and organs by "is part of" relationships (i.e., formalizes knowledge that hypothalamus is part of brain, and that brain is part of the central nervous system) and distributes open-source software for linking transcript annotations from sequenced genomes to tissue expression profiles calculated with data in dbEST.
Cancer Genome Anatomy Project The sequencing of cDNA will produce the entire mRNA transcript that generated it. Practically, only part of the sequence is required to uniquely identify the mRNA or protein associated. The resultant part of the sequence was termed the expressed sequence tag (EST) and is always at the end of the sequence close to the poly A tail. EST data are stored in a database called dbEST. ESTs only need to be around 400 bases long, but with NGS sequencing techniques this will still produce low quality reads. Therefore, an improved method called serial analysis of gene expression (SAGE) is also used. This method identifies, for each cDNA transcript molecule produced from a cell's gene expression, regions only 10-14 bases long anywhere along the read sequence, sufficient to uniquely identify that cDNA transcript. These bases are cut out and linked together, then incorporated into bacterial plasmids as mentioned above. SAGE libraries have better read quality and generate a larger amount of data when sequenced, and since transcripts are compared in absolute rather than relative levels, SAGE has the advantage of requiring no normalisation of data via comparison with a reference.