Biological Databases

The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation.

Human-related databases:

Decoding the human genome bears great significance in, from a theoretical view, unveiling human evolutionary history, and from an application view, exploring personalized medicine against diverse diseases. Considering the heterogeneity in data type, scope and curation, biological databases can be classified into multiple categories under different criteria as presented above, making it easier for people to effectively characterize databases and identify the database(s) of interest. However, some databases are inaccessible over time or poorly maintained/updated or even never used. In this study, therefore, we assemble a collection of human-related databases that are widely used and currently accessible via the Internet. As database classification based on data type is informative and straightforward, we assign one major category to each database, albeit one database may correspond to multiple categories. In what follows, we focus on databases categorized in DNA, RNA, protein, expression, pathway and disease, respectively.

Disease databases:

There are at least 200 forms of cancer in the world, causing 14.6% of all human deaths (http://en.wikipedia.org/wiki/Cancer). Thus, obtaining complete cancer genomes and identifying molecular mutations and abnormal genes can provide new insights for cancer prevention, detection, and eventually, personalized treatment. Toward this end, there are two well-known cancer projects, viz., The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC). TCGA, founded in 2006 by the National Cancer Institute and National Human Genome Research Institute at the National Institutes of Health, aims to collect a wide diversity of omics data (including exome, SNP, mRNA, miRNA, and methylation) for more than 20 different types of human cancer (http://cancergenome.nih.gov). Unlike TCGA, ICGC is a voluntary collaborative organization initiated in 2008 and open to all cancer and genomic researchers in the world. It aims to obtain a comprehensive description of genomic, transcriptomic, and epigenomic changes in 50 different tumor types and/or subtypes, which are of clinical and societal importance across the globe (http://icgc.org.)