Human Cancer Proteome Variation Database

Download CanProVar Data

CanProVar provides the download of human protein database in the fasta format, in which variation information is recorded in the header line of each sequence. The current version is based on Ensembl V54.

The README file explains the contents of the following files:

Validated dbSNP_nsSNPs dbSNP_validated_nsSNP_protein
Cancer related_nsSNPs cancer_nsSNP_protein
Both all_nsSNP_protein

Download MS-CanProVar Data and Tools

MS-CanProVar is a protein sequence database that includes variation information to facilitate peptide variant detection in shotgun proteomics. The database is available in two formats. The .peff format follows the proteomics sequence database standard defined by HUPO. There are a few limitations of the current peff format. First, variations are indicated in the header line, not in the actual sequence. Therefore, existing search engines cannot use this information directly. Secondly, in the current version of the obo file, there is only one term "Variant" (DB:0001011) for describing variations. As a result, it is not possible to distinguish SNPs from cancer-related mutations. In the .fasta format, each variant peptide is included as an independent entry; variations are annotated in the header line; variations are labeled as "rs" for SNPs and "cs" for cancer-related mutations. We also included reverse sequences and contaminant sequences in the .fasta file. Therefore, it can be used directly with different search engines. Tools for analyzing the search results are implemented in Perl. Please refer to A bioinformatics workflow for variant peptide detection in shotgun proteomics. Li et al., MCP, 2011 for details about the MS-CanProVar database and associated tools. The current version of MS-CanProVar is based on Ensembl V53.

Download the tarball containing the database files and related tools.

CanProVar is currently developed and maintained by Dexter Duncan and Bing Zhang at the Zhang Lab . The project was initiated by Jing Li and Bing Zhang in 2009.
Funding credits: National Institutes of Health (NIH)/National Cancer Institute (R01 CA126218); NIH/National Institute of General Medical Sciences (GM088822).

©2009 Jing Li, Bing Zhang