|
|
Download MS-CanProVar Data and Tools
MS-CanProVar is a protein sequence database that includes variation information to facilitate peptide variant detection in shotgun proteomics. The database is available in two formats. The .peff format follows the proteomics sequence database standard defined by HUPO. There are a few limitations of the current peff format. First, variations are indicated in the header line, not in the actual sequence. Therefore, existing search engines cannot use this information directly. Secondly, in the current version of the obo file, there is only one term "Variant" (DB:0001011) for describing variations. As a result, it is not possible to distinguish SNPs from cancer-related mutations. In the .fasta format, each variant peptide is included as an independent entry; variations are annotated in the header line; variations are labeled as "rs" for SNPs and "cs" for cancer-related mutations. We also included reverse sequences and contaminant sequences in the .fasta file. Therefore, it can be used directly with different search engines. Tools for analyzing the search results are implemented in Perl. Please refer to A bioinformatics workflow for variant peptide detection in shotgun proteomics. Li et al., MCP, 2011 for details about the MS-CanProVar database and associated tools. The current version of MS-CanProVar is based on Ensembl V53.
Download the tarball containing the database files and related tools.
|
|