|
应用XML格式文件搭建质谱数据分析平台 |
|
The analysis of tandem mass (MS/MS) data to identify and quantify proteins is hampered by the heterogeneity of file formats at the raw spectral data, peptide identification, and protein identification levels. Different mass spectrometers output their raw spectral data in a variety of proprietary formats, and alternative methods that assign peptides to MS/MS spectra and infer protein identifications from those peptide assignments each write their results in different formats. Here we describe an MS/MS analysis platform, the Trans-Proteomic Pipeline, which makes use of open XML file formats for storage of data at the raw spectral data, peptide, and protein levels. This platform enables uniform analysis and exchange of MS/MS data generated from a variety of different instruments, and assigned peptides using a variety of different database search programs. We demonstrate this by applying the pipeline to data sets generated by ThermoFinnigan LCQ, ABI 4700 MALDI-TOF/TOF, andWaters Q-TOF instruments, and searched in turn using SEQUEST, Mascot, and COMET.
质谱数据的分析大致有三个步骤:原始数据分析,确定肽段,确定蛋白。不同的质谱仪器,不同的分析软件在这三个阶段产生了各种格式不同的文件,成为质谱数据分析过程中的一个瓶颈。基于这种现状,A Keller等人开发了一种MS数据分析平台,将XML格式的文件应用在上述三个层次的分析上,极大的提高了质谱数据分析的效率。该平台首先将不同质谱仪器产生的原始数据转化成称为mzXML格式的文件,用于下一步的肽段数据库搜索等分析,因此相关的软件如SEQUEST, COMET,Mascot等可以不必再考虑不同的数据源的问题。这一步产生的文件再被转化为pepXML格式的文件,用于后续的蛋白质水平的分析。最后以protXML的格式存储蛋白信息。作者使用相关的数据对该平台进行测试,证明其可以应用于ThermoFinnigan LCO, ABI 4700 MALDI-TOF/TOF, 和 Waters Q-TOF等质谱仪,以及SEQUEST, Mascot 和COMET等分析软件。
|
|
|
|
1999-2005 中国科学院上海生命科学研究院生物信息中心 |