Figure 1.A new member of Coronavirus family

SARS冠状病毒基因组初步分析

张其鹏,石磊,芮伟,卢铭
(北京大学医学部生物信息组 100083)

背景:SARS冠状病毒在种属分类上属于“ssRNA positive-strand viruses”家系的“Nidovirales”族中的“Coronaviridae”系(见Taxonomy)。它是冠状病毒家族中新出现的一个子类(Figure 1)。全长29,736bp,已知有11个编码序列(cds),而其中的一个cds(putative orf1ab polyprotein)与鼠类的肝炎病毒(murine hepatitis virus)结构类似,依据鼠类的肝炎病毒的结构模式,推断出该段cds应该编码了14个蛋白质(Figure2Table1)。

目的:通过运用生物信息学的方法,我们对SARS冠状病毒基因组序列进行了全序列分析,根据其结构上的有效信号对SARS冠状病毒结构和功能上的特点做出初步的解释。

方法与结果:运用ClustalW对目前已知的所有11个SARS冠状病毒的基因组序列进行多序列比对分析,其结果作为组内对照,详见结果-1。再将SARS冠状病毒的基因组序列与其他冠状病毒的基因组序列进行多序列比对,其结果作为组间对照,详见结果-2。结果发现组内各序列间同源性很高(align score>89),而组间序列同源性低(<28),两组存在显著差异(p<0.01)。在此基础上对11个可能的蛋白质运用blastp在GenBank进行全序列比对,根据比对结果,对蛋白的结构和功能做出初步的分析。其结果见Table2

结论:根据多序列比对的结果,我们初步推断:

  1. SARS冠状病毒确实是一种冠状病毒的新的变体,它具有自身比较保守的基因组序列结构;
  2. 11个基因组测序结果几乎完全相同,说明在测序这段时间内,该病毒并没有发生较大的转型。

根据blastp的分析结果,我们发现,这11个cds中有五个在GenBank收录的核酸序列中并无相似序列,我们认为它们是SARA冠状病毒所特有的,可能也是SARA发病机制和传染能力异于其他冠状病毒的物质基础之一,针对这些蛋白质将可能对SARS的防治诊断、传染途径的研究、疫苗的制备和药物的筛选提供一定的参考。而其他6个蛋白在其他病毒基因组中都能找到同源性很高的序列,说明这些蛋白可能就是决定SARA冠状病毒具有病毒行为的物质基础。

Table1. The putative product of putative orf1ab polyprotein
LocationProductDescriptionProtein Assession No.
250..786putative leader proteinPL1-PRO cleavage productNP_828860.1
787..2703putative counterpart of MHV p65 protein NP_828861.1
2704..9969putative coronavirus nsp1contains predicted phosphoesterase (similar to the Appr-1'-p processing enzyme) formerly known as 'X-domain', papain-like proteinase domain similar to that of MHV PLP-2, and hydrophobic domainsNP_828862.1
9970..10887putative coronavirus nsp2 (3CL-PRO)presumably mediates cleavages downstream from nsp1; 3C-like proteinaseNP_828863.1
10888..11757putative coronavirus nsp3 (HD2)hydrophobic domainNP_828864.1
11758..12006putative coronavirus nsp4 NP_828865.1
12007..12600putative coronavirus nsp5 NP_828866.1
12601..12939putative coronavirus nsp6   NP_828867.1
12940..13356putative coronavirus nsp7formerly known as growth-factor-like proteinNP_828868.1
13357..13383,13383..16151putative coronavirus nsp9 (RdRp)RNA-dependent RNA polymeraseNP_828869.1
16152..17954putative coronavirus nsp10 (MB, NTPase/HEL)metal-binding domain, NTPase/helicase domainNP_828870.1
17955..19535putative coronavirus nsp11 NP_828871.1
19536..20573putative coronavirus nsp12 NP_828872.1
20574..21467utative coronavirus nsp13 NP_828873.2

Figure 2. SARS Genome

Taxonomy

 

Table2. SARS蛋白的结构和功能的初步分析结果

LocationStrandLength(aa)PIDBlastPrositeProductPredicate Description
250..21470+707429836505See DetailsSee Detailsputative orf1ab polyproteinChain A, Structure Of Coronavirus Main Proteinase Reveals Combination Of A Chymotrypsin Fold With An Extra Alpha- Helical Domain
250..13398+438329836495See DetailsSee Detailsorf1a polyproteinChain A, Structure Of Coronavirus Main Proteinase Reveals Combination Of A Chymotrypsin Fold With An Extra Alpha- Helical Domain
21477..25244+125629836496See DetailsSee Detailsputative E2 glycoprotein precursor;putative spike glycoproteinE2 glycoprotein, it has align similar sequence
25253..26077+27529836497See DetailsSee Detailsputative uncharacterized proteinUnknown, It is definitely new protein, which has now similar sequence
25674..26138+15529836498See DetailsSee Detailsputative uncharacterized proteinUnknown, It is definitely new protein, which has now similar sequence
26102..26332+7729836499See DetailsSee Detailsputative small envelope protein Eenvelope protein
26383..27048+22229836504See DetailsSee Detailsputative protein Malign similar with matrix glycoprotein [porcine hemagglutinating encephalomyelitis virus]
27059..27250+6429836500See DetailsSee Detailsputative uncharacterized proteinUnknown, It is definitely new protein, which has now similar sequence
27258..27626+12329836501See DetailsSee Detailsputative uncharacterized proteinUnknown, It is definitely new protein, which has now similar sequence
28105..29373+42329836503See DetailsSee Detailsputative nucleocapsid proteinalign similar with nucleocapsid protein [Murine hepatitis virus]
28115..28411+9929836502See DetailsSee Detailsputative uncharacterized proteinUnknown, It is definitely new protein, which has now similar sequence

Note:
Location: the cds start and end position on the genome. Strand: +/- strand as the replication template. PID: the Identify No. of each protein in GenBank.
Blast: the result of blastp . Prosite: the result of ScanProsite.Product: the potential product of the cds. Predicate Description: the predicate result of the proteins' structure and function based on sequence alignment.

参考文献:

 

All copyright reserved by CMBI©2003

2003-4-26