Gene Ontology (GO) 注释
<h1 class="post-title">Gene Ontology (GO) 注释</h1><div class="post-meta"><span class="post-time"><span class="post-meta-item-icon"><span class="fa fa-calendar-o"> <span class="post-meta-item-text">Posted on 2017-06-11 <span class="post-category"><span class="post-meta-divider">| <span class="post-meta-item-icon"><span class="fa fa-folder-o"> <span class="post-meta-item-text">In <span><span>生信</span></span></span></span></span></span></span></span></span></span></span></div>
<div class="post-body">
<p>相似的基因在不同物种中,其功能往往保守的。显然,需要一个<strong>统一的术语</strong>用于描述这些跨物种的同源基因及其基因产物的功能,否则,不同的实验室对相同的基因的功能的描述不同,将极大限制学术的交流。而 Gene Ontology (GO) 项目正是为了能够使对各种数据库中基因获基因产物功能描述相一致的努力结果。</p>
<p>所谓的 GO,是生物学功能注释的一个标准词汇表术语(GO term),将基因的功能分为三部分:</p>
<ul>
<li>基因执行的<strong>分子功能(Molecular Function)</strong></li>
<li>基因所处的<strong>细胞组分(Cellular Component)</strong></li>
<li>基因参与的<strong>生物学过程(Biological Process)</strong></li>
</ul>
<p>不同的 GO term 通过有向无环图关联起来,如下图所示:</p>
<p><img src="https://hui-liu.github.io/blog/Gene-Ontology-GO-%E6%B3%A8%E9%87%8A/1.png"></p>
<p>可以看出,不同的 GO term 间的关系由三类:<code>is_a</code>、<code>part_of</code> 和 <code>regulates</code>。</p>
<p>如 <code>regulation of cell projection assembly</code> 是一种生物学过程,是 <code>regulation of cell projection organization</code> 中的一类(<code>is_a</code>),还调节(<code>regulates</code>)<code>cell projection assembly</code>;又如 <code>cellular component assembly</code> 是 <code>celluar component biogenesis</code>的一部分(<code>part_of</code>)。值得注意的是,这些关系都是有方向的,即反过来不成了,因而叫做有向无环图。</p>
<p>目前,GO 注释主要有两种方法:</p>
<ul>
<li>(1)<strong>序列相似性比对(BLAST)</strong></li>
<li>(2)<strong>结构域相似性比对(InterProScan)</strong></li>
</ul>
<p>这里以<strong>序列相似性比对</strong>为例,简单介绍 GO 注释的步骤:</p>
<ul>
<li>
<p>将基因序列与 swiss-prot 蛋白质数据库进行 BLAST (blastp 或者 blastx)比对,得到如下结果:</p>
<table>
<tbody>
<tr>
<td class="code">
<div class="line">c49_g1_i1 RNF13_MOUSE <span class="number">52.00 <span class="number">50 <span class="number">23 <span class="number">1 <span class="number">17 <span class="number">166 <span class="number">240 <span class="number">288 <span class="number">2e-11 <span class="number">65.5</span></span></span></span></span></span></span></span></span></span></div>
<div class="line">c72_g1_i1 RS25_NEUCR <span class="number">78.72 <span class="number">94 <span class="number">20 <span class="number">0 <span class="number">375 <span class="number">94 <span class="number">1 <span class="number">94 <span class="number">1e-32 <span class="number">116</span></span></span></span></span></span></span></span></span></span></div>
<div class="line">c75_g1_i1 POLX_TOBAC <span class="number">45.28 <span class="number">53 <span class="number">29 <span class="number">0 <span class="number">162 <span class="number">4 <span class="number">457 <span class="number">509 <span class="number">1e-08 <span class="number">55.1</span></span></span></span></span></span></span></span></span></span></div>
<div class="line">c86_g2_i1 POLX_TOBAC <span class="number">46.43 <span class="number">112 <span class="number">60 <span class="number">0 <span class="number">339 <span class="number">4 <span class="number">879 <span class="number">990 <span class="number">2e-30 <span class="number">120</span></span></span></span></span></span></span></span></span></span></div>
<div class="line">c91_g1_i1 BUB1_ARATH <span class="number">55.71 <span class="number">70 <span class="number">28 <span class="number">2 <span class="number">61 <span class="number">264 <span class="number">289 <span class="number">357 <span class="number">1e-14 <span class="number">73.6</span></span></span></span></span></span></span></span></span></span></div>
<div class="line">c143_g1_i1 STL1_YEAST <span class="number">31.98 <span class="number">172 <span class="number">85 <span class="number">4 <span class="number">6 <span class="number">518 <span class="number">407 <span class="number">547 <span class="number">6e-17 <span class="number">82.8</span></span></span></span></span></span></span></span></span></span></div>
<div class="line">c150_g1_i1 CST26_YEAST <span class="number">37.63 <span class="number">93 <span class="number">38 <span class="number">3 <span class="number">223 <span class="number">5 <span class="number">142 <span class="number">234 <span class="number">6e-10 <span class="number">58.2</span></span></span></span></span></span></span></span></span></span></div>
<div class="line">c150_g2_i1 YHOE_SCHPO <span class="number">42.67 <span class="number">75 <span class="number">41 <span class="number">1 <span class="number">227 <span class="number">3 <span class="number">54 <span class="number">126 <span class="number">5e-16 <span class="number">74.7</span></span></span></span></span></span></span></span></span></span></div>
<div class="line">c156_g2_i1 EXOL2_ARATH <span class="number">47.17 <span class="number">53 <span class="number">28 <span class="number">0 <span class="number">299 <span class="number">141 <span class="number">229 <span class="number">281 <span class="number">6e-06 <span class="number">47.0</span></span></span></span></span></span></span></span></span></span></div>
<div class="line">c169_g1_i1 SPT5_ASPFU <span class="number">60.98 <span class="number">82 <span class="number">31 <span class="number">1 <span class="number">20 <span class="number">262 <span class="number">725 <span class="number">806 <span class="number">2e-18 <span class="number">84.0</span></span></span></span></span></span></span></span></span></span></div>
</td>
</tr>
</tbody>
</table>
<blockquote>
<p>其中,第二列 swiss-prot 蛋白质数据库序列的 ID(UniProtKB ID)。</p>
</blockquote>
</li>
<li>
<p>从 ftp://ftp.pir.georgetown.edu/databases/idmapping 下载 <code>idmapping.tb.gz</code>,该文件共有 22 列(tab 键分割):</p>
<table>
<tbody>
<tr>
<td class="code">
<div class="line"><span class="selector-tag">Q6GZX4 001<span class="selector-tag">R_FRG3G 2947773 <span class="selector-tag">YP_031579<span class="selector-class">.1 81941549; 49237298 <span class="selector-tag">PF04947 <span class="selector-tag">GO<span class="selector-pseudo">:0006355; <span class="selector-tag">GO<span class="selector-pseudo">:0046782; <span class="selector-tag">GO<span class="selector-pseudo">:0006351 <span class="selector-tag">UniRef100_Q6GZX4 <span class="selector-tag">UniRef90_Q6GZX4 <span class="selector-tag">UniRef50_Q6GZX4 <span class="selector-tag">UPI00003B0FD4 654924 15165820 <span class="selector-tag">AY548484 <span class="selector-tag">AAT09660<span class="selector-class">.1</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
</td>
</tr>
</tbody>
</table>
<blockquote>
<p>每一列的含义分别为 (可以看出,许多数据库已经和GO关联了):</p>
</blockquote>
<table>
<tbody>
<tr>
<td class="code">
<div class="line"><span class="bullet">1. UniProtKB accession</span></div>
<div class="line"><span class="bullet">2. UniProtKB ID</span></div>
<div class="line"><span class="bullet">3. EntrezGene</span></div>
<div class="line"><span class="bullet">4. RefSeq</span></div>
<div class="line"><span class="bullet">5. NCBI GI number</span></div>
<div class="line"><span class="bullet">6. PDB</span></div>
<div class="line"><span class="bullet">7. Pfam</span></div>
<div class="line"><span class="bullet">8. GO</span></div>
<div class="line"><span class="bullet">9. PIRSF</span></div>
<div class="line"><span class="bullet">10. IPI</span></div>
<div class="line"><span class="bullet">11. UniRef100</span></div>
<div class="line"><span class="bullet">12. UniRef90</span></div>
<div class="line"><span class="bullet">13. UniRef50</span></div>
<div class="line"><span class="bullet">14. UniParc</span></div>
<div class="line"><span class="bullet">15. PIR-PSD accession</span></div>
<div class="line"><span class="bullet">16. NCBI taxonomy</span></div>
<div class="line"><span class="bullet">17. MIM</span></div>
<div class="line"><span class="bullet">18. UniGene</span></div>
<div class="line"><span class="bullet">19. Ensembl</span></div>
<div class="line"><span class="bullet">20. PubMed ID</span></div>
<div class="line"><span class="bullet">21. EMBL/GenBank/DDBJ</span></div>
<div class="line"><span class="bullet">22. EMBL protein_id</span></div>
</td>
</tr>
</tbody>
</table>
</li>
<li>
<p>根据文件 <code>idmapping.tb.gz</code>,将 blast 的结果,通过 <code>UniProtKB ID</code>,将第八列的 GO 号注释到对应的基因上。</p>
<table>
<tbody>
<tr>
<td class="code">
<div class="line">python UniProt2GO_annotate.py idmapping.tb.gz blastout outputfile</div>
</td>
</tr>
</tbody>
</table>
<p>结果如下:</p>
<table>
<tbody>
<tr>
<td class="code">
<div class="line"><span class="selector-tag">c93619_g2_i1 <span class="selector-tag">GO<span class="selector-pseudo">:0005506,<span class="selector-tag">GO<span class="selector-pseudo">:0016705,<span class="selector-tag">GO<span class="selector-pseudo">:0016021,<span class="selector-tag">GO<span class="selector-pseudo">:0004497,<span class="selector-tag">GO<span class="selector-pseudo">:0020037</span></span></span></span></span></span></span></span></span></span></span></div>
<div class="line"><span class="selector-tag">c93619_g2_i3 <span class="selector-tag">GO<span class="selector-pseudo">:0009733,<span class="selector-tag">GO<span class="selector-pseudo">:0020037,<span class="selector-tag">GO<span class="selector-pseudo">:0044550,<span class="selector-tag">GO<span class="selector-pseudo">:0016021,<span class="selector-tag">GO<span class="selector-pseudo">:0016020,<span class="selector-tag">GO<span class="selector-pseudo">:0016711,<span class="selector-tag">GO<span class="selector-pseudo">:0009813,<span class="selector-tag">GO<span class="selector-pseudo">:0005789,<span class="selector-tag">GO<span class="selector-pseudo">:0005506</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div class="line"><span class="selector-tag">c70056_g1_i1 <span class="selector-tag">GO<span class="selector-pseudo">:0005737,<span class="selector-tag">GO<span class="selector-pseudo">:0019722,<span class="selector-tag">GO<span class="selector-pseudo">:0071889,<span class="selector-tag">GO<span class="selector-pseudo">:0005829,<span class="selector-tag">GO<span class="selector-pseudo">:0001077,<span class="selector-tag">GO<span class="selector-pseudo">:0006357,<span class="selector-tag">GO<span class="selector-pseudo">:0097720,<span class="selector-tag">GO<span class="selector-pseudo">:0000978,<span class="selector-tag">GO<span class="selector-pseudo">:0046872,<span class="selector-tag">GO<span class="selector-pseudo">:0005634,<span class="selector-tag">GO<span class="selector-pseudo">:0006874</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div class="line"><span class="selector-tag">c93748_g1_i1 <span class="selector-tag">GO<span class="selector-pseudo">:0006729,<span class="selector-tag">GO<span class="selector-pseudo">:0008124</span></span></span></span></span></div>
<div class="line"><span class="selector-tag">c107639_g1_i1 <span class="selector-tag">GO<span class="selector-pseudo">:0009737,<span class="selector-tag">GO<span class="selector-pseudo">:0009738,<span class="selector-tag">GO<span class="selector-pseudo">:0005623,<span class="selector-tag">GO<span class="selector-pseudo">:0006970,<span class="selector-tag">GO<span class="selector-pseudo">:0009651,<span class="selector-tag">GO<span class="selector-pseudo">:0045454,<span class="selector-tag">GO<span class="selector-pseudo">:0009789</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div class="line"><span class="selector-tag">c106424_g1_i1 <span class="selector-tag">GO<span class="selector-pseudo">:0043565,<span class="selector-tag">GO<span class="selector-pseudo">:0009555,<span class="selector-tag">GO<span class="selector-pseudo">:0003700,<span class="selector-tag">GO<span class="selector-pseudo">:0005634,<span class="selector-tag">GO<span class="selector-pseudo">:0009793,<span class="selector-tag">GO<span class="selector-pseudo">:0006351</span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div class="line"><span class="selector-tag">c66585_g1_i1 <span class="selector-tag">GO<span class="selector-pseudo">:0005737,<span class="selector-tag">GO<span class="selector-pseudo">:0003746,<span class="selector-tag">GO<span class="selector-pseudo">:0003924,<span class="selector-tag">GO<span class="selector-pseudo">:0005525</span></span></span></span></span></span></span></span></span></div>
<div class="line"><span class="selector-tag">c110618_g1_i8 <span class="selector-tag">GO<span class="selector-pseudo">:0015297,<span class="selector-tag">GO<span class="selector-pseudo">:0016021,<span class="selector-tag">GO<span class="selector-pseudo">:0015238</span></span></span></span></span></span></span></div>
<div class="line"><span class="selector-tag">c105249_g1_i5 <span class="selector-tag">GO<span class="selector-pseudo">:0046872,<span class="selector-tag">GO<span class="selector-pseudo">:0043161,<span class="selector-tag">GO<span class="selector-pseudo">:0005829,<span class="selector-tag">GO<span class="selector-pseudo">:0006915,<span class="selector-tag">GO<span class="selector-pseudo">:0032648,<span class="selector-tag">GO<span class="selector-pseudo">:0050691,<span class="selector-tag">GO<span class="selector-pseudo">:0005654,<span class="selector-tag">GO<span class="selector-pseudo">:0070936,<span class="selector-tag">GO<span class="selector-pseudo">:0061630,<span class="selector-tag">GO<span class="selector-pseudo">:0005634</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div class="line"><span class="selector-tag">c134727_g1_i1 <span class="selector-tag">GO<span class="selector-pseudo">:0072546,<span class="selector-tag">GO<span class="selector-pseudo">:0030246,<span class="selector-tag">GO<span class="selector-pseudo">:0005783</span></span></span></span></span></span></span></div>
</td>
</tr>
</tbody>
</table>
</li>
</ul>
<p>拓展阅读:</p>
<ul>
<li>Ontology Relations</li>
<li>Frequently Asked Questions (FAQ)</li>
</ul>
</div><br><br>
来源:https://www.cnblogs.com/wangshicheng/p/11171033.html
頁:
[1]