Difference between revisions of "Reference annotation yeast Mbp1"

From "A B C"
Jump to navigation Jump to search
 
(39 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
<div id="BIO">
 
<div id="BIO">
 
<div class="b1">
 
<div class="b1">
Mbp1 protein reference annotation
+
Yeast Mbp1 protein reference annotation
 
</div>
 
</div>
  
__NOTOC__
+
 
 +
&nbsp;
 +
 
 +
__TOC__
 +
 
  
 +
 +
<section begin=contents_summary />
 +
A reference annotation of the ''Saccharomyces cerevisiae'' Mbp1 protein sequence that integrates annotation sources we encounter throughout the course.
 +
<section end=contents_summary />
  
  
This is a reference annotation of the ''Saccharomyces cerevisiae'' Mbp1 protein sequence that integrates annotation sources we encounter throughout the course.
 
  
  
Line 23: Line 30:
  
  
* see also: [[Reference_APSES_domains|Reference sequences of APSES domains]]
+
* see also: [[Reference APSES domains (reference species)]]
 +
* see also: [[Reference APSES proteins (reference species)]]
  
  
Line 68: Line 76:
  
 
==Annotations==
 
==Annotations==
===NCBI CDD APSES domain boundaries===
+
===NCBI CDD APSES and KilA-Ndomain boundaries===
 
<section begin=CDD_APSES />
 
<section begin=CDD_APSES />
  
APSES domain boundaries can be derived from the results of a [http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=pfam04383 '''CDD search'''] with the ID <tt>1BM8_A</tt>. The KilA-N superfamily domain alignment is returned. This superfamily contains the APSES domains.
+
The APSES domain is a well-defined type of DNA-binding domain that is ubiquitous in fungi and unique in that kingdom. Structurally it is a member of the Winged Helix-Turn-Helix family. Recently it was found that it is homologous to the somewhat shorter, prokaryotic KilA-N domain; thus the APSES domain was retired from [http://pfam.xfam.org/family/PF02292 '''pFam'''] and instances were merged into the [http://pfam.xfam.org/family/PF04383 '''KilA-N'''] family. However InterPro has a [http://www.ebi.ac.uk/interpro/entry/IPR018004 KilA-N] entry but still recognizes the [http://www.ebi.ac.uk/interpro/entry/IPR003163 APSES domain].
 +
 
 +
 
 +
KilA-N domain boundaries in Mbp1 can be derived from the results of a [http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=pfam04383 '''CDD search'''] with the ID <tt>1BM8_A</tt> (the Mbp1 DNA binding domain crystal structure). The KilA-N superfamily domain alignment is returned.  
 +
 
  
 
:<small>(pfam 04383): KilA-N domain; The amino-terminal module of the D6R/N1R proteins defines a novel, conserved DNA-binding domain (the KilA-N domain) that is found in a wide range of proteins of large bacterial and eukaryotic DNA viruses. The KilA-N domain family also includes the previously defined APSES domain. The KilA-N and APSES domains may also share a common fold with the nucleic acid-binding modules of the LAGLIDADG nucleases and the amino-terminal domains of the tRNA endonuclease.</small>
 
:<small>(pfam 04383): KilA-N domain; The amino-terminal module of the D6R/N1R proteins defines a novel, conserved DNA-binding domain (the KilA-N domain) that is found in a wide range of proteins of large bacterial and eukaryotic DNA viruses. The KilA-N domain family also includes the previously defined APSES domain. The KilA-N and APSES domains may also share a common fold with the nucleic acid-binding modules of the LAGLIDADG nucleases and the amino-terminal domains of the tRNA endonuclease.</small>
Line 88: Line 100:
 
</div>
 
</div>
  
 +
Note that CDD and SMART are '''not consistent''' in how they apply <code>pFam 04383</code> to the Mbp1 sequence. See [[#Consolidated Annotation|annotation]] below.
  
Note that this domain definition begins at position 16 of the domain. But virtually all fungal APSES domains have a longer N-terminus. Blindly applying this domain definition would lose important information. '''For most purposes we will prefer the sequence spanned by the <tt>1BM8_A</tt> structure.''' The sequence is given below, the <span style="color: #005522;">KilA-N domain is colored dark green</span>.
+
The CDD KilA-N domain definition begins at position 16 of the 1BM8 sequence. But virtually all fungal APSES domains have a longer, structurally defined, conserved N-terminus. Blindly applying the KilA-N domain definition to these proteins would lose important information. '''For most purposes we will prefer the sequence spanned by the <tt>1BM8_A</tt> structure.''' The sequence is given below, the <span style="color: #005522;">KilA-N domain is coloured dark green</span>. By this definition the APSES domain is 99 amino acids long and comprises residues 4 to 102 of the <code>NP_010227</code> sequence.
  
 
<div style="line-height: 0; white-space: pre; border: solid 1px #445577; background-color: #F4F6FF; font-family: 'Courier New', Courier, monospace; padding: 10px;">
 
<div style="line-height: 0; white-space: pre; border: solid 1px #445577; background-color: #F4F6FF; font-family: 'Courier New', Courier, monospace; padding: 10px;">
Line 100: Line 113:
 
<p style="line-height:0.3;">[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&doptcmdl=GenPept&db=Protein&term=1BM8A 1BM8A]        <span style="color: #888888"> 81 </span><span style="color: #005522">KQLAEKFSVY</span><span style="color: #2233CC;">DQLKPLFDF</span> <span style="color: #888888">99</span></p>
 
<p style="line-height:0.3;">[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&doptcmdl=GenPept&db=Protein&term=1BM8A 1BM8A]        <span style="color: #888888"> 81 </span><span style="color: #005522">KQLAEKFSVY</span><span style="color: #2233CC;">DQLKPLFDF</span> <span style="color: #888888">99</span></p>
 
</div>
 
</div>
 +
 +
 +
&nbsp;
 +
;Yeast APSES domain sequence in FASTA format
 +
 +
>APSES_MBP1 Residues 4-102 of S. cerevisiae Mbp1
 +
QIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRI
 +
LEKEVLKETHEKVQGGFGKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDF
 +
 +
 +
&nbsp;
 +
 +
;Synopsis of ranges
 +
<table>
 +
 +
<tr class="sh">
 +
<td><b>Domain</b></td>
 +
<td><b>Link</b></td>
 +
<td><b>Length</b></td>
 +
<td><b>Boundary</b></td>
 +
<td><b>Range (Mbp1)</b></td>
 +
<td><b>Range (1BM8)</b></td>
 +
</tr>
 +
 +
<tr><td colspan="6" class="sp">&nbsp;</td></tr>
 +
 +
<tr class="s2">
 +
<td>KilA-N: pfam04383 (CDD)</td>
 +
<td>[http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?ascbin=8&maxaln=10&seltype=2&uid=252556&querygi=6320147&aln=4,18,2,29,48,31,19,67,65,20,87,89,6 CDD alignment]</td>
 +
<td>72</td>
 +
<td><tt>STGSI ... KFSVY</tt></td>
 +
<td>21 - 93</td>
 +
<td>18 - 90</td>
 +
</tr>
 +
 +
<tr class="s1">
 +
<td>KilA-N: pfam04383 (SMART)</td>
 +
<td>[http://smart.embl-heidelberg.de/smart/set_mode.cgi?NORMAL=1 Smart main page]</td>
 +
<td>79</td>
 +
<td><tt>IHSTG ... YDQLK</tt></td>
 +
<td>19 - 97</td>
 +
<td>16 - 94</td>
 +
</tr>
 +
 +
<tr class="s2">
 +
<td>KilA-N: SM01252 (SMART)</td>
 +
<td>[http://smart.embl-heidelberg.de/smart/set_mode.cgi?NORMAL=1 Smart main page]</td>
 +
<td>84</td>
 +
<td><tt>TGSIM ... DFTQT</tt></td>
 +
<td>22 - 105</td>
 +
<td>19 - 99...</td>
 +
</tr>
 +
 +
<tr class="s1">
 +
<td>APSES: Interpro IPR003163</td>
 +
<td>[http://www.ebi.ac.uk/interpro/protein/P39678 (Interpro)]</td>
 +
<td>130</td>
 +
<td><tt>QIYSA ... IRSAS</tt></td>
 +
<td>3 - 133</td>
 +
<td>1 - 99...</td>
 +
</tr>
 +
 +
<tr class="s2">
 +
<td>APSES (1BM8)</td>
 +
<td>&ndash;</td>
 +
<td>99</td>
 +
<td><tt>QIYSA ... PLFDF</tt></td>
 +
<td>4 - 102</td>
 +
<td>1 - 99</td>
 +
</tr>
 +
</table>
  
 
<section end=CDD_APSES />
 
<section end=CDD_APSES />
Line 242: Line 326:
 
== Orthologs ==
 
== Orthologs ==
  
The Mbp1 orthologs in the six fungal reference species.
+
The Mbp1 orthologs in the [[Reference_species_for_fungi|ten fungal reference species]].
* ''Saccharomyces cerevisiae'' (SACCE)
+
 
 
* ''Aspergillus nidulans'' (ASPNI)
 
* ''Aspergillus nidulans'' (ASPNI)
* ''Candida albicans'' (CANAL)
+
* ''Bipolaris oryzae'' (BIPOR)
 +
* ''Coprinopsis cinerea'' (COPCI)
 +
* ''Cryptococcus neoformans'' (CRYNE)
 
* ''Neurospora crassa'' (NEUCR)
 
* ''Neurospora crassa'' (NEUCR)
 +
* ''Puccinia Graminis'' (PUCGR)
 +
* '''''Saccharomyces cerevisiae'' (SACCE)'''
 
* ''Schizosaccharomyces pombe'' (SCHPO)
 
* ''Schizosaccharomyces pombe'' (SCHPO)
 
* ''Ustilago maydis'' (USTMA)
 
* ''Ustilago maydis'' (USTMA)
 +
* ''Wallemia mellicola'' (WALME)
  
 +
Orthologs were determined by [[BLAST scripting|RBM]] to [http://www.ncbi.nlm.nih.gov/protein/NP_010227 '''NP_010227'''], residues 4 to 102. Uniprot accession numbers were obtained from the [http://www.uniprot.org/uploadlists/ Uniprot mapping service].
  
 
+
<table width="80%">
<table width="60%">
 
  
 
<tr class="sh">
 
<tr class="sh">
Line 263: Line 352:
  
 
<tr class="s1">
 
<tr class="s1">
<td class="sc">''Saccharomyces cerevisiae''</td>
 
<td class="sc">SACCE</td>
 
<td class="sc">Mbp1p</td>
 
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/NP_010227 NP_010227] <small>[http://www.ncbi.nlm.nih.gov/protein/NP_010227?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
 
<td class="sc">[http://www.uniprot.org/uniprot/P39678 P39678]</td>
 
</tr>
 
 
<tr class="s2">
 
 
<td class="sc">''Aspergillus nidulans''</td>
 
<td class="sc">''Aspergillus nidulans''</td>
 
<td class="sc">ASPNI</td>
 
<td class="sc">ASPNI</td>
Line 278: Line 359:
 
</tr>
 
</tr>
  
 +
<tr class="s2">
 +
<td class="sc">''Bipolaris orizae''</td>
 +
<td class="sc">BIPOR</td>
 +
<td class="sc">COCMIDRAFT_338</td>
 +
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XP_007682304 XP_007682304] <small>[http://www.ncbi.nlm.nih.gov/protein/XP_007682304?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
 +
<td class="sc">[http://www.uniprot.org/uniprot/W6ZM86 W6ZM86]</td>
 +
</tr>
  
 
<tr class="s1">
 
<tr class="s1">
<td class="sc">''Candida albicans''</td>
+
<td class="sc">''Coprinopsis cinerea''</td>
<td class="sc">CANAL</td>
+
<td class="sc">COPCI</td>
<td class="sc">potential DNA binding component of MBF</td>
+
<td class="sc">CC1G_01306</td>
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XP_723071 XP_723071] <small>[http://www.ncbi.nlm.nih.gov/protein/XP_723071?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
+
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XP_001837394 XP_001837394] <small>[http://www.ncbi.nlm.nih.gov/protein/XP_001837394?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
<td class="sc">[http://www.uniprot.org/uniprot/Q5ANP5 Q5ANP5]</td>
+
<td class="sc">[http://www.uniprot.org/uniprot/A8NYC6 A8NYC6]</td>
 
</tr>
 
</tr>
  
 
<tr class="s2">
 
<tr class="s2">
 +
<td class="sc">''Cryptococcus neoformans''</td>
 +
<td class="sc">CRYNE</td>
 +
<td class="sc">CND05520</td>
 +
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XP_570545 XP_570545] <small>[http://www.ncbi.nlm.nih.gov/protein/XP_570545?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
 +
<td class="sc">[http://www.uniprot.org/uniprot/Q5KHS0 Q5KHS0]</td>
 +
</tr>
 +
 +
<tr class="s1">
 
<td class="sc">''Neurospora crassa''</td>
 
<td class="sc">''Neurospora crassa''</td>
 
<td class="sc">NEUCR</td>
 
<td class="sc">NEUCR</td>
<td class="sc"> NCU07246 </td>
+
<td class="sc">NCU07246</td>
 
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XP_955821 XP_955821] <small>[http://www.ncbi.nlm.nih.gov/protein/XP_955821?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
 
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XP_955821 XP_955821] <small>[http://www.ncbi.nlm.nih.gov/protein/XP_955821?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
 
<td class="sc">[http://www.uniprot.org/uniprot/Q7RW59 Q7RW59]</td>
 
<td class="sc">[http://www.uniprot.org/uniprot/Q7RW59 Q7RW59]</td>
 +
</tr>
 +
 +
<tr class="s2">
 +
<td class="sc">''Puccinia graminis''</td>
 +
<td class="sc">PUCGR</td>
 +
<td class="sc">PGTG_08863</td>
 +
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XP_003327086 XP_003327086] <small>[http://www.ncbi.nlm.nih.gov/protein/XP_003327086?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
 +
<td class="sc">[http://www.uniprot.org/uniprot/E3KED4 E3KED4]</td>
 
</tr>
 
</tr>
  
 
<tr class="s1">
 
<tr class="s1">
 +
<td class="sc">''Saccharomyces cerevisiae''</td>
 +
<td class="sc">SACCE</td>
 +
<td class="sc">Mbp1</td>
 +
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/NP_010227 NP_010227] <small>[http://www.ncbi.nlm.nih.gov/protein/NP_010227?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
 +
<td class="sc">[http://www.uniprot.org/uniprot/P39678 P39678]</td>
 +
</tr>
 +
 +
<tr class="s2">
 
<td class="sc">''Schizosaccharomyces pombe''</td>
 
<td class="sc">''Schizosaccharomyces pombe''</td>
 
<td class="sc">SCHPO</td>
 
<td class="sc">SCHPO</td>
<td class="sc">MBF transcription factor complex subunit Res2</td>
+
<td class="sc">Res2</td>
 
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/NP_593032 NP_593032] <small>[http://www.ncbi.nlm.nih.gov/protein/NP_593032?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
 
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/NP_593032 NP_593032] <small>[http://www.ncbi.nlm.nih.gov/protein/NP_593032?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
 
<td class="sc">[http://www.uniprot.org/uniprot/P41412 P41412]</td>
 
<td class="sc">[http://www.uniprot.org/uniprot/P41412 P41412]</td>
 
</tr>
 
</tr>
  
<tr class="s2">
+
<tr class="s1">
 
<td class="sc">''Ustilago maydis''</td>
 
<td class="sc">''Ustilago maydis''</td>
 
<td class="sc">USTMA</td>
 
<td class="sc">USTMA</td>
<td class="sc">hypothetical protein UM06196.1</td>
+
<td class="sc">UM06196</td>
 
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XP_762343 XP_762343] <small>[http://www.ncbi.nlm.nih.gov/protein/XP_762343?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
 
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XP_762343 XP_762343] <small>[http://www.ncbi.nlm.nih.gov/protein/XP_762343?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
 
<td class="sc">[http://www.uniprot.org/uniprot/Q4P117 Q4P117]</td>
 
<td class="sc">[http://www.uniprot.org/uniprot/Q4P117 Q4P117]</td>
 +
</tr>
 +
 +
<tr class="s2">
 +
<td class="sc">''Wallemia mellicola''</td>
 +
<td class="sc">WALME</td>
 +
<td class="sc">WALSEDRAFT_59726</td>
 +
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XP_006957051 XP_006957051] <small>[http://www.ncbi.nlm.nih.gov/protein/XP_006957051?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
 +
<td class="sc">[http://www.uniprot.org/uniprot/I4YGC0 I4YGC0]</td>
 
</tr>
 
</tr>
  
 
</table>
 
</table>
  
 +
 +
 +
 +
;FASTA formatted sequences
 +
(Header lines were edited to begin with a code identifying the sequence as orthologous to Mbp1, and including the species shorthand code.)
 +
 +
 +
<source lang="text">
 +
>MBP1_ASPNI XP_660758 AN3154.2 [Aspergillus nidulans FGSC A4]
 +
MAAVDFSNVYSATYSSVPVYEFKIGTDSVMRRRSDDWINATHILKVAGFDKPARTRILEREVQKGVHEKV
 +
QGGYGKYQGTWIPLQEGRQLAERNNILDKLLPIFDYVAGDRSPPPAPKHTSAASKPRAPKINKRVVKEDV
 +
FSAVNHHRSMGPPSFHHEHYDVNTGLDEDESIEQATLESSSMIADEDMISMSQNGPYSSRKRKRGINEVA
 +
AMSLSEQEHILYGDQLLDYFMTVGDAPEATRIPPPQPPANFQVDRPIDDSGNTALHWACAMGDLEIVKDL
 +
LRRGADMKALSIHEETPLVRAVLFTNNYEKRTFPALLDLLLDTISFRDWFGATLFHHIAQTTKSKGKWKS
 +
SRYYCEVALEKLRTTFSPEEVDLLLSCQDSVGDTAVLVAARNGVFRLVDLLLSRCPRAGDLVNKRGETAS
 +
SIMQRAHLAERDIPPPPSSITMGNDHIDGEVGAPTSLEPQSVTLHHESSPATAQLLSQIGAIMAEASRKL
 +
TSSYGAAKPSQKDSDDVANPEALYEQLEQDRQKIRRQYDALAAKEAAEESSDAQLGRYEQMRDNYESLLE
 +
QIQRARLKERLASTPVPTQTAVIGSSSPEQDRLLTTFQLSRALCSEQKIRRAAVKELAQQRADAGVSTKF
 +
DVHRKLVALATGLKEEELDPMAAELAETLEFDRMNGKGVGPESPEADHKDSASLPFPGPVVSVDA
 +
 +
>MBP1_BIPOR XP_007682304 COCMIDRAFT_338 [Bipolaris oryzae ATCC 44560]
 +
MPPAPDGKIYSATYSNVPVYECNVNGHHVMRRRADDWINATHILKVADYDKPARTRILEREVQKGVHEKV
 +
QGGYGKYQGTWIPLEEGRGLAERNGVLDKMRAIFDYVPGDRSPPPAPKHATAASNRMKPPRQTAAAVAAA
 +
AVAAAAAAAAVANHNALMSNSRSQASEDPYENSQRSQIYREDTPDNETVISESMLGDADLMDMSQYSADG
 +
NRKRKRGMDQMSLLDQQHQIWADQLLDYFMLLDHEAAVSWPEPPPSINLDRPIDEKGHAAMHWAAAMGDV
 +
GVVKELIHRGARLDCLSNNLETPLMRAVMFTNNFDKETMPSMVKIFQQTVHRTDWFGSTVFHHIAATTSS
 +
SNKYVCARWYLDCIINKLSETWIPEEVTRLLNAADQNGDTAIMIAARNGARKCVRSLLGRNVAVDIPNKK
 +
GETADDLIRELNQRRRMHGRTRQASSSPFAPAPEHRLNGHVPHFDGGPLMSVPVPSMAVRESVQYRSQTA
 +
SHLMTKVAPTLLEKCEELATAYEAELQEKEAEFFDAERVVKRRQAELEAVRKQVAELQSMSKGLHIDLND
 +
EEAERQQEDELRLLVEEAESLLEIEQKAELRRLCSSMPQQNSDSSPVDITEKMRLALLLHRAQLERRELV
 +
REVVGNLSVAGMSEKQGTYKKLIAKALGEREEDVESMLPEILQELEEAETQERAEGLDGSPV
 +
 +
>MBP1_COPCI XP_001837394 CC1G_01306 [Coprinopsis cinerea okayama7#130]
 +
MPEAQIFKATYSGIPVYEMMCKGVAVMRRRSDSWLNATQILKVAGFDKPQRTRVLEREVQKGEHEKVQGG
 +
YGKYQGTWIPLERGMQLAKQYNCEHLLRPIIEFTPAAKSPPLAPKHLVATAGNRPVRKPLTTDLSAAVIN
 +
TRSTRKQVADGVGEESDHDTHSLRGSEDGSMTPSPSEASSSSRTPSPIHSPGTYHSNGLDGPSSGGRNRY
 +
RQSNDRYDEDDDASRHNGMGDPRSYGDQILEYFISDTNQIPPILITPPPDFDPNMAIDDDGHTSLHWACA
 +
MGRIRIVKLLLSAGADIFKVNKAGQTALMRSVMFANNYDVRKFPELYELLHRSTLNIDNSNRTVFHHVVD
 +
VAMSKGKTHAARYYMETILTRLADYPKELADVINFQDEDGETALTMAARCRSKRLVKLLIDHGADPKINN
 +
HDGKNAEDYILEDERFRSSPAPSSRVAAMSYRNAQVAYPPPGAPSTYSFAPANHDRPPLHYSAAAQKAST
 +
RCVNDMASMLDSLAASFDQELRDKERDMAQAQALLTNIQAEILESQRTVLQLRQQAEGLSQAKQRLADLE
 +
NALQDKMGRRYRLGFEKWIKDEETREKVIRDAANGDLVLTPATTSYTVDEDGDSDSGSNGDKNKGKRKAQ
 +
VQQEEVSDLVELYSNIPTDPEELRKQCEALREEVSQSRKRRKAMFDELVTFQAEAGTSGRMSDYRRLIAA
 +
GCGGLEPLEIDSVLGMLLETLEAEDPSSTSATWSGSKGQQTG
 +
 +
>MBP1_CRYNE XP_570545 CND05520 [Cryptococcus neoformans var. neoformans JEC21]
 +
MEPPSNPIQPPVTPSHHSLLSAISPALSEQTPAPIHTLPPHLRPSIPQPHIAPPRPSSVQPTMEEQQRMH
 +
HIQQHQQQQHFQQQQNDENVFGSVMGAPGHVPGHEAPMSTQPKVYASVYSGVPVFEAMIRGISVMRRASD
 +
SWVNATQILKVAGVHKSARTKILEKEVLNGIHEKIQGGYGKYQGTWVPLDRGRDLAEQYGVGSYLSSVFD
 +
FVPSASVIAALPVIRTGTPDRSGQQTPSGLPGHPNQRVISPFANHGQTTPHMPPPQFIHQGNEQMMNLPP
 +
HPSSLAYPTQPKPYFSMPLQHTVGPQYDERHEGMTMTPTMSMDGLAPPADIARMGFPYNPSDIYIDQYGQ
 +
PHATYQASPYGKESGHPSKRQRSDAEGSYIESGAAVQQHVEQDEEADDGLDNDSTASDDARDPPPLPSSM
 +
LLPHKPIRPKATPANGRIKSRLVQIFNVEGQVNLRSVFGLAPDQLPNFDIDMVIDDQGHSALHWACALAR
 +
LSIVQQLIELGADIHRGNYAGETPLIRAVLTSNHAEAGSFTDLLHLLSPSIRTLDHAYRTVLHHIALVAG
 +
VKGRVPAARTYMASVLEWVAREQQANNTHSITNPPNPADRNELAPINLRTLVDVQDVHGDTALNVAARVG
 +
NKGLVGLLLDAGADKTRANKLGLRPENFGLEIEALKISNGEAVMANLKSEVSKPERKSRDVQKNIATIFE
 +
SISSTFSSEMLAKQTKLNATEASVRHATRALADKRQHLHRAQEKLATMQLFEQRSENVRRIMDAIAAGTL
 +
LTPAEFTGRTQTMHEKSTGQLPPLAFRHVPGLALDASSQSQLNGAPPSTPLSVEDQEDIALPERDDPECL
 +
VKLRRMALWEDRIAEVLEDKIRAMEGEGVDRAVKYRKLVSVCAKVPVDKVDSMLDGLVAAVESEGQGLDF
 +
SRASNFVNRIKATKS
 +
 +
>MBP1_NEUCR XP_955821 NCU07246 [Neurospora crassa OR74A]
 +
MVKENVGGNPEPGIYSATYSGIPVWEYQFGVDLKEHVMRRRHDDWVNATHILKAAGFDKPARTRILEREV
 +
QKDTHEKIQGGYGRYQGTWIPLEQAEALARRNNIYERLKPIFEFQPGNESPPPAPRHASKPKAPKVKPAV
 +
PTWGSKSAKNANPPQPGTFLPPGRKGLPAQAPDYNDADTHMHDDDTPDNLTVASASYMAEDDRYDHSHFS
 +
TGHRKRKRDELIEDMTEQQHAVYGDELLDYFLLSRNEQPAVRPDPPPNFKPDWPIDNERHTCLHWASAMG
 +
DVDVMRQLKKFGASLDAQNVRGETPFMRAVNFTNCFEKQTFPQVMKELFSTIDCRDLSGCTVIHHAAVMK
 +
IGRVNSQSCSRYYLDIILNRLQETHHPEFVQQLLDAQDNDGNTAVHLAAMRDARKCIRALLGRGASTDIP
 +
NKQGIRAEELIKELNASISKSRSNLPQRSSSPFAPDTQRHDAFHEAISESMVTSRKNSQPNYSSDAANTV
 +
QNRITPLVLQKLKDLTATYDSEFKEKDDAEKEARRILNKTQSELKALTASIDDYNSRLDTDDVAAKTAAE
 +
MATARHKVLAFVTHQNRISVQEAVKQELAALDRANAVTNGTSTKSKSSSPSKKPKLSPIPDQKDKPPKDE
 +
NETESEAEHPDPPAAQAHQQQPGPSSQDTEVEDQDREEEEDDYTHRLSLAAELRSILQEQRSAENDYVEA
 +
RGMLGTGERIDKYKHLLMSCLPPDEQENLEENLEEMIKLMEQEDESVTDLPAGAVGGGGGGNAADGSGGG
 +
GQPSNGRRESVLPALRGGNGDGEMSRRGSRTAAAAAAQVDGEREINGRAGAERTERIQEIAAV
 +
 +
>MBP1_PUCGR XP_003327086 PGTG_08863 [Puccinia graminis f. sp. tritici CRL 75-36-700-3]
 +
MAYGGSIQPLRPPSRESATLHLHQPDLTVTSPPLSLTHCPPCVYSHFTHTPTSLIVIQVSLHSLLDQETY
 +
HLLPSRSPPTVSVRMGTTTIYKATYSGVPVLEMPCEGIAVMRRRSDSWLNATQILKVAGFDKPQRTRVLE
 +
REIQKGTHEKIQGGYGKYQGTWVPLDRGIDLAKQYGVDHLLSALFNFQPSSNESPPLAPKHVTALSTRVK
 +
VSKVSAASAARAARAVVPSLPSTSGLGGRNTNNSWSNFDSDNEPGLPPAASSRESNGNWATQSKLARSSN
 +
LARARANINNSHPEDLPVPAPDQLQASPLPSMQTADPENDNSLTPSELSLPSRTPSPIEDLPLTVNTASS
 +
QSTRNKGKSRDLPDDEDLSRGQKRKYDTSLVEDTSYSDGADDQYINGNPSNAASAKYAKLILDYFVSESS
 +
QIPNFLNDPPSDFDPNVVIDDDGHTALHWACAMGRIKIIKLLLTCGADIFRANNAGQTALMRAVMFTNNH
 +
DLRTFPELFESFSGSVINIDRTDRTVFHYVIDIALTKGKVPAARYYLETILSQLSEYPKELIDILNFQDE
 +
DGETALTLAARCRSKKLVKILLDHGANPKTANRDGKSAEDYILEDDKFRALSPTPCSSGPIRQLDQNSPG
 +
GTSNRSDFVDLVDPVPIDSNLIPQRSPNASPPHYSETGQRVTKQLLPEVTSMIELLATTFDTELQDKERD
 +
LDHAVGLLSNIEKEYLEGQRKILNYERMLSDFGEKKLALGDLEKELNDKLGKRYRFGWEKYVRDEEERAR
 +
RITEQRSKYLQELSIEDRKLLDSSNLRFADPSKQEVLMKLQADERENSDLLNLIRTNSTDVESECDLLRE
 +
SVQKLSEERERLFKEFINLSSENTGGENEEDDGANHTSANTSRLNNYRKLISLGCGGIGLDEVDEVIESL
 +
NEGIDVNELNDNGFLTEQDEELGNHQNYHNIHTQGR
 +
 +
>MBP1_SACCE NP_010227 Mbp1p [Saccharomyces cerevisiae S288c]
 +
MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF
 +
GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDRKKAIRSASTSAIMET
 +
KRNNKKAEENQFQSSKILGNPTAAPRKRGRPVGSTRGSRRKLGVNLQRSQSDMGFPRPAIPNSSISTTQL
 +
PSIRSTMGPQSPTLGILEEERHDSRQQQPQQNNSAQFKEIDLEDGLSSDVEPSQQLQQVFNQNTGFVPQQ
 +
QSSLIQTQQTESMATSVSSSPSLPTSPGDFADSNPFEERFPGGGTSPIISMIPRYPVTSRPQTSDINDKV
 +
NKYLSKLVDYFISNEMKSNKSLPQVLLHPPPHSAPYIDAPIDPELHTAFHWACSMGNLPIAEALYEAGTS
 +
IRSTNSQGQTPLMRSSLFHNSYTRRTFPRIFQLLHETVFDIDSQSQTVIHHIVKRKSTTPSAVYYLDVVL
 +
SKIKDFSPQYRIELLLNTQDKNGDTALHIASKNGDVVFFNTLVKMGALTTISNKEGLTANEIMNQQYEQM
 +
MIQNGTNQHVNSSNTDLNIHVNTNNIETKNDVNSMVIMSPVSPSDYITYPSQIATNISRNIPNVVNSMKQ
 +
MASIYNDLHEQHDNEIKSLQKTLKSISKTKIQVSLKTLEVLKESSKDENGEAQTNDDFEILSRLQEQNTK
 +
KLRKRLIRYKRLIKQKLEYRQTVLLNKLIEDETQATTNNTVEKDNNTLERLELAQELTMLQLQRKNKLSS
 +
LVKKFEDNAKIHKYRRIIREGTEMNIEEVDSSLDVILQTLIANNNKNKGAEQIITISNANSHA
 +
 +
>MBP1_SCHPO NP_593032 Res2 [Schizosaccharomyces pombe 972h-]
 +
MAPRSSAVHVAVYSGVEVYECFIKGVSVMRRRRDSWLNATQILKVADFDKPQRTRVLERQVQIGAHEKVQ
 +
GGYGKYQGTWVPFQRGVDLATKYKVDGIMSPILSLDIDEGKAIAPKKKQTKQKKPSVRGRRGRKPSSLSS
 +
STLHSVNEKQPNSSISPTIESSMNKVNLPGAEEQVSATPLPASPNALLSPNDNTIKPVEELGMLEAPLDK
 +
YEESLLDFFLHPEEGRIPSFLYSPPPDFQVNSVIDDDGHTSLHWACSMGHIEMIKLLLRANADIGVCNRL
 +
SQTPLMRSVIFTNNYDCQTFGQVLELLQSTIYAVDTNGQSIFHHIVQSTSTPSKVAAAKYYLDCILEKLI
 +
SIQPFENVVRLVNLQDSNGDTSLLIAARNGAMDCVNSLLSYNANPSIPNRQRRTASEYLLEADKKPHSLL
 +
QSNSNASHSAFSFSGISPAIISPSCSSHAFVKAIPSISSKFSQLAEEYESQLREKEEDLIRANRLKQDTL
 +
NEISRTYQELTFLQKNNPTYSQSMENLIREAQETYQQLSKRLLIWLEARQIFDLERSLKPHTSLSISFPS
 +
DFLKKEDGLSLNNDFKKPACNNVTNSDEYEQLINKLTSLQASRKKDTLYIRKLYEELGIDDTVNSYRRLI
 +
AMSCGINPEDLSLEILDAVEEALTREK
 +
 +
>MBP1_USTMA XP_762343 UM06196 [Ustilago maydis 521]
 +
MSGDKTIFKATYSGVPVYECIINNVAVMRRRSDDWLNATQILKVVGLDKPQRTRVLEREIQKGIHEKVQG
 +
GYGKYQGTWIPLDVAIELAERYNIQGLLQPITSYVPSAADSPPPAPKHTISTSNRSKKIIPADPGALGRS
 +
RRATSIETESEVIGAAPNNVSEGSMSPSPSDISSSSRTPSPLPADRAHPLHANHALAGYNGRDANNHARY
 +
ADIILDYFVTENTTVPSLLINPPPDFNPDMSIDDDEHTALHWACAMGRIRVVKLLLSAGADIFRVNSNQQ
 +
TALMRATIFPNSLSSFTDPSLNIDRNDRTVFHHVVDLALSRGKPHAARYYMETMINRLADYGDQLADILN
 +
FQDDEGETPLTMAARARSKRLVRLLLEHGADPKIRNKEGKNAEDYIIEDERFRSSPSRTGPAGIELGADG
 +
LPVLPTSSLHTSEAGQRTAGRAVTLMSNLLHSLADSYDSEINTAEKKLTQAHGLLKQIQTEIEDSAKVAE
 +
ALHHEAQGVDEERKRVDSLQLALKHAINKRARDDLERRWSEGKQAIKRARLQAGLEPGALSTSNATNAPA
 +
TGDQKSKDDAKSLIEALPAGTNVKTAIAELRKQLSQVQANKTELVDKFVARAREQGTGRTMAAYRRLIAA
 +
GCGGIAPDEVDAVVGVLCELLQESHTGARAGAGGERDDRARDVAMMLKAFPVYSRCIVMNRQLAVTRYPC
 +
CRLLFYSLPCRTNMISGLWMQSDSVAAVLARSNAVLRISPCPKCARMSKLQAHLYEASAARLCGGKMLRR
 +
TLALFSEAARSSSSSSASAAASSSASILTSHLSKAHLPPSLARSAKPHKNLYQMLSTLPKDGVGARVRQR
 +
RWAAKGLDVSHDVDLKAHLAKLHHTGATKTNKDEGHLCYWEITKVRLKDGGNHGKAWGRFVWRERNAGVV
 +
KQGQAQAKLTKVCLSMVVAHPGKPITKAESGERIPGALKYCWDLAH
 +
 +
>MBP1_WALME XP_006957051  [Wallemia mellicola CBS 633.66]
 +
MSAPPIYKACYSGVPVYEFNCKNVAVMKRRSDSWMNATQILKVANFDKPQRTRILEREVQKGTHEKVQGG
 +
YGKYQGTWIPMERSVELARQYRIELLLDPIINYLPGPQSPPLAPKHATNVGSRARKSTAPAAQTLPSTSK
 +
VFHPLSSTKHPAKLAAATNAKAEISDGEDASIPSSPSFKSNSSRTPSPIRINARKRKLEDEATIPSSAID
 +
GSISYEDIILDYFISESTQIPALLIHPPSDFNPNMSIDDEGHTAMHWACAMGKVRVVKLLLSAGADIFRV
 +
NHSEQTALMRSVMFSNNYDIRKFPQLYELLHRSTLNLDKHDRTVLHHIVDLALTKSKTHAARYYMECVLS
 +
KLANYPDELADVINFQDDEGESALTLAARARSKRLVKLLLEHGADSKLPNKDGKTAEDYILEDERFRQSP
 +
LLNSNHLRLHPPDTSIYAPPAHLFNSETSQNIANTSMSSVANLLESLAQSYDKEITQKERDYQQAQVILR
 +
NIKTDIVEAKSNIEKMTIDSSEFEHLKHKLRELEMKLEEHSNDVYNKGWEEYSRNVDDPAIDAPSDNVQE
 +
ECASLRNKIKDLQEKRISSMQELIKRQKEVGTGKKMSEYRKLISVGCGIPTTEIDAVLEMLLESLESENA
 +
NKKAALASGISGALSSTSSAPSQATTSAPTGVATPGAPVPASSEKAGLLPPAPVMQ
 +
 +
</source>
  
  
 
===Ortholog APSES domains===
 
===Ortholog APSES domains===
  
The ortholog APSES domains can be aligned without gaps. They comprise the following sequences:
+
The ortholog APSES domains can be aligned (nearly) without gaps. They comprise the following sequences:
  
>Mbp1_SACCE/2-100 NP_010227
+
<source lang="text">
QIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQG
+
CLUSTAL format alignment by MAFFT L-INS-i (v6.850b)
TWVPLNIAKQLAEKFSVYDQLKPLFDF
 
>Mbp1_ASPNI/2-100 XP_660758
 
NVYSATYSSVPVYEFKIGTDSVMRRRSDDWINATHILKVAGFDKPARTRILEREVQKGVHEKVQGGYGKYQG
 
TWIPLQEGRQLAERNNILDKLLPIFDY
 
>Mbp1_CANAL/2-100 XP_722925
 
QIYSATYSNVPAFEFVTSEGPIMRRKKDSWINATHILKIAKFPKAKRTRILEKDVQTGIHEKVQGGYGKYQG
 
TYVPLDLGAAIARNFGVYDVLKPIFEF
 
>Mbp1_NEUCR/5-103 XP_962967
 
TIYSATYSGVGVYEMEVNNVAVMRRQKDGWVNATQILKVANIDKGRRTKILEKEIQIGEHEKVQGGYGKYQG
 
TWIPFERGLEVCRQYGVEELLSKLLTH
 
>Mbp1_SCHPO/2-100 NP_593032
 
AVHVAVYSGVEVYECFIKGVSVMRRRRDSWLNATQILKVADFDKPQRTRVLERQVQIGAHEKVQGGYGKYQG
 
TWVPFQRGVDLATKYKVDGIMSPILSL
 
>Mbp1_USTMA/2-100 XP_762343
 
TIFKATYSGVPVYECIINNVAVMRRRSDDWLNATQILKVVGLDKPQRTRVLEREIQKGIHEKVQGGYGKYQG
 
TWIPLDVAIELAERYNIQGLLQPITSY
 
  
 +
MBP1_ASPNI      NVYSATYSSVPVYEFKIG---TDSVMRRRSDDWINATHILKVAGFDKPARTRILEREVQKGVHEKVQGGYGKYQGTWIPLQEGRQLAERNNILDKLLPIFDY
 +
MBP1_BIPOR      KIYSATYSNVPVYECNVN---GHHVMRRRADDWINATHILKVADYDKPARTRILEREVQKGVHEKVQGGYGKYQGTWIPLEEGRGLAERNGVLDKMRAIFDY
 +
MBP1_NEUCR      GIYSATYSGIPVWEYQFGVDLKEHVMRRRHDDWVNATHILKAAGFDKPARTRILEREVQKDTHEKIQGGYGRYQGTWIPLEQAEALARRNNIYERLKPIFEF
 +
MBP1_COPCI      QIFKATYSGIPVYEMMCK---GVAVMRRRSDSWLNATQILKVAGFDKPQRTRVLEREVQKGEHEKVQGGYGKYQGTWIPLERGMQLAKQYNCEHLLRPIIEF
 +
MBP1_WALME      PIYKACYSGVPVYEFNCK---NVAVMKRRSDSWMNATQILKVANFDKPQRTRILEREVQKGTHEKVQGGYGKYQGTWIPMERSVELARQYRIELLLDPIINY
 +
MBP1_PUCGR      TIYKATYSGVPVLEMPCE---GIAVMRRRSDSWLNATQILKVAGFDKPQRTRVLEREIQKGTHEKIQGGYGKYQGTWVPLDRGIDLAKQYGVDHLLSALFNF
 +
MBP1_USTMA      TIFKATYSGVPVYECIIN---NVAVMRRRSDDWLNATQILKVVGLDKPQRTRVLEREIQKGIHEKVQGGYGKYQGTWIPLDVAIELAERYNIQGLLQPITSY
 +
MBP1_SCHPO      AVHVAVYSGVEVYECFIK---GVSVMRRRRDSWLNATQILKVADFDKPQRTRVLERQVQIGAHEKVQGGYGKYQGTWVPFQRGVDLATKYKVDGIMSPILSL
 +
MBP1_SACCE      QIYSARYSGVDVYEFIHS---TGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDF
 +
MBP1_CRYNE      KVYASVYSGVPVFEAMIR---GISVMRRASDSWVNATQILKVAGVHKSARTKILEKEVLNGIHEKIQGGYGKYQGTWVPLDRGRDLAEQYGVGSYLSSVFDF
 +
                :. : **.: * *          :*:*  *.*:***:***...  *. **::**:::    ***:***:*:*****:*:: .  ** :      : .: .
 +
</source>
  
 +
 +
When considering the relatively high degree of conservation &ndash; e.g. there are 34 '''fully conserved''' positions (<code>*</code>) in this alignment of 103 positions &ndash; keep in mind that this collection of species represents [http://www.timetree.org/index.php?taxon_a=saccharomyces+cerevisiae&taxon_b=cryptococcus+neoformans&submit=Search on the order of '''a billion years'''] of divergent evolution from a common ancestor.
  
 
&nbsp;
 
&nbsp;
  
 +
===Distant Homologs===
  
&nbsp;
+
APSES domains are a subfamily of [http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=pfam04383 Kil-A N domains]; the latter include examples:
 +
* in Protists (e.g. [http://www.ncbi.nlm.nih.gov/protein/XP_001289194 ''Trichomonas vaginalis'' <code>XP_001289194</code>])
 +
* in DNA viruses (e.g. [http://www.ncbi.nlm.nih.gov/protein/NP_955235 ''Canarypox virus'' <code>NP_955235</code>])
 +
* in Gammaproteobacteria (e.g. [http://www.ncbi.nlm.nih.gov/protein/WP_032936777 ''Escherichia coli'' <code>WP_032936777</code> (KilA)])
 +
 
 +
{{Vspace}}
  
 
[[Category:Bioinformatics]]
 
[[Category:Bioinformatics]]
 
</div>
 
</div>

Latest revision as of 01:31, 9 October 2016

Yeast Mbp1 protein reference annotation


 



A reference annotation of the Saccharomyces cerevisiae Mbp1 protein sequence that integrates annotation sources we encounter throughout the course.



Links



 

FASTA sequences

>gi|6320147|ref|NP_010227.1| Mbp1p [Saccharomyces cerevisiae S288c]
MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF
GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDRKKAIRSASTSAIMET
KRNNKKAEENQFQSSKILGNPTAAPRKRGRPVGSTRGSRRKLGVNLQRSQSDMGFPRPAIPNSSISTTQL
PSIRSTMGPQSPTLGILEEERHDSRQQQPQQNNSAQFKEIDLEDGLSSDVEPSQQLQQVFNQNTGFVPQQ
QSSLIQTQQTESMATSVSSSPSLPTSPGDFADSNPFEERFPGGGTSPIISMIPRYPVTSRPQTSDINDKV
NKYLSKLVDYFISNEMKSNKSLPQVLLHPPPHSAPYIDAPIDPELHTAFHWACSMGNLPIAEALYEAGTS
IRSTNSQGQTPLMRSSLFHNSYTRRTFPRIFQLLHETVFDIDSQSQTVIHHIVKRKSTTPSAVYYLDVVL
SKIKDFSPQYRIELLLNTQDKNGDTALHIASKNGDVVFFNTLVKMGALTTISNKEGLTANEIMNQQYEQM
MIQNGTNQHVNSSNTDLNIHVNTNNIETKNDVNSMVIMSPVSPSDYITYPSQIATNISRNIPNVVNSMKQ
MASIYNDLHEQHDNEIKSLQKTLKSISKTKIQVSLKTLEVLKESSKDENGEAQTNDDFEILSRLQEQNTK
KLRKRLIRYKRLIKQKLEYRQTVLLNKLIEDETQATTNNTVEKDNNTLERLELAQELTMLQLQRKNKLSS
LVKKFEDNAKIHKYRRIIREGTEMNIEEVDSSLDVILQTLIANNNKNKGAEQIITISNANSHA
>1BM8_A
QIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIA
KQLAEKFSVYDQLKPLFDF
>1MB1_A
MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF
GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDHHHHHH

Note: the sequence segments colored grey are disorderd in the protein structure. This generally means they do not contribute significant energy to the fold of the domain. The six histidines at the C-terminus colored in firebrick were added for purification and are not part of the Mbp1 sequence.

>1L3G_A
SNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLN
IAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDKLAAALEHHHHHH

Note: here too the C-terminus is disordered and colored grey and the protein has a purification tag colored in firebrick. However, this being an NMR file, disordered segments are included in the PDB coordinate file and defining the extent of disorder required evaluating the superposed set of models. There are no more structured residues than in the 1MB1 structure, even though the sequence used in the experiment was longer.


>1SW6_A
NDDINKGPSGDNENNGTDDNDRTAGPIITFTHDLTSDFLSSPLKIMKALPSPVVNDNEQKMKLEAFLQRLLFPEIQEMPT
SLNNDSSNRNSEGGSSNQQQQHVSFDSLLQEVNDAFPNTQLNLNIPVDEHGNTPLHWLTSIANLELVKHLVKHGSNRLYG
DNMGESCLVKAVKSVNNYDSGTFEALLDYLYPCLILEDSMNRTILHHIIITSGMTGCSAAAKYYLDILMGWIVKKQNRPI
QSGTNEKESKPNDKNGERKDSILENLDLKWIIANMLNAQDSNGDTCLNIAARLGNISIVDALLDYGADPFIANKSGLRPV
DFGAGLE

Note: this sequence is a part of the Saccharomyces cerevisiae Swi6 protein, which is homologous to Mbp1 but does not contain an APSES domain. Its ankyrin domains have been structurally defined in the 1SW6 PDB file, they do not conform in all details to the canonical Ankyrin domain structure. The sequence segments colored grey are disordered in the protein structure.

Annotations

NCBI CDD APSES and KilA-Ndomain boundaries


The APSES domain is a well-defined type of DNA-binding domain that is ubiquitous in fungi and unique in that kingdom. Structurally it is a member of the Winged Helix-Turn-Helix family. Recently it was found that it is homologous to the somewhat shorter, prokaryotic KilA-N domain; thus the APSES domain was retired from pFam and instances were merged into the KilA-N family. However InterPro has a KilA-N entry but still recognizes the APSES domain.


KilA-N domain boundaries in Mbp1 can be derived from the results of a CDD search with the ID 1BM8_A (the Mbp1 DNA binding domain crystal structure). The KilA-N superfamily domain alignment is returned.


(pfam 04383): KilA-N domain; The amino-terminal module of the D6R/N1R proteins defines a novel, conserved DNA-binding domain (the KilA-N domain) that is found in a wide range of proteins of large bacterial and eukaryotic DNA viruses. The KilA-N domain family also includes the previously defined APSES domain. The KilA-N and APSES domains may also share a common fold with the nucleic acid-binding modules of the LAGLIDADG nucleases and the amino-terminal domains of the tRNA endonuclease.


10 20 30 40 50 60 70 80

....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|

1BM8A 16 IHSTGSIMKRKKDDWVNATHILKAANFAKaKRTRILEKEVLKETHEKVQ---------------GGFGKYQGTWVPLNIA 80

Cdd:pfam04383 3 YNDFEIIIRRDKDGYINATKLCKAAGETK-RFRNWLRLESTKELIEELSeennvdkseiiigrkGKNGRLQGTYVHPDLA 81

90

....*....|....

1BM8A 81 KQLA----EKFSVY 90

Cdd:pfam04383 82 LAIAswisPEFALK 95

Note that CDD and SMART are not consistent in how they apply pFam 04383 to the Mbp1 sequence. See annotation below.

The CDD KilA-N domain definition begins at position 16 of the 1BM8 sequence. But virtually all fungal APSES domains have a longer, structurally defined, conserved N-terminus. Blindly applying the KilA-N domain definition to these proteins would lose important information. For most purposes we will prefer the sequence spanned by the 1BM8_A structure. The sequence is given below, the KilA-N domain is coloured dark green. By this definition the APSES domain is 99 amino acids long and comprises residues 4 to 102 of the NP_010227 sequence.

10 20 30 40 50 60 70 80

....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|

1BM8A 1 QIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIA 80

90

....*....|....*....

1BM8A 81 KQLAEKFSVYDQLKPLFDF 99


 

Yeast APSES domain sequence in FASTA format
>APSES_MBP1 Residues 4-102 of S. cerevisiae Mbp1
QIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRI
LEKEVLKETHEKVQGGFGKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDF


 

Synopsis of ranges
Domain Link Length Boundary Range (Mbp1) Range (1BM8)
 
KilA-N: pfam04383 (CDD) CDD alignment 72 STGSI ... KFSVY 21 - 93 18 - 90
KilA-N: pfam04383 (SMART) Smart main page 79 IHSTG ... YDQLK 19 - 97 16 - 94
KilA-N: SM01252 (SMART) Smart main page 84 TGSIM ... DFTQT 22 - 105 19 - 99...
APSES: Interpro IPR003163 (Interpro) 130 QIYSA ... IRSAS 3 - 133 1 - 99...
APSES (1BM8) 99 QIYSA ... PLFDF 4 - 102 1 - 99


 

NCBI CDD Ankyrin domain boundaries

Derived from the results of a CDD search with the RefSeq ID (NP_010227). There are two, partially overlapping alignments with the profile, which contains 4 ANK repeats each. The aligned sequence is the consensus sequence for the profile (see the CDD output documentation for details).
Alignment 1 - E-value = 1.69e-08
                          10        20        30        40        50        60        70        80
                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
MBP1_SACCE     76 IDPELHTAFHWACSMGNLPIAEALYEAGTSIRSTNSQGQTPLMRSSLFHNsytrrtfPRIFQLLHETVFDIDSQS---QT 152
Cdd:cd00204     3 RDEDGRTPLHLAASNGHLEVVKLLLENGADVNAKDNDGRTPLHLAAKNGH-------LEIVKLLLEKGADVNARDkdgNT 75

                          90       100       110       120       130
                  ....*....|....*....|....*....|....*....|....*....|....*....
MBP1_SACCE    153 VIHHIVKRKSTtpSAVYYLdvvLSKIKDfspqyriellLNTQDKNGDTALHIASKNGDV 211
Cdd:cd00204    76 PLHLAARNGNL--DVVKLL---LKHGAD----------VNARDKDGRTPLHLAAKNGHL 119


Alignment 2 - E-value=8.66e-05
                          10        20        30        40        50        60        70        80        
                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*...
MBP1_SACCE    192 NTQDKNGDTALHIASKNGDVVFFNTLVKMGALTTISNKEGLT----ANEIMNQQYEQMMIQNGTNQHV--NSSNTDLNIHVNTNNIET 273 
Cdd:cd00204     1 NARDEDGRTPLHLAASNGHLEVVKLLLENGADVNAKDNDGRTplhlAAKNGHLEIVKLLLEKGADVNArdKDGNTPLHLAARNGNLDV 88


SMART Annotations

A SMART search with the yeast Mbp1 protein sequence retrieved the APSES domain and three regions of similarity to ankyrin domains, annotated a number of low-complexity regions and a stretch of coiled coil. Annotations have been consolidated below.

SAS Annotations

A SAS FASTA search with yeast Mbp1 protein sequence retrieved the homologous Ankyrin sequence from Swi6 (PDB: 1SW6), together with secondary structure annotations. This structural annotation is based on homology to a protein of known structure. Annotations were consolidated into below.


While CDD, SMART and SAS all annotate the same general regions, they disagree in details of the domain boundaries and on the precise alignment.

Consolidated Annotation

MBP1_SACCE
Annotations based on 
- CDD domain analysis,
- SAS structure annotation and
- literature data on binding region

Keys:

=   domain annotation
C   Coiled coil regions predicted by Coils2 program
x   Low complexity region
*   Proposed binding region
+   positively charged residues, oriented for possible DNA binding interactions
-   negatively charged residues, oriented for possible DNA binding interactions 

E   beta strand
H   alpha helix
t   beta turn
    Sequence that was invisible in the 1SW6 structure is listed in lowercase.


                  10         20         30         40         50         60 
          MSNQIYSARY SGVDVYEFIH STGSIMKRKK DDWVNATHIL KAANFAKAKR TRILEKEVLK
1MB1      ----EEEEEt t-EEEEEEEE t-EEEEEEtt ---EEHHHHH HH----HHHH HHHHhhhHHH
                                                               * *+**-+**** Proposed DNA binding 
pfam04383                    == ========== ========== ========== ========== (CDD alignment)
pfam04383                ====== ========== ========== ========== ========== (SMART alignment) 

                  70         80         90        100        110        120 
          ETHEKVQGGF GKYQGTWVPL NIAKQLAEKF SVYDQLKPLF DFTQTDGSAS PPPAPKHHHA
1MB1      ---EEE---- tt--EEEE-H HHHHHHHHH- --HHHHtt-                       
          **+*+***** ****                                                   Proposed DNA binding
pfam04383 ========== ========== ========== ===                              (CDD alignment)
pfam04383 ========== ========== ========== ========== ========== =          (SMART alignment) 

                 130        140        150        160        170        180 
          SKVDRKKAIR SASTSAIMET KRNNKKAEEN QFQSSKILGN PTAAPRKRGR PVGSTRGSRR
                                                                                      


                 190        200        210        220        230        240 
          KLGVNLQRSQ SDMGFPRPAI PNSSISTTQL PSIRSTMGPQ SPTLGILEEE RHDSRQQQPQ
Low compl.                                                            xxxxx (SMART SEG)


                 250        260        270        280        290        300 
          QNNSAQFKEI DLEDGLSSDV EPSQQLQQVF NQNTGFVPQQ QSSLIQTQQT ESMATSVSSS
Low compl.x                                        xx xxxxxxxxxx xxxxxxxxxx (SMART SEG)


                 310        320        330        340        350        360 
          PSLPTSPGDF ADSNPFEERF PGGGTSPIIS MIPRYPVTSR PQTSDINDKV NKYLSKLVDY
          xxxxxxx                                                           (SMART SEG)
Swi6                       GPII TFTHDLTSDF LSSPLKIMKA LPSPVVNDNE QKM--KL-EA (SAS alignment: 1SW6)
1SW6                       -EEE --tt---ttt ------EE-- ---t---HHH HHH--HH-HH (SAS 2° structure)


                                                370        380        390        400        410        420 
          FISNEMK-------------------------------SNK SLPQVLLHPP PHSAPYIDAP IDPELHTAFH WACSMGNLPI AEALYEAGTS
Swi6      FLQRLLFpeiqemptslnndssnrnseggssnqqqqhvSFD SLLQEVNDAF PNTQLNLNIP VDEHGNTPLH WLTSIANLEL VKHLVKHGSN (SAS alignment: 1SW6)
1SW6      HHHHHH-                               -HH HHHHHHHHH- t-----t--- --t----HHH HHHH--tHHH HHHHHH---- (SAS 2° structure)


                 430        440        450        460        470           480 
          IRSTNSQGQT PLMRSSLFHN SYTRRTFPRI FQLLHETVFD IDSQSQTVIH HIVKRKSTT---P
Swi6      RLYGDNMGES CLVKAVKSVN NYDSGTFEAL LDYLYPCLIL EDSMNRTILH HIIITSGMTGCSA (SAS alignment: 1SW6)
1SW6      t---tt---- HHHHHHH--H HHH---HHHH HHHHHHHHHE E-t----HHH HHHHHH--t--HH (SAS 2° structure)


                 490                                      500        510        520        530        540 
          SAVYYLDVVL-------------------------------SKIKDFSPQY RIELLLNTQD KNGDTALHIA SKNGDVVFFN TLVKMGALTT
Swi6      AAKYYLDILMGWIVKKQNRPIQSGtnekeskpndkngerkDSILENLDLKW IIANMLNAQD SNGDTCLNIA ARLGNISIVD ALLDYGADPF (SAS alignment: 1SW6)
1SW6      HHHHHHHHHHHHHHHHHH--EEE-                -HHHHHt-HHH HHHH------ t----HHHHH HHH--HHHHH HHHH----t- (SAS 2° structure)
 

                 550        560        570        580        590        600 
          ISNKEGLTAN EIMNQQYEQM MIQNGTNQHV NSSNTDLNIH VNTNNIETKN DVNSMVIMSP
Swi6      IANKSGLRPV DFGAG                                                 (SAS alignment: 1SW6)
1SW6      ---t----HH HH---                                                 (SAS 2° structure)


                 610        620        630        640        650        660 
          VSPSDYITYP SQIATNISRN IPNVVNSMKQ MASIYNDLHE QHDNEIKSLQ KTLKSISKTK
Coiled c.                                    CCCCCCCC CCCCCCCCCC CCCCC      (SMART COILS2)
 

                 670        680        690        700        710        720 
          IQVSLKTLEV LKESSKDENG EAQTNDDFEI LSRLQEQNTK KLRKRLIRYK RLIKQKLEYR
Low compl.                                          x xxxxxxxxxx xxxxxxx    (SMART SEG)

                 730        740        750        760        770        780 
          QTVLLNKLIE DETQATTNNT VEKDNNTLER LELAQELTML QLQRKNKLSS LVKKFEDNAK


                 790        800        810        820        830 
          IHKYRRIIRE GTEMNIEEVD SSLDVILQTL IANNNKNKGA EQIITISNAN SHA


Orthologs

The Mbp1 orthologs in the ten fungal reference species.

  • Aspergillus nidulans (ASPNI)
  • Bipolaris oryzae (BIPOR)
  • Coprinopsis cinerea (COPCI)
  • Cryptococcus neoformans (CRYNE)
  • Neurospora crassa (NEUCR)
  • Puccinia Graminis (PUCGR)
  • Saccharomyces cerevisiae (SACCE)
  • Schizosaccharomyces pombe (SCHPO)
  • Ustilago maydis (USTMA)
  • Wallemia mellicola (WALME)

Orthologs were determined by RBM to NP_010227, residues 4 to 102. Uniprot accession numbers were obtained from the Uniprot mapping service.

Species Code Name RefSeq UniProt
Aspergillus nidulans ASPNI AN3154 XP_660758 (FASTA) Q5B8H6
Bipolaris orizae BIPOR COCMIDRAFT_338 XP_007682304 (FASTA) W6ZM86
Coprinopsis cinerea COPCI CC1G_01306 XP_001837394 (FASTA) A8NYC6
Cryptococcus neoformans CRYNE CND05520 XP_570545 (FASTA) Q5KHS0
Neurospora crassa NEUCR NCU07246 XP_955821 (FASTA) Q7RW59
Puccinia graminis PUCGR PGTG_08863 XP_003327086 (FASTA) E3KED4
Saccharomyces cerevisiae SACCE Mbp1 NP_010227 (FASTA) P39678
Schizosaccharomyces pombe SCHPO Res2 NP_593032 (FASTA) P41412
Ustilago maydis USTMA UM06196 XP_762343 (FASTA) Q4P117
Wallemia mellicola WALME WALSEDRAFT_59726 XP_006957051 (FASTA) I4YGC0



FASTA formatted sequences

(Header lines were edited to begin with a code identifying the sequence as orthologous to Mbp1, and including the species shorthand code.)


>MBP1_ASPNI XP_660758 AN3154.2 [Aspergillus nidulans FGSC A4]
MAAVDFSNVYSATYSSVPVYEFKIGTDSVMRRRSDDWINATHILKVAGFDKPARTRILEREVQKGVHEKV
QGGYGKYQGTWIPLQEGRQLAERNNILDKLLPIFDYVAGDRSPPPAPKHTSAASKPRAPKINKRVVKEDV
FSAVNHHRSMGPPSFHHEHYDVNTGLDEDESIEQATLESSSMIADEDMISMSQNGPYSSRKRKRGINEVA
AMSLSEQEHILYGDQLLDYFMTVGDAPEATRIPPPQPPANFQVDRPIDDSGNTALHWACAMGDLEIVKDL
LRRGADMKALSIHEETPLVRAVLFTNNYEKRTFPALLDLLLDTISFRDWFGATLFHHIAQTTKSKGKWKS
SRYYCEVALEKLRTTFSPEEVDLLLSCQDSVGDTAVLVAARNGVFRLVDLLLSRCPRAGDLVNKRGETAS
SIMQRAHLAERDIPPPPSSITMGNDHIDGEVGAPTSLEPQSVTLHHESSPATAQLLSQIGAIMAEASRKL
TSSYGAAKPSQKDSDDVANPEALYEQLEQDRQKIRRQYDALAAKEAAEESSDAQLGRYEQMRDNYESLLE
QIQRARLKERLASTPVPTQTAVIGSSSPEQDRLLTTFQLSRALCSEQKIRRAAVKELAQQRADAGVSTKF
DVHRKLVALATGLKEEELDPMAAELAETLEFDRMNGKGVGPESPEADHKDSASLPFPGPVVSVDA

>MBP1_BIPOR XP_007682304 COCMIDRAFT_338 [Bipolaris oryzae ATCC 44560]
MPPAPDGKIYSATYSNVPVYECNVNGHHVMRRRADDWINATHILKVADYDKPARTRILEREVQKGVHEKV
QGGYGKYQGTWIPLEEGRGLAERNGVLDKMRAIFDYVPGDRSPPPAPKHATAASNRMKPPRQTAAAVAAA
AVAAAAAAAAVANHNALMSNSRSQASEDPYENSQRSQIYREDTPDNETVISESMLGDADLMDMSQYSADG
NRKRKRGMDQMSLLDQQHQIWADQLLDYFMLLDHEAAVSWPEPPPSINLDRPIDEKGHAAMHWAAAMGDV
GVVKELIHRGARLDCLSNNLETPLMRAVMFTNNFDKETMPSMVKIFQQTVHRTDWFGSTVFHHIAATTSS
SNKYVCARWYLDCIINKLSETWIPEEVTRLLNAADQNGDTAIMIAARNGARKCVRSLLGRNVAVDIPNKK
GETADDLIRELNQRRRMHGRTRQASSSPFAPAPEHRLNGHVPHFDGGPLMSVPVPSMAVRESVQYRSQTA
SHLMTKVAPTLLEKCEELATAYEAELQEKEAEFFDAERVVKRRQAELEAVRKQVAELQSMSKGLHIDLND
EEAERQQEDELRLLVEEAESLLEIEQKAELRRLCSSMPQQNSDSSPVDITEKMRLALLLHRAQLERRELV
REVVGNLSVAGMSEKQGTYKKLIAKALGEREEDVESMLPEILQELEEAETQERAEGLDGSPV

>MBP1_COPCI XP_001837394 CC1G_01306 [Coprinopsis cinerea okayama7#130]
MPEAQIFKATYSGIPVYEMMCKGVAVMRRRSDSWLNATQILKVAGFDKPQRTRVLEREVQKGEHEKVQGG
YGKYQGTWIPLERGMQLAKQYNCEHLLRPIIEFTPAAKSPPLAPKHLVATAGNRPVRKPLTTDLSAAVIN
TRSTRKQVADGVGEESDHDTHSLRGSEDGSMTPSPSEASSSSRTPSPIHSPGTYHSNGLDGPSSGGRNRY
RQSNDRYDEDDDASRHNGMGDPRSYGDQILEYFISDTNQIPPILITPPPDFDPNMAIDDDGHTSLHWACA
MGRIRIVKLLLSAGADIFKVNKAGQTALMRSVMFANNYDVRKFPELYELLHRSTLNIDNSNRTVFHHVVD
VAMSKGKTHAARYYMETILTRLADYPKELADVINFQDEDGETALTMAARCRSKRLVKLLIDHGADPKINN
HDGKNAEDYILEDERFRSSPAPSSRVAAMSYRNAQVAYPPPGAPSTYSFAPANHDRPPLHYSAAAQKAST
RCVNDMASMLDSLAASFDQELRDKERDMAQAQALLTNIQAEILESQRTVLQLRQQAEGLSQAKQRLADLE
NALQDKMGRRYRLGFEKWIKDEETREKVIRDAANGDLVLTPATTSYTVDEDGDSDSGSNGDKNKGKRKAQ
VQQEEVSDLVELYSNIPTDPEELRKQCEALREEVSQSRKRRKAMFDELVTFQAEAGTSGRMSDYRRLIAA
GCGGLEPLEIDSVLGMLLETLEAEDPSSTSATWSGSKGQQTG

>MBP1_CRYNE XP_570545 CND05520 [Cryptococcus neoformans var. neoformans JEC21]
MEPPSNPIQPPVTPSHHSLLSAISPALSEQTPAPIHTLPPHLRPSIPQPHIAPPRPSSVQPTMEEQQRMH
HIQQHQQQQHFQQQQNDENVFGSVMGAPGHVPGHEAPMSTQPKVYASVYSGVPVFEAMIRGISVMRRASD
SWVNATQILKVAGVHKSARTKILEKEVLNGIHEKIQGGYGKYQGTWVPLDRGRDLAEQYGVGSYLSSVFD
FVPSASVIAALPVIRTGTPDRSGQQTPSGLPGHPNQRVISPFANHGQTTPHMPPPQFIHQGNEQMMNLPP
HPSSLAYPTQPKPYFSMPLQHTVGPQYDERHEGMTMTPTMSMDGLAPPADIARMGFPYNPSDIYIDQYGQ
PHATYQASPYGKESGHPSKRQRSDAEGSYIESGAAVQQHVEQDEEADDGLDNDSTASDDARDPPPLPSSM
LLPHKPIRPKATPANGRIKSRLVQIFNVEGQVNLRSVFGLAPDQLPNFDIDMVIDDQGHSALHWACALAR
LSIVQQLIELGADIHRGNYAGETPLIRAVLTSNHAEAGSFTDLLHLLSPSIRTLDHAYRTVLHHIALVAG
VKGRVPAARTYMASVLEWVAREQQANNTHSITNPPNPADRNELAPINLRTLVDVQDVHGDTALNVAARVG
NKGLVGLLLDAGADKTRANKLGLRPENFGLEIEALKISNGEAVMANLKSEVSKPERKSRDVQKNIATIFE
SISSTFSSEMLAKQTKLNATEASVRHATRALADKRQHLHRAQEKLATMQLFEQRSENVRRIMDAIAAGTL
LTPAEFTGRTQTMHEKSTGQLPPLAFRHVPGLALDASSQSQLNGAPPSTPLSVEDQEDIALPERDDPECL
VKLRRMALWEDRIAEVLEDKIRAMEGEGVDRAVKYRKLVSVCAKVPVDKVDSMLDGLVAAVESEGQGLDF
SRASNFVNRIKATKS

>MBP1_NEUCR XP_955821 NCU07246 [Neurospora crassa OR74A]
MVKENVGGNPEPGIYSATYSGIPVWEYQFGVDLKEHVMRRRHDDWVNATHILKAAGFDKPARTRILEREV
QKDTHEKIQGGYGRYQGTWIPLEQAEALARRNNIYERLKPIFEFQPGNESPPPAPRHASKPKAPKVKPAV
PTWGSKSAKNANPPQPGTFLPPGRKGLPAQAPDYNDADTHMHDDDTPDNLTVASASYMAEDDRYDHSHFS
TGHRKRKRDELIEDMTEQQHAVYGDELLDYFLLSRNEQPAVRPDPPPNFKPDWPIDNERHTCLHWASAMG
DVDVMRQLKKFGASLDAQNVRGETPFMRAVNFTNCFEKQTFPQVMKELFSTIDCRDLSGCTVIHHAAVMK
IGRVNSQSCSRYYLDIILNRLQETHHPEFVQQLLDAQDNDGNTAVHLAAMRDARKCIRALLGRGASTDIP
NKQGIRAEELIKELNASISKSRSNLPQRSSSPFAPDTQRHDAFHEAISESMVTSRKNSQPNYSSDAANTV
QNRITPLVLQKLKDLTATYDSEFKEKDDAEKEARRILNKTQSELKALTASIDDYNSRLDTDDVAAKTAAE
MATARHKVLAFVTHQNRISVQEAVKQELAALDRANAVTNGTSTKSKSSSPSKKPKLSPIPDQKDKPPKDE
NETESEAEHPDPPAAQAHQQQPGPSSQDTEVEDQDREEEEDDYTHRLSLAAELRSILQEQRSAENDYVEA
RGMLGTGERIDKYKHLLMSCLPPDEQENLEENLEEMIKLMEQEDESVTDLPAGAVGGGGGGNAADGSGGG
GQPSNGRRESVLPALRGGNGDGEMSRRGSRTAAAAAAQVDGEREINGRAGAERTERIQEIAAV

>MBP1_PUCGR XP_003327086 PGTG_08863 [Puccinia graminis f. sp. tritici CRL 75-36-700-3]
MAYGGSIQPLRPPSRESATLHLHQPDLTVTSPPLSLTHCPPCVYSHFTHTPTSLIVIQVSLHSLLDQETY
HLLPSRSPPTVSVRMGTTTIYKATYSGVPVLEMPCEGIAVMRRRSDSWLNATQILKVAGFDKPQRTRVLE
REIQKGTHEKIQGGYGKYQGTWVPLDRGIDLAKQYGVDHLLSALFNFQPSSNESPPLAPKHVTALSTRVK
VSKVSAASAARAARAVVPSLPSTSGLGGRNTNNSWSNFDSDNEPGLPPAASSRESNGNWATQSKLARSSN
LARARANINNSHPEDLPVPAPDQLQASPLPSMQTADPENDNSLTPSELSLPSRTPSPIEDLPLTVNTASS
QSTRNKGKSRDLPDDEDLSRGQKRKYDTSLVEDTSYSDGADDQYINGNPSNAASAKYAKLILDYFVSESS
QIPNFLNDPPSDFDPNVVIDDDGHTALHWACAMGRIKIIKLLLTCGADIFRANNAGQTALMRAVMFTNNH
DLRTFPELFESFSGSVINIDRTDRTVFHYVIDIALTKGKVPAARYYLETILSQLSEYPKELIDILNFQDE
DGETALTLAARCRSKKLVKILLDHGANPKTANRDGKSAEDYILEDDKFRALSPTPCSSGPIRQLDQNSPG
GTSNRSDFVDLVDPVPIDSNLIPQRSPNASPPHYSETGQRVTKQLLPEVTSMIELLATTFDTELQDKERD
LDHAVGLLSNIEKEYLEGQRKILNYERMLSDFGEKKLALGDLEKELNDKLGKRYRFGWEKYVRDEEERAR
RITEQRSKYLQELSIEDRKLLDSSNLRFADPSKQEVLMKLQADERENSDLLNLIRTNSTDVESECDLLRE
SVQKLSEERERLFKEFINLSSENTGGENEEDDGANHTSANTSRLNNYRKLISLGCGGIGLDEVDEVIESL
NEGIDVNELNDNGFLTEQDEELGNHQNYHNIHTQGR

>MBP1_SACCE NP_010227 Mbp1p [Saccharomyces cerevisiae S288c]
MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF
GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDRKKAIRSASTSAIMET
KRNNKKAEENQFQSSKILGNPTAAPRKRGRPVGSTRGSRRKLGVNLQRSQSDMGFPRPAIPNSSISTTQL
PSIRSTMGPQSPTLGILEEERHDSRQQQPQQNNSAQFKEIDLEDGLSSDVEPSQQLQQVFNQNTGFVPQQ
QSSLIQTQQTESMATSVSSSPSLPTSPGDFADSNPFEERFPGGGTSPIISMIPRYPVTSRPQTSDINDKV
NKYLSKLVDYFISNEMKSNKSLPQVLLHPPPHSAPYIDAPIDPELHTAFHWACSMGNLPIAEALYEAGTS
IRSTNSQGQTPLMRSSLFHNSYTRRTFPRIFQLLHETVFDIDSQSQTVIHHIVKRKSTTPSAVYYLDVVL
SKIKDFSPQYRIELLLNTQDKNGDTALHIASKNGDVVFFNTLVKMGALTTISNKEGLTANEIMNQQYEQM
MIQNGTNQHVNSSNTDLNIHVNTNNIETKNDVNSMVIMSPVSPSDYITYPSQIATNISRNIPNVVNSMKQ
MASIYNDLHEQHDNEIKSLQKTLKSISKTKIQVSLKTLEVLKESSKDENGEAQTNDDFEILSRLQEQNTK
KLRKRLIRYKRLIKQKLEYRQTVLLNKLIEDETQATTNNTVEKDNNTLERLELAQELTMLQLQRKNKLSS
LVKKFEDNAKIHKYRRIIREGTEMNIEEVDSSLDVILQTLIANNNKNKGAEQIITISNANSHA

>MBP1_SCHPO NP_593032 Res2 [Schizosaccharomyces pombe 972h-]
MAPRSSAVHVAVYSGVEVYECFIKGVSVMRRRRDSWLNATQILKVADFDKPQRTRVLERQVQIGAHEKVQ
GGYGKYQGTWVPFQRGVDLATKYKVDGIMSPILSLDIDEGKAIAPKKKQTKQKKPSVRGRRGRKPSSLSS
STLHSVNEKQPNSSISPTIESSMNKVNLPGAEEQVSATPLPASPNALLSPNDNTIKPVEELGMLEAPLDK
YEESLLDFFLHPEEGRIPSFLYSPPPDFQVNSVIDDDGHTSLHWACSMGHIEMIKLLLRANADIGVCNRL
SQTPLMRSVIFTNNYDCQTFGQVLELLQSTIYAVDTNGQSIFHHIVQSTSTPSKVAAAKYYLDCILEKLI
SIQPFENVVRLVNLQDSNGDTSLLIAARNGAMDCVNSLLSYNANPSIPNRQRRTASEYLLEADKKPHSLL
QSNSNASHSAFSFSGISPAIISPSCSSHAFVKAIPSISSKFSQLAEEYESQLREKEEDLIRANRLKQDTL
NEISRTYQELTFLQKNNPTYSQSMENLIREAQETYQQLSKRLLIWLEARQIFDLERSLKPHTSLSISFPS
DFLKKEDGLSLNNDFKKPACNNVTNSDEYEQLINKLTSLQASRKKDTLYIRKLYEELGIDDTVNSYRRLI
AMSCGINPEDLSLEILDAVEEALTREK

>MBP1_USTMA XP_762343 UM06196 [Ustilago maydis 521]
MSGDKTIFKATYSGVPVYECIINNVAVMRRRSDDWLNATQILKVVGLDKPQRTRVLEREIQKGIHEKVQG
GYGKYQGTWIPLDVAIELAERYNIQGLLQPITSYVPSAADSPPPAPKHTISTSNRSKKIIPADPGALGRS
RRATSIETESEVIGAAPNNVSEGSMSPSPSDISSSSRTPSPLPADRAHPLHANHALAGYNGRDANNHARY
ADIILDYFVTENTTVPSLLINPPPDFNPDMSIDDDEHTALHWACAMGRIRVVKLLLSAGADIFRVNSNQQ
TALMRATIFPNSLSSFTDPSLNIDRNDRTVFHHVVDLALSRGKPHAARYYMETMINRLADYGDQLADILN
FQDDEGETPLTMAARARSKRLVRLLLEHGADPKIRNKEGKNAEDYIIEDERFRSSPSRTGPAGIELGADG
LPVLPTSSLHTSEAGQRTAGRAVTLMSNLLHSLADSYDSEINTAEKKLTQAHGLLKQIQTEIEDSAKVAE
ALHHEAQGVDEERKRVDSLQLALKHAINKRARDDLERRWSEGKQAIKRARLQAGLEPGALSTSNATNAPA
TGDQKSKDDAKSLIEALPAGTNVKTAIAELRKQLSQVQANKTELVDKFVARAREQGTGRTMAAYRRLIAA
GCGGIAPDEVDAVVGVLCELLQESHTGARAGAGGERDDRARDVAMMLKAFPVYSRCIVMNRQLAVTRYPC
CRLLFYSLPCRTNMISGLWMQSDSVAAVLARSNAVLRISPCPKCARMSKLQAHLYEASAARLCGGKMLRR
TLALFSEAARSSSSSSASAAASSSASILTSHLSKAHLPPSLARSAKPHKNLYQMLSTLPKDGVGARVRQR
RWAAKGLDVSHDVDLKAHLAKLHHTGATKTNKDEGHLCYWEITKVRLKDGGNHGKAWGRFVWRERNAGVV
KQGQAQAKLTKVCLSMVVAHPGKPITKAESGERIPGALKYCWDLAH

>MBP1_WALME XP_006957051  [Wallemia mellicola CBS 633.66]
MSAPPIYKACYSGVPVYEFNCKNVAVMKRRSDSWMNATQILKVANFDKPQRTRILEREVQKGTHEKVQGG
YGKYQGTWIPMERSVELARQYRIELLLDPIINYLPGPQSPPLAPKHATNVGSRARKSTAPAAQTLPSTSK
VFHPLSSTKHPAKLAAATNAKAEISDGEDASIPSSPSFKSNSSRTPSPIRINARKRKLEDEATIPSSAID
GSISYEDIILDYFISESTQIPALLIHPPSDFNPNMSIDDEGHTAMHWACAMGKVRVVKLLLSAGADIFRV
NHSEQTALMRSVMFSNNYDIRKFPQLYELLHRSTLNLDKHDRTVLHHIVDLALTKSKTHAARYYMECVLS
KLANYPDELADVINFQDDEGESALTLAARARSKRLVKLLLEHGADSKLPNKDGKTAEDYILEDERFRQSP
LLNSNHLRLHPPDTSIYAPPAHLFNSETSQNIANTSMSSVANLLESLAQSYDKEITQKERDYQQAQVILR
NIKTDIVEAKSNIEKMTIDSSEFEHLKHKLRELEMKLEEHSNDVYNKGWEEYSRNVDDPAIDAPSDNVQE
ECASLRNKIKDLQEKRISSMQELIKRQKEVGTGKKMSEYRKLISVGCGIPTTEIDAVLEMLLESLESENA
NKKAALASGISGALSSTSSAPSQATTSAPTGVATPGAPVPASSEKAGLLPPAPVMQ


Ortholog APSES domains

The ortholog APSES domains can be aligned (nearly) without gaps. They comprise the following sequences:

CLUSTAL format alignment by MAFFT L-INS-i (v6.850b)

MBP1_ASPNI      NVYSATYSSVPVYEFKIG---TDSVMRRRSDDWINATHILKVAGFDKPARTRILEREVQKGVHEKVQGGYGKYQGTWIPLQEGRQLAERNNILDKLLPIFDY
MBP1_BIPOR      KIYSATYSNVPVYECNVN---GHHVMRRRADDWINATHILKVADYDKPARTRILEREVQKGVHEKVQGGYGKYQGTWIPLEEGRGLAERNGVLDKMRAIFDY
MBP1_NEUCR      GIYSATYSGIPVWEYQFGVDLKEHVMRRRHDDWVNATHILKAAGFDKPARTRILEREVQKDTHEKIQGGYGRYQGTWIPLEQAEALARRNNIYERLKPIFEF
MBP1_COPCI      QIFKATYSGIPVYEMMCK---GVAVMRRRSDSWLNATQILKVAGFDKPQRTRVLEREVQKGEHEKVQGGYGKYQGTWIPLERGMQLAKQYNCEHLLRPIIEF
MBP1_WALME      PIYKACYSGVPVYEFNCK---NVAVMKRRSDSWMNATQILKVANFDKPQRTRILEREVQKGTHEKVQGGYGKYQGTWIPMERSVELARQYRIELLLDPIINY
MBP1_PUCGR      TIYKATYSGVPVLEMPCE---GIAVMRRRSDSWLNATQILKVAGFDKPQRTRVLEREIQKGTHEKIQGGYGKYQGTWVPLDRGIDLAKQYGVDHLLSALFNF
MBP1_USTMA      TIFKATYSGVPVYECIIN---NVAVMRRRSDDWLNATQILKVVGLDKPQRTRVLEREIQKGIHEKVQGGYGKYQGTWIPLDVAIELAERYNIQGLLQPITSY
MBP1_SCHPO      AVHVAVYSGVEVYECFIK---GVSVMRRRRDSWLNATQILKVADFDKPQRTRVLERQVQIGAHEKVQGGYGKYQGTWVPFQRGVDLATKYKVDGIMSPILSL
MBP1_SACCE      QIYSARYSGVDVYEFIHS---TGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDF
MBP1_CRYNE      KVYASVYSGVPVFEAMIR---GISVMRRASDSWVNATQILKVAGVHKSARTKILEKEVLNGIHEKIQGGYGKYQGTWVPLDRGRDLAEQYGVGSYLSSVFDF
                 :. : **.: * *          :*:*  *.*:***:***...  *. **::**:::    ***:***:*:*****:*:: .  ** :      : .: .


When considering the relatively high degree of conservation – e.g. there are 34 fully conserved positions (*) in this alignment of 103 positions – keep in mind that this collection of species represents on the order of a billion years of divergent evolution from a common ancestor.

 

Distant Homologs

APSES domains are a subfamily of Kil-A N domains; the latter include examples: