Difference between revisions of "Reference APSES domains (reference species)"

From "A B C"
Jump to navigation Jump to search
m
 
(54 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
<div id="BIO">
 +
<div class="b1">
 +
Reference APSES domains
 +
</div>
 +
 
__NOTOC__
 
__NOTOC__
  
;Multi FASTA file of all APSES domains in fungal proteins.
 
  
====Executing the PSI-BLAST search====
+
<div class="alert">
 +
The species used on this page are not the current set of [[Reference_species_for_fungi|reference species]]. Proceed with caution.
 +
</div>
 +
 
 +
<section begin=contents_summary />
 +
Sequences of APSES domains in the fungal reference species - domain definition, PSI-BLAST search, and header editing.
 +
<section end=contents_summary />
 +
 
 +
 
 +
The APSES domain proteins were determined with a PSI-BLAST search in the refseq database, using 1BM8_A as the search sequence, and restricting the search to the [[Reference species for fungi]].
  
The starting point of this list is a BLAST search with '''one''' known APSES domain sequence. This query sequence - the Mbp1 APSES domain - was defined as follows, based on [http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=66020 Pfam profile 02292: APSES].
 
  
>Yeast Mbp1 APSES domain (AA 24..102 of NP_010227)
 
SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKY
 
QGTWVPLNIAKQLAEKFSVYDQLKPLFDF
 
  
A PSI-BLAST search was executed, searching in the '''nr''' subset of GenPept without further restrictions. The default parameters for PSI-BLAST were used, except for using the BLOSUM45 matrix and reducing the Evalue to 1.0 from 10.0.
 
  
The search converged after 6 iterations, i.e. PSI-BLAST had found no additional new hits above the inclusion threshold E-value of 0.005. 164 sequences were found and contributed to the profile. However, some of these sequences are redundant, i.e. they are matches to the same amino acid sequence in different database entries, and some of these sequences are from organisnms other than the ones we are considering in the assignment. Even if these latter sequences  are removed, it was appropriate to keep them included initially: they contribute to the information in the PSI-BLAST search profile and improve the sensitivity and specificity of the search.
+
* see also: [[Reference APSES proteins (reference species)]]
  
It would certainly not be impossible - albeit somewhat tedious - to manually edit the list of proteins by checking/unchecking which hits to include. I have written a short Perl script for this task solely to be able to rename the sequences at the same time. This is not required; RefSeq / GenPept accession numbers will do just fine to name the sequences, but the final analysis is easier to do if the sequence labels actually tell us something about the organisms they came from and which other sequence they might be similar to.
 
  
After removing redundant sequences, sequence fragments that did not span the entire Mbp1 APSES domain, and sequences from fungi that are not in the list of organisms for this course, 69 sequences remained for analysis.
+
===Executing the PSI-BLAST search===
 +
====Defining the APSES Domain sequence====
  
<!--TODO:  In the next version of assignment, spend some time to carefully follow up on Xbp1 hits; I've left them out fo now since a) they don't find APSES with RPS-BLAST at CDD, and B) this simplifies the phylogenetics... -->
+
;The APSES domain "proper"
  
====Constructing the multi-FASTA file====
+
#Navigate to the [http://www.ncbi.nlm.nih.gov/blast NCBI BLAST page], accessed '''protein BLAST''';
 +
#Follow the link to '''protein BLAST''' and enter the yeast Mbp1 refseq ID NP_010227 into the input form;
 +
#Select the '''PHI-BLAST''' algorithm to search for domains in the sequence and '''Run BLAST''';
 +
#Click on the graphical summary of the result to access the '''CDD conserved domains''' report for the sequence;
 +
#Click on the (+) sign next to the link to [http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?ascbin=8&maxaln=10&seltype=2&uid=190963 KilA-N(pfam 04383)] domain to display the query/profile alignment. This is what it looks like:
  
A multi-FASTA file is the default input format for many MSA programs, it is simply a file that contains more than one FASTA formatted sequence.
+
<table>
 +
<tr><td>
 +
<font color=#700777>                          10        20        30        40        50        60        70        80</font>
 +
<font color=#700777>                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|</font>
 +
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&doptcmdl=GenPept&db=Protein&term=6320147 gi 6320147]    <font color=#229922> 19 </font><font color=#2233CC>IHSTGS</font><font color=#FF4466>I</font><font color=#2233CC>MK</font><font color=#FF4466>R</font><font color=#2233CC>K</font><font color=#FF4466>KD</font><font color=#2233CC>DWV</font><font color=#FF4466>NAT</font><font color=#2233CC>HIL</font><font color=#FF4466>KAA</font><font color=#2233CC>NFAKAKRTRI</font><font color=#FF4466>L</font><font color=#2233CC>EK</font><font color=#FF4466>E</font><font color=#2233CC>VL</font><font color=#FF4466>KE</font><font color=#2233CC>TH</font><font color=#FF4466>E</font><font color=#2233CC>KVQ</font><font color=#888888>---------------</font><font color=#FF4466>G</font><font color=#2233CC>GF</font><font color=#FF4466>G</font><font color=#2233CC>KY</font><font color=#FF4466>QGT</font><font color=#2233CC>W</font><font color=#FF4466>V</font><font color=#2233CC>PLNI</font><font color=#FF4466>A</font> <font color=#229922>83</font>
 +
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&doptcmdl=GenPept&db=cdd&term=pfam04383 Cdd:pfam04383] <font color=#229922>  3 </font><font color=#2233CC>YNDFEI</font><font color=#FF4466>I</font><font color=#2233CC>IR</font><font color=#FF4466>R</font><font color=#2233CC>D</font><font color=#FF4466>KD</font><font color=#2233CC>GYI</font><font color=#FF4466>NAT</font><font color=#2233CC>KLC</font><font color=#FF4466>KAA</font><font color=#2233CC>GAKGKRFRNW</font><font color=#FF4466>L</font><font color=#2233CC>RL</font><font color=#FF4466>E</font><font color=#2233CC>ST</font><font color=#FF4466>KE</font><font color=#2233CC>LI</font><font color=#FF4466>E</font><font color=#2233CC>ELS</font><font color=#888888>kennpdkliiienrk</font><font color=#FF4466>G</font><font color=#2233CC>KG</font><font color=#FF4466>G</font><font color=#2233CC>RL</font><font color=#FF4466>QGT</font><font color=#2233CC>Y</font><font color=#FF4466>V</font><font color=#2233CC>HPDL</font><font color=#FF4466>A</font> <font color=#229922>82</font>
 +
 +
 +
<font color=#700777>                          90</font>
 +
<font color=#700777>                  ....*....|....</font>
 +
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&doptcmdl=GenPept&db=Protein&term=6320147 gi 6320147]    <font color=#229922> 84 </font><font color=#2233CC>KQL</font><font color=#FF4466>A</font><font color=#888888>----</font><font color=#2233CC>EK</font><font color=#FF4466>F</font><font color=#2233CC>SVY</font> <font color=#229922>93</font>
 +
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&doptcmdl=GenPept&db=cdd&term=pfam04383 Cdd:pfam04383] <font color=#229922> 83 </font><font color=#2233CC>LAI</font><font color=#FF4466>A</font><font color=#888888>swis</font><font color=#2233CC>PE</font><font color=#FF4466>F</font><font color=#2233CC>ALK</font> <font color=#229922>96</font>
 +
</td></tr>
 +
</table>
  
The PSI-BLAST search has already defined the sequences from each source protein that are similar to the APSES search profile. We only need to extract them in a convenient way from the search results. NCBI offers a number of options to format the result page: they are presented from alink at the top of the results page: " Reformat these Results": the principal options we have several options to display are:
+
This gives us the following APSES domain sequence:
 +
 
 +
>Yeast Mbp1 APSES domain (AA 19..93 of NP_010227)
 +
IHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQG
 +
GFGKYQGTWVPLNIAKQLAEKFSVY
 +
 
 +
====Searching for APSES domains====
 +
 
 +
 
 +
A PSI-BLAST search was executed, searching in the '''refseq''' subset of the NCBI protein database and restricting the species to the six fungal reference species plus ''Escherichia coli''. The latter was chosen to retrieve the KilA-N domain sequence which we need as an outgroup for phylogenetic analysis.
 +
 
 +
The search converged after 5 iterations in which matches of less than 80% of the query length were manually removed, even if they had low E-values. Also, care was taken not to include false positives and thus to avoid profile corruption, and hits with E > 10<sup>-4</sup> were also removed. The check-boxes next to the alignments were used to select sequences with > 80% coverage to the query and only the highest-scoring KilA-N domain protein was kept. Clicking on '''Get selected sequences''' created a results page of 27 sequences. These were then displayed in a FASTA(text) format and their headers were slightly edited to create the dataset [[Reference APSES proteins (reference species)]].
 +
 
 +
===Constructing the multi-FASTA file===
 +
 
 +
 
 +
A multi-FASTA file is the default input format for many MSA programs, it is simply a file that contains more than one FASTA formatted sequence. To generate the multi-FASTA file of APSES domains, we could have simply edited the full length proteins manually. But there is a simpler way to achieve this. The PSI-BLAST search has already defined the sequences from each source protein that are similar to the APSES search profile. We only need to extract them in a convenient way from the search results. NCBI offers a number of options to format the BLAST result page: they are presented from a link at the top of the BLAST results page: "Formatting options": the principal options for the format are:
  
 
*'''Pairwise''': the default
 
*'''Pairwise''': the default
Line 33: Line 74:
 
*'''hit-table''': this gives only the numerical parameters describing the quality of the matches.
 
*'''hit-table''': this gives only the numerical parameters describing the quality of the matches.
  
When we select the  '''flat-query anchored with/without identitites''' option, it is reasonably straightforward to obtain the aligned sequences, copy and paste them into a Word document and convert that into a multi-FASTA format with a few Edit > Replace commands.
+
When we select the  '''Flat-query anchored with letters for identitites''' option, it is reasonably straightforward to obtain the aligned sequences, copy and paste them into a Word document and convert that into a multi-FASTA format with a few Edit > Replace commands.
 +
 
 +
===Renaming sequences===
 +
 
 +
 
 +
To make the interpretation of alignments and gene trees easier, all ''Saccharomyces cerevisiaea'' sequences were labelled with their gene name  (e.g. <code>Sok2_SACCE</code>). Sequences that are presumed to be functionally equivalent orthologues to Mbp1 were identified through the ''Reciprocal Best Match'' (RBM) criterion and labeled as <code>Mbp1_NNNNN</code>. All other sequences were named <code>APS1_</code>, <code>APS2_</code>, <code>APS3_</code> ... - as required.  (e.g. <code>APS1_USTMA</code>). There is no further significance in the numbers, ''i.e.'' <code>APS1_USTMA</code> is not necessarily an RBM to <code>APS1_SCHPO</code>. Note that such relabeling of sequences does not change the data or its interpretation, it is just helpful to interpret the tree.
 +
 
 +
===The final 27 APSES domain reference sequences===
 +
 
 +
 
 +
>KILA_ESCCO ZP_07189117 KilA-N domain protein
 +
IDGEIIHLRAKDGYINATSMCRTAGKLLSDYTRLKTTQEFFDELSRDMGIPISELIQSFKGGRPENQGTW
 +
VHPDIAINLAQ
 +
 +
>MBP1_SACCE NP_010227 Mbp1
 +
IHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIAKQLAE
 +
KFSVY
 +
 +
>MBP1_USTMA XP_762343 UM06196
 +
IINNVAVMRRRSDDWLNATQILKVVGLDKPQRTRVLEREIQKGIHEKVQGGYGKYQGTWIPLDVAIELAE
 +
RYNI
 +
 +
>MBP1_NEUCR XP_955821 NCU07246
 +
VMRRRHDDWVNATHILKAAGFDKPARTRILEREVQKDTHEKIQGGYGRYQGTWIPLEQAEALARRNNIY
 +
 +
>MBP1_ASPNI XP_660758.1  AN3154
 +
IGTDSVMRRRSDDWINATHILKVAGFDKPARTRILEREVQKGVHEKVQGGYGKYQGTWIPLQEGRQLAER
 +
NNI
 +
 +
>MBP1_SCHPO NP_593032 MBF transcription factor complex subunit Res2
 +
IKGVSVMRRRRDSWLNATQILKVADFDKPQRTRVLERQVQIGAHEKVQGGYGKYQGTWVPFQRGVDLATK
 +
YKV
 +
 +
>MBP1_CANAL XP_723071 potential DNA binding component of MBF
 +
VTSEGPIMRRKKDSWINATHILKIAKFPKAKRTRILEKDVQTGIHEKVQGGYGKYQGTYVPLDLGAAIAR
 +
NFGVY
 +
 +
>APS1_NEUCR XP_962967 NCU07587
 +
VNNVAVMRRQKDGWVNATQILKVANIDKGRRTKILEKEIQIGEHEKVQGGYGKYQGTWIPFERGLEVCRQ
 +
YGV
 +
 +
>APS1_CANAL XP_712970 potential DNA binding component of SBF
 +
MMNESSIMRRCKDDWVNATQILKCCNFPKAKRTKILEKGVQQGLHEKVQGGFGRFQGTWIPLEDARKLAK
 +
TYGV
 +
 +
>APS1_SCHPO NP_595496 MBF transcription factor complex subunit Res1
 +
INGFPLMKRCHDNWLNATQILKIAELDKPRRTRILEKFAQKGLHEKIQGGCGKYQGTWVPSERAVELAHE
 +
YNVF
 +
 +
>APS2_ASPNI XP_664319 hypothetical protein AN6715
 +
VNGVAVMKRRSDGWLNATQILKVAGVVKARRTKTLEKEIAAGEHEKVQGGYGKYQGTWVNYQRGVELCRE
 +
YHV
 +
 +
>APS2_USTMA XP_761485 UM05338
 +
VRGIAVMRRRGDGWLNATQILKIAGIEKTRRTKILEKSILTGEHEKIQGGYGKFQGTWIPLQRAQQVAAE
 +
YNV
 +
 +
>SWI4_SACCE NP_011036 Swi4p
 +
TKIVMRRTKDDWINITQVFKIAQFSKTKRTKILEKESNDMQHEKVQGGYGRFQGTWIPLDSAKFLVNKYE
 +
I
 +
 +
>APS3_SCHPO NP_596132 MBF transcription factor complex subunit Cdc10
 +
GDNVALRRCPDSYFNISQILRLAGTSSSENAKELDDIIESGDYENVDSKHPQIDGVWVPYDRAISIAKR
 +
YGVY
 +
 +
>APS3_CANAL XP_714237 potential DNA binding regulator of filamentous growth
 +
NNVSVVRRADNNMINGTKLLNVAQMTRGRRDGILKSEKVRHVVKIGSMHLKGVWIPFERALAMAQREQI
 +
 +
>SOK2_SACCE NP_013729 Sok2p
 +
NGISVVRRADNDMVNGTKLLNVTKMTRGRRDGILKAEKIRHVVKIGSMHLKGVWIPFERALAIAQREKI
 +
 +
>APS3_ASPNI XP_663440 STUA CELL PATTERN FORMATION-ASSOCIATED PROTEIN
 +
GVCVARREDNGMINGTKLLNVAGMTRGRRDGILKSEKVRNVVKIGPMHLKGVWIPFDRALEFANKEKI
 +
 +
>PHD1_SACCE NP_012881 Phd1p
 +
NGISVVRRADNNMINGTKLLNVTKMTRGRRDGILRSEKVREVVKIGSMHLKGVWIPFERAYILAQREQI
 +
 +
>APS4_CANAL XP_710918 CaO19.5210
 +
LNNHWVIWDYETGWVHLTGIWKASLTIDGSNVSPSHLKADIVKLLESTPKEYQQYIKRIRGGFLKIQGTW
 +
LPYKLCKILARRFCYY
 +
 +
>APS3_NEUCR XP_960837 NCU01414
 +
GICVARREDNAMINGTKLLNVAGMTRGRRDGILKSEKVRHVVKIGPMHLKGVWIPFERALDFANKEKI
 +
 +
>APS5_CANAL XP_711513 potential DNA binding protein
 +
NILVSRREDTNYINGTKLLNVIGMTRGKRDGILKTEKIKNVVKVGSMNLKGVWIPFDRAYEIARNEGV
 +
 +
>APS4_ASPNI XP_663009 AN5405
 +
TVMWDYNIGLVRTTHLFKCNDYSKTTPAKMLNQNPGLRDICHSITGGALAAQGYWMPYEAAKAIAATFC
 +
 +
>APS3_USTMA XP_760925 UM04778
 +
VRGHTMMIDVDTSFVRFTSITQALGKNKVNFGRLVKTCPALDPHITKLKGGYLSIQGTWLPFDLAKELSR
 +
R
 +
 +
>APS4_SCHPO NP_596166
 +
HFLMRMAKDSSISATSMFRSAFPKATQEEEDLEMRWIRDNLNPIEDKRVAGLWVPPADALALAKDYSM
 +
 +
>APS6_CANAL XP_723412 potential transcriptional co-activator
 +
HGEIIVLRRVQDSFVNVTQLFQILIKLEVLPTSQVDNYFDNEILSNLKYFGSSSNTPQYLDLRKHQNIYL
 +
QGIWIPYDKAVNLALKFDIY
 +
 +
>APS4_NEUCR XP_962267 NCU06560
 +
FLMRRSQDGYISATGMFKATFPYASQEEEEAERKYIKSIPTTSSEETAGNVWIPPEQALILAEEYQI
 +
 +
>APS5_ASPNI XP_657766 AN0162
 +
TYFLMRRSKDGYVSATGMFKIAFPWAKLEEERSEREYLKTRPETSEDEIAGNVWISPVLALELAAEYKMY
 +
 
 +
 
 +
===Mbp1 orthologue reference alignment===
 +
 
 +
This is a reference alignment of the APSES domains of those proteins that fulfilled the ''Reciprocal Best Match'' criterion with yeast Mbp1.
 +
 
 +
CLUSTAL format alignment by MAFFT L-INS-1 (v6.850b)
 +
 +
 +
MBP1_SACCE      IHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIAKQLAEKFSVY
 +
MBP1_CANAL      VTSEGPIMRRKKDSWINATHILKIAKFPKAKRTRILEKDVQTGIHEKVQGGYGKYQGTYVPLDLGAAIARNFGVY
 +
MBP1_USTMA      IINNVAVMRRRSDDWLNATQILKVVGLDKPQRTRVLEREIQKGIHEKVQGGYGKYQGTWIPLDVAIELAERYNI-
 +
MBP1_NEUCR      ------VMRRRHDDWVNATHILKAAGFDKPARTRILEREVQKDTHEKIQGGYGRYQGTWIPLEQAEALARRNNIY
 +
MBP1_ASPNI      -IGTDSVMRRRSDDWINATHILKVAGFDKPARTRILEREVQKGVHEKVQGGYGKYQGTWIPLQEGRQLAERNNI-
 +
MBP1_SCHPO      -IKGVSVMRRRRDSWLNATQILKVADFDKPQRTRVLERQVQIGAHEKVQGGYGKYQGTWVPFQRGVDLATKYKV-
 +
 
 +
===Sample Phylip format===
 +
Here  is a sample set of the APSES domain sequences to illustrate the '''phylip''' format. Sequences were aligned with MAFFT and edited in JALVIEW to remove gapped regions and frayed termini. The FASTA sequences were converted with [http://www-bimas.cit.nih.gov/molbio/readseq/ the Readseq server].
 +
 
 +
<pre>
 +
27 78
 +
KILA_ESCCO  DGEIIHLRAK DGYINATSMC RT-A-GKLLS DYTRLKLSRD M-GIPIS-IQ
 +
MBP1_SACCE  STGSIMKRKK DDWVNATHIL KA-A-NFAKA KRTRI-LEKE V-LKETH--E
 +
MBP1_USTMA  NNVAVMRRRS DDWLNATQIL KV-V-GLDKP QRTRV-LERE I-QKGIH--E
 +
MBP1_NEUCR  ----VMRRRH DDWVNATHIL KA-A-GFDKP ARTRI-LERE V-QKDTH--E
 +
MBP1_ASPNI  GTDSVMRRRS DDWINATHIL KV-A-GFDKP ARTRI-LERE V-QKGVH--E
 +
MBP1_SCHPO  KGVSVMRRRR DSWLNATQIL KV-A-DFDKP QRTRV-LERQ V-QIGAH--E
 +
MBP1_CANAL  SEGPIMRRKK DSWINATHIL KI-A-KFPKA KRTRI-LEKD V-QTGIH--E
 +
APS1_NEUCR  NNVAVMRRQK DGWVNATQIL KV-A-NIDKG RRTKI-LEKE I-QIGEH--E
 +
APS1_CANAL  NESSIMRRCK DDWVNATQIL KC-C-NFPKA KRTKI-LEKG V-QQGLH--E
 +
APS1_SCHPO  NGFPLMKRCH DNWLNATQIL KI-A-ELDKP RRTRI-LEKF A-QKGLH--E
 +
APS2_ASPNI  NGVAVMKRRS DGWLNATQIL KV-A-GVVKA RRTKT-LEKE I-AAGEH--E
 +
APS2_USTMA  RGIAVMRRRG DGWLNATQIL KI-A-GIEKT RRTKI-LEKS I-LTGEH--E
 +
SWI4_SACCE  -TKIVMRRTK DDWINITQVF KI-A-QFSKT KRTKI-LEKE S-NDMQH--E
 +
APS3_SCHPO  GDNVALRRCP DSYFNISQIL RL-A-GTSSS ENAKE-LDDI I-ESGDY--E
 +
APS3_CANAL  NNVSVVRRAD NNMINGTKLL NV-A-QMTRG RRDGI-LKSE ----KVR--H
 +
SOK2_SACCE  NGISVVRRAD NDMVNGTKLL NV-T-KMTRG RRDGI-LKAE ----KIR--H
 +
APS3_ASPNI  -GVCVARRED NGMINGTKLL NV-A-GMTRG RRDGI-LKSE ----KVR--N
 +
PHD1_SACCE  NGISVVRRAD NNMINGTKLL NV-T-KMTRG RRDGI-LRSE ----KVR--E
 +
APS4_CANAL  NNHWVIWDYE TGWVHLTGIW KA-SLSHLKA DIVKL-LEST PKEYQQY-IK
 +
APS3_NEUCR  -GICVARRED NAMINGTKLL NV-A-GMTRG RRDGI-LKSE ----KVR--H
 +
APS5_CANAL  -NILVSRRED TNYINGTKLL NV-I-GMTRG KRDGI-LKTE ----KIK--N
 +
APS4_ASPNI  ---TVMWDYN IGLVRTTHLF KC-N-DYSKT TPAKM-LNQN PGLRDIC--H
 +
APS3_USTMA  RGHTMMIDVD TSFVRFTSIT QA-L-GKNKV NFGRL-VKTC P-ALDPH-IT
 +
APS4_SCHPO  --HFLMRMAK DSSISATSMF RS-A-FPKAT QEEED-LEMR WIRDNLN---
 +
APS6_CANAL  GEIIVLRRVQ DSFVNVTQLF QILE-VLPTS QVDNY-FDNE I-LSNLKYLR
 +
APS4_NEUCR  ---FLMRRSQ DGYISATGMF KA-T-FPYAS QEEEE-AERK YIKSIPT---
 +
APS5_ASPNI  -TYFLMRRSK DGYVSATGMF KI-A-FPWAK LEEER-SERE YLKTRPE---
 +
 
 +
            SFKGGRPENQ GTWVHPDIAI NLAQ----
 +
            KVQGGFGKYQ GTWVPLNIAK QLAEKFSV
 +
            KVQGGYGKYQ GTWIPLDVAI ELAERYNI
 +
            KIQGGYGRYQ GTWIPLEQAE ALARRNNI
 +
            KVQGGYGKYQ GTWIPLQEGR QLAERNNI
 +
            KVQGGYGKYQ GTWVPFQRGV DLATKYKV
 +
            KVQGGYGKYQ GTYVPLDLGA AIARNFGV
 +
            KVQGGYGKYQ GTWIPFERGL EVCRQYGV
 +
            KVQGGFGRFQ GTWIPLEDAR KLAKTYGV
 +
            KIQGGCGKYQ GTWVPSERAV ELAHEYNV
 +
            KVQGGYGKYQ GTWVNYQRGV ELCREYHV
 +
            KIQGGYGKFQ GTWIPLQRAQ QVAAEYNV
 +
            KVQGGYGRFQ GTWIPLDSAK FLVNKYEI
 +
            NVDSKHPQID GVWVPYDRAI SIAKRYGV
 +
            VVKIGSMHLK GVWIPFERAL AMAQREQI
 +
            VVKIGSMHLK GVWIPFERAL AIAQREKI
 +
            VVKIGPMHLK GVWIPFDRAL EFANKEKI
 +
            VVKIGSMHLK GVWIPFERAY ILAQREQI
 +
            RIRGGFLKIQ GTWLPYKLCK ILARRFCY
 +
            VVKIGPMHLK GVWIPFERAL DFANKEKI
 +
            VVKVGSMNLK GVWIPFDRAY EIARNEGV
 +
            SITGGALAAQ GYWMPYEAAK AIAATFC-
 +
            KLKGGYLSIQ GTWLPFDLAK ELSRR---
 +
            --PIEDKRVA GLWVPPADAL ALAKDYSM
 +
            KHQNIY--LQ GIWIPYDKAV NLALKFDI
 +
            --TSSEETAG NVWIPPEQAL ILAEEYQI
 +
 
 +
</pre>
 +
 
 +
 
 +
<!--
 +
 
 +
 
 +
===All APSES domains for all course species ===
 +
To construct a reference alignment for '''all''' APSES domains in the various course species, the following process was used:
 +
 
 +
*Open a protein BLAST input window.
 +
*Paste the yeast Mbp1 APSES domain sequence
 +
>Yeast Mbp1 APSES domain (AA 19..93 of NP_010227)
 +
IHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQG
 +
GFGKYQGTWVPLNIAKQLAEKFSVY
 +
* Select '''refseq_protein''' as the '''Database'''.
 +
* Paste the following organism restrictions into the '''Entrez query''' field. This includes all fungi we have worked with in the course, as well as ''Escherichia coli'' (for the KilA-N domain):
 +
 
 +
<pre>
 +
Ajellomyces dermatitidis [ORGN]
 +
OR Arthroderma benhamiae [ORGN]
 +
OR Arthroderma gypseum [ORGN]
 +
OR Ashbya gossypii [ORGN]
 +
OR Aspergillus clavatus [ORGN]
 +
OR Aspergillus fumigatus [ORGN]
 +
OR Aspergillus nidulans [ORGN]
 +
OR Aspergillus niger [ORGN]
 +
OR Aspergillus terreus [ORGN]
 +
OR Candida albicans [ORGN]
 +
OR Candida dubliniensis [ORGN]
 +
OR Candida glabrata [ORGN]
 +
OR Candida orthopsilosis [ORGN]
 +
OR Candida tropicalis [ORGN]
 +
OR Chaetomium globosum [ORGN]
 +
OR Clavispora lusitaniae [ORGN]
 +
OR Coccidioides immitis [ORGN]
 +
OR Coccidioides posadasii [ORGN]
 +
OR Debaryomyces hansenii [ORGN]
 +
OR Eremothecium cymbalariae [ORGN]
 +
OR Kazachstania africana [ORGN]
 +
OR Kluyveromyces lactis [ORGN]
 +
OR Komagataella pastoris [ORGN]
 +
OR Lachancea thermotolerans [ORGN]
 +
OR Lodderomyces elongisporus [ORGN]
 +
OR Magnaporthe oryzae [ORGN]
 +
OR Malassezia globosa [ORGN]
 +
OR Meyerozyma guilliermondii [ORGN]
 +
OR Millerozyma farinosa [ORGN]
 +
OR Myceliophthora thermophila [ORGN]
 +
OR Naumovozyma castellii [ORGN]
 +
OR Naumovozyma dairenensis [ORGN]
 +
OR Nectria haematococca [ORGN]
 +
OR Neosartorya fischeri [ORGN]
 +
OR Neurospora crassa [ORGN]
 +
OR Paracoccidioides sp. [ORGN]
 +
OR Puccinia graminis [ORGN]
 +
OR Pyrenophora teres [ORGN]
 +
OR Pyrenophora tritici-repentis [ORGN]
 +
OR Saccharomyces cerevisiae[ORGN]
 +
OR Saccharomyces cerevisiae [ORGN]
 +
OR Scheffersomyces stipitis [ORGN]
 +
OR Schizosaccharomyces japonicus [ORGN]
 +
OR Schizosaccharomyces pombe [ORGN]
 +
OR Sclerotinia sclerotiorum [ORGN]
 +
OR Sordaria macrospora [ORGN]
 +
OR Talaromyces marneffei [ORGN]
 +
OR Talaromyces stipitatus [ORGN]
 +
OR Tetrapisispora blattae [ORGN]
 +
OR Tetrapisispora phaffii [ORGN]
 +
OR Thielavia terrestris [ORGN]
 +
OR Torulaspora delbrueckii [ORGN]
 +
OR Trichophyton rubrum [ORGN]
 +
OR Trichophyton verrucosum [ORGN]
 +
OR Uncinocarpus reesii [ORGN]
 +
OR Ustilago maydis [ORGN]
 +
OR Vanderwaltozyma polyspora [ORGN]
 +
OR Verticillium alfalfae [ORGN]
 +
OR Yarrowia lipolytica [ORGN]
 +
OR Zygosaccharomyces rouxii [ORGN]
 +
OR Zymoseptoria tritici [ORGN]
 +
OR Escherichia coli [ORGN]
 +
</pre>
 +
* Select '''PSI-BLAST''' as the algorithm.
 +
* '''BLAST''' this.
 +
* On the results page, select hits with >75% coverage and E values < 10<sup>-4</sup> and iterate (6 rounds) to convergence.
 +
* Open the '''Formatting options''' link and select '''Flat query anchored with letters for identities. The alignment then looks something like this:
 +
[...]
 +
XP_962267    81      P-SYFLMRRSQD----GYISATGMF---------K----------------------A  102
 +
XP_001212599  125    DK-EWLIMWDYNI----GLVRTTPLF---------R-------------S--------Q  148
 +
XP_003666082  80      P-SYFLMRRSED----GYVSATGMF---------K----------------------A  101
 +
XP_001398916  86        TYFLMRRSKD----GFVSATGMF---------K-------------I--------A  107
 +
XP_001527061  504      NILVSRREDT----NYINCTKLL---------N-------------V--------V  525
 +
XP_002417464  87    HN-EIIVLRRVQD----SFVNITQLFQILI-----K-------------L--------D  114
 +
XP_657766    86        TYFLMRRSKD----GYVSATGMF---------K-------------I--------A  107
 +
[...]
 +
* Copy all those sequences, and paste them into a text file called <tt>APSES_ali.txt</tt>
 +
* Copy the headers, and paste them into a separte text file called <tt>APSES_headers.txt</tt>; they look something like this:
 +
APSES transcription factor Xbp1 [Aspergillus clavatus NRRL 1] 85.9  85.9  94%  2e-19  26%  XP_001268422.1
 +
ABR055Cp [Ashbya gossypii ATCC 10895]                        86.3  86.3  96%  3e-19  26%  NP_983001.2 
 +
hypothetical protein PICST_67427 [Scheffersomyces stipitis]  85.6  85.6  96%  3e-19  24%  XP_001383609.2
 +
hypothetical protein PGUG_03651 [Meyerozyma guilliermondii]  85.2  85.2  96%  3e-19  24%  XP_001484270.1
 +
*Also, we should take the results from [http://biochemistry.utoronto.ca/steipe/abc/students/index.php/BCH441_2013_Assignment_7_RBM '''the RBM annotations'''] on the Student Wiki into account. I have copied these into a file called <tt>test.txt</tt> and then issued the following Unix command to extract the header lines into a separate file:
 +
grep '>' test.txt | sort > APSES_Mbp1_RBM.txt
 +
:... the result is...
 +
>Mbp1_AJEDE  XP_002623146.1
 +
>Mbp1_ASPFU XP_754232.1
 +
>Mbp1_ASPNI XP_660758.1
 +
>Mbp1_ASPTE XP_001213217.1
 +
>Mbp1_CANAL XP_723071.1
 +
>Mbp1_CANGA XP_445458.1   
 +
>Mbp1_CANOR XP_003867545.1
 +
>Mbp1_CHAGL XP_001224558.1
 +
>Mbp1_CLALU XP_002615371
 +
>Mbp1_COCPO XP_003066829.1
 +
>Mbp1_DEBHA XP_002770278
 +
>Mbp1_LACTH XP_002553316.1
 +
>Mbp1_MEYGU XP_001484708.1
 +
>Mbp1_MILFA XP_004204377.1
 +
>Mbp1_MYCTH XP_003662384.1
 +
>Mbp1_NECHA XP_003039845.1
 +
>Mbp1_SACCE NP_010227
 +
>Mbp1_SCHPO NP_593032
 +
>Mbp1_SCLSC XP_001598963.1
 +
>Mbp1_TETPH XP_003684194.1
 +
>Mbp1_TETRE XP_004182459.1
 +
>Mbp1_THITE XP_003650005.1
 +
>Mbp1_UNCRE XP_002540670.1
 +
>Mbp1_ZYGRO XP_002495259.1
 +
 
 +
====Processing the PSI-BLAST results====
 +
* We need to collapse the separate aligned sections, remove the profusion of gap characters, and replace the semantically meaningless GI numbers with something that we can use for interpreting alignments and trees. I could do this by hand for the ~300 sequences in about 2 hours. I chose to write some Perl code instead. It works on the copied alignments, the headers, and the RBM annotations.
 +
<pre>
 +
#!/usr/bin/perl
 +
# ProcessPSI-BLAST.pl
 +
# Read PSI-BLAST headers and flat query alignments from files.
 +
# Also read RBM annotations.
 +
# Collapse all alignments into single, ungapped strings.
 +
# Select which GI to use, construct meaningful header and print out
 +
# header in multiFASTA format.
 +
# BS Nov 2013
 +
use strict;
 +
use warnings;
 +
 
 +
my $headerFile = "APSES_headers.txt";
 +
my $aliFile = "APSES_ali.txt";
 +
my $RBMfile = "APSES_Mbp1_RBM.txt";
 +
my $MINCOVER = 75;    # Minimum required coverage (%)
 +
my $MAXEXPECT = 0.0001; # Maximum allowed E value
 +
 
 +
my %headers;  # Hash to hold the header data
 +
my %sequences; # Hash to hold the sequences
 +
 
 +
open IN, $headerFile or die "$!";
 +
while (my $line = <IN>) { # process all lines from this file
 +
    # use regular expression to parse information from header line.
 +
    if ($line =~ m/^\s*        # possibly match whitespace
 +
                  (\w+).*      # match and capture the first word (as $1: protein name)
 +
                  .*\[        # match and discard all characters until opening bracket
 +
                  (\w+)\s(\w+) # capture two words ($2 and $3: species)
 +
                  .*\]        # discard all characters until closing bracket
 +
                  \s+(\S+)    # discard whitespace, capture word ($4: max score)
 +
                  \s+(\S+)    # discard whitespace, capture word ($5: total score)
 +
                  \s+(\S+)%    # discard whitespace, capture word ($6: coverage)
 +
                  \s+(\S+)    # discard whitespace, capture word ($7: E value)
 +
                  \s+(\S+)    # discard whitespace, capture word ($8: Identity)
 +
                  \s+(\S+)\.  # discard whitespace, capture word ($9: accession, without version)
 +
                  /x ) {
 +
        if ($6 >= $MINCOVER && $7 <= $MAXEXPECT) {  # only if both conditions hold...
 +
            my $h  = substr($1,0,4) . "_";  # 4 characters of protein name, underscore
 +
              $h .= uc(substr($2,0,3)) . uc(substr($3,0,2));  # add species code
 +
            $headers{$9} = $h;  # put this into the hash
 +
        }
 +
    }
 +
}
 +
close IN;
 +
 
 +
# For all refseq IDs for which we have annotated Mbp1 RBMs, we replace the
 +
# header we interpolated above, with the one in the RBM annotation file.
 +
open IN, $RBMfile or die "$!";
 +
while (my $line = <IN>) { # process all lines from this file
 +
    # use regular expression to parse information about annotated Mbp1 RBMs
 +
    if ($line =~ m/^>(\S+)      # capture header string (as $1)
 +
                  \s+          # match and discard whitespace
 +
                  (\S+)\.      # capture accession without version ($2)
 +
                  /x ) {
 +
        if (exists($headers{$2})) {
 +
            $headers{$2} = $1  # replace old with new string
 +
        }
 +
    }
 +
}
 +
close IN;
 +
 
 +
# Many of the protein codes are now not unique and this will
 +
# cause problems for alignment and phylogeny programs. We will
 +
# retain all headers that have a Aaa1_ABCDE or AaaA_ABCDE format,
 +
# and rename all others with Aps1_ABCDE, Aps2_ABCDE ... etc.
 +
 
 +
my %nonUnique; # hash to keep track of how often we have
 +
              # seen an entry
 +
foreach my $key (keys(%headers)) {
 +
    # if the value doesn't match the requested pattern ...
 +
    if ($headers{$key} !~ m/^[A-Z][a-z][a-z][A-Z0-9]_([A-Z]{5})/) {
 +
        $headers{$key} =~ m/_([A-Z]{5})/;
 +
my $x = $headers{$key};
 +
        my $code = $1;
 +
        $nonUnique{$code}++; # increment count in this hash (or create
 +
                          # entry if new)
 +
        my $h = "Aps" . $nonUnique{$code} . "_" . $code; # construct new string
 +
        $headers{$key} = $h;  # replace old string
 +
    }
 +
}
 +
 
 +
# concatenate all sequence blocks for each accession number
 +
open IN, $aliFile or die "$!";
 +
while (my $line = <IN>) { # process all lines from this file
 +
    # use regular expression to parse information from header line.
 +
    if ($line =~ m/^(.._\S+)\s+ # capture accession number (as $1)
 +
                  \d+\s+      # discard numbers and whitespace
 +
                  ([A-Z-]+)    # capture sequence ($2)
 +
                  /x ) {
 +
        my $key = $1;
 +
        my $val = $2;
 +
        $val =~ s/-//g; # remove all hyphens
 +
        $sequences{$key} .= $val;  # concatenate sequence fragment
 +
                                  # into hash (create entry if
 +
                                  # none exists yet).
 +
    }
 +
}
 +
close IN;
 +
 
 +
# Now iterate through all keys in %headers and print sequences in
 +
# multi FASTA format.
  
Of course, the sequences for which only partial matches were found needed to be completed "by hand" (as described abovedescribed above to validate these sequences).
+
foreach my $key (keys(%headers)) {
 +
    print (">");
 +
    print ("$headers{$key} $key\n");
 +
    print ("$sequences{$key}\n");
 +
}
  
 +
exit();
  
 +
</pre>
  
====Renaming sequences====
+
====Alignment====
To support the interpretation of alignments and gene trees, the Mbp1 orthologues for all species were named accordingly (e.g. <code>MBP1_ASPFU</code>). All yeast genes were given the yeast-gene-name  (e.g. <code>SOK2_SACCE</code>). All other sequences were named with the last four digits of their RefSeq ID and a five character species code according to their species  (e.g. <code>SOK2_SACCE</code>). This is a pain to do by hand, so I wrote a little perl script to parse this information from the original BLAST report and modify the headers in the multi-FASTA file accordingly. However, note that renaming sequences does not change the data or its interpretation, it is just helpful.
+
* The alignment was done at the EBI using MAFFT and written using FASTA output format.
 +
<pre>
 +
>Mbp1_USTMA XP_762343
 +
--------IIN-NVA-VMRRRSDDWLN---------------------------------
 +
--ATQILKV-------------VGLDK--------PQRTRV---------LEREIQKG--
 +
I-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPLDVAIELAERYNI-
 +
>Aps2_MALGL XP_001730500
 +
--------IIK-DVA-VMRRRSDAWLN---------------------------------
 +
--ATQILKV-------------VGLDK--------SQRTRV---------LEKEVQKG--
 +
T-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPMDVAIALAEHYHI-
 +
>Mbp1_SCHPO NP_593032
 +
---------IK-GVS-VMRRRRDSWLN---------------------------------
 +
--ATQILKV-------------ADFDK--------PQRTRV---------LERQVQIG--
 +
A-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VPFQRGVDLATKYKV-
 +
>Aps2_SCHJA XP_002172253
 +
--------LIK-GVS-VMRRRHDSWLN---------------------------------
 +
--ATQILKV-------------ADFDK--------PQRTRI---------LEKEVQKG--
 +
H-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VPFKRGLELAVQFKV-
 +
>Aps2_PUCGR XP_003327086
 +
---------CE-GIA-VMRRRSDSWLN---------------------------------
 +
--ATQILKV-------------AGFDK--------PQRTRV---------LEREIQKG--
 +
T-HE-----------KIQGG----------------YG---KYQ-------GTW------
 +
-------------VPLDRGIDLAKQYGV-
 +
>Aps2_YARLI XP_500257
 +
---------CK-NVA-VMRRKSDGWVN---------------------------------
 +
--ATHILKV-------------AGFDK--------PQRTRI---------LEKEVQKG--
 +
V-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VPLERAREIATLYDV-
 +
>Aps1_ARTBE XP_003012641
 +
----------------VMRRRVDDWVN---------------------------------
 +
--ATHILKA-------------AGLDK--------PSRTRI---------LERDVQRG--
 +
V-HE-----------KIQGG----------------YG---KYQ-------GTW------
 +
-------------IPLAEARALADKNNV-
 +
>Aps1_TRIVE XP_003024540
 +
----------------VMRRRVDDWVN---------------------------------
 +
--ATHILKA-------------AGLDK--------PSRTRI---------LERDVQRG--
 +
V-HE-----------KIQGG----------------YG---KYQ-------GTW------
 +
-------------IPLAEARALADKNNV-
 +
>Aps1_TRIRU XP_003238886
 +
----------------VMRRRVDDWVN---------------------------------
 +
--ATHILKA-------------AGLDK--------PSRTRI---------LERDVQRG--
 +
V-HE-----------KIQGG----------------YG---KYQ-------GTW------
 +
-------------IPLAEARALADKNNV-
 +
>Aps4_ARTGY XP_003176577
 +
----------------VMRRRVDDWVN---------------------------------
 +
--ATHILKA-------------AGLDK--------PSRTRI---------LEREVQRG--
 +
V-HE-----------KIQGG----------------YG---KYQ-------GTW------
 +
-------------IPLAEARALADKNGV-
 +
>Aps4_PYRTR XP_001940178
 +
----------N-GNH-VMRRRADDWIN---------------------------------
 +
--ATHILKV-------------ADYDK--------PARTRI---------LEREVQKG--
 +
V-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPLEEGRHLAERNGV-
 +
>Aps5_PYRTE XP_003297289
 +
----------N-GNH-VMRRRADDWIN---------------------------------
 +
--ATHILKV-------------ADYDK--------PARTRI---------LEREVQKG--
 +
V-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPLEEGRHLAERNGV-
 +
>Mbp1_ASPNI XP_660758
 +
--------------S-VMRRRSDDWIN---------------------------------
 +
--ATHILKV-------------AGFDK--------PARTRI---------LEREVQKG--
 +
V-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPLQEGRQLAERNNI-
 +
>Mbp1_ASPTE XP_001213217
 +
--------------S-VMRRRADDWIN---------------------------------
 +
--ATHILKV-------------AGFDK--------PARTRI---------LEREVQKG--
 +
V-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPLPEGRLLAERNNI-
 +
>Aps6_ASPNI XP_001400103
 +
--------------S-VMRRRSDDWIN---------------------------------
 +
--ATHILKV-------------AGFDK--------PARTRI---------LEREVQKG--
 +
V-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPLPEGRMLAERNNI-
 +
>Aps5_ASPCL XP_001271352
 +
------------GES-VMRRRGDNWIN---------------------------------
 +
--ATHILKV-------------AGFDK--------PARTRI---------LEREVQKG--
 +
T-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPLPEGRLLAERNNI-
 +
>Aps4_NEOFI XP_001263071
 +
------------GES-VMRRRGDNWIN---------------------------------
 +
--ATHILKV-------------AGFDK--------PARTRI---------LEREVQKG--
 +
T-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPLPEGRLLAERNNI-
 +
>Mbp1_ASPFU XP_754232
 +
-----------------MRRRGDDWIN---------------------------------
 +
--ATHILKV-------------AGFDK--------PARTRI---------LEREVQKG--
 +
T-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPLHEGRLLAERNNI-
 +
>Aps3_TALST XP_002479844
 +
------------GEC-LMRRRADDWIN---------------------------------
 +
--ATHILKV-------------AGFDK--------PSRTRI---------LEREVQKG--
 +
V-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPLPEARLLAERNNI-
 +
>Aps4_TALMA XP_002143521
 +
------------GEC-LMRRRADDWIN---------------------------------
 +
--ATHILKV-------------AGFDK--------PSRTRI---------LEREVQKG--
 +
V-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPLPEARLLAERNNI-
 +
>Mbp1_AJEDE XP_002623146
 +
----------------VMRRRADDWIN---------------------------------
 +
--ATHILKV-------------AGLDK--------PARTRI---------LEREVQKG--
 +
V-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VPLQEGRELAERNGI-
 +
>Aps1_ZYMTR XP_003857416
 +
----------------VMRRRSDDWIN---------------------------------
 +
--ATHILKV-------------AQYDK--------PARTRI---------LEREVQKG--
 +
V-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPLPDGRLLAQKNSV-
 +
>Mbp1_UNCRE XP_002540670
 +
--------------S-VMRRRHDDWIN---------------------------------
 +
--ATHILKV-------------AGLDK--------PSRTRI---------LEREVQKG--
 +
T-HE-----------KIQGG----------------YG---KYQGTRHYTAGTW------
 +
-------------VPLPDGRHLAERNNV-
 +
>Mbp1_COCPO XP_003066829
 +
--------------S-VMRRRHDDWIN---------------------------------
 +
--ATHILKV-------------AGLDK--------PSRTRI---------LEREVQKG--
 +
T-HE-----------KIQGG----------------YG---KYQ-------GTW------
 +
-------------VPLADGRAVAERNKV-
 +
>Aps4_COCIM XP_001246304
 +
--------------S-VMRRRHDDWIN---------------------------------
 +
--ATHILKV-------------AGLDK--------PSRTRI---------LEREVQKG--
 +
T-HE-----------KIQGG----------------YG---KYQ-------GTW------
 +
-------------VPLADGRAVAERNKV-
 +
>Mbp1_CHAGL XP_001224558
 +
----------------VMRRREDNWIN---------------------------------
 +
--ATHILKA-------------AGFDK--------PARTRI---------LERDVQKD--
 +
V-HE-----------KIQGG----------------YG---KYQ-------GTW------
 +
-------------IPLEQGRALAQRNNIY
 +
>Mbp1_MYCTH XP_003662384
 +
----------------VMRRREDNWIN---------------------------------
 +
--ATHILKA-------------AGFDK--------PARTRI---------LERDVQKD--
 +
I-HE-----------KIQGG----------------YG---KYQ-------GTW------
 +
-------------IPLEHGEALAQRNNVY
 +
>Mbp1_SCLSC XP_001598963
 +
----------------VMRRRHDDWIN---------------------------------
 +
--ATHILKA-------------AGFDK--------PARTRI---------LEREVQKE--
 +
E-HE-----------KIQGG----------------YG---KYQ-------GTW------
 +
-------------VPLEKGQALAQRNNIY
 +
>Aps4_SORMA XP_003349090
 +
----------------VMRRRHDDWVN---------------------------------
 +
--ATHILKA-------------AGFDK--------PARTRI---------LEREVQKD--
 +
T-HE-----------KIQGG----------------YG---RYQ-------GTW------
 +
-------------IPLEQAEALARRNNIY
 +
>Aps4_NEUCR XP_955821
 +
----------------VMRRRHDDWVN---------------------------------
 +
--ATHILKA-------------AGFDK--------PARTRI---------LEREVQKD--
 +
T-HE-----------KIQGG----------------YG---RYQ-------GTW------
 +
-------------IPLEQAEALARRNNIY
 +
>Aps3_MAGOR XP_003715968
 +
----------------VMRRRVDDWIN---------------------------------
 +
--ATHILKA-------------AGFDK--------PARTRI---------LEREVQKD--
 +
Q-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPLEAGEALAHRNNIF
 +
>Mbp1_THITE XP_003650005
 +
----------------VMRRREDNWIN---------------------------------
 +
--ATHILKA-------------AGFDK--------PARTRI---------LEREVQKE--
 +
A-HR-----------KIQGG----------------YG---KYQ-------GTW------
 +
-------------ISLEQGEVLARRNNVY
 +
>Aps5_VERAL XP_003007918
 +
----------------VMRRRQDNWIN---------------------------------
 +
--ATHILKA-------------AGFDK--------PARTRI---------LEREVQKE--
 +
K-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPLNQGQQLAQRNNCY
 +
>Mbp1_NECHA XP_003039845
 +
----------------VMRRRQDNWIN---------------------------------
 +
--ATHILKA-------------AGFDK--------PARTRI---------LERDVQKD--
 +
V-HE-----------KIQGG----------------YG---KYQ-------GTW------
 +
-------------IPLESGQALAERHSV-
 +
>Aps2_USTMA XP_761485
 +
---------VR-GIA-VMRRRGDGWLN---------------------------------
 +
--ATQILKI-------------AGIEK--------TRRTKI---------LEKSILTG--
 +
E-HE-----------KIQGG----------------YG---KFQ-------GTW------
 +
-------------IPLQRAQQVAAEYNV-
 +
>Aps3_MALGL XP_001728900
 +
------------GIA-LMRRRSDGYLN---------------------------------
 +
--ATQILKI-------------AGIEK--------ARRTRI---------LEKEILTG--
 +
E-HD-----------KVQGG----------------YG---TFQ-------GTW------
 +
-------------IPLQRAQELAISYNVY
 +
>Aps3_PUCGR XP_003320997
 +
------------GIG-VMRRRSDSYMN---------------------------------
 +
--ATQILKV-------------AGLDK--------SKRTRI---------LEREIIQG--
 +
E-HE-----------KIQGG----------------YG---RYQ-------GTW------
 +
-------------VPFTRAQELATQLNV-
 +
>Aps1_NEOFI XP_001261510
 +
----------N-GVA-VMKRRSDSWLN---------------------------------
 +
--ATQILKV-------------AGVVK--------ARRTKT---------LEKEIAAG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VNYQRGVELCREYHV-
 +
>Aps1_ASPFU XP_748947
 +
----------N-GVA-VMKRRSDSWLN---------------------------------
 +
--ATQILKV-------------AGVVK--------ARRTKT---------LEKEIAAG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VNYQRGVELCREYHV-
 +
>Aps8_ASPNI XP_001391313
 +
----------N-GVA-VMKRRSDSWLN---------------------------------
 +
--ATQILKV-------------AGVVK--------ARRTKT---------LEKEIAAG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VNYQRGVELCREYHV-
 +
>Aps1_ASPCL XP_001273399
 +
----------N-GVA-VMKRRSDSWLN---------------------------------
 +
--ATQILKV-------------AGVVK--------ARRTKT---------LEKEIAAG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VNYQRGVDLCREYHV-
 +
>Aps4_ASPTE XP_001215548
 +
----------N-GVA-VMKRRSDSWLN---------------------------------
 +
--ATQILKV-------------AGVVK--------ARRTKT---------LEKEIAAG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VNYQRGVDLCREYHV-
 +
>Aps9_ASPNI XP_664319
 +
----------N-GVA-VMKRRSDGWLN---------------------------------
 +
--ATQILKV-------------AGVVK--------ARRTKT---------LEKEIAAG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VNYQRGVELCREYHV-
 +
>Aps2_TALMA XP_002148693
 +
----------N-GIA-VMKRRSDSWLN---------------------------------
 +
--ATQILKV-------------AGVVK--------AKRTKT---------LEKEIAAG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VSYQRGVELCREYQV-
 +
>Aps2_TALST XP_002485546
 +
----------N-GIA-VMKRRSDSWLN---------------------------------
 +
--ATQILKV-------------AGVVK--------AKRTKT---------LEKEIAAG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VSYQRGVELCREYQV-
 +
>Aps2_UNCRE XP_002583286
 +
----------N-GVA-VMRRRSDSWLN---------------------------------
 +
--ATQILKV-------------AGVVK--------ARRTKT---------LEKEVASG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VSYQRGVELCRRYHV-
 +
>Aps4_COCPO XP_003067661
 +
----------N-GVA-VMRRRSDSWLN---------------------------------
 +
--ATQILKV-------------AGVVK--------ARRTKT---------LEKEVVSG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VSYQRGVELCRRYHV-
 +
>Aps2_ARTGY XP_003175012
 +
----------N-GVA-MMRRRSDSWLN---------------------------------
 +
--ATQILKV-------------AGVAK--------ARRTKT---------LEKEVAAG--
 +
D-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VSYERGLELCRRYQV-
 +
>Aps2_TRIVE XP_003020882
 +
----------N-GVA-MMRRRSDSWLN---------------------------------
 +
--ATQILKV-------------AGVAK--------ARRTKT---------LEKEVAAG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VSYERGLELCRRYQV-
 +
>Aps3_TRIRU XP_003236744
 +
----------N-GVA-MMRRRSDSWLN---------------------------------
 +
--ATQILKV-------------AGVAK--------ARRTKT---------LEKEVAAG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VSYERGLELCRRYQV-
 +
>Aps3_ARTBE XP_003013132
 +
-----------------MRRRSDSWLN---------------------------------
 +
--ATQILKV-------------AGVAK--------ARRTKT---------LEKEVAAG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VSYERGLELCRRYQV-
 +
>Aps1_AJEDE XP_002624235
 +
----------N-GVA-VMRRRSDSWLN---------------------------------
 +
--ATQILKV-------------AGVMK--------ARRTKT---------LEKEVAAG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VNYERGVELCRHYHVF
 +
>Aps1_PYRTE XP_003298893
 +
----------N-RVA-VMRRRSDGWLN---------------------------------
 +
--ATQILKV-------------AGVDK--------GKRTKV---------LEKEILTG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------INYRRGREFCRQYGV-
 +
>Aps3_PYRTR XP_001935618
 +
----------N-RVA-VMRRRSDGWLN---------------------------------
 +
--ATQILKV-------------AGVDK--------GKRTKV---------LEKEILTG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------INYRRGREFCRQYGV-
 +
>Aps3_ZYMTR XP_003848849
 +
---------VH-NVA-VMRRRSDGWLN---------------------------------
 +
--ATQILKV-------------AGVDK--------GKRTKV---------LEKEILPG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------ISYQRGREFCRQYGV-
 +
>Aps4_SCLSC XP_001590455
 +
----------N-RIA-VMRRRKDSWLN---------------------------------
 +
--ATQILKV-------------AGIEK--------GKRTKV---------LEKEILIG--
 +
D-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IRFERGVEFCKQYGV-
 +
>Aps1_SORMA XP_003347917
 +
----------N-NVA-VMRRQKDGWVN---------------------------------
 +
--ATQILKV-------------ANIDK--------GRRTKI---------LEKEIQIG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPFERGLEVCRQYGV-
 +
>Aps1_NEUCR XP_962967
 +
----------N-NVA-VMRRQKDGWVN---------------------------------
 +
--ATQILKV-------------ANIDK--------GRRTKI---------LEKEIQIG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPFERGLEVCRQYGV-
 +
>Aps3_CHAGL XP_001224444
 +
----------N-NVA-VMRRQTDGWLN---------------------------------
 +
--ATQILKV-------------AGVDK--------GRRTKI---------LEKEIQTG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPFERGFEVCRQYGV-
 +
>Aps3_MYCTH XP_003663630
 +
----------N-NVA-VMRRQADGWLN---------------------------------
 +
--ATQILKV-------------AGVDK--------GRRTKI---------LEKEIQTG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPFERGYEVCRQYGV-
 +
>Aps1_THITE XP_003653705
 +
----------N-NVA-VMRRQHDSWLN---------------------------------
 +
--ATQILKV-------------AGVDK--------GRRTKI---------LEKEIQTG--
 +
Q-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPFERGVEVCRQYGV-
 +
>Aps4_NECHA XP_003045061
 +
----------N-NIA-VMRRRNDSWLN---------------------------------
 +
--ATQILKV-------------AGVDK--------GKRTKI---------LEKEIQTG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------ITFDRGVQVCRQYGV-
 +
>Aps1_VERAL XP_003001507
 +
------------GVA-VMRRRNDSWLN---------------------------------
 +
--ATQILKV-------------AGVEK--------GKRTKI---------LEKEIQTG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IKFERAVEVCRQYGV-
 +
>Aps2_MAGOR XP_003720365
 +
----------N-GVA-VMKRIGDSKLN---------------------------------
 +
--ATQILKV-------------AGVEK--------GKRTKI---------LEKEIQTG--
 +
E-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IKYERALEVCRQYGV-
 +
>Aps3_YARLI XP_501770
 +
--------MAN-DVA-VMRRRTDSSLN---------------------------------
 +
--ATQILKV-------------AGVEK--------SKRTKI---------LEKEILTG--
 +
A-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------IPYERGVDLCRQYSVY
 +
>Res1_SCHPO NP_595496 MBF transcription factor complex subunit Res1
 +
---------IN-GFP-LMKRCHDNWLN---------------------------------
 +
--ATQILKI-------------AELDK--------PRRTRI---------LEKFAQKG--
 +
L-HE-----------KIQGG----------------CG---KYQ-------GTW------
 +
-------------VPSERAVELAHEYNVF
 +
>Aps1_SCHJA XP_002171963
 +
--------IVN-GVA-VMKRCRDGWLN---------------------------------
 +
--ATQILKV-------------AELDK--------PKRTRV---------LEKFAQRG--
 +
I-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VPLQRGVELAMEFQVH
 +
>Mbp1_MILFA XP_004204377
 +
--------VTS-EGP-IMRRKSDSWIN---------------------------------
 +
--ATHILKI-------------AKFPK--------AKRTRI---------LEKDVQTG--
 +
I-HE-----------KVQGG----------------YG---KYQ-------GTY------
 +
-------------VPLDLGAEIARSFGIY
 +
>Aps2_MILFA XP_004204934
 +
--------VTS-EGP-IMRRKSDSWIN---------------------------------
 +
--ATHILKI-------------AKFPK--------AKRTRI---------LEKDVQTG--
 +
I-HE-----------KVQGG----------------YG---KYQ-------GTY------
 +
-------------VPLELGAEIARSFGIY
 +
>Aps6_CLALU XP_002615371
 +
--------VTK-EGP-IMRRKSDSWIN---------------------------------
 +
--ATHILKI-------------AKFPK--------AKRTRI---------LEKDVQTG--
 +
I-HE-----------KVQGG----------------YG---KYQ-------GTY------
 +
-------------VPLDLGAEIAKSFGIF
 +
>Aps3_DEBHA XP_002770278
 +
--------VTS-EGP-IMRRKSDSWIN---------------------------------
 +
--ATHILKI-------------AKFPK--------AKRTRI---------LEKDVQTG--
 +
V-HE-----------KVQGG----------------YG---KYQ-------GTY------
 +
-------------VPLDLGADIAKNFGVF
 +
>Aps3_SCHST XP_001386821
 +
--------VTS-EGP-IMRRKSDSWIN---------------------------------
 +
--ATHILKI-------------AKFPK--------AKRTRI---------LEKDVQTG--
 +
V-HE-----------KVQGG----------------YG---KYQ-------GTY------
 +
-------------VPLELGRDIAKNFGVF
 +
>Mbp1_CANAL XP_723071
 +
--------VTS-EGP-IMRRKKDSWIN---------------------------------
 +
--ATHILKI-------------AKFPK--------AKRTRI---------LEKDVQTG--
 +
I-HE-----------KVQGG----------------YG---KYQ-------GTY------
 +
-------------VPLDLGAAIARNFGVY
 +
>Aps3_CANDU XP_002419323
 +
--------VTS-EGP-IMRRKKDSWIN---------------------------------
 +
--ATHILKI-------------AKFPK--------AKRTRI---------LEKDVQTG--
 +
I-HE-----------KVQGG----------------YG---KYQ-------GTY------
 +
-------------VPLDLGAAIAKNFGVY
 +
>Aps4_CANTR XP_002548345
 +
--------VTS-EGP-IMRRKSDSWIN---------------------------------
 +
--ATHILKI-------------AKFPK--------ARRTRI---------LEKDVQTG--
 +
V-HE-----------KVQGG----------------YG---KYQ-------GTY------
 +
-------------VPLELGATIAKNFGVY
 +
>Mbp1_MEYGU XP_001484708
 +
--------VTS-EGP-IMRRKLDSWIN---------------------------------
 +
--ATHILKI-------------ARFPK--------AKRTRI---------LEKDVQTG--
 +
I-HE-----------KVQGG----------------YG---KYQ-------GTY------
 +
-------------VPLNLGAEIAQSFGVY
 +
>Aps1_LODEL XP_001527262
 +
------------EGP-IMRRKLDSWIN---------------------------------
 +
--ATHILKI-------------AKLPK--------AKRTRI---------LEKDVQTG--
 +
I-HE-----------KVQGG----------------YG---KYQ-------GTY------
 +
-------------VPLELGEIIARNYDVY
 +
>Mbp1_CANOR XP_003867545
 +
--------VTS-EGP-IMRRKGDSWIN---------------------------------
 +
--ATHILKI-------------AKLPK--------AKRTRI---------LEKDVQTG--
 +
I-HE-----------KVQGG----------------YG---KYQ-------GTY------
 +
-------------VPLKLGEVIARNYDVY
 +
>Aps1_KAZAF XP_003958484
 +
--------IHP-TGS-IMKRKKDGWVN---------------------------------
 +
--ATHILKA-------------ANFAK--------AKRTRI---------LEKEVLPG--
 +
T-HE-----------KVQGG----------------FG---KYQ-------GTW------
 +
-------------IPLESAIALAEKFAVY
 +
>Mbp1_LACTH XP_002553316
 +
--------IHP-TGS-IMKRKEDDWVN---------------------------------
 +
--ATHILKA-------------AKFAK--------AKRTRI---------LEKEVIKD--
 +
T-HE-----------KVQGG----------------FG---KYQ-------GTW------
 +
-------------VPLDIARSLAAKFEV-
 +
>Aps3_ERECY XP_003645298
 +
--------IHP-TGS-IMKRKADDWVN---------------------------------
 +
--ATHILKA-------------AKFAK--------AKRTRI---------LEKEVIKD--
 +
I-HE-----------KVQGG----------------FG---KYQ-------GTW------
 +
-------------VPLDIARRLAEKFDV-
 +
>Aps3_ASHGO NP_986147
 +
--------LHP-TGS-IMKRKADDWVN---------------------------------
 +
--ATHILKA-------------AKFAK--------AKRTRI---------LEKEVIKD--
 +
T-HE-----------KVQGG----------------FG---KYQ-------GTW------
 +
-------------VPLDIARRLAQKFEV-
 +
>Aps1_TORDE XP_003681593
 +
--------IHP-TGS-VMKRKTDDWVN---------------------------------
 +
--ATHILKA-------------AKFAK--------AKRTRI---------LEKEVIKE--
 +
V-HE-----------KVQGG----------------FG---KYQ-------GTW------
 +
-------------VPLDIATRLANKFDVY
 +
>Aps3_KLULA XP_454189
 +
--------IHP-TGS-IMKRKADNWVN---------------------------------
 +
--ATHILKA-------------AKFPK--------AKRTRI---------LEKEVITD--
 +
T-HE-----------KVQGG----------------FG---KYQ-------GTW------
 +
-------------IPLELASKLAEKFEV-
 +
>Mbp1_CANGA XP_445458
 +
--------IHP-TGS-IMKRKNDGWVN---------------------------------
 +
--ATHILKA-------------ANFAK--------AKRTRI---------LEKEVLKE--
 +
M-HE-----------KVQGG----------------FG---KYQ-------GTW------
 +
-------------VPLNIAINLAEKFDVY
 +
>Mbp1_SACCE NP_010227
 +
--------IHS-TGS-IMKRKKDDWVN---------------------------------
 +
--ATHILKA-------------ANFAK--------AKRTRI---------LEKEVLKE--
 +
T-HE-----------KVQGG----------------FG---KYQ-------GTW------
 +
-------------VPLNIAKQLAEKFSVY
 +
>Aps4_NAUDA XP_003670000
 +
--------VHP-TGS-VMKRKSDDWVN---------------------------------
 +
--ATHILKV-------------ANFSK--------AKRTRI---------LEKEVLKE--
 +
T-HE-----------KVQGG----------------FG---KYQ-------GTW------
 +
-------------VPMNIALNLAEKYGVY
 +
>Mbp1_ZYGRO XP_002495259
 +
--------IHP-TGS-VMKRRDDDWVN---------------------------------
 +
--ATHILKA-------------ARFAK--------AKRTRI---------LEKEVIKE--
 +
V-HE-----------KVQGG----------------FG---KYQ-------GTW------
 +
-------------VPMDVARTLATKFGVH
 +
>Aps4_VANPO XP_001643445
 +
--------IHP-TGS-VMKRKLDNWVN---------------------------------
 +
--ATHILKA-------------ANFAK--------AKRTRI---------LEKEVIKE--
 +
T-HE-----------KVQGG----------------FG---KYQ-------GTW------
 +
-------------VPLDIARKLAEKFGVH
 +
>Mbp1_TETPH XP_003684194
 +
--------LHS-TGS-VMKRKKDGWVN---------------------------------
 +
--ATHILKT-------------ANFAK--------AKRTRI---------LEKEVIQE--
 +
T-HE-----------KVQGG----------------FG---KYQ-------GTW------
 +
-------------VPLSVAISLAQKFEVY
 +
>Aps5_NAUCA XP_003673193
 +
--------IHP-TGS-VMKRKKDDWVN---------------------------------
 +
--ATHILKA-------------ANFAK--------AKRTRI---------LDKEVMGR--
 +
K-HE-----------KVQGG----------------FG---KYQ-------GTW------
 +
-------------VPLEIATELAMKFDVY
 +
>Mbp1_TETRE XP_004182459
 +
--------IHP-TGS-IMKRKIDGWVN---------------------------------
 +
--ATHILKA-------------AKFPK--------AKRTRI---------LEKEVIHE--
 +
I-HE-----------KVQGG----------------FG---KYQ-------GTW------
 +
-------------VPTDIATRLSKKFGVF
 +
>Aps1_TETBL XP_004178121
 +
--------LHP-TGS-IMKRKTDNWVN---------------------------------
 +
--ATHILKA-------------AHLPK--------AKRTRI---------LERQILNN--
 +
NHHE-----------KVQGG----------------FG---KYQ-------GTW------
 +
-------------IPLEDAVALAREFGVY
 +
>Aps2_KOMPA XP_002491420
 +
--------VTP-LTS-VMRRKSDDWIN---------------------------------
 +
--ATHILKV-------------ADFPK--------AKRTRI---------LERDIQVG--
 +
T-HE-----------KVQGG----------------YG---KYQ-------GTW------
 +
-------------VPLESAVKIAETFDV-
 +
>Aps2_CANTR XP_002550287
 +
----------N-DSP-IMRRCKDDWVN---------------------------------
 +
--ATQILKC-------------CNFPK--------AKRTKI---------LEKGVQQG--
 +
L-HE-----------KVQGG----------------FG---RFQ-------GTW------
 +
-------------IPLEDARRLAETYGV-
 +
>Swi4_CANOR XP_003868155
 +
----------N-DSP-IMRRCKDDWVN---------------------------------
 +
--ATQILKC-------------CNFPK--------AKRTKI---------LEKGVQQG--
 +
L-HE-----------KVQGG----------------FG---RFQ-------GTW------
 +
-------------IPLEDARRLACTYGV-
 +
>Aps3_LODEL XP_001526754
 +
----------N-DSP-IMRRCKDDWVN---------------------------------
 +
--ATQILKC-------------CNFPK--------AKRTKI---------LEKGVQQG--
 +
V-HE-----------KIQGG----------------FG---RFQ-------GTW------
 +
-------------IPLEDARRLAATYGV-
 +
>Aps5_SCHST XP_001383745
 +
----------N-DSP-IMRRCKDDWVN---------------------------------
 +
--ATQILKC-------------CNFPK--------AKRTKI---------LEKGVQQG--
 +
L-HE-----------KVQGG----------------FG---RFQ-------GTW------
 +
-------------IPLPDAQRLATMYGV-
 +
>Aps1_DEBHA XP_457246
 +
----------N-NSP-IMRRCKDDWVN---------------------------------
 +
--ATQILKC-------------CNFPK--------AKRTKI---------LEKGVQQG--
 +
L-HE-----------KIQGG----------------YG---RFQ-------GTW------
 +
-------------IPLADAQRLAASYGV-
 +
>Aps7_MILFA XP_004194775
 +
----------N-NSP-IMRRCKDDWVN---------------------------------
 +
--ATQILKC-------------CNFPK--------AKRTKI---------LEKGVQQG--
 +
L-HE-----------KIQGG----------------YG---RFQ-------GTW------
 +
-------------IPLANAQKLAASYGV-
 +
>Aps9_MILFA XP_004195866
 +
----------N-NSP-IMRRCKDDWVN---------------------------------
 +
--ATQILKC-------------CNFPK--------AKRTKI---------LEKGVQQG--
 +
L-HE-----------KIQGG----------------YG---RFQ-------GTW------
 +
-------------IPLANAQKLAASYGV-
 +
>Aps1_CANDU XP_002416839
 +
--------IMN-DYS-IMRRCKDDWVN---------------------------------
 +
--ATQILKC-------------CNFPK--------AKRTKI---------LEKGVQQG--
 +
L-HE-----------KVQGG----------------FG---RFQ-------GTW------
 +
-------------IPLEDARRLAESYGV-
 +
>Aps2_CANAL XP_712970
 +
--------MMN-ESS-IMRRCKDDWVN---------------------------------
 +
--ATQILKC-------------CNFPK--------AKRTKI---------LEKGVQQG--
 +
L-HE-----------KVQGG----------------FG---RFQ-------GTW------
 +
-------------IPLEDARKLAKTYGV-
 +
>Aps6_CANAL XP_712876
 +
--------MMN-ESS-IMRRCKDDWVN---------------------------------
 +
--ATQILKC-------------CNFPK--------AKRTKI---------LEKGVQQG--
 +
L-HE-----------KVQGG----------------FG---RFQ-------GTW------
 +
-------------IPLEDARRLAKTYGV-
 +
>Aps5_CLALU XP_002618938
 +
-----------------MRRCKDDWVN---------------------------------
 +
--ATQILKL-------------CNFPK--------AKRTKI---------LEKGVQQG--
 +
L-HE-----------KVQGG----------------YG---RFQ-------GTW------
 +
-------------IPLADARRLADEYGI-
 +
>Aps3_MEYGU XP_001487394
 +
-----------------MRRVKDNWVN---------------------------------
 +
--ATQILKC-------------CNFPK--------AKRTKI---------LEKGVQQG--
 +
L-HE-----------KIQGG----------------YG---RFQ-------GTW------
 +
-------------IPLEDAQQLAANYGL-
 +
>Aps3_KAZAF XP_003955178
 +
--------LHPVAGS-IMKRRIDNWVN---------------------------------
 +
--ATHVLKI-------------ANFNK--------SKRLRL---------LEKEVIKAGK
 +
A-YE-----------KIQGG----------------SG---KYQ-------GTW------
 +
-------------VPLEVAKELAVKFEV-
 +
>Aps3_KOMPA XP_002489438
 +
--------ICN-TFP-LMRRCSDDWVN---------------------------------
 +
--VTQILKI-------------AQFPK--------AQRTKI---------LEKEVHDK--
 +
T-HQ-----------RIQGG----------------YG---RFQ-------GTW------
 +
-------------TPLDIARNLAMNYG--
 +
>Aps1_KLULA XP_454890
 +
----------------IMRRCNDNWLN---------------------------------
 +
--ITQVFKA-------------GSFTK--------AQRTKI---------LEKEANEI--
 +
K-HE-----------KIQGG----------------YG---RFQ-------GTW------
 +
-------------IPWESTKYLVEKYNI-
 +
>Aps2_KAZAF XP_003959931
 +
------------SHI-VMRRTRDDWIN---------------------------------
 +
--ITQVFKV-------------AKFSK--------NHRTKV---------LERESSNL--
 +
R-HE-----------KVQGG----------------YG---RFQ-------GTW------
 +
-------------IPLVDAKRLIAEYNI-
 +
>Aps1_ASHGO NP_986370
 +
--------------I-VMRRLHDDWVN---------------------------------
 +
--ITQVFKV-------------ATFSK--------TQRTKI---------LEKESADI--
 +
S-HE-----------KIQGG----------------YG---RFQ-------GTW------
 +
-------------IPLDSAKGLVAKYEI-
 +
>Aps4_ERECY XP_003647811
 +
--------------I-VMRRLHDDWVN---------------------------------
 +
--ITQVFKV-------------ASFTK--------TQRTKV---------LEKESTDI--
 +
N-HE-----------KIQGG----------------YG---RFQ-------GTW------
 +
-------------IPLLSAQNLVAKYCI-
 +
>Aps2_ZYGRO XP_002495118
 +
--------------I-VMRRTQDDWVN---------------------------------
 +
--ITQVFKI-------------AQFSK--------TQRTKV---------LEKESNDM--
 +
R-HE-----------KVQGG----------------YG---RFQ-------GTW------
 +
-------------IPLEDAKYMVTKYNI-
 +
>Aps3_TORDE XP_003680369
 +
--------------I-VMRRTADDWVN---------------------------------
 +
--ITQVFKI-------------AQFSK--------TQRTKV---------LEKESTDM--
 +
R-HE-----------KVQGG----------------YG---RFQ-------GTW------
 +
-------------IPLENAKYMVSKYNI-
 +
>Aps4_CANGL XP_444966
 +
--------------I-VMRRTMDDWVN---------------------------------
 +
--VTQVFKI-------------AQFSK--------TQRTKI---------LEKESTNM--
 +
K-HE-----------KVQGG----------------YG---RFQ-------GTW------
 +
-------------VPLEAAKFMTTKYNI-
 +
>Swi4_SACCE NP_011036
 +
------------TKI-VMRRTKDDWIN---------------------------------
 +
--ITQVFKI-------------AQFSK--------TKRTKI---------LEKESNDM--
 +
Q-HE-----------KVQGG----------------YG---RFQ-------GTW------
 +
-------------IPLDSAKFLVNKYEI-
 +
>Aps8_KAZAF XP_003959682
 +
--------------V-VMRRTRDDWVN---------------------------------
 +
--ITQVFKI-------------AQFSK--------TQRTKL---------LEKESMNI--
 +
Q-HE-----------KVQGG----------------YG---RFQ-------GTW------
 +
-------------VPLDAARDIAAKYSI-
 +
>Aps3_VANPO XP_001647430
 +
--------------I-VMRRTSNDWIN---------------------------------
 +
--ITQIFKL-------------ASFTK--------TKRTKV---------LEIESNNI--
 +
Q-HE-----------KVQGG----------------YG---RFQ-------GTW------
 +
-------------IPLNDAKNLVQKYNI-
 +
>Aps3_TETBL XP_004180077
 +
--------------I-VMRRTKNDWIN---------------------------------
 +
--ITQVFKL-------------ASFSK--------TKRTKI---------LEKESIDI--
 +
E-HE-----------KVQGG----------------YG---RFQ-------GTW------
 +
-------------IPLHYAKLLVNKYNI-
 +
>Aps5_TETPH XP_003685604
 +
--------------I-VMRRKNNDWVN---------------------------------
 +
--ITQVLKL-------------ASFSK--------TKRTKI---------IEKESMNM--
 +
E-HE-----------KVQGG----------------YG---RFQ-------GTW------
 +
-------------IPLSSTKELIEKYNI-
 +
>Aps6_NAUCA XP_003674387
 +
--------------I-VMRRTKDDWIN---------------------------------
 +
--VTQVFKI-------------ADFSK--------AHRTKV---------LEKESSDM--
 +
M-HE-----------KVQGG----------------YG---RFQ-------GTW------
 +
-------------IPLESALMLVQKYKI-
 +
>Aps2_LACTH XP_002552498
 +
--------------I-VMRRCMDNWVN---------------------------------
 +
--ITQVFKI-------------ASFSK--------TQRTKI---------LEKESNMV--
 +
K-HE-----------KIQGG----------------YG---RFQ-------GTW------
 +
-------------IPLENAHYLVQKYSV-
 +
>Aps5_VANPO XP_001645902
 +
--------------T-VMRRTLDDWIN---------------------------------
 +
--ITQVFKL-------------ASFSK--------TKRTKI---------LEKETKSI--
 +
D-HE-----------KIQGG----------------YG---RFQ-------GTW------
 +
-------------IPLICAKTIVIKYNI-
 +
>Aps3_NAUDA XP_003667554
 +
-------------KV-VMRRTRDDWIN---------------------------------
 +
--ITQVFKI-------------GKFSK--------AQRTKV---------LELEANEM--
 +
K-HE-----------KVQGG----------------YG---RFQ-------GTW------
 +
-------------IPLESAMFLAKKYTI-
 +
>Aps4_TETPH XP_003687643
 +
------------TKT-VMRKVSNDWVN---------------------------------
 +
--ATQIFKI-------------ANFTK--------NKRTRI---------LEREAKLI--
 +
K-HE-----------KIQGG----------------YG---RFQ-------GTW------
 +
-------------IPLDDAKMLVNKYEI-
 +
>Aps1_SCHST XP_001385235
 +
------------GVL-VSRREDTNFVN---------------------------------
 +
--GTKLLNV-------------IGMTR--------GKRDGI---------LKTEK-----
 +
T-RN-----------VVKVG----------------SM---NLK-------GVW------
 +
-------------IPFDRAFEIARNEGV-
 +
>Aps3_CANAL XP_711513
 +
------------NIL-VSRREDTNYIN---------------------------------
 +
--GTKLLNV-------------IGMTR--------GKRDGI---------LKTEK-----
 +
I-KN-----------VVKVG----------------SM---NLK-------GVW------
 +
-------------IPFDRAYEIARNEGV-
 +
>Aps4_CANDU XP_002418552
 +
------------NIL-VSRREDTNYIN---------------------------------
 +
--GTKLLNV-------------IGMTR--------GKRDGI---------LKTEK-----
 +
I-KN-----------VVKVG----------------SM---NLK-------GVW------
 +
-------------IPFDRAYEIARNEGV-
 +
>Aps5_CANTR XP_002547473
 +
------------NIL-VSRREDSNYIN---------------------------------
 +
--GTKLLNV-------------IGMTR--------GKRDGI---------LKTEK-----
 +
V-KN-----------VVKVG----------------SM---NLK-------GVW------
 +
-------------IPFDRAYEIARNEGV-
 +
>Aps4_LODEL XP_001527061
 +
------------NIL-VSRREDTNYIN---------------------------------
 +
--CTKLLNV-------------VGMTR--------GKRDGI---------LKTEK-----
 +
V-KQ-----------VVKVG----------------SM---NLK-------GVW------
 +
-------------IPFDRAYEIARNEGV-
 +
>Aps3_MILFA XP_004203535
 +
------------GIL-VSRREDTNFVN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GKRDGI---------LKTEK-----
 +
T-KS-----------VIKVG----------------TM---NLK-------GVW------
 +
-------------IPFERAAEIARNEGI-
 +
>Aps4_DEBHA XP_460447
 +
------------GIL-VSRREDTNYVN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GKRDGI---------LKTEK-----
 +
T-KS-----------VVKVG----------------AM---NLK-------GVW------
 +
-------------IPFERASEIARNEGI-
 +
>Efh1_CANOR XP_003867732
 +
----------N-EIL-VSRREDNNYIN---------------------------------
 +
--CTKLLNV-------------TGMSR--------GKRDGI---------LKTEK-----
 +
V-KD-----------VVKVG----------------TM---NLK-------GVW------
 +
-------------VPFDRAYEIARNEGV-
 +
>Aps2_MEYGU XP_001486611
 +
------------GVL-VSRREDTNYIN---------------------------------
 +
--GTKLLNV-------------AGMSR--------GKRDGI---------LKTEK-----
 +
D-RY-----------VVRAG----------------AM---SLK-------GVW------
 +
-------------IPYERAKEIARNEGV-
 +
>Aps4_CLALU XP_002618164
 +
-------------VV-VSRREKDDYVN---------------------------------
 +
--GTKLLNV-------------TGMSR--------GKRDGL---------LKTEK-----
 +
G-RI-----------VVRNG----------------PM---NLK-------GVW------
 +
-------------IPFHRASEIARNEGV-
 +
>Aps1_ASPNI XP_663440
 +
----------K-GVC-VARREDNGMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
V-RN-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFDRALEFANKEKI-
 +
>Aps1_SCLSC XP_001590416
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
M-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALDFANKEKI-
 +
>Aps2_ARTBE XP_003013983
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
I-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALDFANKEKI-
 +
>Aps5_TRIRU XP_003238727
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
I-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALDFANKEKI-
 +
>Aps5_ARTGY XP_003176766
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
I-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALDFANKEKI-
 +
>Aps5_TALMA XP_002146488
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPYERALDFANKEKI-
 +
>Aps5_TALST XP_002478786
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPYERALDFANKEKI-
 +
>Aps1_COCIM XP_001247133
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALEFANKEKI-
 +
>Aps3_ASPNI XP_001390623
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALEFANKEKI-
 +
>Aps1_COCPO XP_003066203
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALEFANKEKI-
 +
>Aps3_ASPCL XP_001267726
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALEFANKEKI-
 +
>Aps3_NEOFI XP_001260304
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALEFANKEKI-
 +
>Aps4_ASPFU XP_755125
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALEFANKEKI-
 +
>Aps3_UNCRE XP_002541343
 +
----------K-GVC-VARREDNHMVN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
I-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALEFANKEKI-
 +
>Aps2_PYRTR XP_001932216
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
T-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALEFANKEKI-
 +
>Aps3_PYRTE XP_003306747
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
T-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALEFANKEKI-
 +
>Aps2_AJEDE XP_002621560
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
V-RN-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALEFANKEKI-
 +
>Aps1_ASPTE XP_001218256
 +
----------K-GVC-VARREDNSMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
I-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALEFANKEKI-
 +
>Aps2_ZYMTR XP_003851453
 +
----------N-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
T-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFDRALDFANKEKI-
 +
>Aps1_MYCTH XP_003661163
 +
------------GIC-VARREDNSMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALDFANKEKI-
 +
>Aps2_NEUCR XP_960837
 +
------------GIC-VARREDNAMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALDFANKEKI-
 +
>Aps3_SORMA XP_003343963
 +
------------GIC-VARREDNAMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALDFANKEKI-
 +
>Aps1_MAGOR XP_003718315
 +
------------GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
M-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALDFANKEKI-
 +
>Aps4_VERAL XP_003008681
 +
------------GIC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
L-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALDFANKEKI-
 +
>Aps3_THITE XP_003648650
 +
------------GIC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
I-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPFERALDFANKEKI-
 +
>Aps4_CHAGL XP_001219797
 +
------------GIC-VARREDNAMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPYDRALDFANKEKI-
 +
>Aps3_NECHA XP_003051234
 +
------------GIC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------PM---HLK-------GVW------
 +
-------------IPYDRALDFANKEKI-
 +
>Aps3_TRIVE XP_003018714
 +
----------K-GVC-VARREDNHMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKSEK-----
 +
I-RH-----------VVKIG----------------PM---HLK-------GVWYVESLL
 +
FLTQKYPELTSRRIPFERALDFANKEKI-
 +
>Aps5_YARLI XP_502292
 +
------------GIC-VARREDNDMIN---------------------------------
 +
--GTKLLNV-------------AGMTR--------GRRDGI---------LKGEK-----
 +
L-RH-----------VVKAG----------------AM---HLK-------GVW------
 +
-------------IPYDRALEFANKEKI-
 +
>Aps4_YARLI XP_501102
 +
------------GVC-VARREDNNMIN---------------------------------
 +
--GTKLLNV-------------VGMTR--------GRRDGI---------LKTEK-----
 +
I-RH-----------VVKIG----------------AM---HLK-------GVW------
 +
-------------IPYERALAFAQRERI-
 +
>Aps1_NAUDA XP_003668432
 +
----------N-GVS-VVRRADNDMIN---------------------------------
 +
--GTKLLNV-------------SKMTR--------GRRDGI---------LKAEK-----
 +
I-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERARIMAEKEKI-
 +
>Aps4_KAZAF XP_003954785
 +
----------N-GVS-VVRRADNDMIN---------------------------------
 +
--GTKLLNV-------------TKMTR--------GRRDGI---------LKAEK-----
 +
I-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERARYMAEKEKI-
 +
>Aps1_ZYGRO XP_002499194
 +
----------N-GVS-VVRRADNDMIN---------------------------------
 +
--GTKLLNV-------------AKITR--------GRRDGI---------LKAER-----
 +
I-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERAQVMAEREKI-
 +
>Aps2_TORDE XP_003679993
 +
----------N-GVS-VVRRADNDMIN---------------------------------
 +
--GTKLLNV-------------AKITR--------GRRDGI---------LKAER-----
 +
I-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERAHAMAQREKI-
 +
>Aps1_LACTH XP_002553055
 +
----------N-GVS-VVRRADNDMIN---------------------------------
 +
--GTKLLNV-------------AKMTR--------GRRDGI---------LKAEK-----
 +
I-RH-----------VVKVG----------------SM---HLK-------GVW------
 +
-------------IPFDRALAMAQREKI-
 +
>Aps2_ASHGO NP_983001
 +
----------N-GVS-VVRRADNDMIN---------------------------------
 +
--GTKLLNV-------------AKMTR--------GRRDGI---------LKAEK-----
 +
V-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERALALAQREKI-
 +
>Aps2_ERECY XP_003646434
 +
----------N-SVS-VVRRADNDMIN---------------------------------
 +
--GTKLLNV-------------AKMTR--------GRRDGI---------LKAEK-----
 +
V-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERALALAQREKI-
 +
>Sok2_SACCE NP_013729
 +
----------N-GIS-VVRRADNDMVN---------------------------------
 +
--GTKLLNV-------------TKMTR--------GRRDGI---------LKAEK-----
 +
I-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERALAIAQREKI-
 +
>Aps2_KLULA XP_455299
 +
----------N-GVS-VVRRADNDMIN---------------------------------
 +
--GTKLLNV-------------TRMTR--------GRRDGI---------LKAEK-----
 +
I-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERALVMAQREKI-
 +
>Aps1_VANPO XP_001643248
 +
----------N-GVS-VVRRADNDMIN---------------------------------
 +
--GTKLLNV-------------TKMTR--------GRRDGI---------LKAEK-----
 +
I-RH-----------VVKVG----------------SM---NLK-------GVW------
 +
-------------IPFERALLMAKKEKI-
 +
>Aps4_KOMPA XP_002490663
 +
----------N-GVS-VVRRADNNMIN---------------------------------
 +
--GTKLLNV-------------AKMTR--------GRRDGM---------LKSEK-----
 +
I-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFDRALAMAQKEHI-
 +
>Aps1_CANAL XP_714197
 +
----------N-NVS-VVRRADNNMIN---------------------------------
 +
--GTKLLNV-------------AQMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERALAMAQREQI-
 +
>Aps5_CANAL XP_714237
 +
----------N-NVS-VVRRADNNMIN---------------------------------
 +
--GTKLLNV-------------AQMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERALAMAQREQI-
 +
>Aps1_MEYGU XP_001484270
 +
----------N-NVS-VVRRADNNMIN---------------------------------
 +
--GTKLLNV-------------AQMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFDRALAMAQREGI-
 +
>Aps2_CLALU XP_002618588
 +
----------N-NVS-VVRRADNNMIN---------------------------------
 +
--GTKLLNV-------------AQMTR--------GRRDGI---------LKSEK-----
 +
I-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERALAMAQREGI-
 +
>Aps5_MILFA XP_004202992
 +
----------N-NVS-VVRRADNNMIN---------------------------------
 +
--GTKLLNV-------------AQMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERALAMAQREGI-
 +
>Aps2_SCHST XP_001383609
 +
----------N-NVS-VVRRADNNMIN---------------------------------
 +
--GTKLLNV-------------AQMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERALAMAQREGI-
 +
>Aps8_MILFA XP_004202373
 +
----------N-NVS-VVRRADNNMIN---------------------------------
 +
--GTKLLNV-------------AQMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERALAMAQREGI-
 +
>Aps5_DEBHA XP_459785
 +
----------N-NVS-VVRRADNNMIN---------------------------------
 +
--GTKLLNV-------------AQMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERALAMAQREGI-
 +
>Aps6_CANDU XP_002422294
 +
----------N-NVS-VVRRADNNMIN---------------------------------
 +
--GTKLLNV-------------AQMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERALVMAQREGI-
 +
>Efg1_CANOR XP_003870987
 +
----------N-NVS-VVRRADNNMIN---------------------------------
 +
--GTKLLNV-------------AQMTR--------GRRDGI---------LKSEK-----
 +
V-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERALSMAQRENI-
 +
>Aps2_LODEL XP_001523544
 +
----------N-NVS-VVRRADNNMIN---------------------------------
 +
--GTKLLNV-------------AQMTR--------GRRDGI---------LKLEK-----
 +
V-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERALTMAQRENI-
 +
>Aps2_NAUCA XP_003674209
 +
----------N-GVS-VVRRADNDMIN---------------------------------
 +
--GTKLLNV-------------TKMTR--------GRRDGI---------LKSEK-----
 +
I-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------VPFERARLMAGREHI-
 +
>Phd1_SACCE NP_012881
 +
----------N-GIS-VVRRADNNMIN---------------------------------
 +
--GTKLLNV-------------TKMTR--------GRRDGI---------LRSEK-----
 +
V-RE-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERAYILAQREQI-
 +
>Aps6_KAZAF XP_003955575
 +
----------N-GVS-VVRRADNDMIN---------------------------------
 +
--GTKLLNV-------------TKMTR--------GRRDGI---------LRGEK-----
 +
V-RN-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERAYLIAQREKI-
 +
>Aps3_CANGL XP_448847
 +
----------N-GVS-VVRRADNDMIN---------------------------------
 +
--GTKLLNV-------------TKMTR--------GKRDGI---------LRSEK-----
 +
Y-RK-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFERALFIAKREKI-
 +
>Aps5_NAUDA XP_003672610
 +
----------N-SVS-VIRRADNDMIN---------------------------------
 +
--GTKLLNV-------------TKMTR--------GRRDGI---------LRTEK-----
 +
I-RK-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPFDRAYEIARREKI-
 +
>Aps2_TETPH XP_003688350
 +
----------N-GIS-VVRRADNDMIN---------------------------------
 +
--GTKLLNV-------------TKMTR--------GRRDGI---------LKAEK-----
 +
T-RK-----------VVKMG----------------TL---NLK-------GVW------
 +
-------------IPFDRAYCIARREKI-
 +
>Aps1_NAUCA XP_003673416
 +
---------CN-GVA-VVRRADNDMIN---------------------------------
 +
--GTKLLNV-------------TKMTR--------GRRDGI---------LRAEK-----
 +
V-RS-----------VIKIG----------------SM---HLK-------GVW------
 +
-------------IPFDRALMMAKREKI-
 +
>Aps2_VANPO XP_001644666
 +
--------VVN-GIT-VLRRDDNNMIN---------------------------------
 +
--GTKLLNV-------------TKMTR--------GRRDRI---------LRAEK-----
 +
I-RH-----------VVKIG----------------SM---HLK-------GVW------
 +
-------------IPLERAKRMAQMENIY
 +
>Aps1_TETPH XP_003687180
 +
--------IAN-GVV-VLRRADNHMVN---------------------------------
 +
--GTKLLNV-------------TGMTR--------GRRDRM---------LRSEK-----
 +
E-RH-----------VVKVG----------------LM---HSK-------GVW------
 +
-------------IPLERARYLAEKTNI-
 +
>Aps2_CANGL XP_449680
 +
---------HN-GVT-VVRRADNDMVN---------------------------------
 +
--GTKLLNV-------------TGMTR--------GRRDGI---------LKNEP-----
 +
V-RD-----------VVKGG----------------PM---TLK-------GVW------
 +
-------------IPIDRARAIARQEGI-
 +
>Aps1_MALGL XP_001732538
 +
----------K-GVC-VARRHDNNMVN---------------------------------
 +
--GTKLLNV-------------CGMSR--------GKRDGI---------LKNEK-----
 +
E-RI-----------VVKVG----------------AM---HLK-------GVW------
 +
-------------IAFSRGKQLAEQHGI-
 +
>Aps4_PUCGR XP_003321545
 +
---------HK-GVT-VGRLKGSGLVN---------------------------------
 +
--GTKLLNL-------------AGISR--------GKRDGI---------LKNEK-----
 +
I-RK-----------VVKHG----------------TM---HLK-------GVW------
 +
-------------IAFDRAVFLAEQHSI-
 +
>Aps1_KOMPA XP_002493748
 +
--------VVQ-KIP-LSRRADNDYVN---------------------------------
 +
--ATKLLNL-------------TGMRR--------GRRDGI---------LKLEK-----
 +
Q-RQ-----------VVKTG----------------TI---DLK-------GVW------
 +
-------------VPLKRAIKLAKAEQVF
 +
>CdcA_SCHPO NP_596132
 +
------------GDNVALRRCPDSYFN---------------------------------
 +
--ISQILRL-------------AGTSS--------SENAKE---------LDDIIESG--
 +
D-YE-----------NVDSK----------------HP---QID-------GVW------
 +
-------------VPYDRAISIAKRYGVY
 +
>Aps3_SCHJA XP_002174002
 +
------------GKR-VLRRCSDSYVN---------------------------------
 +
--LSHVLQL-------------IGSSP--------MQIARE---------LDPIIAAG--
 +
D-FE-----------NVDGR----------------DA---ELN-------GVW------
 +
-------------VPLSRIGNICEKHGL-
 +
>Aps1_MILFA XP_004195060
 +
-------------VI-ILRRVQDSYVN---------------------------------
 +
--ISQLLSIL---------VKMGHFNQ--------TRLNNF---------LNNEIITN--
 +
P-QY-----------S--AE--EKGINVYVDWVDHEVR---QLR-------GLW------
 +
-------------IPYDKAVSLALKFDIY
 +
>Aps10_MILFA XP_004196154
 +
-------------VI-ILRRVQDSYVN---------------------------------
 +
--ISQLLSIL---------VKMGHFNQ--------TRLNNF---------LNNEIITN--
 +
P-QY-----------S--AD--EKGINVYVDWVDHEVK---QLR-------GLW------
 +
-------------ISYDKAVSLALKFDIY
 +
>Aps4_SCHST XP_001387125
 +
--------LDN-TVV-ILRRVQDSYVN---------------------------------
 +
--VTQLFGIL---------LKLGHFNE--------TQLNNF---------FNNEIVTN--
 +
I-QL-----------Q--GA--GTKNNHFLDLRKHENT---QLR-------GLW------
 +
-------------ISYDRAVALALQFDIY
 +
>Aps6_DEBHA XP_002770480
 +
---------DD-PIV-ILRRVQDSYIN---------------------------------
 +
--ISQLFSIL---------LKIGHLSE--------AQLTNF---------LNNEILTN--
 +
T-QY-----------L--SS--GGSNPQFNDLRNHEVR---DLR-------GLW------
 +
-------------IPYDRAVSLALKFDIY
 +
>Aps3_CANTR XP_002548922
 +
---------DE-ELI-ILRRVQDSFIN---------------------------------
 +
--VTQLFEIL---------VKLDLLTL--------SQLNNF---------FDNEILSN--
 +
L-KY-----------F--GS--STKNPQYLDLRSHENT---YIK-------GIW------
 +
-------------IPYDKAVELALKFDIY
 +
>Aps5_CANDU XP_002417464
 +
---------HN-EII-VLRRVQDSFVN---------------------------------
 +
--ITQLFQIL---------IKLDLLSA--------SQVNNY---------FDNEILSN--
 +
L-EY-----------F--GS--SSNTPQYLDLRKHQNT---FLQ-------GIW------
 +
-------------IPYDRAVNLALKFDVY
 +
>Aps7_CANAL XP_723412
 +
---------HG-EII-VLRRVQDSFVN---------------------------------
 +
--VTQLFQIL---------IKLEVLPT--------SQVDNY---------FDNEILSN--
 +
L-KY-----------F--GS--SSNTPQYLDLRKHQNI---YLQ-------GIW------
 +
-------------IPYDKAVNLALKFDIY
 +
>Aps1_CLALU XP_002617825
 +
---------DK-PIL-VLRRVQDSYVN---------------------------------
 +
--VSQMLEIL---------VLTGHFSK--------DQVSGF---------LRNEILHS--
 +
T-QY-----------LPRGN--PTHLASFNDFRTHAVE---QIR-------GLW------
 +
-------------IPYDKAVSIAVRFDLY
 +
>Swi6_CANOR XP_003866226
 +
------------EII-VLRRVQDSFIN---------------------------------
 +
--ASQLLKIL---------VRLHIVTP--------IQVKNY---------LNNEVLSN--
 +
L-EY-----------F--GNPVSKDNLQVLDYSKHENK---SLR-------GIW------
 +
-------------VPYNKGVKIALDFDVY
 +
>Aps5_MEYGU XP_001483939
 +
------------SLV-ILRRVQDSFVN---------------------------------
 +
--VSQLFSIL---------VRLGHSNP--------DQISSF---------LSNEILSS--
 +
S-HY-----------T--GS--IEGSVFYNDFRSHENP---MLQ-------GLW------
 +
-------------VSYDRAVALALRFDIY
 +
>Aps4_SCHPO NP_596166
 +
--------------HFLMRMAKDSSIS---------------------------------
 +
--ATSMFRSA---------FPKATQEE--------EDLEMR---------WIRDNLNP--
 +
I-ED-----------KRVA--------------------------------GLW------
 +
-------------VPPADALALAKDYSM-
 +
>Aps4_SCHJA XP_002172515
 +
------------NPHFLMRMAKNSHIS---------------------------------
 +
--ATSMFRSA---------FPKATPEE--------EEAEMS---------WIQQHLHP--
 +
V-EE-----------KQVS--------------------------------GLW------
 +
-------------VSPEDALALAKDYHM-
 +
>Aps2_ASPNI XP_657766
 +
-------------TYFLMRRSKDGYVS---------------------------------
 +
--ATGMFKIA---------FPWAKLEE--------ERSERE---------YLKTRPET--
 +
S-ED-----------EIAG--------------------------------NVW------
 +
-------------ISPVLALELAAEYKMY
 +
>Aps4_ASPNI XP_001398916
 +
-------------TYFLMRRSKDGFVS---------------------------------
 +
--ATGMFKIA---------FPWAKLDE--------ERSERE---------YLKTRTET--
 +
S-ED-----------EIAG--------------------------------NVW------
 +
-------------ISPLLALELAKEYQMY
 +
>Aps4_ASPCL XP_001274436
 +
-------------TYFLMRRSKDGYVS---------------------------------
 +
--ATGMFKIA---------FPWAKLEE--------EKAERE---------YLKSRDET--
 +
S-ED-----------EIAG--------------------------------NIW------
 +
-------------ISPTLALELAKEYQMY
 +
>Aps2_ASPFU XP_753510
 +
-------------TYFLMRRSKDGYVS---------------------------------
 +
--ATGMFKIA---------FPWAKLEE--------EKAERE---------YLKTREGT--
 +
S-ED-----------EIAG--------------------------------NIW------
 +
-------------VSPLLALELAKEYQMY
 +
>Aps5_NEOFI XP_001259554
 +
-------------TYFLMRRSKDGYVS---------------------------------
 +
--ATGMFKIA---------FPWAKLEE--------EKAERE---------YLKTREGT--
 +
S-ED-----------EIAG--------------------------------NIW------
 +
-------------VSPLLALELAKEYQMY
 +
>Aps3_ASPTE XP_001216355
 +
-------------TYFLM----DGYVS---------------------------------
 +
--ATGMFKIA---------FPWAKLDE--------ERSERE---------YLKSREET--
 +
S-ED-----------EIAG--------------------------------NVW------
 +
-------------ISPKLALELAGEYQMY
 +
>Aps3_TALMA XP_002144963
 +
-------------TYFLMRRSKDGYIS---------------------------------
 +
--ATGMFKIA---------FPWAKAEE--------EKTERE---------YVKSKTET--
 +
S-ID-----------ETAG--------------------------------NLW------
 +
-------------ISPLLALELAKEYQM-
 +
>Aps4_TALST XP_002340417
 +
-------------TYFLMRRSKDGYIS---------------------------------
 +
--ATGMFKIA---------FPWAKAEE--------EKAERE---------YVKSKTET--
 +
S-VD-----------ETAG--------------------------------NLW------
 +
-------------ISPMLALELAKEYQM-
 +
>Aps1_UNCRE XP_002584504
 +
-------------TYFLMRRSKDGYVS---------------------------------
 +
--ATGMFKIA---------FPWAKQAE--------EKGERE---------YLRGHPNT--
 +
S-SD-----------ETAG--------------------------------NLW------
 +
-------------ISPELALELAEEYKM-
 +
>Aps3_COCIM XP_001239522
 +
-------------TYFLMRRSKDGYVS---------------------------------
 +
--ATGMFKIA---------FPWAKLAD--------EKSERE---------YLRGLPET--
 +
S-PD-----------EVAG--------------------------------NLW------
 +
-------------ISPELALELAEEYRM-
 +
>Aps2_COCPO XP_003067108
 +
-------------TYFLMRRSKDGYVS---------------------------------
 +
--ATGMFKIA---------FPWAKLAD--------EKSERE---------YLRGLPET--
 +
S-PD-----------EVAG--------------------------------NLW------
 +
-------------ISPELALELAEEYRM-
 +
>Aps1_ARTGY XP_003175741
 +
-------------SYFLMRRSRDGHIS---------------------------------
 +
--ASGMFKIA---------FPWAKHSE--------ESDERD---------YLRTRPET--
 +
S-ED-----------EIAG--------------------------------NVW------
 +
-------------ISPELALELAREYGI-
 +
>Aps4_TRIRU XP_003234496
 +
-------------SYFLMRRSRDGHIS---------------------------------
 +
--ASGMFKIA---------FPWAKHSE--------EADERE---------YLRTRPET--
 +
S-ED-----------EIAG--------------------------------NVW------
 +
-------------ISPELALELAREYGI-
 +
>Aps1_CHAGL XP_001223374
 +
------------PSYFLMRRSHDGFVS---------------------------------
 +
--ATGMFKG-------------------------------------------HSLPST--
 +
S-HE-----------ETAG--------------------------------NVW------
 +
-------------IPPEEALVLAEEYNI-
 +
>Aps2_NECHA XP_003046455
 +
------------NSYFLMRRSFDGYVS---------------------------------
 +
--ATGMFKAT---------FPYAEAAD--------EEAERK---------FIKSLATT--
 +
S-PE-----------ETAG--------------------------------NIW------
 +
-------------IPPEQALALADEYQI-
 +
>Aps2_SORMA XP_003346507
 +
------------PSYFLMRRSQDGYIS---------------------------------
 +
--ATGMFKAT---------FPYASTEE--------EEAERK---------YIKSLPTT--
 +
S-HE-----------ETAG--------------------------------NVW------
 +
-------------IPPEQALILAEEYQI-
 +
>Aps3_NEUCR XP_962267
 +
------------PSYFLMRRSQDGYIS---------------------------------
 +
--ATGMFKAT---------FPYASQEE--------EEAERK---------YIKSIPTT--
 +
S-SE-----------ETAG--------------------------------NVW------
 +
-------------IPPEQALILAEEYQI-
 +
>Aps4_MYCTH XP_003666082
 +
------------PSYFLMRRSEDGYVS---------------------------------
 +
--ATGMFKAT---------FPYATQEE--------EEAERK---------YIKSLPST--
 +
S-PE-----------ETAG--------------------------------NVW------
 +
-------------IPPEQALILAEEYQI-
 +
>Aps2_THITE XP_003652670
 +
------------PSYFLMRRSVDGFVS---------------------------------
 +
--ATGMFKAT---------FPYATQEE--------EEAERK---------YIRSLSST--
 +
S-PE-----------ETAG--------------------------------NVW------
 +
-------------IPPEQALALAEDYKI-
 +
>Aps2_VERAL XP_003009662
 +
------------NSYFLMRRSHDGYVS---------------------------------
 +
--ATGMFKAT---------YPYAEAHE--------EETERR---------YIKSLPST--
 +
S-PE-----------ETAG--------------------------------NVW------
 +
-------------IPPDHALSLAEEYGV-
 +
>Aps4_MAGOR XP_003714678
 +
------------NAYFLMRRSSDGYVS---------------------------------
 +
--ATGMFKAT---------FPYADAED--------EEAERN---------YIKSLPAT--
 +
S-KE-----------ETAG--------------------------------NVW------
 +
-------------ISPDQALALAEEYSI-
 +
>Aps2_SCLSC XP_001590771
 +
-------------SYFLMRRSSDGYIS---------------------------------
 +
--ATGMFKAT---------FPYAEAAE--------EEMERR---------YIKSLPTT--
 +
S-VD-----------ETAG--------------------------------NVW------
 +
-------------IPPHHALELAEEYQI-
 +
>Aps4_ZYMTR XP_003849371
 +
--------------YFLMRRSSDGFIS---------------------------------
 +
--ATGMFKAA---------FPYAQQEE--------ELLEKD---------YIKSLPAA--
 +
S-SE-----------EVAG--------------------------------NVW------
 +
-------------IDAHKALELADEYGI-
 +
>Aps2_PYRTE XP_003304936
 +
-------------SYFLMRRSSDGYIS---------------------------------
 +
--ATGMFKAA---------FPWASLIE--------EDAERK---------YQKTFPSA--
 +
G-AE-----------EVAG--------------------------------SVW------
 +
-------------IAPEEALALSEEYGM-
 +
>Aps5_PYRTR XP_001939200
 +
-------------SYFLMRRSSDGYIS---------------------------------
 +
--ATGMFKAA---------FPWASLIE--------EDAERK---------YQKTFPSA--
 +
G-AE-----------EVAG--------------------------------SVW------
 +
-------------IAPEEALALSEEYGM-
 +
>Aps4_USTMA XP_760925
 +
-----------VRGHTMMIDVDTSFVR---------------------------------
 +
--FTSITQAL-------------GKNK--------VNFGRL---------VKTCP-ALDP
 +
H-IT-----------KLKGG----------------YL---SIQ-------GTW------
 +
-------------LPFDLAKELSRR----
 +
>Aps1_CANTR XP_002547216
 +
------------NNHWVIWDYETGWVH---------------------------------
 +
--LTGIWKASLNVE---EANVSPSHMK--------ADIVKL---------LESTPKEYQH
 +
Y-IK-----------RIRGG----------------FL---KIQ-------GTW------
 +
-------------LPYKLCKILARRFCYH
 +
>Aps2_CANDU XP_002418509
 +
------------NNHWVIWDYETGWVH---------------------------------
 +
--LTGIWKASLSTD---ESNVSPSHLK--------ADIVKL---------LESTPKEYQQ
 +
Y-IK-----------RIRGG----------------FL---KIQ-------GTW------
 +
-------------LPFKLCKILARRFCYY
 +
>Aps4_CANAL XP_710918
 +
------------NNHWVIWDYETGWVH---------------------------------
 +
--LTGIWKASLTID---GSNVSPSHLK--------ADIVKL---------LESTPKEYQQ
 +
Y-IK-----------RIRGG----------------FL---KIQ-------GTW------
 +
-------------LPYKLCKILARRFCYY
 +
>Aps1_CANOR XP_003866742
 +
------------NDHWVIWDYETGFVH---------------------------------
 +
--LTGIWKASLNVDG--EAPPCASHFK--------ADIVKL---------LESTPKQYQA
 +
Y-IK-----------RIRGG----------------FL---KIQ-------GTW------
 +
-------------LPFKLCKILARRFCY-
 +
>Aps2_DEBHA XP_002770462
 +
------------NNHWIIWDYETGFVH---------------------------------
 +
--LTGIWKASIN-----DEVNTHRNLK--------ADIVKL---------LESTPKQYHQ
 +
H-IK-----------RIRGG----------------FL---KIQ-------GTW------
 +
-------------LPFDLCKMLAKRFCYH
 +
>Aps4_MILFA XP_004202980
 +
------------NNQWIIWDYETSLVH---------------------------------
 +
--LTGIWKASFI-----DESSGSKSVK--------ADIMKL---------LESTPKQYHS
 +
N-IK-----------RIRGG----------------YL---KIQ-------GTW------
 +
-------------MPYGLCKVLARRFCYH
 +
>Aps6_MILFA XP_004202360
 +
------------NNQWIIWDYETGLVH---------------------------------
 +
--LTGIWKASFI-----DEQSGSKSVK--------ADIMKL---------LESTPKQYHS
 +
N-IK-----------RIRGG----------------FL---KIQ-------GTW------
 +
-------------MPYDLCKVLARRFCYH
 +
>Aps4_MEYGU XP_001484277
 +
------------NGQSIIWDYESGYVH---------------------------------
 +
--LTGIWKAAIHHP---DNDLPKSNSK--------ADIVKL---------LESTPRQHQA
 +
K-IK-----------RIRGG----------------FL---KIQ-------GTW------
 +
-------------LPYSLCRILARRFCYH
 +
>Aps1_YARLI XP_505499
 +
------------NNQWIIWDYHTGYVH---------------------------------
 +
--LTGLWKAI-------------GNSK--------ADIVKL---------IDNSP-DLEA
 +
V-IR-----------RVRGG----------------YL---KIQ-------GTW------
 +
-------------VPYDIARALASRTCYF
 +
>Aps3_CLALU XP_002618622
 +
-------------SQWIIWDHETGNVL---------------------------------
 +
--LTSLWRAAQQHSPQADHDKLRAPPK--------ADIVKL---------LESTPKELHA
 +
S-IK-----------RVRGG----------------FL---KIQ-------GTW------
 +
-------------VPHALCRRLARRFCYY
 +
>Aps1_PUCGR XP_003330006
 +
------------NGQYIMIDCETGMVH---------------------------------
 +
--FTGIWKAL-------------GHTK--------ADVVKL---------VESDP-TIAP
 +
Y-LR-----------KVRGG----------------YL---KIQ-------GTW------
 +
-------------LPFDTAQTLARR----
 +
>Aps1_TALMA XP_002145833
 +
------------KTWTMMWDYNIGLVR---------------------------------
 +
--TTHLFKCL-------------DYPK--------TTPAKM---------LNSNE-GLRD
 +
I-CH-----------SITGG----------------AL---AAQ-------GYW------
 +
-------------MPFETAKAVAATFC-Y
 +
>Aps1_TALST XP_002478097
 +
--------------WTIMWDYNIGLVR---------------------------------
 +
--TTHLFKCL-------------DYPK--------TTPAKM---------LNANE-GLRD
 +
I-CH-----------SITGG----------------AL---AAQ-------GYW------
 +
-------------MPFETAKAVAATFC-Y
 +
>Aps2_COCIM XP_001249063
 +
-----------DKIHTVMWDYNVGLVR---------------------------------
 +
--TTSLFKCN-------------NYPK--------TAPGKM---------LDANR-GLRE
 +
I-CH-----------SITGG----------------AL---AAQ-------GYW------
 +
-------------MPFEAAKAVAATFC--
 +
>Aps3_COCPO XP_003071043
 +
-----------DKIHTVMWDYNVGLVR---------------------------------
 +
--TTSLFKCN-------------NYPK--------TAPGKM---------LDANR-GLRE
 +
I-CH-----------SITGG----------------AL---AAQ-------GYW------
 +
-------------MPFEAAKAVAATFC--
 +
>Aps3_ARTGY XP_003173310
 +
-----------DKVYTVMWDYNIGLVR---------------------------------
 +
--TTSLFRCN-------------NYSK--------TAPAKM---------LNANP-GLRE
 +
I-CH-----------SITGG----------------AL---AAQ-------GYW------
 +
-------------MPFEAAKAVAATFC--
 +
>Aps2_TRIRU XP_003239491
 +
-----------DKVYTVMWDYNIGLVR---------------------------------
 +
--TTSLFRCN-------------NYSK--------TAPAKM---------LNANP-GLRE
 +
I-CH-----------SITGG----------------AL---AAQ-------GYW------
 +
-------------MPFEAAKAVAATFC--
 +
>Aps3_AJEDE XP_002620782
 +
-----------DKTYTVMWDYNIGLVR---------------------------------
 +
--TTSLFRCN-------------NYSK--------TAPAKM---------LNANP-GLRE
 +
I-CH-----------SITGG----------------AL---AAQ-------GYW------
 +
-------------MPFEAAKAVAATFC--
 +
>Aps2_NEOFI XP_001258507
 +
------------KEWIVMWDYNIGIVR---------------------------------
 +
--TTHLFKCN-------------DYSK--------TTPAKM---------LNANP-GLRE
 +
I-CH-----------SITGG----------------AL---AAQ-------GYW------
 +
-------------MPYEAAKAVAATFC--
 +
>Aps2_ASPCL XP_001268422
 +
------------KEWTVMWDYNIGLVR---------------------------------
 +
--TTHLFKCN-------------DYSK--------TTPAKM---------LNLNP-GLRE
 +
I-CH-----------SITGG----------------AL---AAQ-------GYW------
 +
-------------MPFEAAKAVAATFC--
 +
>Aps7_ASPNI XP_663009
 +
------------KQWTVMWDYNIGLVR---------------------------------
 +
--TTHLFKCN-------------DYSK--------TTPAKM---------LNQNP-GLRD
 +
I-CH-----------SITGG----------------AL---AAQ-------GYW------
 +
-------------MPYEAAKAIAATFC--
 +
>Aps3_ASPFU XP_751244
 +
------------KEWIVMWDYNIGLVR---------------------------------
 +
--TTHLFKCN-------------DYS-------------KM---------LNANP-GLRE
 +
I-CH-----------SITGG----------------AL---AAQ-------GYW------
 +
-------------MPYEAAKAVAATFC--
 +
>Aps2_ASPTE XP_001212599
 +
-----------DKEWLIMWDYNIGLVR---------------------------------
 +
--TTPLFRSQ-------------NYSK--------TTPAKV---------LDANP-GLRE
 +
I-SH-----------SITGG----------------AI---VAQDKP----GYW------
 +
-------------IPFEAAKAVAATFC--
 +
>Aps1_PYRTR XP_001933008
 +
-----------DKEYVVVWDYNIGLVR---------------------------------
 +
--MTPFFKSC-------------KYSK--------TIPAKA---------LRENP-GLKE
 +
I-SY-----------SITGG----------------AL---VCQ-------GYW------
 +
-------------MPYHAAKAIAATFC-Y
 +
>Aps4_PYRTE XP_003300482
 +
-----------DKEYVVVWDYNVGLVR---------------------------------
 +
--MTPFFKSC-------------KYSK--------TIPAKA---------LRENP-GLKE
 +
I-SY-----------SITGG----------------AL---VCQ-------GYW------
 +
-------------MPYHAARAIAATFC-Y
 +
>Aps1_NECHA XP_003046049
 +
-----------DTEYAVMWDYNVGLVR---------------------------------
 +
--MTPFFKCC-------------RYGK--------TIPAKM---------LGLNQ-GLKE
 +
I-TH-----------SITGG----------------SI---AAQ-------GYW------
 +
-------------MPYQCARAVCATFC-Y
 +
>Aps3_SCLSC XP_001597731
 +
-----------DKDYTVMWDYNVGLVR---------------------------------
 +
--ITPFFKCC-------------KYSK--------TTPAKM---------LGLNP-GLKE
 +
I-TH-----------SITGG----------------AL---AAQ-------GYW------
 +
-------------MPYSCALAVCTTFCSH
 +
>Aps3_VERAL XP_003009274
 +
----------VDAEFMVMWDYNIGLVR---------------------------------
 +
--MTPFFKCC-------------KYGKALLTGVLETVPAKM---------LSLNP-GLKD
 +
I-TH-----------SITGG----------------AI---LAQ-------GYW------
 +
-------------MPYNCAKAVCATFC-Y
 +
>Aps2_CHAGL XP_001223147
 +
-------------SYTVMWDYN--------------------------------------
 +
-----------------------------------TAPAKM---------LNLNP-GLKD
 +
I-TY-----------SITGG----------------SI---KAQ-------GYW------
 +
-------------MPYSCAKAVCATFC--
 +
>Aps2_MYCTH XP_003665914
 +
-----------DTDYTVMWDHNVGLVR---------------------------------
 +
--MTPFFKCR-------------GYSK--------TTPAKM---------LNLNP-GLKD
 +
I-TY-----------SITGG----------------SI---KAQ-------GYW------
 +
-------------MPYSCAKAVCATFC--
 +
>Aps5_ASPNI XP_001392970
 +
------------KTWVISWDYNVGLVL---------------------------------
 +
--TRSLFKCN-------------GHPK--------TAPAKV---------LKMNP-GLGD
 +
I-SH-----------SITGG----------------AL---VGQ-------GYW------
 +
-------------MPFRAAKALATTFC--
 +
>Aps2_NAUDA XP_003672783
 +
--------------SDLHWNNISSNIKNF-------------------------------
 +
--LCDSFKQY-----------LTKREN----------IPAE---------TLKNL-TLSM
 +
L-IQ-----------RIRGG----------------YI---KIQ-------GTW------
 +
-------------LPMEICRSLCLRFC--
 +
>Aps3_NAUCA XP_003677631
 +
--------------SDLHWNNMSPDLQKF-------------------------------
 +
--ITESFKKD-----------LIINKH----------CNEQ---------DLKDL-NLSN
 +
L-IQ-----------RIRGG----------------YI---KIQ-------GTW------
 +
-------------LPLEIARLLSLRFC--
 +
>Aps5_KAZAF XP_003958883
 +
-----------------HWNNLSKELKNL-------------------------------
 +
--ILKNFKDF-----------LINEKH----------LTEE---------NLLNY-NLNN
 +
L-IQ-----------RIRGG----------------YI---KIQ-------GTW------
 +
-------------LPMEIAKLICSRFC--
 +
>Xbp1_SACCE NP_012165
 +
---------------DFHWNNIKPELRDL-------------------------------
 +
--ICQSYKDF-----------LINELG----------PDQI---------DLPNL-NPAN
 +
F-TK-----------RIRGG----------------YI---KIQ-------GTW------
 +
-------------LPMEISRLLCLRFC--
 +
>Aps6_VANPO XP_001644581
 +
-----------------HWNNISNELKDF-------------------------------
 +
--LLITFKDY-----------LRIKRN----------LPES---------QLTNL-TIYD
 +
L-IQ-----------RIRGG----------------YI---KIQ-------GTW------
 +
-------------LPWEISRILCIRFC-Y
 +
>Aps3_TETPH XP_003684917
 +
-----------------HWANVSNYLKEE-------------------------------
 +
--LLIVFKNY-----------ILNGEN--------DGVNTD---------KMQNL-SIYD
 +
L-IN-----------RIRGG----------------YI---KIQ-------GTW------
 +
-------------LPWIMAKEICKRFC--
 +
>Aps4_NAUCA XP_003675086
 +
--------------KDFHWNNLPPILKEQ-------------------------------
 +
--AINHFRNI-----------LQMEKG----------ITSD---------YLASM-KDCD
 +
F-CQ-----------RIRGG----------------YI---KIQ-------GTW------
 +
-------------LPIEMAKLICTKFC--
 +
>Aps2_TETBL XP_004181697
 +
--------------------------KDT-------------------------------
 +
--LVDGYRAF-----------LCRQYP----------EHAE---------ELRHV-PFAS
 +
L-LQ-----------RIRGG----------------YI---KIQ-------GTW------
 +
-------------LPYEVSRQICTRFC--
 +
>Aps1_ERECY XP_003645620
 +
--------------TDVHWNQLDPAWKQQINPNNVILWDYKTGYVFFTGIWRLYQDVMRA
 +
MCLCQMFQEI-----------RKNMPR--------TGSSEH---------LDFTL-DFQD
 +
C-YKEEENSQKRLWQRIRGG----------------YICVKKIQ-------GTW------
 +
-------------LPLEISRQLCTRFC--
 +
>Aps4_ASHGO NP_983869
 +
--------------TDVHWNQVDPTWKQR-------------------------------
 +
--LCRLYQQ-----------------------------EKN---------LDFTP-EFQD
 +
C-YK-----------RIRGG----------------YI---KIQ-------GTW------
 +
-------------LPMEICKRLCIRFC--
 +
>Aps1_CANGL XP_446482
 +
---------------DFHWFDISEKVRSQ-------------------------------
 +
--IFEQFKQH-----------LEKDRN----------VDCS---------TIP---KAEE
 +
Y-IQ-----------RIRGG----------------YI---KIQ-------GTW------
 +
-------------VPWYIAKLICIRFC--
 +
>Aps7_KAZAF XP_003959346
 +
ISNKKSTLLRKDRYIELHWQNITATMKTQ-------------------------------
 +
--LFNEFKNY----------VLEHEPN----------VDAT---------LFQNY-NMAD
 +
L-IH-----------RIRGG----------------CI---KVQ-------GTW------
 +
-------------FPMELAKLFCIKF---
 +
>KilA_ESCCO WP_000200358
 +
-------------------RAKDGYIN---------------------------------
 +
--ATSMCRT-------------AGKLL--------SDYTRLKTTQEFFDELSRDMGIPIS
 +
ELIQ-----------SFKGG----------------RP---ENQ-------GTW------
 +
-------------VHPDIAINLAQ-----
 +
</pre>
  
====The final 74 sequences====
+
-->
  
>MBP1_SACCE NP_010227 024..107
+
[[Category:Bioinformatics]]
SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDG
+
</div>
>MBP1_YARLI XP_500257 022..105
 
AVMRRKSDGWVNATHILKVAGFDKPQRTRILEKEVQKGVHEKVQGGYGKYQGTWVPLERAREIATLYDVDSHLAPIFNYDDEDG
 
>5821_NEUCR XP_955821 037..118
 
VMRRRHDDWVNATHILKAAGFDKPARTRILEREVQKDTHEKIQGGYGRYQGTWIPLEQAEALARRNNIYERLKPIFEFQPGN
 
>9090_CRYNE XP_569090 036..117
 
AVMRRRSDAYLNATQILKVAGFDKPQRTRVLEREVQKGEHEKVQGGYGKYQGTWIPIERGLALAKQYGVEDILRPIIDYVPT
 
>MBP1_ASPNI XP_660758 028..110
 
SVMRRRSDDWINATHILKVAGFDKPARTRILEREVQKGVHEKVQGGYGKYQGTWIPLQEGRQLAERNNILDKLLPIFDYVAGD
 
>MBP1_KLULA XP_454189 025..108
 
SIMKRKADNWVNATHILKAAKFPKAKRTRILEKEVITDTHEKVQGGFGKYQGTWIPLELASKLAEKFEVLDELKPLFDFTQQEG
 
>MBP1_GIBZE XP_384396 045..129
 
AVMRRRNDSWLNATQILKVAGVDKGKRTKILEKEIQTGEHEKVQGGYGKYQGTWIKFERGLQVCRQYGVEELLRPLLTYDMGQDG
 
>MBP1_ASPTE XP_001213217 028..110
 
SVMRRRADDWINATHILKVAGFDKPARTRILEREVQKGVHEKVQGGYGKYQGTWIPLPEGRLLAERNNIIDKLRPIFDYVAGD
 
>MBP1_CANAL XP_723071 026..108
 
IMRRKKDSWINATHILKIAKFPKAKRTRILEKDVQTGIHEKVQGGYGKYQGTYVPLDLGAAIARNFGVYDVLKPIFEFQYIEG
 
>MBP1_CANGL XP_445458 024..107
 
SIMKRKNDGWVNATHILKAANFAKAKRTRILEKEVLKEMHEKVQGGFGKYQGTWVPLNIAINLAEKFDVYQDLKPLFDFSEENG
 
>1770_YARLI XP_501770 036..116
 
AVMRRRTDSSLNATQILKVAGVEKSKRTKILEKEILTGAHEKVQGGYGKYQGTWIPYERGVDLCRQYSVYDVLQPLLAFDP
 
>2974_MAGGR XP_362974 121..199
 
VMRRRVDDWINATHILKAAGFDKPARTRILEREVQKDQHEKVQGGYGKYQGTWIPLEAGEALAHRNNIFDRLRPIFEFS
 
>1485_USTMA XP_761485 182..262
 
AVMRRRGDGWLNATQILKIAGIEKTRRTKILEKSILTGEHEKIQGGYGKFQGTWIPLQRAQQVAAEYNVSHLLQPILEFDP
 
>MBP1_USTMA XP_762343 026..107
 
AVMRRRSDDWLNATQILKVVGLDKPQRTRVLEREIQKGIHEKVQGGYGKYQGTWIPLDVAIELAERYNIQGLLQPITSYVPS
 
>0560_GIBZE XP_390560 040..120
 
VMRRRSDDWINATHILKAAGFDKPARTRILERDVQKDVHEKIQGGYGKYQGTWIPLESGQALAERHSVIDRLRPIFEYVQG
 
>4232_ASPFU XP_754232 001..081
 
MRRRGDDWINATHILKVAGFDKPARTRILEREVQKGTHEKVQGGYGKYQGTWIPLHEGRLLAERNNIIDKLRPIFDYVAGD
 
>MBP1_CRYNE XP_570545 133..214
 
SVMRRASDSWVNATQILKVAGVHKSARTKILEKEVLNGIHEKIQGGYGKYQGTWVPLDRGRDLAEQYGVGSYLSSVFDFVPS
 
>MBP1_NEUCR XP_962967 071..155
 
AVMRRQKDGWVNATQILKVANIDKGRRTKILEKEIQIGEHEKVQGGYGKYQGTWIPFERGLEVCRQYGVEELLSKLLTHNRGQEG
 
>MBP1_DEBHA XP_458784 027..109
 
IMRRKLDSWINATHILKIAKFPKAKRTRILEKDVQTGVHEKVQGGYGKYQGTYVPLDLGADIAKNFGVFDSLRPIFEFTYVEG
 
>2876_CANAL XP_712876 006..088
 
SIMRRCKDDWVNATQILKCCNFPKAKRTKILEKGVQQGLHEKVQGGFGRFQGTWIPLEDARRLAKTYGVTEELAPVLFLDFSD
 
>MBP1_MAGGR XP_365024 131..210
 
AVMKRIGDSKLNATQILKVAGVEKGKRTKILEKEIQTGEHEKVQGGYGKYQGTWIKYERALEVCRQYGVEELLRPLLEYN
 
>4319_ASPNI XP_664319 119..198
 
AVMKRRSDGWLNATQILKVAGVVKARRTKTLEKEIAAGEHEKVQGGYGKYQGTWVNYQRGVELCREYHVEELLRPLLEYD
 
>MBP1_ASPFU XP_748947 105..184
 
AVMKRRSDSWLNATQILKVAGVVKARRTKTLEKEIAAGEHEKVQGGYGKYQGTWVNYQRGVELCREYHVEELLRPLLEYD
 
>MBP1_SCHPO NP_593032 027..110
 
SVMRRRRDSWLNATQILKVADFDKPQRTRVLERQVQIGAHEKVQGGYGKYQGTWVPFQRGVDLATKYKVDGIMSPILSLDIDEG
 
>5548_ASPTE XP_001215548 007..086
 
AVMKRRSDSWLNATQILKVAGVVKARRTKTLEKEIAAGEHEKVQGGYGKYQGTWVNYQRGVDLCREYHVEELLRPLLEYD
 
>5496_SCHPO NP_595496 026..106
 
LMKRCHDNWLNATQILKIAELDKPRRTRILEKFAQKGLHEKIQGGCGKYQGTWVPSERAVELAHEYNVFDLIQPLIEYSGS
 
>7246_DEBHA XP_457246 028..109
 
IMRRCKDDWVNATQILKCCNFPKAKRTKILEKGVQQGLHEKIQGGYGRFQGTWIPLADAQRLAASYGVTPDLAPVLYLDASD
 
>MBP1_EREGO NP_986147 031..114
 
SIMKRKADDWVNATHILKAAKFAKAKRTRILEKEVIKDTHEKVQGGFGKYQGTWVPLDIARRLAQKFEVLEELRPLFDFTRRDG
 
>6370_EREGO NP_986370 043..124
 
VMRRLHDDWVNITQVFKVATFSKTQRTKILEKESADISHEKIQGGYGRFQGTWIPLDSAKGLVAKYEITDIVVLTVINFQPD
 
>SWI4_SACCE NP_011036 060..141
 
VMRRTKDDWINITQVFKIAQFSKTKRTKILEKESNDMQHEKVQGGYGRFQGTWIPLDSAKFLVNKYEIIDPVVNSILTFQFD
 
>4890_KLULA XP_454890 119..200
 
IMRRCNDNWLNITQVFKAGSFTKAQRTKILEKEANEIKHEKIQGGYGRFQGTWIPWESTKYLVEKYNINNKVVKRIVEFIPD
 
>4966_CANGL XP_444966 062..140
 
VMRRTMDDWVNVTQVFKIAQFSKTQRTKILEKESTNMKHEKVQGGYGRFQGTWVPLEAAKFMTTKYNIDNPVVNTILSF
 
>9785_DEBHA XP_459785 307..380
 
SVVRRADNNMINGTKLLNVAQMTRGRRDGILKSEKVRHVVKIGSMHLKGVWIPFERALAMAQREGIVDLLYPLF
 
>3009_ASPNI XP_663009 131..216
 
TVMWDYNIGLVRTTHLFKCNDYSKTTPAKMLNQNPGLRDICHSITGGALAAQGYWMPYEAAKAIAATFCWKIRFALTPLFGDNFPD
 
>SOK2_SACCE NP_013729 436..509
 
SVVRRADNDMVNGTKLLNVTKMTRGRRDGILKAEKIRHVVKIGSMHLKGVWIPFERALAIAQREKIADYLYPLF
 
>9680_CANGL XP_449680 143..216
 
TVVRRADNDMVNGTKLLNVTGMTRGRRDGILKNEPVRDVVKGGPMTLKGVWIPIDRARAIARQEGIEQWLYPLF
 
>3001_EREGO NP_983001 352..425
 
SVVRRADNDMINGTKLLNVAKMTRGRRDGILKAEKVRHVVKIGSMHLKGVWIPFERALALAQREKIVDMLFPLF
 
>4197_CANAL XP_714197 227..300
 
SVVRRADNNMINGTKLLNVAQMTRGRRDGILKSEKVRHVVKIGSMHLKGVWIPFERALAMAQREQIVDMLYPLF
 
>4237_CANAL XP_714237 228..301
 
SVVRRADNNMINGTKLLNVAQMTRGRRDGILKSEKVRHVVKIGSMHLKGVWIPFERALAMAQREQIVDMLYPLF
 
>8256_ASPTE XP_001218256 139..211
 
VARREDNSMINGTKLLNVAGMTRGRRDGILKSEKIRHVVKIGPMHLKGVWIPFERALEFANKEKITDLLYPLF
 
>3440_ASPNI XP_663440 152..224
 
VARREDNGMINGTKLLNVAGMTRGRRDGILKSEKVRNVVKIGPMHLKGVWIPFDRALEFANKEKITDLLYPLF
 
>2292_YARLI XP_502292 285..357
 
VARREDNDMINGTKLLNVAGMTRGRRDGILKGEKLRHVVKAGAMHLKGVWIPYDRALEFANKEKIIDLLFPLF
 
>1102_YARLI XP_501102 130..202
 
VARREDNNMINGTKLLNVVGMTRGRRDGILKTEKIRHVVKIGAMHLKGVWIPYERALAFAQRERIVDVLYPLF
 
>5125_ASPFU XP_755125 152..224
 
VARREDNHMINGTKLLNVAGMTRGRRDGILKSEKVRHVVKIGPMHLKGVWIPFERALEFANKEKITDLLYPLF
 
>PHD1_SACCE NP_012881 208..281
 
SVVRRADNNMINGTKLLNVTKMTRGRRDGILRSEKVREVVKIGSMHLKGVWIPFERAYILAQREQILDHLYPLF
 
>8847_CANGL XP_448847 224..297
 
SVVRRADNDMINGTKLLNVTKMTRGKRDGILRSEKYRKVVKIGSMHLKGVWIPFERALFIAKREKIVDLLYPLF
 
>5499_YARLI XP_505499 080..165
 
IIWDYHTGYVHLTGLWKAIGNSKADIVKLIDNSPDLEAVIRRVRGGYLKIQGTWVPYDIARALASRTCYFIRFALIPLFGQDFPGT
 
>5299_KLULA XP_455299 386..459
 
SVVRRADNDMINGTKLLNVTRMTRGRRDGILKAEKIRHVVKIGSMHLKGVWIPFERALVMAQREKIVDLLYALF
 
>0305_GIBZE XP_390305 226..298
 
VARREDNHMINGTKLLNVAGMTRGRRDGILKSEKVRHVVKIGPMHLKGVWIPYDRALDFANKEKITELLYPLF
 
>0837_NEUCR XP_960837 139..211
 
VARREDNAMINGTKLLNVAGMTRGRRDGILKSEKVRHVVKIGPMHLKGVWIPFERALDFANKEKITELLYPLF
 
>8552_MAGGR XP_368552 127..199
 
VARREDNHMINGTKLLNVAGMTRGRRDGILKSEKMRHVVKIGPMHLKGVWIPFERALDFANKEKITELLYPLF
 
>0447_DEBHA XP_460447 213..285
 
VSRREDTNYVNGTKLLNVAGMTRGKRDGILKTEKTKSVVKVGAMNLKGVWIPFERASEIARNEGIDGLLYPLF
 
>9978_GIBZE XP_389978 139..218
 
AVMWDYNIGLVRMTPFFKCRGYGKTIPAKMLGLNPGLKEITHSITGGSIAAQGYWMPYRCAKAICATFCHPIAGALIPIF
 
>1513_CANAL XP_711513 469..541
 
VSRREDTNYINGTKLLNVIGMTRGKRDGILKTEKIKNVVKVGSMNLKGVWIPFDRAYEIARNEGVDSLLYPLF
 
>6132_SCHPO NP_596132 088..165
 
LRRCPDSYFNISQILRLAGTSSSENAKELDDIIESGDYENVDSKHPQIDGVWVPYDRAISIAKRYGVYEILQPLISFN
 
>1244_ASPFU XP_751244 151..230
 
VMWDYNIGLVRTTHLFKCNDYSKMLNANPGLREICHSITGGALAAQGYWMPYEAAKAVAATFCWKIRHALTPLFGLDFPS
 
>0925_USTMA XP_760925 057..143
 
TMMIDVDTSFVRFTSITQALGKNKVNFGRLVKTCPALDPHITKLKGGYLSIQGTWLPFDLAKELSRRIAWEIRDHLVPLFGYDFPST
 
>2599_ASPTE XP_001212599 130..218
 
IMWDYNIGLVRTTPLFRSQNYSKTTPAKVLDANPGLREISHSITGGAIVAQDKPGYWIPFEAAKAVAATFCWRIRYALTPIFGLDFPSQ
 
>9773_DEBHA XP_459773 187..274
 
IIWDYETGFVHLTGIWKASINDEVNTHRNLKADIVKLLESTPKQYHQHIKRIRGGFLKIQGTWLPFDLCKMLAKRFCYHIRFQLIPIF
 
>0918_CANAL XP_710918 256..352
 
VIWDYETGWVHLTGIWKASLTIDGSNVSPSHLKADIVKLLESTPKEYQQYIKRIRGGFLKIQGTWLPYKLCKILARRFCYYLRYSLIPIFGTDFPDS
 
>9901_DEBHA XP_459901 067..158
 
ILRRVQDSYINISQLFSILLKIGHLSEAQLTNFLNNEILTNTQYLSSGGSNPQFNDLRNHEVRDLRGLWIPYDRAVSLALKFDIYELAKSLF
 
>7766_ASPNI XP_657766 089..163
 
LMRRSKDGYVSATGMFKIAFPWAKLEEERSEREYLKTRPETSEDEIAGNVWISPVLALELAAEYKMYDWVRALLD
 
>5459_GIBZE XP_385459 077..154
 
LMRRSYDGFVSATGMFKASFPYAEASDEDAERKYIKSLPTTSHEETAGNVWIPPEQALILAEEYKISPWIRALLDPTP
 
>2267_NEUCR XP_962267 085..162
 
LMRRSQDGYISATGMFKATFPYASQEEEEAERKYIKSIPTTSSEETAGNVWIPPEQALILAEEYQITPWIRALLDPSD
 
>3510_ASPFU XP_753510 089..163
 
LMRRSKDGYVSATGMFKIAFPWAKLEEEKAEREYLKTREGTSEDEIAGNIWVSPLLALELAKEYQMYDWVRALLD
 
>3762_MAGGR XP_363762 084..161
 
LMRRSSDGYVSATGMFKATFPYADAEDEEAERNYIKSLPATSKEETAGNVWISPDQALALAEEYSIATWIRALLDPTD
 
>3412_CANAL XP_723412 087..178
 
VLRRVQDSFVNVTQLFQILIKLEVLPTSQVDNYFDNEILSNLKYFGSSSNTPQYLDLRKHQNIYLQGIWIPYDKAVNLALKFDIYEITKKLF
 
>6166_SCHPO NP_596166 062..140
 
LMRMAKDSSISATSMFRSAFPKATQEEEDLEMRWIRDNLNPIEDKRVAGLWVPPADALALAKDYSMTPFINALLEASST
 
>XBP1_SACCE NP_012165 314..400
 
RDLICQSYKDFLINELGPDQIDLPNLNPANFTKRIRGGYIKIQGTWLPMEISRLLCLRFCFPIRYFLVPIFGPDFPKDCESWYLAHQ
 
>6355_ASPTE XP_001216355 084..167
 
TYFLMDGYVSATGMFKIAFPWAKLDEERSEREYLKSREETSEDEIAGNVWISPKLALELAGEYQMYNWVRALLDPTDIVQSPS
 
>9301_MAGGR XP_369301 092..188
 
EEYTVMWDYGCGLVRMTHFFKCRGYTKTVPGKVLNQNHGLKDITYSITGGSISAQESPNFGRMVIDRELVAHATREAESMYGRSMQAQAQQQGPLR
 
>5262_KLULA XP_455262 301..388
 
QQKWNKWFQRESFSTYIDLHWHKLNPTLSTLLGQSYDAKIPFERMVKRIRGGYIKIQGTWLPYPVSKELCSRFCYPLRYLLVPLFGPDFPEKCEYWY
 
>3869_EREGO NP_983869 277..365
 
YTDVHWNQVDPTWKQRLCRLYQQEKNLDFTPEFQDCYKRIRGGYIKIQGTWLPMEICKRLCIRFCFPIRYFLVPIFGEGFLQECHNWYF
 
>6482_CANGL XP_446482 300..390
 
SVNYLDFHWFDISEKVRSQIFEQFKQHLEKDRNVDCSTIPKAEEYIQRIRGGYIKIQGTWVPWYIAKLICIRFCFPIRYLLVPIFGEQFPV
 

Latest revision as of 06:51, 26 September 2020

Reference APSES domains



The species used on this page are not the current set of reference species. Proceed with caution.


Sequences of APSES domains in the fungal reference species - domain definition, PSI-BLAST search, and header editing.


The APSES domain proteins were determined with a PSI-BLAST search in the refseq database, using 1BM8_A as the search sequence, and restricting the search to the Reference species for fungi.




Executing the PSI-BLAST search

Defining the APSES Domain sequence

The APSES domain "proper"
  1. Navigate to the NCBI BLAST page, accessed protein BLAST;
  2. Follow the link to protein BLAST and enter the yeast Mbp1 refseq ID NP_010227 into the input form;
  3. Select the PHI-BLAST algorithm to search for domains in the sequence and Run BLAST;
  4. Click on the graphical summary of the result to access the CDD conserved domains report for the sequence;
  5. Click on the (+) sign next to the link to KilA-N(pfam 04383) domain to display the query/profile alignment. This is what it looks like:
                          10        20        30        40        50        60        70        80
                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 6320147     19 IHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQ---------------GGFGKYQGTWVPLNIA 83
Cdd:pfam04383   3 YNDFEIIIRRDKDGYINATKLCKAAGAKGKRFRNWLRLESTKELIEELSkennpdkliiienrkGKGGRLQGTYVHPDLA 82


                          90
                  ....*....|....
gi 6320147     84 KQLA----EKFSVY 93
Cdd:pfam04383  83 LAIAswisPEFALK 96

This gives us the following APSES domain sequence:

>Yeast Mbp1 APSES domain (AA 19..93 of NP_010227)
IHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQG 
GFGKYQGTWVPLNIAKQLAEKFSVY

Searching for APSES domains

A PSI-BLAST search was executed, searching in the refseq subset of the NCBI protein database and restricting the species to the six fungal reference species plus Escherichia coli. The latter was chosen to retrieve the KilA-N domain sequence which we need as an outgroup for phylogenetic analysis.

The search converged after 5 iterations in which matches of less than 80% of the query length were manually removed, even if they had low E-values. Also, care was taken not to include false positives and thus to avoid profile corruption, and hits with E > 10-4 were also removed. The check-boxes next to the alignments were used to select sequences with > 80% coverage to the query and only the highest-scoring KilA-N domain protein was kept. Clicking on Get selected sequences created a results page of 27 sequences. These were then displayed in a FASTA(text) format and their headers were slightly edited to create the dataset Reference APSES proteins (reference species).

Constructing the multi-FASTA file

A multi-FASTA file is the default input format for many MSA programs, it is simply a file that contains more than one FASTA formatted sequence. To generate the multi-FASTA file of APSES domains, we could have simply edited the full length proteins manually. But there is a simpler way to achieve this. The PSI-BLAST search has already defined the sequences from each source protein that are similar to the APSES search profile. We only need to extract them in a convenient way from the search results. NCBI offers a number of options to format the BLAST result page: they are presented from a link at the top of the BLAST results page: "Formatting options": the principal options for the format are:

  • Pairwise: the default
  • Pairwise with identities: showing only differences to the query sequence
  • query anchored with/without identities: looks something like a multiple sequence alignment, hyphens for gaps, insertions relative to the query are displayed below the sequence
  • flat-query anchored with/without identitites: This now looks like a multiple sequence alignment (in fact it is one - all sequences aligned to the profile).
  • hit-table: this gives only the numerical parameters describing the quality of the matches.

When we select the Flat-query anchored with letters for identitites option, it is reasonably straightforward to obtain the aligned sequences, copy and paste them into a Word document and convert that into a multi-FASTA format with a few Edit > Replace commands.

Renaming sequences

To make the interpretation of alignments and gene trees easier, all Saccharomyces cerevisiaea sequences were labelled with their gene name (e.g. Sok2_SACCE). Sequences that are presumed to be functionally equivalent orthologues to Mbp1 were identified through the Reciprocal Best Match (RBM) criterion and labeled as Mbp1_NNNNN. All other sequences were named APS1_, APS2_, APS3_ ... - as required. (e.g. APS1_USTMA). There is no further significance in the numbers, i.e. APS1_USTMA is not necessarily an RBM to APS1_SCHPO. Note that such relabeling of sequences does not change the data or its interpretation, it is just helpful to interpret the tree.

The final 27 APSES domain reference sequences

>KILA_ESCCO ZP_07189117 KilA-N domain protein
IDGEIIHLRAKDGYINATSMCRTAGKLLSDYTRLKTTQEFFDELSRDMGIPISELIQSFKGGRPENQGTW
VHPDIAINLAQ

>MBP1_SACCE NP_010227 Mbp1
IHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIAKQLAE
KFSVY

>MBP1_USTMA XP_762343 UM06196
IINNVAVMRRRSDDWLNATQILKVVGLDKPQRTRVLEREIQKGIHEKVQGGYGKYQGTWIPLDVAIELAE
RYNI

>MBP1_NEUCR XP_955821 NCU07246
VMRRRHDDWVNATHILKAAGFDKPARTRILEREVQKDTHEKIQGGYGRYQGTWIPLEQAEALARRNNIY

>MBP1_ASPNI XP_660758.1  AN3154
IGTDSVMRRRSDDWINATHILKVAGFDKPARTRILEREVQKGVHEKVQGGYGKYQGTWIPLQEGRQLAER
NNI

>MBP1_SCHPO NP_593032 MBF transcription factor complex subunit Res2
IKGVSVMRRRRDSWLNATQILKVADFDKPQRTRVLERQVQIGAHEKVQGGYGKYQGTWVPFQRGVDLATK
YKV

>MBP1_CANAL XP_723071 potential DNA binding component of MBF
VTSEGPIMRRKKDSWINATHILKIAKFPKAKRTRILEKDVQTGIHEKVQGGYGKYQGTYVPLDLGAAIAR
NFGVY

>APS1_NEUCR XP_962967 NCU07587
VNNVAVMRRQKDGWVNATQILKVANIDKGRRTKILEKEIQIGEHEKVQGGYGKYQGTWIPFERGLEVCRQ
YGV

>APS1_CANAL XP_712970 potential DNA binding component of SBF
MMNESSIMRRCKDDWVNATQILKCCNFPKAKRTKILEKGVQQGLHEKVQGGFGRFQGTWIPLEDARKLAK
TYGV

>APS1_SCHPO NP_595496 MBF transcription factor complex subunit Res1
INGFPLMKRCHDNWLNATQILKIAELDKPRRTRILEKFAQKGLHEKIQGGCGKYQGTWVPSERAVELAHE
YNVF

>APS2_ASPNI XP_664319 hypothetical protein AN6715
VNGVAVMKRRSDGWLNATQILKVAGVVKARRTKTLEKEIAAGEHEKVQGGYGKYQGTWVNYQRGVELCRE
YHV

>APS2_USTMA XP_761485 UM05338
VRGIAVMRRRGDGWLNATQILKIAGIEKTRRTKILEKSILTGEHEKIQGGYGKFQGTWIPLQRAQQVAAE
YNV

>SWI4_SACCE NP_011036 Swi4p
TKIVMRRTKDDWINITQVFKIAQFSKTKRTKILEKESNDMQHEKVQGGYGRFQGTWIPLDSAKFLVNKYE
I

>APS3_SCHPO NP_596132 MBF transcription factor complex subunit Cdc10
GDNVALRRCPDSYFNISQILRLAGTSSSENAKELDDIIESGDYENVDSKHPQIDGVWVPYDRAISIAKR
YGVY

>APS3_CANAL XP_714237 potential DNA binding regulator of filamentous growth
NNVSVVRRADNNMINGTKLLNVAQMTRGRRDGILKSEKVRHVVKIGSMHLKGVWIPFERALAMAQREQI

>SOK2_SACCE NP_013729 Sok2p
NGISVVRRADNDMVNGTKLLNVTKMTRGRRDGILKAEKIRHVVKIGSMHLKGVWIPFERALAIAQREKI

>APS3_ASPNI XP_663440 STUA CELL PATTERN FORMATION-ASSOCIATED PROTEIN
GVCVARREDNGMINGTKLLNVAGMTRGRRDGILKSEKVRNVVKIGPMHLKGVWIPFDRALEFANKEKI

>PHD1_SACCE NP_012881 Phd1p
NGISVVRRADNNMINGTKLLNVTKMTRGRRDGILRSEKVREVVKIGSMHLKGVWIPFERAYILAQREQI

>APS4_CANAL XP_710918 CaO19.5210
LNNHWVIWDYETGWVHLTGIWKASLTIDGSNVSPSHLKADIVKLLESTPKEYQQYIKRIRGGFLKIQGTW
LPYKLCKILARRFCYY

>APS3_NEUCR XP_960837 NCU01414
GICVARREDNAMINGTKLLNVAGMTRGRRDGILKSEKVRHVVKIGPMHLKGVWIPFERALDFANKEKI

>APS5_CANAL XP_711513 potential DNA binding protein
NILVSRREDTNYINGTKLLNVIGMTRGKRDGILKTEKIKNVVKVGSMNLKGVWIPFDRAYEIARNEGV

>APS4_ASPNI XP_663009 AN5405
TVMWDYNIGLVRTTHLFKCNDYSKTTPAKMLNQNPGLRDICHSITGGALAAQGYWMPYEAAKAIAATFC

>APS3_USTMA XP_760925 UM04778
VRGHTMMIDVDTSFVRFTSITQALGKNKVNFGRLVKTCPALDPHITKLKGGYLSIQGTWLPFDLAKELSR
R

>APS4_SCHPO NP_596166
HFLMRMAKDSSISATSMFRSAFPKATQEEEDLEMRWIRDNLNPIEDKRVAGLWVPPADALALAKDYSM

>APS6_CANAL XP_723412 potential transcriptional co-activator
HGEIIVLRRVQDSFVNVTQLFQILIKLEVLPTSQVDNYFDNEILSNLKYFGSSSNTPQYLDLRKHQNIYL
QGIWIPYDKAVNLALKFDIY

>APS4_NEUCR XP_962267 NCU06560
FLMRRSQDGYISATGMFKATFPYASQEEEEAERKYIKSIPTTSSEETAGNVWIPPEQALILAEEYQI

>APS5_ASPNI XP_657766 AN0162
TYFLMRRSKDGYVSATGMFKIAFPWAKLEEERSEREYLKTRPETSEDEIAGNVWISPVLALELAAEYKMY


Mbp1 orthologue reference alignment

This is a reference alignment of the APSES domains of those proteins that fulfilled the Reciprocal Best Match criterion with yeast Mbp1.

CLUSTAL format alignment by MAFFT L-INS-1 (v6.850b)


MBP1_SACCE      IHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIAKQLAEKFSVY
MBP1_CANAL      VTSEGPIMRRKKDSWINATHILKIAKFPKAKRTRILEKDVQTGIHEKVQGGYGKYQGTYVPLDLGAAIARNFGVY
MBP1_USTMA      IINNVAVMRRRSDDWLNATQILKVVGLDKPQRTRVLEREIQKGIHEKVQGGYGKYQGTWIPLDVAIELAERYNI-
MBP1_NEUCR      ------VMRRRHDDWVNATHILKAAGFDKPARTRILEREVQKDTHEKIQGGYGRYQGTWIPLEQAEALARRNNIY
MBP1_ASPNI      -IGTDSVMRRRSDDWINATHILKVAGFDKPARTRILEREVQKGVHEKVQGGYGKYQGTWIPLQEGRQLAERNNI-
MBP1_SCHPO      -IKGVSVMRRRRDSWLNATQILKVADFDKPQRTRVLERQVQIGAHEKVQGGYGKYQGTWVPFQRGVDLATKYKV-

Sample Phylip format

Here is a sample set of the APSES domain sequences to illustrate the phylip format. Sequences were aligned with MAFFT and edited in JALVIEW to remove gapped regions and frayed termini. The FASTA sequences were converted with the Readseq server.

 27 78
KILA_ESCCO   DGEIIHLRAK DGYINATSMC RT-A-GKLLS DYTRLKLSRD M-GIPIS-IQ
MBP1_SACCE   STGSIMKRKK DDWVNATHIL KA-A-NFAKA KRTRI-LEKE V-LKETH--E
MBP1_USTMA   NNVAVMRRRS DDWLNATQIL KV-V-GLDKP QRTRV-LERE I-QKGIH--E
MBP1_NEUCR   ----VMRRRH DDWVNATHIL KA-A-GFDKP ARTRI-LERE V-QKDTH--E
MBP1_ASPNI   GTDSVMRRRS DDWINATHIL KV-A-GFDKP ARTRI-LERE V-QKGVH--E
MBP1_SCHPO   KGVSVMRRRR DSWLNATQIL KV-A-DFDKP QRTRV-LERQ V-QIGAH--E
MBP1_CANAL   SEGPIMRRKK DSWINATHIL KI-A-KFPKA KRTRI-LEKD V-QTGIH--E
APS1_NEUCR   NNVAVMRRQK DGWVNATQIL KV-A-NIDKG RRTKI-LEKE I-QIGEH--E
APS1_CANAL   NESSIMRRCK DDWVNATQIL KC-C-NFPKA KRTKI-LEKG V-QQGLH--E
APS1_SCHPO   NGFPLMKRCH DNWLNATQIL KI-A-ELDKP RRTRI-LEKF A-QKGLH--E
APS2_ASPNI   NGVAVMKRRS DGWLNATQIL KV-A-GVVKA RRTKT-LEKE I-AAGEH--E
APS2_USTMA   RGIAVMRRRG DGWLNATQIL KI-A-GIEKT RRTKI-LEKS I-LTGEH--E
SWI4_SACCE   -TKIVMRRTK DDWINITQVF KI-A-QFSKT KRTKI-LEKE S-NDMQH--E
APS3_SCHPO   GDNVALRRCP DSYFNISQIL RL-A-GTSSS ENAKE-LDDI I-ESGDY--E
APS3_CANAL   NNVSVVRRAD NNMINGTKLL NV-A-QMTRG RRDGI-LKSE ----KVR--H
SOK2_SACCE   NGISVVRRAD NDMVNGTKLL NV-T-KMTRG RRDGI-LKAE ----KIR--H
APS3_ASPNI   -GVCVARRED NGMINGTKLL NV-A-GMTRG RRDGI-LKSE ----KVR--N
PHD1_SACCE   NGISVVRRAD NNMINGTKLL NV-T-KMTRG RRDGI-LRSE ----KVR--E
APS4_CANAL   NNHWVIWDYE TGWVHLTGIW KA-SLSHLKA DIVKL-LEST PKEYQQY-IK
APS3_NEUCR   -GICVARRED NAMINGTKLL NV-A-GMTRG RRDGI-LKSE ----KVR--H
APS5_CANAL   -NILVSRRED TNYINGTKLL NV-I-GMTRG KRDGI-LKTE ----KIK--N
APS4_ASPNI   ---TVMWDYN IGLVRTTHLF KC-N-DYSKT TPAKM-LNQN PGLRDIC--H
APS3_USTMA   RGHTMMIDVD TSFVRFTSIT QA-L-GKNKV NFGRL-VKTC P-ALDPH-IT
APS4_SCHPO   --HFLMRMAK DSSISATSMF RS-A-FPKAT QEEED-LEMR WIRDNLN---
APS6_CANAL   GEIIVLRRVQ DSFVNVTQLF QILE-VLPTS QVDNY-FDNE I-LSNLKYLR
APS4_NEUCR   ---FLMRRSQ DGYISATGMF KA-T-FPYAS QEEEE-AERK YIKSIPT---
APS5_ASPNI   -TYFLMRRSK DGYVSATGMF KI-A-FPWAK LEEER-SERE YLKTRPE---

             SFKGGRPENQ GTWVHPDIAI NLAQ----
             KVQGGFGKYQ GTWVPLNIAK QLAEKFSV
             KVQGGYGKYQ GTWIPLDVAI ELAERYNI
             KIQGGYGRYQ GTWIPLEQAE ALARRNNI
             KVQGGYGKYQ GTWIPLQEGR QLAERNNI
             KVQGGYGKYQ GTWVPFQRGV DLATKYKV
             KVQGGYGKYQ GTYVPLDLGA AIARNFGV
             KVQGGYGKYQ GTWIPFERGL EVCRQYGV
             KVQGGFGRFQ GTWIPLEDAR KLAKTYGV
             KIQGGCGKYQ GTWVPSERAV ELAHEYNV
             KVQGGYGKYQ GTWVNYQRGV ELCREYHV
             KIQGGYGKFQ GTWIPLQRAQ QVAAEYNV
             KVQGGYGRFQ GTWIPLDSAK FLVNKYEI
             NVDSKHPQID GVWVPYDRAI SIAKRYGV
             VVKIGSMHLK GVWIPFERAL AMAQREQI
             VVKIGSMHLK GVWIPFERAL AIAQREKI
             VVKIGPMHLK GVWIPFDRAL EFANKEKI
             VVKIGSMHLK GVWIPFERAY ILAQREQI
             RIRGGFLKIQ GTWLPYKLCK ILARRFCY
             VVKIGPMHLK GVWIPFERAL DFANKEKI
             VVKVGSMNLK GVWIPFDRAY EIARNEGV
             SITGGALAAQ GYWMPYEAAK AIAATFC-
             KLKGGYLSIQ GTWLPFDLAK ELSRR---
             --PIEDKRVA GLWVPPADAL ALAKDYSM
             KHQNIY--LQ GIWIPYDKAV NLALKFDI
             --TSSEETAG NVWIPPEQAL ILAEEYQI