Relational database example 2-Perl interface

From "A B C"
Revision as of 16:12, 16 September 2012 by Boris (talk | contribs) (Created page with "<div id="APB"> <div class="b1"> Relational database example 3-Perl interface </div> {{fix}} Summary ... __TOC__ <table id="toc" class="toc"><tr><td><div id="toctitle"><...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Relational database example 3-Perl interface


The contents of this page has recently been imported from an older version of this Wiki. This page may contain outdated information, information that is irrelevant for this Wiki, information that needs to be differently structured, outdated syntax, and/or broken links. Use with caution!


Summary ...


Related Pages


 

Introductory reading



  In this continuation of our worked example, we will use a Perl program to further process the input data. An introduction to the use of the Perl DBI and DBD::mysql modules can be found in the Perl_MySQL_introduction. Recall that our MySQL database of E. coli interactions contains two tables - one for genes and one for interactions - that we parsed out from a tab-delimited file. A number of interactions were discarded - all those that did not have a gene name associated with them. Here we will use Perl code to parse the file, we can do much more elaborate processing in "real" code than in our concise one-line example. Then we will insert the missing information into the database.

Re-parsing the input

This code example opens the input file and looks for records that we discarded earlier. it prints the missing information to the terminal.

<perl>

  1. !/usr/bin/perl

use strict; use warnings;

my $source = "mpidbRaw.txt"; open(IN, $source) or die ("Aack! - $!"); my @a; my @interactions = ( ( ) ); my %genes;

while (my $line = <IN>) {

   $line =~ s/uniprotkb://g;
   @a = split("\t", $line);
   if(! $a[4] || !$a[5]) {
       push (@interactions, [reParse(@a)]); # push (geneA, geneB, weight) tuple
       $genes{$interactions[-1][0]}++;      # get gene A from last row of array
       $genes{$interactions[-1][1]}++;      # get gene B from last row of array
   }

}

print scalar(keys(%genes)), " Genes\n=========\n"; print join("\n", sort(keys(%genes))); print "\n\n", scalar(@interactions), " Interactions\n===============\n";

for (my $i=0; $i<scalar(@interactions); $i++) {

   my @int_i = @{$interactions[$i]};
   print join(", ", @int_i), "\n";

}

exit ();


sub reParse {

   my @a = @_;
   my @i;
   my @b;
   if(! $a[4]) { $i[0] = $a[0]; } else { $i[0] = $a[4]; }
   if(! $a[5]) { $i[1] = $a[1]; } else { $i[1] = $a[5]; }
   @b = split(":",$a[13]);
   $i[2] = scalar(@b)-1;
   
   return(@i);

} </perl>


Inserting into the database

We now have two datastructures that hold information we wish to insert into the database. We need to do the following:

  1. connect to the database
  2. if any of the genes is not yet in the gene table, insert it
  3. if any of the interactions is not yet in the interaction table, insert it
#!/usr/bin/perl
use strict;
use warnings;

<perl>use DBI;</perl>

my $source = "mpidbRaw.txt"; 
open(IN, $source) or die ("Aack! - $!");
my @a;
my @interactions = ( ( ) );
my %genes;

while (my $line = <IN>) {
    $line =~ s/uniprotkb://g;
    @a = split("\t", $line);
    if(! $a[4] || !$a[5]) {
        push (@interactions, [reParse(@a)]); # push (geneA, geneB, weight) tuple
        $genes{$interactions[-1][0]}++;      # get gene A from last row of array
        $genes{$interactions[-1][1]}++;      # get gene B from last row of array
    }
}

<perl>

  1. connect to the database

my $host = 'localhost'; # MySQL Host my $user = 'ecodev'; # your username my $password = 'Nixo$poil'; # your password my $database_name = 'ecoint'; # name of database

my $dbh = DBI->connect("dbi:mysql:database=$database_name;host=$host", $user, $password)

   or die "Couldn't connect to database: $DBI::errstr\n";

if (!$dbh->do("USE $database_name;")) {

   die "Panic - Database \"$database_name\" appears not to exist. Aborting.\n";

}

  1. if any of the genes is not yet in the gene table, insert it

my $countGenes=0; foreach my $gene (keys(%genes)) {

   my $sqlCommand = "SELECT * FROM gene WHERE Gene_Name='" . $gene . "';";
   
   if (! int($dbh->do($sqlCommand))) {
       $dbh->do("INSERT INTO gene (Gene_Name) VALUES ('" . $gene . "');");
       $countGenes++;
       # last;
   }

} print "Inserted $countGenes new genes into table.\n";


  1. insert each interaction into its table (using the statement handler method)

my $countIntx = 0; my $sth = $dbh->prepare('INSERT INTO inx (int_A,int_B,weight) VALUES(?,?,?);');

for (my $i=0; $i<scalar(@interactions); $i++) {

   my @int_i = @{$interactions[$i]};
   
   my $sqlCommand = "SELECT * FROM inx WHERE int_A='" . $int_i[0] .
                                      "' AND int_B='" . $int_i[1] .
                                    "' AND weight = " . $int_i[2] . ";";
   
       if (! int($dbh->do($sqlCommand))) {
               $sth->execute( $int_i[0], $int_i[1], $int_i[2] )
       or die ( "Error on INSERT of interaction: $DBI::errstr\n" );
       $countIntx++;
       # last;
   }

} print "Inserted $countIntx new interactions into table.\n";

exit(); </perl>

sub reParse {
    my @a = @_;
    my @i;
    my @b;

    if(! $a[4]) { $i[0] = $a[0]; } else { $i[0] = $a[4]; }
    if(! $a[5]) { $i[1] = $a[1]; } else { $i[1] = $a[5]; }
    @b = split(":",$a[13]);
    $i[2] = scalar(@b)-1;
    
    return(@i);
}

Further reading

Next:


   

Further reading and resources