Difference between revisions of "Relational database example 2-Perl interface"

Latest revision as of 12:51, 27 September 2015

Relational database example 3-Perl interface

The contents of this page has recently been imported from an older version of this Wiki. This page may contain outdated information, information that is irrelevant for this Wiki, information that needs to be differently structured, outdated syntax, and/or broken links. Use with caution!

Summary ...

Introductory reading

In this continuation of our worked example, we will use a Perl program to further process the input data. An introduction to the use of the Perl DBI and DBD::mysql modules can be found in the Perl_MySQL_introduction. Recall that our MySQL database of E. coli interactions contains two tables - one for genes and one for interactions - that we parsed out from a tab-delimited file. A number of interactions were discarded - all those that did not have a gene name associated with them. Here we will use Perl code to parse the file, we can do much more elaborate processing in "real" code than in our concise one-line example. Then we will insert the missing information into the database.

Re-parsing the input

This code example opens the input file and looks for records that we discarded earlier. it prints the missing information to the terminal.

<perl>

!/usr/bin/perl

use strict; use warnings;

my $source = "mpidbRaw.txt"; open(IN, $source) or die ("Aack! - $!"); my @a; my @interactions = ( ( ) ); my %genes;

while (my $line = <IN>) {

   $line =~ s/uniprotkb://g;
   @a = split("\t", $line);
   if(! $a[4] || !$a[5]) {
       push (@interactions, [reParse(@a)]); # push (geneA, geneB, weight) tuple
       $genes{$interactions[-1][0]}++;      # get gene A from last row of array
       $genes{$interactions[-1][1]}++;      # get gene B from last row of array
   }

}

print scalar(keys(%genes)), " Genes\n=========\n"; print join("\n", sort(keys(%genes))); print "\n\n", scalar(@interactions), " Interactions\n===============\n";

for (my $i=0; $i<scalar(@interactions); $i++) {

   my @int_i = @{$interactions[$i]};
   print join(", ", @int_i), "\n";

}

exit ();

sub reParse {

   my @a = @_;
   my @i;
   my @b;

   if(! $a[4]) { $i[0] = $a[0]; } else { $i[0] = $a[4]; }
   if(! $a[5]) { $i[1] = $a[1]; } else { $i[1] = $a[5]; }
   @b = split(":",$a[13]);
   $i[2] = scalar(@b)-1;
   
   return(@i);

} </perl>

Inserting into the database

We now have two datastructures that hold information we wish to insert into the database. We need to do the following:

connect to the database
if any of the genes is not yet in the gene table, insert it
if any of the interactions is not yet in the interaction table, insert it

#!/usr/bin/perl
use strict;
use warnings;

my $source = "mpidbRaw.txt"; 
open(IN, $source) or die ("Aack! - $!");
my @a;
my @interactions = ( ( ) );
my %genes;

while (my $line = <IN>) {
    $line =~ s/uniprotkb://g;
    @a = split("\t", $line);
    if(! $a[4] || !$a[5]) {
        push (@interactions, [reParse(@a)]); # push (geneA, geneB, weight) tuple
        $genes{$interactions[-1][0]}++;      # get gene A from last row of array
        $genes{$interactions[-1][1]}++;      # get gene B from last row of array
    }
}

<perl>

connect to the database

my $host = 'localhost'; # MySQL Host my $user = 'ecodev'; # your username my $password = 'Nixo$poil'; # your password my $database_name = 'ecoint'; # name of database

my $dbh = DBI->connect("dbi:mysql:database=$database_name;host=$host", $user, $password)

   or die "Couldn't connect to database: $DBI::errstr\n";

if (!$dbh->do("USE $database_name;")) {

   die "Panic - Database \"$database_name\" appears not to exist. Aborting.\n";

}

if any of the genes is not yet in the gene table, insert it

my $countGenes=0; foreach my $gene (keys(%genes)) {

   my $sqlCommand = "SELECT * FROM gene WHERE Gene_Name='" . $gene . "';";
   
   if (! int($dbh->do($sqlCommand))) {
       $dbh->do("INSERT INTO gene (Gene_Name) VALUES ('" . $gene . "');");
       $countGenes++;
       # last;
   }

} print "Inserted $countGenes new genes into table.\n";

insert each interaction into its table (using the statement handler method)

my $countIntx = 0; my $sth = $dbh->prepare('INSERT INTO inx (int_A,int_B,weight) VALUES(?,?,?);');

for (my $i=0; $i<scalar(@interactions); $i++) {

   my @int_i = @{$interactions[$i]};
   
   my $sqlCommand = "SELECT * FROM inx WHERE int_A='" . $int_i[0] .
                                      "' AND int_B='" . $int_i[1] .
                                    "' AND weight = " . $int_i[2] . ";";
   
       if (! int($dbh->do($sqlCommand))) {
               $sth->execute( $int_i[0], $int_i[1], $int_i[2] )
       or die ( "Error on INSERT of interaction: $DBI::errstr\n" );
       $countIntx++;
       # last;
   }

} print "Inserted $countIntx new interactions into table.\n";

exit(); </perl>

sub reParse {
    my @a = @_;
    my @i;
    my @b;

    if(! $a[4]) { $i[0] = $a[0]; } else { $i[0] = $a[4]; }
    if(! $a[5]) { $i[1] = $a[1]; } else { $i[1] = $a[5]; }
    @b = split(":",$a[13]);
    $i[2] = scalar(@b)-1;
    
    return(@i);
}

@@ Line 15: / Line 15: @@
 <table id="toc" class="toc"><tr><td><div id="toctitle"><span style="font-size:125%; ">Related Pages</span></div>
 <ul>
-<li><span class="toctext">[[Relational database principles]]</span></li>
+<li><span class="toctext">[[Data modelling]]</span></li>
 <li><span class="toctext">[[Relational database example 1]]</span></li>
 <li><span class="toctext">[[Relational database example 3-join tables]]</span></li>
@@ Line 30: / Line 30: @@
 &nbsp;
-In this continuation of our [[Database_example|worked example]], we will use a Perl program to further process the input data. An introduction to the use of the Perl DBI and DBD::mysql modules can be found in the [[Perl_MySQL_introduction]]. Recall that our MySQL database of ''E. coli'' interactions contains two tables - one for genes and one for interactions - that we parsed out from a tab-delimited file. A number of interactions were discarded - all those that did not have a gene name associated with them. Here we will use Perl code to parse the file, we can do much more elaborate processing in "real" code than in our concise one-line example. Then we will insert the missing information into the database.
+In this continuation of our [[Relational database example 1|worked example]], we will use a Perl program to further process the input data. An introduction to the use of the Perl DBI and DBD::mysql modules can be found in the [[Perl_MySQL_introduction]]. Recall that our MySQL database of ''E. coli'' interactions contains two tables - one for genes and one for interactions - that we parsed out from a tab-delimited file. A number of interactions were discarded - all those that did not have a gene name associated with them. Here we will use Perl code to parse the file, we can do much more elaborate processing in "real" code than in our concise one-line example. Then we will insert the missing information into the database.
 === Re-parsing the input ===
@@ Line 186: / Line 186: @@
 Next:
-*[[http://biochemistry.utoronto.ca/steipe/abc/index.php/Relational_database_example_3-join_tables|Creating a new table to fix the datamodel]]
+*[[Relational_database_example_3-join_tables|Creating a new table to fix the datamodel]]

Difference between revisions of "Relational database example 2-Perl interface"

Latest revision as of 12:51, 27 September 2015

Contents

Introductory reading

Re-parsing the input

Inserting into the database

Further reading

Further reading and resources

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools