Difference between revisions of "Relational database example 2-Perl interface"
m |
|||
Line 15: | Line 15: | ||
<table id="toc" class="toc"><tr><td><div id="toctitle"><span style="font-size:125%; ">Related Pages</span></div> | <table id="toc" class="toc"><tr><td><div id="toctitle"><span style="font-size:125%; ">Related Pages</span></div> | ||
<ul> | <ul> | ||
− | <li><span class="toctext">[[ | + | <li><span class="toctext">[[Data modelling]]</span></li> |
<li><span class="toctext">[[Relational database example 1]]</span></li> | <li><span class="toctext">[[Relational database example 1]]</span></li> | ||
<li><span class="toctext">[[Relational database example 3-join tables]]</span></li> | <li><span class="toctext">[[Relational database example 3-join tables]]</span></li> | ||
Line 30: | Line 30: | ||
| | ||
− | In this continuation of our [[ | + | In this continuation of our [[Relational database example 1|worked example]], we will use a Perl program to further process the input data. An introduction to the use of the Perl DBI and DBD::mysql modules can be found in the [[Perl_MySQL_introduction]]. Recall that our MySQL database of ''E. coli'' interactions contains two tables - one for genes and one for interactions - that we parsed out from a tab-delimited file. A number of interactions were discarded - all those that did not have a gene name associated with them. Here we will use Perl code to parse the file, we can do much more elaborate processing in "real" code than in our concise one-line example. Then we will insert the missing information into the database. |
=== Re-parsing the input === | === Re-parsing the input === | ||
Line 186: | Line 186: | ||
Next: | Next: | ||
− | *[[ | + | *[[Relational_database_example_3-join_tables|Creating a new table to fix the datamodel]] |
Latest revision as of 12:51, 27 September 2015
Relational database example 3-Perl interface
Summary ...
Contents
Related Pages
|
Introductory reading
In this continuation of our worked example, we will use a Perl program to further process the input data. An introduction to the use of the Perl DBI and DBD::mysql modules can be found in the Perl_MySQL_introduction. Recall that our MySQL database of E. coli interactions contains two tables - one for genes and one for interactions - that we parsed out from a tab-delimited file. A number of interactions were discarded - all those that did not have a gene name associated with them. Here we will use Perl code to parse the file, we can do much more elaborate processing in "real" code than in our concise one-line example. Then we will insert the missing information into the database.
Re-parsing the input
This code example opens the input file and looks for records that we discarded earlier. it prints the missing information to the terminal.
<perl>
- !/usr/bin/perl
use strict; use warnings;
my $source = "mpidbRaw.txt"; open(IN, $source) or die ("Aack! - $!"); my @a; my @interactions = ( ( ) ); my %genes;
while (my $line = <IN>) {
$line =~ s/uniprotkb://g; @a = split("\t", $line); if(! $a[4] || !$a[5]) { push (@interactions, [reParse(@a)]); # push (geneA, geneB, weight) tuple $genes{$interactions[-1][0]}++; # get gene A from last row of array $genes{$interactions[-1][1]}++; # get gene B from last row of array }
}
print scalar(keys(%genes)), " Genes\n=========\n"; print join("\n", sort(keys(%genes))); print "\n\n", scalar(@interactions), " Interactions\n===============\n";
for (my $i=0; $i<scalar(@interactions); $i++) {
my @int_i = @{$interactions[$i]}; print join(", ", @int_i), "\n";
}
exit ();
sub reParse {
my @a = @_; my @i; my @b;
if(! $a[4]) { $i[0] = $a[0]; } else { $i[0] = $a[4]; } if(! $a[5]) { $i[1] = $a[1]; } else { $i[1] = $a[5]; } @b = split(":",$a[13]); $i[2] = scalar(@b)-1; return(@i);
} </perl>
Inserting into the database
We now have two datastructures that hold information we wish to insert into the database. We need to do the following:
- connect to the database
- if any of the genes is not yet in the gene table, insert it
- if any of the interactions is not yet in the interaction table, insert it
#!/usr/bin/perl use strict; use warnings;
<perl>use DBI;</perl>
my $source = "mpidbRaw.txt"; open(IN, $source) or die ("Aack! - $!"); my @a; my @interactions = ( ( ) ); my %genes; while (my $line = <IN>) { $line =~ s/uniprotkb://g; @a = split("\t", $line); if(! $a[4] || !$a[5]) { push (@interactions, [reParse(@a)]); # push (geneA, geneB, weight) tuple $genes{$interactions[-1][0]}++; # get gene A from last row of array $genes{$interactions[-1][1]}++; # get gene B from last row of array } }
<perl>
- connect to the database
my $host = 'localhost'; # MySQL Host my $user = 'ecodev'; # your username my $password = 'Nixo$poil'; # your password my $database_name = 'ecoint'; # name of database
my $dbh = DBI->connect("dbi:mysql:database=$database_name;host=$host", $user, $password)
or die "Couldn't connect to database: $DBI::errstr\n";
if (!$dbh->do("USE $database_name;")) {
die "Panic - Database \"$database_name\" appears not to exist. Aborting.\n";
}
- if any of the genes is not yet in the gene table, insert it
my $countGenes=0; foreach my $gene (keys(%genes)) {
my $sqlCommand = "SELECT * FROM gene WHERE Gene_Name='" . $gene . "';"; if (! int($dbh->do($sqlCommand))) { $dbh->do("INSERT INTO gene (Gene_Name) VALUES ('" . $gene . "');"); $countGenes++; # last; }
} print "Inserted $countGenes new genes into table.\n";
- insert each interaction into its table (using the statement handler method)
my $countIntx = 0; my $sth = $dbh->prepare('INSERT INTO inx (int_A,int_B,weight) VALUES(?,?,?);');
for (my $i=0; $i<scalar(@interactions); $i++) {
my @int_i = @{$interactions[$i]}; my $sqlCommand = "SELECT * FROM inx WHERE int_A='" . $int_i[0] . "' AND int_B='" . $int_i[1] . "' AND weight = " . $int_i[2] . ";"; if (! int($dbh->do($sqlCommand))) { $sth->execute( $int_i[0], $int_i[1], $int_i[2] ) or die ( "Error on INSERT of interaction: $DBI::errstr\n" ); $countIntx++; # last; }
} print "Inserted $countIntx new interactions into table.\n";
exit(); </perl>
sub reParse { my @a = @_; my @i; my @b; if(! $a[4]) { $i[0] = $a[0]; } else { $i[0] = $a[4]; } if(! $a[5]) { $i[1] = $a[1]; } else { $i[1] = $a[5]; } @b = split(":",$a[13]); $i[2] = scalar(@b)-1; return(@i); }
Further reading
Next:
Further reading and resources