CGI GET

From "A B C"
Jump to navigation Jump to search

CGI GET


The contents of this page has recently been imported from an older version of this Wiki. This page may contain outdated information, information that is irrelevant for this Wiki, information that needs to be differently structured, outdated syntax, and/or broken links. Use with caution!


With the GET method of the CGI interface, a query string is passed to a program in the URL that is requested from the server.



GET

The CGI interface provides the GET method: a simple way to to invoke a CGI program by passing data via a "query string" that is attached to the actual URL of the program. Data is separated from the URL by a question-mark. The elements of this so called "query string" are then made available inside the program, e.g. in a Perl script in the special array @ARGV.

  1. a URL is formatted to request an executable script and contains additional input, separated from the script-name by a question-mark "?". This input is URL-encoded: "+" signs separate elements and reserved characters are replaced by a %-code.
  1. the Web server runs the script, and makes the characters following the "?" available via @ARGV.
  2. the program runs, interprets the elements in ARGV and writes to STDOUT
  3. the Web server accepts the output and sends it to the requesting browser.

Note however that there is a limit to the length of query strings, that is browser dependent. All browsers must handle 255 character long URLs, but modern browsers handle between 2 and 8 kb. That may not be enough for "real" data. Consider the POST method via "forms" as a alternative for transfer of up to 2GB.


Code example

#!/usr/bin/perl
# file: revcon.pl
# passing an argument through the URL- query string
# Boris Steipe
       
use strict;
use warnings;

my $input= join(" ", @ARGV);
$input = uc($input);
$input =~ s/[^ACGTUMRWSYKVHDBN ]/X/g;

if (! $input=~ m/T/) {
       $input =~ tr/[ACGUMRYKVHDB]/[UGCAKYRMBDHV]/; }
else { $input =~ tr/[ACGTMRYKVHDB]/[TGCAKYRMBDHV]/; }

$input = reverse($input);

print "Content-type: text/plain\n\n";
print "$input\n";

exit();


Notes on the code

  • The query string is passed through @ARGV, it may have several elements that were separated by "+", thus we join( ) all elements, separated by a blank space.
  • We use uc( ) to convert the entire string to uppercase.
  • We use s/ / / to replace all characters that are not nucleotide codes with an "X".
  • We use tr/ / / to replace all nucleotide codes with their complement. If there is no "T" in the input, we translate as RNA.
  • The code only returns the requested information, no markup or other structuring tags.


A mini-RPC example

Since there is no markup, the output can be used for a remote procedure call - a function that is executed on a remote machine rather than your own. "Real" remote procedure calls are best done through a perl module such as XML::RPC, but using wget the call can be as simple as in the following example:

#!/usr/bin/perl
# file: miniRPC.pl
# Boris Steipe, 2008

use strict;
use warnings;

my $for ="atgagtaaag+gagaagaact+tttcactgga";
my $rev = `wget -qO - "http://biochemistry.utoronto.ca/steipe/test/revcomp.pl?$for"`;
$for =~ tr/[+]/[ ]/;
chomp($rev);
print "Seq: 5'- $for -3'\n";
print "Rev: 5'- $rev -3'\n";

exit();

You can try this - the script lives on my server.

Notes on the code

  • Note the backticks that execute wget and capture its output.
  • The options passed to wget are: -q (quiet) to suppress its progress output, -O to write to a file and "-" as the filename, to write to STDOUT.
  • Scalar variables such $for are expanded when the back-ticked string is executed.
  • Since we have separated blocks of 10 nucleotides with a "+" i the query string, we translate them to blanks before printing.
  • wget adds "\n" to its output, thus the need to remove it with chomp( )

The CGI "environment"

A running perl program has a number of so-called environment variables automatically available in the hash %ENV, such as its own name, the path to it, the perl version that is running it, etc. CGI adds a few additional variables to the list. We'll use environment variables in a somewhat more verbose version of the code sample above: when no query string is defined, lets create an example string that the user can click on. This emulates the behaviour of many commands, that will print information on proper usage if they are run with arguments.

We need to capture and use the path and name of the running perl script.

Code example: listing all environment variables

#!/usr/bin/perl
# file: printEnv.pl
# Boris Steipe, 2008

use strict;
use warnings;

   print "Content-type: text/plain\n\n"; 

    foreach my $key (sort keys(%ENV)) { 
        print "$key = $ENV{$key}\n"; 
    } 

exit();

Notes on the code

  • keys( ) returns a list (an array) of all keys that are defined in a hash.
  • sort( ) returns an array in sorted order.
  • foreach $var (@array) { ... } is a shorthand notation to iterate over all elements of an array and assign them to a variable. The (fully equivalent) longhand version would be: for ($i=0;$i=scalar(@array);$i++) {$var=$array[$i]; ... }


If you save this file and run it from the command-line, you will get all the variables that perl has defined for a normally running perl script. If you execute this file by typing its URL in a browser, the list of variables will change, as appropriate for a CGI script. Importantly, there is now a variable called SCRIPT_NAME. We'll use this in the code below.

Code example: a fancier version of revcomp.pl

#!/usr/bin/perl
# file: revcomp2.pl
# evaluating an argument passed through the URL- query string
# and using environment variables
# Boris Steipe, 2008
       
use strict;
use warnings;

my $SampleSeq = "atgagtaaag+gagaagaact+tttcactgga";
my $url = "http://" . $ENV{"REMOTE_ADDR"} . $ENV{"SCRIPT_NAME"} . "?" . $SampleSeq;

print "Content-type: text/html\n\n";
print "<html>\n";
print "<head>\n";
print "<title>Reverse Complement</title>\n";
print "</head>\n";
print "<body>\n";
print "<p>\n";

if (! $ENV{"QUERY_STRING"}) {
   print "Enter a short nucleotide sequence as a query string ...";
   print "<p>\n";
   print "Example<br>\n";
   print "<tt><b><a href=\"$url\">$url</a></b></tt>\n";
}
else {
   my $input= join(" ", @ARGV);
   print"<tt>\n";   

   $input = uc($input);
   $input =~ s/[^ACGTUMRWSYKVHDBN ]/X/g;
   print "For: 5'- $input -3'<br>\n";

   if ($input =~ /T/) {
      $input =~ tr/[ACGTUMRYKVHDB]/[TGCAAKYRMBDHV]/;
   }
   else {
      $input =~ tr/[ACGUMRYKVHDB]/[UGCAKYRMBDHV]/;
   }
   
   $input = reverse($input);
   
   print "Rev: 5'- $input -3'<br>\n";
   print"</tt>\n";
   print"</body>\n";
   print"</html>\n";
}
exit();

Notes on the code

  • I like to define variables, such as the sample sequence or the URL at the beginning of the script, rather than hard-coding them into the code.
  • if (! $ENV{"QUERY_STRING"}) is TRUE (i.e. ! FALSE) if the environment variable is empty. Remember: an empty string is evaluated as FALSE.
  • In one case we use $ENV{"QUERY_STRING"} to process the input, then we process the elements from @ARGV - both work in a similar way, except @ARGV has the elements of the query separated out.
  • <tt> is HTML markup for "teletype", i.e. courier or courier-like font.
  • <b> is HTML markup for "bold".
  • <a href=\"$url\">$url</a> is HTML markup to create a hyperlink to the complete URL, on the text of the URL itself.
  • Note that the Content-type is not text/plain, as in the previous examples, but text/html.



   

Further reading and resources