CGI

From "A B C"
Jump to navigation Jump to search

CGI


The contents of this page has recently been imported from an older version of this Wiki. This page may contain outdated information, information that is irrelevant for this Wiki, information that needs to be differently structured, outdated syntax, and/or broken links. Use with caution!


CGI - the Common Gateway Interface allows Web pages to interact with server-side programs.


Related Pages


 

Contents

The Common Gateway Interface (CGI) is a standard for connecting external applications (programs) with information servers, such as Web servers. A plain HTML document is static: it is a text file that doesn't change unless it is replaced. But often we want to have dynamic contents, contents that changes according to the needs of the client. Client-side execution of code can be achieved with scripts that are supported by a browser, such as Javascript. However, access to your own databases must typically be done on the server. A CGI script, is executed on the server; it can serve dynamic information and have access to all resources on your laboratory server that you wish to give it. Essentially, the Web server can invoke the program and serve the program's output in response to a client request, rather than a Web-page file, as it would normally do. This makes the construction of dynamic Web pages as easy as putting an executable script into the right directory. Many programming languages can be used for scripts that use the CGI interface in client-server interactions.

Apache and CGI

Apache needs to be configured to permit CGI execution. In a default installation, a variable called ScriptAlias in httpd.conf defines a directory for which Apache assumes that it contains CGI programs which it will attempt to execute, rather than serve as files.

$  grep " ScriptAlias " /usr/local/apache/conf/httpd.conf
   ScriptAlias /cgi-bin/ "/usr/local/apache/cgi-bin/"
$

Thus in order to execute a script ...

  • Apache will look for the requested program in the cgi-bin directory
  • the program must exist
  • the program must be executable by Apache
  • the output will be returned to the requesting Web client.
More detail

In a default installation of apache, if you type a filename into an URL, the file will be served, not executed, even if it is an executable file. In order to actually run it as a script, you have to configure apache.

  • Navigate to the apache configuration directory. If this is a default installation, type

<text> $ cd /usr/local/apache2/conf (for Mac/UNIX) or $ cd /etc/apache2 (for Ubuntu/Debian style Linux) </text>

  • Make a backup copy of the filehttpd.conf

<text> $ sudo cp httpd.conf httpd.conf.01

  • note: on Ubuntu/Debian systems the httpd.conf file in /etc/apache2/ is blank. While you can add config settings here,

they are global in nature. Configuration of apache on Ubuntu/Debian systems is organized in apache2.conf where settings are controlled by directory with separate permissions for global, main server, and virtual hosts. Adding handlers or directives to httpd.conf may be redundant with parameters in apache2.conf, which will throw warnings when apache starts/restarts. Add the line "ServerName (yourservername)" to httpd.conf or the appropriate directory in apache2.conf to address the warning: "Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 for ServerName". </text>

  • Open the original httpd.conf for editing. (You will need the appropriate privileges to do that).

<text> $ sudo pico httpd.conf </text>

  • First, you need to allow execution of files. apache allows this on a per-directory basis. Some administrators like to keep all executable files in a single directory called cgi-bin. Here, we simply allow execution of files from all directories. Look for the section that defines the Directory options for the Web-root directory. For a default installation this section will read ...

<text> <Directory "/usr/local/apache2/htdocs"> </text> ... below this, there is a line that defines the allowed options. Add the option ExecGGI. In a default installation it would read <text>

   Options Indexes FollowSymLinks ExecCGI
  • note: on Ubuntu/Debian enabling CGI in the localhost (/var/www/) directory for all users can be accomplished by adding

"Options Indexes FollowSymLinks ExecCGI" to the desired directory (/var/www/) in /etc/apache2/sites-available/default. </text>


  • You then need to find the section that discusses the "AddHandler" directive and edit it to tell apache to execute perl scripts as cgi scipts. Un-comment the line that reads ...
# AddHandler cgi-script .cgi
</text>
... and change it to read .pl instead of .cgi:
<text>
AddHandler cgi-script .pl


  • Save the modified httpd.conf and exit the editor.
  • Confirm that you did not inadvertently change the owner or the permissions of the file.
  • Restart apache to use the new configuration

<text> $ sudo /usr/local/apache2/bin/apachctl restart </text>


Read about more details and alternatives, troubleshooting and security issues in the apache documentation!




The simplest example

As long as the output of the program looks like an HTML page, all is well. There is one subtle and important difference however: while Apache will add the correct MIME type header to an .html file, based on its extension, it has no way of knowing what the output from your CGI program will be. HTML? Plain text? An audio clip? A Flash game? Thus you have to provide the correct header as the first line of your program output.

$ cd /usr/local/apache/cgi-bin
$ sudo pico testscript.pl

Enter the following code, then save.

#!/usr/bin/perl -w

print"Content-type: text/html\n";   # MIME header
print "\n";                         # Blank line: payload begins here
print "\n";
 print "\n";
 print "Occam\'s Razor\n";
 print "\n";
 print "\n";
 print "<h2>Entia non sunt multiplicanda sine necessitatem</h2>\n";
 print "\n";
 print "\n";

exit;

Then:

$ sudo chown root testscript.pl
$ sudo chmod 755 testscript.pl
$ ls -l testscript.pl
-rwxr-xr-x   1 root  wheel  368 Mar 22 18:31 testscript.pl
$

The program does not have to be owned by root but in this way it is not possible for anyone to change it (after all you are executing code on behalf of a complete stranger somehwere on the Web) unless she has root privileges.

Now open your Web browser and type the following into the address field:

http://localhost/cgi-bin/testscript.pl

Voila. The program executes and sends the result to the client. All this takes is to request this script from the Web server. Of course, this could have been inserted as a clickable link into an HTML page, such as in this example:

<a href="http://localhost/cgi-bin/testscript.pl">Click here</a>

Generating a static file like this is perhaps not very useful, since we could have achieved the same by simply retrieving this file; no program code need be involved. Here is an example of something actually happening.


A minimal CGI example

  1. a URL is formatted to request an executable program
  2. the Web server executes the program
  3. the program runs and sends data to STDOUT
  4. the Web server accepts this data and sends it to the requesting browser
Note
Since this requires an active role of the remote Web server, this will not work by simply opening an HTML file in your browser locally. You actually have to run a Web server.

Code example

<perl>

  1. !/usr/bin/perl
  2. serverTime.pl - CGI example script.
  3. Boris Steipe

use strict; use warnings;

my $time = `date`;

print "Content-type: text/html\n\n"; print"\n"; print"\n"; print""; print"Current Server Time\n"; print"\n"; print"\n"; print"

$time

\n"; print"\n"; print"\n";

exit(); </perl>

Notes on the code

  • date is a unix command to convert the system's internal clock time (in seconds since the epoch) into a formatted output.
  • date is enclosed in "backticks": reverse slanted apostrophies. This is the perl code to run the enclosed string as a system command; the output from STDOUT is the result of this operation and can be assigned to a variable.
  • Content-type: text/html defines the MIME type of the output that our program is writing. It must be followed by a blank line, therefore \n\n.
  • <META HTTP-EQUIV="Refresh" CONTENT="10"> causes the page to reload itself in 10-second intervals. This is the so called "client-pull" mechanism to refresh contents (as opposed to "server-push").

How to run this script:

    • Copy this code, edit it at leisure and save it e.g. under the name serverTime.pl.
    • Upload the file to a directory on your server. For example I have a directory called test in my Web-root directory where I can store code I am just playing around with and that I may delete at any time. You might put it into a directory called course, or whatever.
    • chmod the file to be executable. For example chmod 755 /usr/local/apache2/htdocs/test/serverTime.pl
    • Then you can paste the URL into a Web-browser http://localhost/test/serverTime.pl. The local time for your server should now be boldly displayed in the page that is returned.
    • Reload the page. You notice that the time changes. This is dynamically created content.


Recapitulate the process

  1. Your browser has sent a request to the server.
  2. The server has recognized that this is an executable cgi file.
  3. It has executed the script and has captured the output the script sent to STDOUT.
  4. It has sent the output back to the requesting browser.
  5. The browser has parsed the HTML and has formatted the output as a Web page.


Troubleshooting this script

  • The server reports: "File not found".
    • Are you sure the file is in the right place relative to document root? A file with the physical location /usr/local/apache2/htdocs/test/serverTime.pl has the URL http://localhost/test/serverTime.pl.
  • The server reports "Internal server error". This can mean the file could not be executed, or it did not produce a valid output.
    • Has the Directory directive been set as discussed?
    • Does the server (i.e. "others") have read privileges in the directory?
    • Does the server have execute privleges on the file? Set permissions as: chmod 755 filename.
    • Does the script have the extension .pl for which you have isssued an AddHandler directive?
    • Does the script run at all? Does it run correctly when you run it from the commandline?
    • Does the script output the correct header - Content-type: text/html\n\n - a declaration followed by a blank line?
  • The server prints the source-code rather than executing the script. This means the server has treated the file as a text-file, rather than a script.
    • Has the AddHandler directive been set in httpd.conf ?
    • Did you restart apache after editing httpd.conf ?




A dynamic script

In order for client and server to communicate, they have to agree on a number of variables for the transaction that set the process' environment. Here is a program that will print out all the environment variables on the server side. This is easy since they are made availble to the perl process in the hash %ENV. Type ...

$ sudo cp testscript.pl env2html.pl
$ sudo pico en2html.pl

then enter the following program code:

#!/usr/bin/perl -w
use strict;

print"Content-type: text/html\n";   # MIME header
print "\n";                         # Blank line: payload begins here
print "<html>\n";
print "<head>\n";
print "<title>Current environment from %ENV</title>\n";
print "</head>\n";
print "<body>\n";
print "<h2>Environment Variables:</h2>\n";
print "<table border=\"1\">\n";
print "    <tr><td><b>Variable</b></td><td><b>Value</b></td></tr>\n";
    foreach my $key (keys %ENV) {
        print "    <tr><td>$key</td><td>$ENV{$key}</td></tr>\n";
    }
print "</table>\n";
print "</body>\n";
print "</html>\n";

exit;

The file (once you save) should still be owned by root, executable and in the cgi-bin directory. Then type

http://localhost/cgi-bin/env2html.pl

into the address field of your browser ...

Next, we want to change the behaviour of the program, depending on input from the client. For this, we need to pass parameters into the program.

Passing parameters: GETting a QUERY_STRING

One of the environment variables is called QUERY_STRING and this constitutes one mechanism to pass parameters into the program: type the following into the browser address field:

http://localhost/cgi-bin/env2html.pl?My Special Data

You should notice two things: one, the Query_String variable has changed to what you typed after the questionmark, and two, the blank space has been encoded as "%20" - this URLencoding is a common mechanism to cast arbitrary characters into the valid character space for Internet URLs. Using this string is simply a question of retrieving it from %ENV, for example as

my $request = $ENV{'QUERY_STRING'};

However this is not advisable: it is far better and safer to use the methods available in the CGI package for such tasks. One more note: passing parameters in this way passes them through an HTTP GET request. Get requests are not supposed to modify the server state, multiple GET requests should always result in the same behaviour. However there is nothing that prevents you from writing CGI code that changes the server state - imagine something like:

http://localhost/cgi-bin/env2html.pl?command=delete_user&name=Blaise%20Pascal

If a data item is changed in such a way, a second GET request for this URL will fail. Nonetheless, the Web is full of such examples. Foor many developers simplicity trumps correctness.

See also: CGI GET on this wiki.

Passing parameters: POSTing a form

A (more correct and) more versatile way to pass parameters is to use HTML forms. For example, type the following into your text editor and save this as /usr/local/apache/htdocs/testform.html

See also: CGI POST on this wiki.

<source lang="text"> Form test <h2>My Form</h2>

<br> Enter some text here. <p>
</text>

Open this html file in your browser ...

http://localhost/testform..html

... and type something into the field and submit it. The output doesn't change a whole lot, but you can see that the REQEST_METHOD has changed to POST instead of GET and that the CONTENT_TYPE was defined as application/x-www-form-urlencoded. This means we have uploaded some data to the server but it would take a bit more programming to actually get at it. This is where we start using CGI. Change the program to the following, which includes a separate table for POSTed data:

#!/usr/bin/perl -w
use strict;
use CGI;

print"Content-type: text/html\n";   # MIME header
print "\n";                         # Blank line: payload begins here
print "<html>\n";
print "<head>\n";
print "<title>Current environment from %ENV</title>\n";
print "</head>\n";
print "<body>\n";
print "<h2>Environment Variables:</h2>\n";
print "<table border=\"1\">\n";
print "    <tr><td><b>Variable</b></td><td><b>Value</b></td></tr>\n";
    foreach my $key (keys %ENV) {
        print "    <tr><td>$key</td><td>$ENV{$key}</td></tr>\n";
    }    
print "</table>\n";

if ($ENV{'REQUEST_METHOD'} == 'POST') {
    print "<p>\n";
    print "<h2>POSTed Contents:</h2>\n";
    print "<table border=\"1\">\n";
    my $form = CGI->new();
    my @fields = $form->param(); # list of all posted parameters
    foreach my $field (@fields) {
        my $value = $form->param($field);
        print "    <tr>";
        print "<td>$field</td>";
        print "<td>$value</td>";
        print "</tr>\n";
    }
    print "</table>\n";
}

print "</body>\n";
print "</html>\n";

exit;

Then type something into your html form and submit it. Once this runs, we have all the elements for dynamic client-server transactions via HTTP in place:

  • accepting input in a form in a Web browser page
  • sending the input as a request to a server
  • accepting the input, invoking a program and passing it the input
  • analysing the input and using it in a program
  • generating output and returning it to the requesting server
  • displaying output in a Web browser

GBrowse does nothing else.

References



   

Further reading and resources