Writing Dynamic pages with
CGI/Perl
- What is CGI?
- Dynamic Content
- Configuration on
the Server-Side
- Running the Program
- The Whole Server-
Side Process
- Possible Problems
- Debugging
- Summary
1. WHAT IS CGI?
CGI stands for "Common Gateway Interface" - a term you don't
really need to know. In short, CGI defines how web servers and
web browsers handle information from HTML forms on web pages.
That's simplifying it, but you get the point. In the broader
sense, however, the term 'CGI' is often used to mean "any
program that runs on a web server and interacts with a web
browser". You may hear someone ask, "Where can I get a CGI
script to handle this form?" or "Use CGI to do what you need".
What they are referring to is a program of some sort that runs
on your web server.
2. DYNAMIC CONTENT
Okay, so a web server spends most of its time answering
requests, loading the HTML page that a user is requesting, and
sending it to them. Nothing too complicated. But this isn't very
exciting, is it? What if I want the user to see something
different every time they load a page? What if I want to ask a
user for information, and save it to a database? What if I want
to display information from a file that may change 5 times a
day? In situations like this, loading a static (non-changing)
page just isn't good enough. We need the web server to run a
program, take some action, and then send a results page back to
the user's web browser. The results page might be different
every time the program is run.
Let's take an example...
You create a web page in your browser that has a form in it,
asking the user's name and email address. There is also a
'Submit' button. When the user presses submit, their information
should be saved in a file on the web server that you can view
later, and they should get a 'Thank You' screen back.
In your original HTML document, you will have a <FORM> and some
<INPUT> tags. For example:
<FORM METHOD="POST"
ACTION="http://www.server.com/cgi-bin/program.cgi">
Name: <INPUT NAME="name"><BR>
Email: <INPUT NAME="email"><BR>
<INPUT TYPE="SUBMIT" VALUE="Submit">
</FORM>
This is a basic form.
Webmonkey's HTML Tutorial is good if you need to learn more
about Forms in HTML. If you don't understand forms, read up on
them first. You can't really tackle CGI scripts without
understanding forms.
The <FORM> tag has two parameters that are important for us. The
METHOD tag defines how the browser will send the information to
the server, and how the web server will send it to your program.
It can either be "POST" or "GET" - you will most often see
"POST". For a full explanation of the difference, you need a
longer tutorial or a book. The other parameter, ACTION, is the
URL of the program on the server that will process the
information sent from the form and do something with it.
When the user hits 'submit' the web browser makes a connection
to the server, requests the URL in the 'ACTION' paramater, and
also sends all the form values that the user entered. The web
server looks at the URL, realizes it is a program rather than a
static file, and runs it. The program then grabs all the data
sent to it, does something, and returns HTML back to the browser
as the response. That's it! That's the basic process that almost
all CGI scripts are going to go through.
3. CONFIGURATION ON THE SERVER SIDE
In order for all this to happen, you need to make sure your
web server is setup to handle this whole thing. By default your
web hosting account is ready to go.
When you (your browser, actually) request a URL from a server,
the server needs to do some checking to find out what to do. How
does the server know if the URL you are requesting is a static
file it should just load and send, or if it's a program it
should run and send to you? This is typically decided by two
factors: Which directory the file is in, and its file extension.
First, let's look at the directory part. If you're reading this,
you've no doubt heard of a 'cgi-bin' directory, and noticed that
most CGI scripts need to be in this directory. Why? Well, this
is a server configuration issue. The server is setup to know
that any file in this directory is a program to run, and not a
static file to send to the browser. Usually, you can't even put
a regular HTML file in this directory, because when the server
tries to load it, it will try to run it as a program rather than
just send it as a file.
Are you curious where the name 'cgi-bin' came from? Well, it
goes back to the original days of the NCSA web server. By
default, this web server had two directories: cgi-src and
cgi-bin. The first contained source code for CGI programs that
could run on the server. The second contained the binaries
(compiled executables) of the programs, which could be run on
the server. Web servers typically don't have the cgi-src
directory anymore, but the name cgi-bin has stuck around as the
'default' place to put executable CGI programs on a web server.
Now let's look at the second factor to determine whether a web
server runs the file or loads it as a static file: the file
extension.
The extension of a file on the server - .html, .cgi, .pl, .txt,
etc - tells the server what kind of file it is and how to handle
it. It knows that .html and .txt files are plain text static
files that should just be sent to the browser, for example. You
can add your own file extensions through the web server's
configuration options, and tell it how to handle those files.
The .cgi extension is one example of an extension that the web
server is configured to recognize as a program it should run.
Okay, now let's take another look at the <FORM> line from our
example above:
<FORM METHOD="POST"
ACTION="http://www.server.com/cgi-bin/program.cgi">
When the web browser sends its request to the ACTION URL, the
web server sees that it is in the cgi-bin directory, and its
extension is .cgi - so it knows that this is a program that it
should run. So it hands off a request to the operating system
telling it to run the program, and also passes all the form data
to this program. Makes perfect sense, doesn't it?
4. RUNNING THE PROGRAM
We're now at the point where the web server has decided it
should run the CGI program, and its made the request to the
Operating System to execute the file. This is where a lot of
problems start happening, because there are a lot of things that
need to be exactly correct in order for the program to run
successfully and send the output back to the web browser. Some
of these potential problems are specific to UNIX, and some are
specific to Windows NT (I won't go into other operating systems
because these two are the most common). I'll just go down the
list of things that need to be correct in order for this to
work.
1. The file needs to be executable
In Unix, files have attributes that don't exist in the Windows
NT world. One of these is the executable bit. Each file has a
setting that tells the operating system whether it can be
executed as a program or not, and whether it can be run by only
the file owner, only the group that the file owner is in, or by
everyone on the server. In order for the operating system to run
the file, it needs to be marked as 'executable' by Everyone.
This is what the 'chmod' command does. I won't go into detail
about how chmod works, but when you see an instruction that says
something like 'do a chmod 755 on the program.cgi file', what it
is telling you is to make the file executable by everyone on
your server, so it can be run from the web server. For more
information on permissions check
Setting
File Permissions
2. The file needs to point to a valid executable
For .cgi files, the server knows to run it as a program, but it
needs to know HOW to run it. If it's a compiled executable,
there's no problem - it just runs it. But if it's a script using
a language like Perl, it needs to know where to find the Perl
program that will run the script. This is the function of the
first line of the file. For example:
#!/usr/local/bin/perl
In Unix, this points to an executable file (in this case, the
program is named 'perl') that will run your script. The first
two characters - #! - is called a shebang, and it's common Unix
syntax. If your script starts with the line above, and your
server doesn't have a program called /usr/local/bin/perl, the
whole thing will die and you'll get an error back. For perl
scripts, the line above is typical, and most servers have /usr/local/bin/perl.
But in some rare instances, things are configured differently
and you need to edit this first line to point to a valid program
to run.
3. The program needs to return a valid response
Any CGI programming that runs needs to return a valid response
to the browser. If it encounters a problem while running and
dies, it could output an error message, however. If you were
running the program in a normal window on NT or Unix, you would
simply see the error message. In the web world, however, the
program needs to hand its response back to the web server, who
then packages it up to send back to the browser. If the program
outputs an error message, the web server does not get the
response it expects and instead returns a general error (501
error, for example) back to the browser saying there was a
problem running the program.
|
 |
5. THE WHOLE SERVER - SIDE PROCESS
Now that you understand how things need to be setup, it's a
good time to step through the whole process and see exactly what
happens when a CGI script is run. Going back to our original
example, here is the sequence of events (assuming a Unix
server):
1. The browser requests the URL in the ACTION tag, and passes
all the data along with the request
2. The server recognizes that .cgi means that this file should
be run
3. It checks to make sure that CGI programs are allowed to run
on the server
4. It checks to make sure that CGI programs are allowed to run
in the /cgi-bin directory
5. It launches a sub-process to run the program in the operating
system
6. The operating system opens the file and looks at the first
line to see which program to use with the script
7. It runs this program and passes it the filename to run
8. The script runs, does whatever it needs to, and then returns
an HTML response, using print() statements, for example
9. The whole response is passed back to the server which then
packages it up in an HTTP response, including content length,
etc.
10. The server then passes the whole response back to the
browser, which displays it.
6. POSSIBLE PROBLEMS
Of course, that whole process doesn't always go as planned,
and there are some things that can stand in the way of your
program running correctly.
1. File permissions
When the web server launches a sub-process to run the program
(Step 5 above) it does a trick. It changes the User ID of who it
is running as to a user that has very little or no permissions
to do anything on the web server. This is for security purposes
- so you can't write a script that over-writes important files
on accident, or deletes whole directory trees. But this also
creates a problem when your program tries to access files to
read and write. If you want your program to write to a file, you
need to make sure it has permissions setup correctly for this
user (usually a user named 'nobody') to write to it. Once again,
you need to use the 'chmod' command. A command like 'chmod 777
filename.txt' will give Read, Write, and Execute permissions for
the file for anyone on the server machine, so even when the
server changes to the new user it will still have access to the
file.
File permissions are an important thing to remember when trying
to setup someone's CGI script, and are often the cause of it not
working correctly. Make sure to follow instructions on which
file permissions are needed for which files in order to setup
the script correctly.
2. Content-type
The first thing a CGI script needs to output, assuming it's
giving an HTML response back to the user, is "Content-type:
text/html" followed by two returns (creating an empty line).
This is needed in any CGI script so that the web server knows
what kind of data is being sent back to the browser and can
handle it appropriately. The CGI script could actually return
any type of response it wanted to - it could be plain text, or a
PDF document, or a Microsoft Word file. But 99% of the time, the
result of any CGI script is going to be plain HTML.
If the script runs and it outputs something other than the
Content-type, the web server will return an error message to the
browser saying the script returned an invalid response.
7. DEBUGGING
Any time you have a script that doesn't work, you need to go
through a series of steps to figure out what is wrong. Most of
the time, you'll be setting up a script that someone else wrote,
so sometimes it can be difficult to figure out what is wrong.
But if you follow a few steps, it should be easier.
1. Make sure CGI scripts are in the right place
If you aren't the Webmaster in charge of the web server, the
very first thing to check is that your scripts are in the right
place. All your files need to be in your account www
folder, and scripts must be in your cgi-bin - no exceptions!
2. Make sure file is executeable
See above. Make sure the file is executeable! (Setting
File Permissions)
3. Check shebang line
The first line of your script needs to be
#!/usr/local/bin/perl.
4. Ask your Webmaster for help
If you've checked all these things, email your webmaster and see
if they can help you out. They may have some special things
setup, or they might be able to give you some clues. Also, be
sure to tell them you've tried the above steps - they will love
you! Nothing is more aggravating than receiving a request for
help from someone who has apparently done NOTHING to try to help
themselves.
5. Contact the script author for help
If all else fails, contact the script author for help. We
recommend you use this as a last resort - most folks who have
contributed free scripts to web sites explicitly state "no
support provided". Some offer to install their scripts for a
fee, and if you've chosen not to pay it's uncool to ask for free
help!
8. SUMMARY
Hopefully we've gone into enough detail in this tutorial to
help you out. If you want to know more, or really get into the
nitty-gritty of things, check out
Sample Scripts and Sources and learn through trial-and-error
on your own account!
An important thing to remember is that this can be a
complicated subject and you shouldn't expect easy answers.
Programming and/or installing CGI scripts is more difficult and
involved that writing simple HTML. If you're trying to make the
jump from one to the other, make sure you've got the desire to
really learn it and the knowledge to make it happen.
Learning CGI programming and making scripts work on your site
can be a very satisfying experience. Hopefully this helped you
along your way, and you'll have much success with it! Good luck!
Helpful Server Paths
| Path to Perl: |
#!/usr/local/bin/perl |
| Path to Sendmail: |
#!/usr/lib/sendmail |
| Date command: |
/usr/bin/date |
| Absolute path to HTML dir: |
/home/username/domainname-www |
| (example: with test.com and
username joe, path would be /home/joe/test-www |
|