WebKit Install Guide

WebKit 0.7
Webware for Python 0.7

Table of Contents

Introduction
Python
Webware Version
Installing Webware
Operating Systems
Web Servers
      Apache
      IIS
      AOLserver
Architecture
Bad Marshal Data
Adapters
      Permissions
      CGI Adapter
            1. Review and fix permissions
            2. Set up WebKit.cgi
            3. Launch the servers
            4. Try it out
      Stopping the App Server
      Reloading Servlets
      OneShot Adapter
      FastCGI Adapter
      Mod_python Adapter
      ModSnake Adapter
      ModWebkit Adapter
      AOLserver Adapter
      Renaming Adapters
Configuration
      Sessions
      Activity Log
      E-mail Errors
Contexts
AppServer Notes
      ThreadedAppServer
      Monitoring
      Launching the AppServer at UNIX boot up
      Running ThreadedAppServer as a Windows NT/2000 Service
      Other Notes
Future
Credit

Introduction

WebKit provides an application server that runs on both UNIX and Windows in "CGI-" or "persistent-" mode, with several configuration options to change its behavior.

This install guide takes you through the installation process while showing you what options are available and how you would use them. There are notes for specific operating systems and web servers preceded with tags such as [UNIX], [WinNT], [IIS], etc.

WebKit is pretty easy to install, but there are some important things to know. You will be doing yourself a favor by reading through this guide from top to bottom (skipping sections that don't apply to you).

We recommend that you get a very simple configuration working first, focusing most on a convenient environment for your development. After that, if you still want to play with more advanced setups, then go for it.

The term deploy means to install your web application/site in a well known location for access by your users. You first develop your web application and then deploy it. You will see deployment specific notes throughout the guide since the issues for these two situations can be different. In a nutshell, you want convenience during development, and security and performance during deployment.

You can always e-mail webware-discuss@lists.sourceforge.net to give feedback, discuss features and get help using WebKit.

Python

WebKit 0.7 was tested with Python 2.0, 2.1.1, and 2.2. If you have any problems whatsoever with Python 2.0 and up, please let us know so we can fix them (webware-discuss@lists.sourceforge.net).

Your installation of Python must be multi-threaded, even if you are using the OneShot adapter (which logically doesn't require threads). It's not uncommon for third party web host providers to leave this disabled in Python because they don't expect that it's needed and/or because the default installation of Python does not turn threads on. If your Python installation is not multi-threaded, you will have to reinstall it. If you're using a third party host provider in this situation, you may be able to install it yourself into your home directory via a telnet or ssh account. See the Python site for instructions.

To determine if threading is enabled, start the Python interpreter from the command line and enter import thread. If you don't get an exception, all is well.

Webware Version

Unless you just downloaded Webware (and therefore WebKit), you should check to see if you have the latest version. This version you are looking at is 0.7. You can check for the latest version at http://webware.sourceforge.net.

If you're feeling adventurous, you can get the latest in-development source code from the public repository. Instructions are located at the Webware CVS page. You can find more information about CVS in general at http://www.cvshome.org.

Installing Webware

Webware's main directory contains an install.py script that should always be run first.

Note that the install program doesn't actually copy the Webware files to any separate directory. It copies files within the Webware directory tree, modifies permissions, generates documentation, etc.

Also, you can run the install program as many times as you like with no ill effect, so if you're not sure whether or not it has been run, feel free to do so again. If you're debugging a problem, you can run install.py -v for verbose output.

Operating Systems

WebKit actively supports UNIX, Windows (95/98/NT/2000) and their various flavors.

Note that you don't have to develop and deploy on the same platform. One of the WebKit developers develops everything on Windows 98 2nd edition and deploys his web sites on BSD (http://www.bsd.org/).

What follows are some OS specific notes:

[UNIX] Nothing special to report. Both Linux and BSD have been used with WebKit.

[WinNT] If you are using IIS as your web server, please see the IIS notes below.

[Win9x] Although a lot of the development of both WebKit and various web sites that use it has been done on Windows 9x, we don't recommend that you use this operating system to actually serve your users. UNIX and NT are generally more robust and secure and therefore, more appropriate for deployment.

[Win] Some Windows users are surprised that URLs passed to WebKit are case sensitive. While it's true that the various Windows file systems are case insensitive, the data structures that WebKit uses internally are not. This is on the to do list.

[Win] When editing path names in configuration files, keep in mind that you are editing a Python dictionary populated with Python strings. Therefore your paths must either use double backslashes (\\) (since the backslash is a special character), use forward slashes or use "r-strings". A bad example is 'C:\All\Web\'. Good examples include:

Web Servers

WebKit has been tested with both Apache and IIS, so chances are your web server has already been tested. For the most part, WebKit doesn't care what web server you use. The least common denominator is standard CGI, so as long as your web server supports that, you should be fine.

Apache

Apache is the world's most popular web server, which means it's also well tested. The Apache home page is located at http://www.apache.org.

In the simplest case, you can use WebKit CGI adapters with Apache. For better performance, other apache adapters are the new mod_webkit adapter, the FastCGI adapter, mod_python adapter, or the mod_snake adapter

IIS

Microsoft's Internet Information Server, or IIS, requires some special configuration due to its default non-standard CGI behavior. You have to do two things:

1) For IIS 4.0, configure it to handle PATH_INFO and PATH_TRANSLATED according to the CGI specification, by running the following command:

cscript adsutil.vbs SET W3SVC/1/AllowPathInfoForScriptMappings 1

This is lightly documented in http://support.microsoft.com/support/kb/articles/Q184/3/20.ASP.

2) Use Gordon McMillan's Standalone utility, which is part of his Installer package, to convert WebKit.cgi into an executable program, WebKit.exe. IIS apparently will not handle requests with additional path components after the name of the script, like http://localhost/WebKit.cgi/foo/bar where WebKit.cgi is a script, but will handle requests like http://localhost/WebKit.exe/foo/bar where WebKit.exe is an executable. In any case, using the Standalone utility probably results in a speed increase, so this is probably a good thing to do anyway.

In order to convert WebKit.cgi into WebKit.exe using the Standalone utility, you first have to rename it to WebKit.py, and add import socket to the top of the file (you could use a configuration file instead, but this is easier). After creating the .exe file, delete WebKit.pyc and rename WebKit.py to _WebKit.py. This will avoid conflicts with the .exe.

We hope to provide a native Windows executable cgi program written in C in the near future, as well as a DLL for apache and an ISAPI module for IIS. If you would like to implement any of these, please do so. The mod_webkit adapter source includes all of the WebKit specific code you'll need.

AOLserver

The PyWX project aims to meld Python and AOLserver and as part of that, they created an AOLserver "stay resident" WebKit adapter for tighter integration. More information can be found at http://pywx.idyll.org. If you find that this adapter does not work with the latest release of Webware, please notify the PyWX project.

Architecture

The WebKit architecture involves three main entities at the top level:

  1. The web browser
  2. The web server
  3. The app server

The browser will be something like Microsoft Internet Explorer or Netscape Communicator. The web server will be something like Apache or IIS. And finally, the app server will be WebKit, e.g. a Python process running the WebKit modules and your custom servlets.

The chronological order of events goes from top to bottom and then back up to top. In other words, the browser makes a request to the web server which in turn makes a request to the app server. The response then goes from the app server to the web server to the browser.

The key to installing WebKit is to get the web server and app server talking to each other. For this purpose, you must use a WebKit adapter. See below.

Bad Marshal Data

The most common installation problem is a Python exception appearing in your browser that says "bad marshal data". This is always caused by pointing the web browser to the app server:

http://localhost:8086/WebKit.cgi/

But the app server hides behind a web server so the correct URL would be:

http://localhost/WebKit.cgi/

That requires that web server and app server are set up to talk to each other. And that's what the next section is all about...

Adapters

A WebKit adapter takes an HTTP request from a web server and, as quickly as possible, packs it up and ships it to the app server which subsequently sends the response back to the adapter for delivery to the web server and ultimately the web client. More concisely, an adapter is the go-between of the web server and the app server.

The current suite of adapters for WebKit are:

  1. CGI
  2. FastCGI
  3. OneShot
  4. mod_python
  5. mod_snake
  6. AOLserver
  7. mod_webkit

Permissions

Note that on UNIX and NT, regardless of which adapter you use, you may need to set some permissions so that the app server can write files to the WebKit directories Cache, ErrorMsgs, Logs and Sessions.

The easiest way to do this is to make these directories writeable by all (on UNIX, that would be cd Webware/WebKit; chmod -R a+rwX Cache ErrorMsgs Logs Sessions).

The most secure way to do this, is to make sure the app server and its adapter are executed by a specific user and that the specific user has write permissions to these directories (but no one else). This requires some savvy with both your operating system and your web server. You may wish to take the easy approach first in order to bootstrap yourself and then come back to this.

CGI Adapter

The "classic" WebKit setup uses the CGI adapter, known as WebKit.cgi. Almost all web servers support CGI so it was obvious that WebKit should, too.

In this configuration, the app server stays resident (similar to the web server) and is tapped for responses by a small CGI program that forwards the request to the app server.

Note that even though this approach uses CGI, such a WebKit application will run faster than the equivalent all-CGI solution. The reason is that that the app server (and possibly your servlets) can do quite a bit of caching, and the CGI adapter is fairly small. On the other hand, this is one of the slower adapters out of the bunch.

To set this up, do the following:

1. Review and fix permissions

2. Set up WebKit.cgi

3. Launch the servers

4. Try it out

Stopping the App Server

The recommended method of stopping the AppServer is through the Application Control interface. This is a servlet located in the Admin context. A username and password are required, and these default to "admin" and "webware". (If your system is open to the internet, please change the password immediately. It's set in the WebKit/Configs/Application.config file). This shutdown method is safer than doing a ctrl-c from a terminal, as described below.

On all OSs, stopping the app server may also be accomplished by simply going to its terminal/command window and hitting Control-C. The app server is designed to intercept this interruption and shut down as gracefully as possible. This includes saving session data to disk, if necessary.

On UNIX, a running appserver may be stopped from a terminal by typing "AppServer stop".

[Windows] Control-C normally shuts the app server down gracefully, where as Control-Break does not. Keep that in mind and use Control-C, unless the server is unresponsive to it.

[UNIX] If you don't have access to the terminal window of the app server (perhaps because you used rlogin, telnet or ssh to remotely access the machine), and "AppServer stop" doesn't work, you can use ps -ax | grep AppServer to get the pid and kill <pid> to effect a Control-C.

Reloading Servlets

As you develop your web application, you will change the code of your various Python classes, including your servlets. The WebKit app server will detect a change in the timestamp of a servlet's source file and automatically reload it for you.

However, reloading fails in two areas. The first is that WebKit doesn't check ancestor classes of servlets for modifications. So if you modify an abstract class (for example, SitePage, AccountPage, etc.), it won't be reloaded. The second is that WebKit can't check non-servlet classes. So if you modify a utility class (for example, ShoppingCart, Story, etc.), it won't be reloaded.

You can deal with reloading problems by stopping the app server (Control-C in its terminal/command window) and restarting it. If this gets too bothersome during development (and it probably will), use the OneShot adapter.

A new alternative way of reloading ancestor classes is through the Application Control servlet, in the Admin context, which will provide a list of all of the currently loaded modules, and allow you to reload selected modules.

OneShot Adapter

The OneShot adapter is another CGI based adapter, but unlike WebKit.cgi, the app server launches, serves a response and shuts down for every single request via OneShot.cgi.

There are three major reasons to use the OneShot adapter:

1. When developing a web site.
OneShot avoids the reloading problems described in the previous section, because the app server is loaded fresh for every request.

2. When developing WebKit itself and/or plug-ins.
Restarting the app server is required in these circumstances, so again, OneShot becomes a convenient way to do this.

3. To ensure absolute stability.
Stability problems and memory leaks are unlikely to occur with OneShot since the lifetime of the app server is so short. However, the persistent app server is fairly well tested so this particular reason is somewhat weak.

Of course, the cost of these benefits is performance. OneShot is at least an order of magnitude slower than a persistent app server. For development, that's not a problem. But for a deployed site with multiple users it could be.

We generally recommend that you develop with OneShot and deploy with something else, like WebKit.cgi or FastCGI (see below).

To set up OneShot, follow the same instructions as for WebKit.cgi with two exceptions and one addition:

  1. Work with OneShot.cgi (not WebKit.cgi).
  2. Do not launch the app server (OneShot will not use it even if it's loaded).

FastCGI Adapter

The essence of the FastCGI protocol is to keep the CGI program resident. In our case, that program is a combination of the Python interpreter, several standard Python modules and the WebKit adapter code. By eliminating the constant reloading of these components, a substantial speedup can be obtained (2 X in informal tests).

Your web server will have to be FastCGI enabled, which you may be able to accomplish by downloading software at http://www.FastCGI.com where you can also learn more about FastCGI.

The top of FCGIAdapter.py contains a doc string explaining its setup.

Note that to date, we have only gotten this to work on UNIX.

For development purposes, the FastCGI adapter doesn't offer anything substantial. However, for deploying your site for real use, the extra configuration effort could be well worth the performance boost.

Another adapter that provides the same type of benefit is the mod_python adapter, described below.

Mod_python Adapter

Mod_python is a module that embeds the Python interpreter into the Apache web server. This can make for a substantial speedup over WebKit.cgi which reloads the Python interpreter and various modules for each request.

mod_python itself and the adapter are both somewhat new, which makes this approach the least tested among the bunch.

The top of ModPythonAdapter.py contains a doc sting explaining the setup.

This has been tested on both UNIX and Windows. On Windows, you should use mod_python 2.7.4 or later because there is a bug in earlier versions of mod_python for Windows that can cause Apache to crash or return incorrect responses under heavy load.

More information about mod_python can be found at http://www.modpython.org.

Another adapter that provides the same type of benefit is the FastCGI adapter described above.

ModSnake Adapter

This adapter is similar to ModPython, but written for the mod_snake apache module. mod_snake is another method of embedding a python interpreter into apache. mod_snake may be preferable over mod_python if you need to work with Apache 2.0, or if you would like greater access to the internals of Apache than is available through mod_python.

A doc string is included at the top of ModSnakeAdapter.py which describes the configuration steps to use this adapter.

ModWebkit Adapter

This is a native apache module designed solely to communicate with WebKit. It is written in C, and has been tested on Linux and Windows.

This adapter is the fastest of the adapters available for WebKit.

The source code and a README file describing how to configure and build mod_webkit are located in the Webware/WebKit/Native/mod_webkit directory.

AOLserver Adapter

This adapter comes with the Python-enhanced AOLserver created by the PyWX project and is described there.

Renaming Adapters

Adapters such as WebKit.cgi and OneShot.cgi do not rely on their name. Consequently, when you deploy your web site, you can rename the adapter to something like serve.cgi. This allows you to switch adapters later without affecting the URLs and therefore the bookmarks of your users (provided you're still using some form of CGI adapter).

Adapter Problems

There is one gotcha in setting up the adapters that don't rely on CGI. For mod_webkit, ModPython and ModSnake, the name that you give to the adapter location in your Apache configuration file must not actually exist in your apache document root. Also, you may not have a file or directory in your document root with the same name as one of WebKit's contexts. So, you can't have a directory named Examples in your document root.

Configuration

In this section, we'll briefly touch on some of the configuration options related to installing and running WebKit. A full reference to these options can be found in the User's Guide under Configuration.

The settings referenced below are found in the configuration file, Configs/Application.config.

Sessions

WebKit provides a Session utility class for storing data on the server side that relates to an individual user's session with your site. The SessionStore setting determines where the data is stored and can currently be set to Dynamic or File.

Storing to the Dynamic session store is the fastest solution and is the default. This session storage method keeps the most recently used sessions in memory, and moves older sessions to disk periodocally. All sessions will be moved to disk when the server is stopped. This storage mechanism works with both the persistant, long running AppServers and OneShot. There are two settings in Application.config relating to this Session store. MaxDynamicMemorySessions specifies the maximum number of sessions that can be in memory at any one time. DynamicSessionTimeout specifies after what period of time sessions will be moved from memory to file. ( Note: this setting is unrealted to the SessionTimeout setting below. Sessions which are moved to disk by the Dynamic Session store are not deleted).

Storing to files is provided mainly in support of the OneShot adapter. It may also prove useful in the future in support of load balancing. In this scenario, each individual session is stored in its own file, loaded for every request and saved when delivering the corresponding response.

All on-disk session information is located in WebKit/Sessions.

Also, the SessionTimeout setting lets you set the number of minutes of inactivity before a user's session becomes invalid and is deleted. The default is 60. The Session Timeout value can also be changed dynamically on a per session basis.

Activity Log

Three options let you control:

See Configuration in the User's Guide for more information.

E-mail Errors

EmailErrors, ErrorEmailServer and ErrorEmailHeaders let you configure the app server so that uncaught exceptions land in your mailbox in real time. You should definitely set these options when deploying a web site.

See Configuration in the User's Guide for more information.

Contexts

WebKit divides the world into contexts, each of which is a directory with its own files and servlets. WebKit will only serve files out of its list of known contexts.

Some of the contexts you will find out of the box are Examples, Documentation and Admin. When viewing either an example or admin page, you will see a sidebar that links to all the contexts.

Another way to look at contexts is a means for "directory partitioning". If you have two distinct web applications (for example, PythonTutor and DayTrader), you will likely put each of these in their own context.

To add a new context, simply add a new one to the Contexts dictionary of Application.config. The key is the name of the context as it appears in the URL and the value is the path (absolute or relative to the WebKit directory). Often the name of the context and the name of the directory will be the same:

'DayTrader': '/All/Web/Apps/DayTrader',

The URL to access DayTrader would then be something like:

http://localhost/WebKit.cgi/DayTrader/

The special name default is reserved to specify what context is served when none is specified (as in http://localhost/WebKit.cgi/). Upon installation, this is the Examples context, which is convenient during development since it provides links to all the other contexts.

Note that a context can contain an __init__.py which will be executed when the context is loaded at app server startup. You can put any kind of initialization code you deem appropriate there.

AppServer Notes

ThreadedAppServer

WebKit uses a process called an AppServer to handle requests. The AppServer is responsible for receiving a request from the adapter, and then running it through the Application, and then sending the response back to the adapter. The default AppServer is called ThreadedAppServer. This AppServer uses a set number of threads to process requests. The number of threads used is configurable. As each request comes in, a thread from the thread pool will be handed that request and will be responsible for receiving the request from the adapter, running the request through the Application, and then sending the response back to the adapter.

Launching the AppServer at UNIX boot up

The script WebKit/webkit is a UNIX shell script launching WebKit at boot time through the standard "init" mechanisms.

To start WebKit at boot time, make the necessary variable modifications in the script, like so:

WEBKIT_DIR=/opt/Webware/WebKit
PID_FILE=$WEBKIT_DIR/appserverpid.txt
LOG=/var/log/webkit
PYTHONPATH=
PYTHONPATH_ADDONS=

Now put this script in /etc/rc.d/init.d and do this:

> cd /etc/rc.d/init.d
> chmod a+rx webkit
> chkconfig webkit reset

That takes care of getting it started for future boots, but this first time you need to jump start it:

> ./webkit start

Additional Notes

You can use chkconfig to put this in at the right runlevels. By default, it is set up properly for the default Apache install, which is to say that the AppServer starts up before Apache and shuts down after Apache.

By default, the script is in runlevels 2-5 at start priority 75, stop priority 25.

You will notice the variables PYTHONPATH and PYTHON_PATH which are both set to blank. The first prevents inheriting PYTHON_PATH from the environment (as security paranoia). You can comment this out if you prefer to inherit the Python path. The second allows you specify additional Python path components (regardless of whether or not you inherited the path).

Running ThreadedAppServer as a Windows NT/2000 Service

ThreadedAppServerService is a version of ThreadedAppServer that runs as a Windows NT Service. This means it can be started and stopped from the Control Panel or from the command line using "net start" and "net stop", and it can be configured in the Control Panel to auto-start when the machine boots. This is the preferred way to deploy WebKit on a Windows NT/2000 platform.

ThreadedAppServerService requires the Python win32all extensions to have been installed.

To see the options for installing, removing, starting, and stopping the service, just run ThreadedAppServerService.py with no arguments. Typical usage is to install the service to run under a particular user account and startup automatically on reboot with

python ThreadedAppServerService.py --username mydomain\myusername --password mypassword --startup auto install

Then, you can start the service from the Services applet in the Control Panel, where it will be listed as "WebKit Threaded Application Server". Or, from the command line, it can be started with either of the following commands:

net start WebKit
python ThreadedAppServerService.py start

The service can be stopped from the Control Panel or with:

net stop WebKit
python ThreadedAppServerService.py stop

And finally, to uninstall the service, stop it and then run:

python ThreadedAppServerService.py remove

Other Notes

See Stopping the App Server and Reloading Servlets above in Adapters.

Future

This section documents future items for this install guide. To learn about future developments for WebKit, see WebKit's Future document.

Credit

Chuck Esterbrook (primary author)
Geoff Talvola (IIS notes)
Jay Love (AppServer notes)