WebKit User's Guide

Version 0.7
Webware for Python 0.7

Table of Contents

Synopsis
Feedback
Introduction
      Python 9
      Overview
      Compared to CGI "apps"
Errors / Uncaught Exceptions
Configuration
      AppServer.config
      Application.config
Administration
Debugging
      print
      Raising Exceptions
      Restarting the Server
      Assertions
Naming Conventions
Actions
Plug-ins
How do I develop an app?
Known Bugs
Credit

Synopsis

WebKit provides Python classes for generating dynamic content from a web-based, server-side application. It is a significantly more powerful alternative to CGI scripts for application-oriented development.

Feedback

You can e-mail webware-discuss@lists.sourceforge.net to give feedback, discuss features and get help using WebKit.

Introduction

Python 9

A paper titled "Introduction to Webware" was accepted for Python 9, which runs the week of March 4th, 2001. Eventually, the conference proceedings make their way to the web. Unfortunately, at the time of this writing, we don't know what the URL will be, but if you poke around on that site or on python.org you can probably find them.

Overview

The core concepts of the WebKit are the Application, Servlet, Request, Response and Transaction, for which there are one or more Python classes.

The application resides on the server-side and manages incoming requests in order to deliver them to servlets which then produce responses that get sent back to the client. A transaction is a simple container object that holds references to all of these objects and is accessible to all of them.

Content is normally served in HTML or XML format over an HTTP connection. However, applications can provide other forms of content and the framework is designed to allow new classes for supporting protocols other than HTTP.

In order to connect the web server and the application server a small program called an adapter is used. It bundles a web browser request and sends it to the application server, which then processes it and sends the response back to the adapter which then outputs the results for use by the web server. Adapters come in various flavors including CGI, FastCGI and Apache mod. See the Install Guide for more information.

At a more detailed level, the process looks like this:

  1. At some point, someone has configured and run both a web server (such as Apache) and the WebKit app server (WebKit/AppServer).
  2. A user requests a web page by typing a URL or submitting a form.
  3. The user's browser sends the request to the remote web server.
  4. The web server invokes the adapter.
  5. The adapter simply collects information about the request and sends it to the WebKit app server which is ready and waiting.
  6. The app server asks the Application object to dispatch the raw request.
  7. The application instantiates an HTTPRequest object and asks the appropriate Servlet (as determined by examining the URL) to process it.
  8. The servlet generates content into a given HTTPResponse object, whose content is then sent back by the app server to the adapter.
  9. The adapter sends the content through the web server and ultimately to the user's web browser.

Compared to CGI "apps"

The alternative to a server-side application is a set of CGI scripts. However, a CGI script must always be launched from scratch and many common tasks will be performed repeatedly for each request. For example, loading libraries, opening database connections, reading configuration files, etc.

With the server-side application, the majority of these tasks can be done once at launch time and important results can be easily cached. This makes the application significantly more efficient.

Of course, CGIs can still be appropriate for "one shot" deals or simple applications. Webware includes a CGI Wrapper if you'd like to encapsulate your CGI scripts with robust error handling, e-mail notifications, etc.

Errors / Uncaught Exceptions

One of the conveniences provided by WebKit is the handling of uncaught exceptions. The response to an uncaught exception is:

  1. Log the time, error, script name and traceback to AppServer's console.
  2. Display a web page containing an apologetic message to the user.
  3. Save a technical web page with debugging information so that developers can look at it after-the-fact. These HTML-based error messages are stored one-per-file, if the SaveErrorMessages setting is true (the default). They are stored in the directory named by the ErrorMessagesDir (defaults to 'ErrorMsgs').
  4. Add an entry to the error log, found by default in Logs/Errors.csv.
  5. E-mail the error message if the EmailErrors setting is true, using the settings ErrorEmailServer and ErrorEmailHeaders. You'll need to configure these to active this feature.

Here is a sample error page.

Archived error messages can be browsed through the administration page.

Error handling behavior can be configured as described in Configuration.

Configuration

There are several configuration parameters through which you can alter how WebKit behaves. They are described below, including their default values. Note that you can override the defaults by placing config files in the Configs/ directory. A config file simply contains a Python dictionary containing the items you wish to override. For example:

{
      'SessionStore': 'Memory',
      'ShowDebugInfoOnErrors': 1
}

AppServer.config

PrintConfigAtStartUp   = 1
Does what it says. It's generally a good idea to leave this on.

Verbose   = 1
If true, then additional messages are printed while the AppServer runs, most notably information about each request such as size and response time.

Host   = 127.0.0.1
The host that the application server runs on. There is little reason to ever change this.

Port   = 8086
The port that the application server runs on. Change this if there is a conflict with another application on your server.

PlugIns   = []
Loads the plug-ins from the given locations when the application server starts up. This setting isn't as popular as PlugInDirs below since it requires an update for every new plug-in created.

PlugInDirs   = ['..']
A list of directories where WebKit plug-ins can be detected and loaded when the application server starts up. Relative paths are relative to the WebKit directory. Webware already ships with several plug-ins (hence the '..'). You can also create your own plug-ins.

A plug-in must have __init__.py and Properties.py files. You can disable a specific plug-in by placing a dontload file in it.

The advantage of a plug-in directory is that you can add new plug-ins to the app server simply by dropping them into the directory; no additional configuration is necessary.

ServerThreads   = 10
The maximum number of threads in the request handler thread pool, and therefore, the maximum number of concurrent requests that can be served. Unless you have a serious load on a high end machine, the default is generally sufficient.

CheckInterval   = 100
The number of virtual instructions after which Python will check for thread switches, signal handlers, etc. This is passed directly to sys.setcheckinterval(). Benchmarks have shown 100 to give a worthwhile performance boost with higher values resulting in little gain.

Application.config

AdminPassword   = 'webware'
The password that, combined with the admin id, allows access to the AppControl page of the Admin context. You should change this after installation so that outsiders cannot tamper with your app server.

DirectoryFile   = ['index', 'Main']
The list of basic filenames that WebKit searches for when serving up a directory. Note that the extensions are absent since WebKit autodetects extensions (for example, index.py, index.html, index.psp, etc.).

PrintConfigAtStartUp   = 1
Does what it says. It's generally a good idea to leave this on.

ExtensionsToIgnore   =
['.pyc', '.pyo', '.py~', '.bak']
This is a list of extensions that WebKit will ignore when autodetecting extensions. Note that this does not prevent WebKit from serving such a file if it is named explicitly in a URL.

ExtensionsToServe   = []
This is a list of extensions that WebKit will use exclusively when autodetecting extensions. Note that this does not prevent WebKit from serving such a file if it is named explicitly in a URL.

UseExtensionCascading   = 1
Should extension cascading be used when autodetecting extensions

ExtensionCascadeOrder   =
['.psp', '.py', '.html']
This is a list of extensions that WebKit will cascade through when autodetecting extensions. Note that this has no effect if the extension is named explicitly in a URL.

FilesToHide   =
['.*', '*~', '*bak', '*.tmpl', '*.pyc', '*.pyo', '*.config']
File patterns to protect from browsing. This affects all requests, not just requests with autodetected extensions.

FilesToServe   = []
File patterns to serve from exclusively. If the file being served for a particulary request does not match one of these patterns an HTTP403 Forbidden error will be return. This affects all requests, not just requests with autodetected extensions.

LogActivity   = 1
If true, then the execution of each servlet is logged with useful information such as time, duration and whether or not an error occurred.

ActivityLogFilename   = 'Logs/Activity.csv'
This is the name of the file that servlet executions are logged to. This setting has no effect if LogActivity is 0. The path can be relative to the WebKit location, or an absolute path.

ActivityLogColumns   =
['request.remoteAddress', 'request.method', 'request.uri', 'response.size', 'servlet.name', 'request.timeStamp', 'transaction.duration', 'transaction.errorOccurred']
Specifies the columns that will be stored in the activity log. Each column can refer to an object from the set: [application, transaction, request, response, servlet, session] and then refer to its attributes using "dot notation". The attributes can be methods or instance attributes and can be qualified arbitrarily deep.

Contexts   =
{
    'default': 'Examples',
    'Admin': 'Admin',
    'Examples': 'Examples',
    'Docs': 'Docs',
    'Testing': 'Testing',
}

This dictionary maps context names to the directory holding the context content. Since the default contexts all reside in WebKit, the paths are simple and relative. The context name appears as the first path component of a URL, otherwise Contexts['default'] is used when none is specified. When creating your own application, you will add a key such as 'MyApp' with a value such as '/home/apps/MyApp'. That directory will then contain content such as Main.py, SomeServlet.py, SomePage.psp, etc.

SessionStore   = 'Dynamic'
This setting determines which of the three session stores is used by the application: File, Dynamic or Memory. The file store always gets sessions from disk and puts them back when finished. Memory always keeps all sessions in memory, but will periodically back them up to disk. Dynamic is a good cross between the two, which pushes excessive or inactive sessions out to disk.

SessionTimeout   = 60
Determines the amount of time (expressed in minutes) that passes before a user's session will timeout. When a session times out, all data associated with that session is lost.

IgnoreInvalidSession   = 1
If false, then an error message will be returned to the user if the user's session has timed out or doesn't exist. If true, then servlets will be processed with no session data .

UseAutomaticPathSessions   = 0
If true, then the app server will include the session ID in the URL by inserting a component of the form _SID_=8098302983 into the URL, and will parse the URL to determine the session ID. This is useful for situations where you want to use sessions, but it has to work even if the users can't use cookies.

MaxDynamicMemorySessions   = 10000
The maximum number of dynamic memory sessions that will be retained in memory. When this number is exceeded, the least recently used, excess sessions will be pushed out to disk. This setting can be used to help control memory requirements, especially for busy sites. This is used only if the SessionStore is set to Dynamic.

DynamicSessionTimeout   = 15
The number of minutes of inactivity after which a session is pushed out to disk. This setting can be used to help control memory requirements, especially for busy sites. This is used only if the SessionStore is set to Dynamic.

SessionPrefix   = None
This setting can be used to prefix the session IDs with a string. Possible values are None (don't use a prefix), "hostname" (use the hostname as the prefix), or any other string (use that string as the prefix). Why would you want to use this? It can be used along with some mod_rewrite magic to do simple load balancing with session affinity.

ExtraPathInfo   = 0
When enabled, this setting allows a servlet to be followed by additional path components which are accessible via HTTPRequest's extraURLPath(). For subclassers of Page, this would be self.request().extraURLPath(). This may degrade performance when turned on.

CacheServletClasses   = 1
When set to zero, the app server will not cache the classes that are loaded for servlets. This is for development and debugging.

CacheServletInstances   = 1
When set to zero, the app server will not cache the instances that are created for servlets. This is for development and debugging.

ClearPSPCacheOnStart   = 1
When set to zero, the app server will allow PSP instances to persist from one AppServer run to the next. If you have PSPs that take a long time to compile, this can give a speedup.

ShowDebugInfoOnErrors   = 1
If true, then uncaught exceptions will not only display a message for the user, but debugging information for the developer as well. This includes the traceback, HTTP headers, form fields, environment and process ids. You will most likely want to turn this off when deploying the site for users.

IncludeFancyTraceback   = 0
If true, then display a fancy, detailed traceback at the end of the error page. This makes use of a modified version of cgitb.py which is included with Webware. The original version was written by Ka-Ping Yee and is available here.

FancyTracebackContext   = 5
The number of lines of source code context to show if IncludeFancyTraceback is turned on.

UserErrorMessage   =
'The site is having technical difficulties with this page. An error has been logged, and the problem will be fixed as soon as possible. Sorry!'
This is the error message that is displayed to the user when an uncaught exception escapes a servlet.

ErrorLogFilename   = 'Logs/Errors.csv'
The name of the file where exceptions are logged. Each entry contains the date & time, filename, pathname, exception name & data, and the HTML error message filename (assuming there is one).

SaveErrorMessages   = 1
If true, then errors (e.g., uncaught exceptions) will produce an HTML file with both the user message and debugging information. Developers/administrators can view these files after the fact, to see the details of what went wrong.

ErrorMessagesDir   = 'ErrorMsgs'
This is the name of the directory where HTML error messages get stored.

EmailErrors   = 0
If true, error messages are e-mailed out according to the ErrorEmailServer and ErrorEmailHeaders settings. This setting defaults to false because ErrorEmailServer and ErrorEmailHeaders must be configured first.

EmailErrorReportAsAttachment   = 0
1 to make html error reports be emailed as text with an html attachment, or 0 to make the html the body of the message.

ErrorEmailServer   = 'mail.-.com'
The SMTP server to use for sending e-mail error messages.

ErrorEmailHeaders   =
{
    'From': '-@-.com',
    'To': ['-@-.com'],
    'Reply-to': '-@-.com',
    'Content-type': 'text/html',
    'Subject': 'Error'
}

The e-mail MIME headers used for e-mailing error messages. Be sure to configure 'From', 'To' and 'Reply-to' before turning EmailErrors on.

UnknownFileTypes   =
{
    'ReuseServlets': 1,

    # Technique choices:
    # serveContent, redirectSansAdapter
    'Technique': 'serveContent',

    # If serving content:
    'CacheContent': 1, # set to 0 to reduce memory use
    'CheckDate': 1,
}

This setting controls the manner in which WebKit serves "unknown extensions" such as .html, .gif, .jpeg, etc. The default settings specify that the servlet matching the file be cached in memory, that the contents of the file be cached in memory and that the file timestamp to be checked on every request. This works well for most sites.

If your site has a large amount of static files being served via WebKit, you should consider changing 'CacheContent' to 0. If you are confident that your static files do not get updated while the app server is live, then you might consider changing 'CheckDate' to 0 for better performance.

The 'Technique' setting can be switch to 'redirectSansAdapter', but this is an experimental setting with some known problems.

RPCExceptionReturn   = 'traceback'
Determines how much detail an RPC servlet will return when an exception occurs on the server side. Can take the values, in order of increasing detail, 'occurred', 'exception' and 'traceback'. The first reports the string 'unhandled exception', which the second prints the actual exception and finally the third prints both the exception and accompanying traceback. All returns are always strings.

Administration

WebKit has a built-in administration page that you can access via the Admin context. You can see a list of all contexts in the sidebar of any Example or Admin page.

The admin pages allows you to view WebKit's configuration, logs, and servlet cache, and perform actions such as clearing the cache, reloading selected modules and shutting down the app server.

More sensitive pages that give control over the app server require a user name and password, which defaults to admin/webware. You can change the password in WebKit/Configs/Application.config and should do so as soon as possible.

The adminstration scripts provide further examples of writing pages with WebKit, so you may wish to examine their source in WebKit/Admin/.

Debugging

As with all software development, you will need to debug your web application. The most popular techniques are detailed below.

print

The most common technique is the infamous print statement. The results of print statements go to the console where the WebKit application server was started (not to the HTML page as would happen with CGI). Prefixing the debugging output with a special tag (such as >>) is useful because it stands out on the console and you can search for the tag in source code to remove the print statements after they are no longer useful. For example:

print '>> fields =', self.request().fields()

Note that if you are using OneShot.cgi, then you will need to set ShowConsole to 1 in WebKit/Configs/OneShotAdapter.config.

Raising Exceptions

Uncaught expections are trapped at the application level where a useful error page is saved with information such as the traceback, environment, fields, etc. You can configure the application to automatically e-mail you this information. Here is an example error page.

When an application isn't behaving correctly, raising an exception can be useful because of the additional information that comes with it. Exceptions can be coupled with messages, thereby turning them into more powerful versions of the print statement. For example:

raise Exception, 'self = %s' % self

Restarting the Server

When a servlet's source code changes, it is reloaded. However, ancestor classes of servlets are not. That is why web sites are often developed with the One Shot adapter and deployed with a more advanced, high performance adapter.

In any case, when having problems, consider restarting the app server.

Another option is to use the AppControl page of the Admin context to clear the servlet instance and class cache.

Assertions

Assertions are used to ensure that the internal conditions of the application are as expected. An assertion is equivalent to an if statement coupled with an exception. For example:

assert shoppingCart.total()>=0.0, 'shopping cart total is %0.2f' % shoppingCart.total()

Naming Conventions

Cookies and form values that are named with surrounding underscores (such as _sid_ and _action_) are reserved by WebKit for its own internal purposes. If you refrain from using surrounding underscores in your own names, then [a] you won't accidentally clobber an already existing internal name and [b] when new names are introduced by future versions of WebKit, they won't break your application.

Actions

Suppose you have a web page with a form and one or more buttons. Normally, when the form is submitted, a method such as Servlet's respondToPost() or Page's writeHTML(), will be invoked. However, you may find it more useful to bind the button to a specific method of your servlet such as new(), remove() etc. to implement the command, and reserve writeHTML() for displaying the page. Note that your "command methods" can then invoke writeHTML() after performing their task.

The action feature of Page let's you do this. The process goes like this:

1. Add buttons to your HTML form of type submit and name _action_. For example:

<input name=_action_ type=submit value=New>
<input name=_action_ type=submit value=Delete>

2. Add an actions() method to your class to state which actions are valid. This is security requirement is important. Without it, hackers could invoke any servlet method they wanted! For example:

def actions(self): return SuperClass.actions(self) + ['New', 'Delete']

3. Unfortunately, the HTML submit button does not separate its value from its title/label. If your button labels don't match your method names, you will need to implement methodNameForAction() to provide the mapping. You could simply use a dictionary to create the mapping, or if you know there is some relationship you could write the logic for it. For example,

def methodNameForAction(self, name): return string.lower(name)

4. Now you implement your action methods.

The ListBox example shows the use of actions.

Plug-ins

A plug-in is a software component that is loaded by WebKit in order to provide additional WebKit functionality without necessarily having to modify WebKit's source.

The most infamous plug-in is PSP (Python Server Pages) which ships with Webware.

Plug-ins often provide additional servlet factories, servlet subclasses, examples and documentation. Ultimately, it is the plug-in author's choice as to what to provide and in what manner.

Technically, plug-ins are Python packages that follow a few simple conventions in order to work with WebKit. More information can be found in PlugIn.py's doc strings. You can learn more about Python packages in the Python Tutorial, 6.4: "Packages".

How do I develop an app?

The answer to that question might not seem clear after being deluged with all the details. Here's a summary:

  1. Make sure you can run the WebKit AppServer. See the Install Guide for more information.
  2. Read the source to the examples (in WebKit/Examples), then modify one of them to get your toes wet.
  3. Create your own new example from scratch. Ninety-nine percent of the time you will be subclassing the Page class.
  4. Familiarize yourself with the class docs in order to take advantage of classes like Page, HTTPRequest, HTTPResponse and Session. Unfortunately, I couldn't get generated class docs working for this release, so you'll have to resort to breezing through the source code which is coupled with documentation strings. Read the examples first.
  5. With this additional knowledge, create more sophisticated pages.
  6. If you need to secure your pages using a login screen, you'll want to look at the SecurePage, LoginPage, and SecureCountVisits examples in WebKit/Examples. You'll need to modify them to suit your particular needs.
  7. Contribute enhancements and bug fixes back to the project.   :-)
  8. The Webware user community is quite supportive and friendly:
    http://lists.sourceforge.net/mailman/listinfo/webware-discuss
  9. Make sure you find out about new versions when they're released:
    http://lists.sourceforge.net/mailman/listinfo/webware-announce

Known Bugs

Known bugs and future work in general, is documented in Future.html.

Credit

Authors: Chuck Esterbrook, Jay Love, Geoff Talvola

Many people, mostly on the webware-discuss mailing list, have provided feedback and testing.

The design was inspired by both Apple's WebObjects and Java's Servlets.