W3C Jigsaw

Jigsaw Resource Factory

The Jigsaw resource factory is a piece of software that runs behind the scene, and creates HTTPResource instances out of existing data. The factory currently knows about files and directories of the underlying file system, but you can extend it to handle more objects, at will.

This document describes when the factory is called, how it maps files or directories to resources, and provides a brief overview of the form-based configuration tool.

When is the factory invoked

Each running server has a resource factory attached to it (which it might share with other server, but this is not relevant here). Any resource can call its server factory in order to create a resource out of an existing object. Currently, the only resource that does so is the DirectoryResource, which is the one that exports existing directories.

When queried for an URL component, at lookup time, the directory resource first checks its children resource store for a matching resource, if such a resource is found, than it is returned as the target of the lookup, otherwise, if the directory is flaged as extensible, the directory resource derives a file name from the resource's identifier, and goes to the resource factory to obtain a wrapping resource instance. If such a resource is built successfully by the factory, the directory resource installs it as one of its children resources, and manages its persistency.

Let's walk through this algorithm with an example. Suppose there is a directory resource User which wraps an underlying file-system directory named User. This directory resource will usually be created empty (with no children resources). At some point, a client will ask for, say, User/Overview.html. The lookup process starts, and after some iterations comes to the point were it looks for Overview.html in the directory resource User. The directory resource looks into its children resources to find it, as none is found, it goes to the resource factory, and asks it to construct a resource for the file Overview.html. If a resource is returned (which depends on the factory configuration), the directory resource plugs the newly created resource into its resource store, and returns it as the target of the lookup.

One important remark here: as resources are persistent objects (they persist across Jigsaw invocations), resources that wrap existing objects are created only once in the whole lifetime of the server. This means that changing the factory configuration after a resource has been indexed, has no effect on the resources that have already been created. This is one of the features that makes the server fast: indexing an existing object into a resource might be a costly process (it will involve querying multiple databases, such as the extensions and directory templates database, etc.). Caching the result of this operation allows the server to concentrate on its real work, which is to serve data back to clients. You may still however, want to change the resource factory configuration, and re-index part of your information space with these new options. The DirectoryResourceEditor lets you reindex files when needed. If you want the whole site to be re-indexed, then one last resort is to stop the server, delete all the .jigidx files, and re-run it. This will make the server re-index the whole site as it runs.

How the factory creates resources

To index files and directories, the resource factory manages two databases, that are editable through a form based interface (see the factory configuration section). The first database, known as the extension database, record how files of a given extensions should be mapped to resources. The second database, known as the directory template database, records how directories are to be mapped to resources.

The extension database

When the factory is called to index a normal file, the first thing it does is to split the file name into its raw name, plus its set of extensions. So, for example, if the file to be indexed if foo.en.html.gz, the raw name will be foo, and the set of extensions will be {en, html, gz}.

It then take each extension description record, and look if it defines a resource class. In a typicall setting, only the html extension will have an associated resource class, which is likely to be the FileResource class. This gives the factory the class of the resource to build for the given file, so the factory carries on by creating an empty instance of this class. It then creates a set of default attribute values, first by defining the following pre-defined set of attributes:

Then for each of the file extensions, it looks into the associated database record, and fill in the remaining attributes. The html extension record, for example, might define the default value for content-type to text/html. The en extension record will probably define the content-language default value to en, and finally the gz extension record will probably state that the resource's content-encoding default value should be x-gzip. Once the set of default attribute values is constructed, the resource is initialized, and returned.

The directory templates database

When the factory is called to index a directory, it examines its directory templates database. This database allows the web admin to map directory names to specific sub-classes of resources. Directory templates can be generic: in which case they apply to all directory below the named one.

For each directory template, the web admin first specifies an appropriate resource class. A typicall setting, might specify, for example, that all directory named Putable should be exported by an instance of the PutableDirectory. Moreover, if the template is flaged as generic, then all directories below the Putable directory will also be exported as PutableDirectory instances.

The class attached to a directory template needs not be a sub-class of the DirectoryResource. You can specify, for example, that directories named CVS should be exported through a CvsDirectoryResource, which will provide you with a form-based interface to CVS.

Configuring the factory

Configuring Jigsaw factory consists of editing the extensions and directory templatesdatabases. This can be done entirely through forms. This section describes how this works, you might also want to check the configuration tutorial.

Jigsaw release comes with a sample root directory that includes an Admin directory. This directory, in turn, provides two resources that allows you to edit the factory configuration databases through forms. The first one, usually named extensions, will allow you to edit the extensions database.

Point your browser to your /Admin/extensions URL. This will show up the sorted list of currently defined extensions. To remove an extension record, mark it by clicking on the check box, and press the OK button: the extension record is deleted from the database. To edit a particular extension record, click on it. This will bring up a form, containing all the default attribute values for the extension. This form changes depending on the class that you have attached to the extension (extension with no class applies to all resources, hence, they allow you to edit the HTTPResource attribute values). You can change any of these values, which will provided as default attribute values for resources wrapping a file that matches this particular extension.

To define new extensions, click on the /Admin/extensions AddExtension link. This will popup a form querying you for the extension name, and the (optional) attached class. Let's say you want to define the extension ps for exporting application/postscript files. Type in the name of the extension (here ps), and attach it the w3c.jigsaw.resources.FileResource class, then click on the OK button. This will popup the attribute editor, state that the default value for the content-type is application/postscript, and press the OK button. You are done: all files having the ps extension will be exported through a FileResource whose default value for the content-type attribute will be application/postscript.

Now, let's create some directory templates. Point your browser to /Admin/DirectoryTemplates. This will display the sorted list of currently defined templates. To remove a directory template, just mark it (by clicking the check box), and press the OK button. To edit the attributes of a directory template, click on its name, this will display the set of attributes for the directory template itself. If you want the template to be generic, then turn its generic flag to true (it will then apply to directory having the given name, but also to all directories below it).

You will also see a link named ShadowAttributes. By following this link, you will be able to edit the default attribute values for the resource to be created when this template is used. For example, if your template is attached to the DirectoryResource class, this will allows you to edit the default attribute of this resource class.



Anselm Baird-Smith
$Id: indexer.html,v 1.7 1997/02/06 22:45:24 abaird Exp $