IntroductionIntroduction
  InstallingInstalling
  HandlingHandling
  Virtual serversVirtual servers
  ModulesModules
  FilesystemsFilesystems
  RXML tagsRXML tags
  GraphicsGraphics
  ProxyProxy
  Miscellaneous modulesMiscellaneous modules
  Security considerationsSecurity considerations
  ScriptingScripting
  DatabasesDatabases
  LDAPLDAP
  SiteBuilderSiteBuilder
  Access ControlAccess Control
  IntraSeekIntraSeek
    <Directories>Directories<Directories>Directories
    <Configuring>Configuring<Configuring>Configuring
    <Creating new profile>Creating new profile<Creating new profile>Creating new profile
    <Indexing>Indexing<Indexing>Indexing
    <Languages>Languages<Languages>Languages
    <Logs>Logs<Logs>Logs
    <Advanced profile>Advanced profile<Advanced profile>Advanced profile
    <Technical document>Technical document<Technical document>Technical document
  LogViewLogView
  FrontPageFrontPage
  UpgradingUpgrading
  Third party extensionsThird party extensions
  PortabilityPortability
  Reporting bugsReporting bugs
  AppendixAppendix
 
Creating new profile

Profiles defines how and which web pages and servers are to be indexed by the crawler. To create a new profile select New profile wizard and follow the online instructions.

Below, the basic configuration variables for a profile is described, while the more advanced variables are described later, on the Advanced profile configuration page.

Profile id
A unique identification for the profile. It should be a short identifying text, and must not contain any spaces. For example, the id could be: my_profile.

Profile name
The contents of the profile name will be seen on the selection tab on the search page shown to the outside world. For example, this could be My test search.

Activated
Should be left at yes for now.

Storage directory
Is a search path in your file system. Ends with a "/". Intraseek has automatically created a special directory for storage of the databases, but you can change this to any path in the file-system.

Working directory
Is a search path in your file system. Ends with a "/". This is where data from the crawlers' data gatherings will be stored. Due to nature of the workings of the data base, it is advantageous for this to be situated on a fast disk, This will increase the speed of the process by several hundred per cent.

Startpages
Where you specify a set of pages for the crawler to start at. It is usually sufficient to state the URL of the main page of the site you are about to index, since an IntraSeek crawler will follow all links it finds. Separate the various URLs by putting them on separate lines. For example: http://my.server.com/~sysadm/

Accept pattern
Specifies which pages are to be accepted by the crawler. There are some very important things to consider here:

    1. Always limit the crawler to stay within your site. If you don't, it will, without any warning, crawl out on the worldwide web.

    2. Since the accept and avoid patterns really are regexps, they should read ^http://www.foo.com/* instead of www.foo.com/* if you want to make sure not to index http://gazonk.www.foo.com/.

    3. Separate the various accept patterns by putting them on separate lines. For example, this could be my.server.com/~webmaster/*.

Avoid pattern
Specifies what sort of pages the crawler will avoid. Already specified are file types that contain information the crawler can't index. If inappropriate, these may be removed in order to have the crawler index these file types.

For example, if you specify */~webmaster/non-public/ here, the crawler will avoid ~webmaster/non-public/ on all servers. If you specify *my.server.com/~root/*, /~root/ will not be indexed on the server my.server.com.

Remember to check arguments to CGI scripts and the like. For instance, directory listings can sometimes enter infinite loops. If any such are present, it is recommended that *?* be added here.

Check up on the crawler while it is running, by checking its log file, so that it doesn't go into a loop, run amok, etc.

Finally, on the last page of the New profile wizard pages, press OK to save the new profile. Technical notes: all profiles are saved in the text file ENGINE_HOME/profiles.txt. If no id is specified, a new unique id will be generated.