The Store Layer

This layer implements the resource store with all of the handlers. I describe the modules in roughly logical order.

The Store Module

This is the front door to the resource store. It exports one function for the HTTP protocol module to deliver a request.

and deliver req =
let
    val Req.Request  {url, ...}  = req
    val URL.HTTP_URL {path, ...} = url

    val msg = Node.HTTPRequest {
                request = req,
                segs    = Cfg.URLPathToList path
                }
in
    Node.send (get_root()) msg
end

The request is packaged along with an abstraction of the list of segments in the path of the URL. This segment list is used for routing the message through the store. Then the message is sent on to the root node.

A node is implemented by a backbone thread as described in the section called Resource Store Nodes in Chapter 8. Here is the type definition.

datatype Node = Node of {
        name:       string,
        in_mbox:    NodeMsg Mailbox.mbox
        }

The input to the node is a mailbox which has unlimited buffering. This prevents a slow resource node from congesting the tree above it. I am relying on the server's connection limit to prevent these mailboxes from filling with a huge number of messages if a resource becomes congested. Each node also has a name which is the same as the last segment of the path that leads to the node. Message routing is done by comparing the segments in the message with the names of the nodes.

The root node is stored as a singleton object. The resource tree is built on demand by the build_node_tree function when the first request arrives.

and build_node_tree() : Node.Node =
let
    fun add_paths (c as Cfg.NodeConfig {path, ...}) = (path, c)

    val all_configs = Cfg.getNodeConfigs()
    val with_paths  = map add_paths all_configs
    val root        = build_level [] (Node.initOptions false) with_paths
in
    root
end

The build_node_tree function is just a wrapper around the build_level function which recurses to build all of the levels of the tree.

and build_level path options config_pairs : Node.Node =
let
    val () = Log.testInform Globals.TestStoreBuild Log.Debug
                (fn() => TF.L ["Installing resource ",
                               Cfg.listToString path]);

    type Pair = (string list * Cfg.NodeConfig)

    val table: Pair list STRT.hash_table = STRT.mkTable(101, NotFound)

    fun add (remainder, config) =
    (
        case remainder of
          [] => ()

        | (h::t) => 
        (
            case STRT.find table h of
              NONE     => STRT.insert table (h, [(t, config)])
            | SOME lst => STRT.insert table (h, (t, config)::lst)
        )
    )

    (*  If there is no config for this node then we fake a node
        that always rejects an attempt to access it.

        There may not be a config if this level is an intermediate segment
        of a config path.  E.g. node /cgi/env {...}
    *)
    val self_config =
        case Cfg.findNodeConfig path of
          NONE   => U.builtinConfig path "reject"
        | SOME c => c


    (*  Compute the option flags for this node.
    *)
    val self_options =
        let
            val Cfg.NodeConfig {options = formula, ...} = self_config
        in
            Node.performFormula options formula
        end


    fun build (name, pairs) =
    let
        val prefix = path @ [name]
    in
        build_level prefix self_options pairs
    end

    val () = app add config_pairs;
    val items: (string * Pair list) list = STRT.listItemsi table
    val children = map build items

    val the_node =
            NodeFactory.create {
                config    = self_config,
                children  = children,
                options   = self_options
                }
in
    getOpt(the_node, make_rejecter path)
end

The path argument is the path down to the node being built. The options argument is a set of node options derived from the options in the server configuration file. The inheritance of options between nodes is performed during this building process.

The config_pairs argument is a list of node configurations for all nodes that will be in the sub-tree below the node being built. The path from each configuration has been separated out into a pair with the configuration. The path is in the form of a list of strings and is relative to the node being built. So for example for a node with path /a/b/c at the initial call to build_level, the pair will be (["a", "b", "c"], config).

The algorithm is to sort the node configurations according to the first part of their paths and group them according to this part. For example for the paths /a/b/c, /a/d and /e I want these groups:

"a"     ->      [ (["b", "c"], config1), (["d"], config2) ]
"e"     ->      [ ([], config3) ]

This tells me that there will be two child nodes named "a" and "e". The "a" node will in turn have children with sub-paths of b/c and d. The "e" node will have no children. I've implemented the grouping using a hash table. It maps from the leading part to a list of those pairs that share the path. The add function inserts each configuration pair into the table. The listItemsi function extracts all of the entries from the table in the form of pairs of key and value where the key is the leading part and the value is the list of pairs. These are the groups. For each group I build a child node by recursing into build_level.

Once I have the child nodes I can build the node in question by calling the node factory (see the section called The Node Factory). The configuration for the node, in self_config, is looked up from the path being built. This checks that the path corresponds to a real node. For example if the server's configuration only contains the paths /a and /a/b/c then there is no node that corresponds to the path /a/b and it won't have a configuration. In this case a dummy node is built for the path /a/b that just rejects all requests to it. The builtinConfig function creates a dummy configuration for the rejecting node.

and make_rejecter path =
let
    val node =
        NodeFactory.create {
            config    = U.builtinConfig path "reject",
            children  = [],
            options   = Node.initOptions false
            }
in
    case node of
      NONE => (     (* something wrong if we can't do this *)
        Log.fatal ["Cannot create a rejecter node."];
        Common.fail()
        )

    | SOME n => n
end

If the factory fails to create the node and returns NONE then I try again to make a node with the dummy rejecting configuration. If this fails then the server is not able to make any kind of node and it gives up with a fatal error.

The Node Factory

The NodeFactory module encapsulates the creating of different kinds of resource nodes. The factory function is described by the following type from the Node module.

type NodeCreator = {
        config:     Config.NodeConfig,  (* URL path that reaches this node. *)
        children:   Node list,          (* child nodes *)
        options:    Options
        } ->
        Node option

It creates a node given its configuration and its children (so the tree of nodes is built from the bottom up). The options are flags that are derived from the node configuration (see the section called The Node Parameters in Chapter 8). The caller is responsible for performing the inheritance of options from parent nodes, for example see the store building in the section called The Store Module).

datatype Options = Options of {
        exec_cgi:       bool,
        follow_sym:     bool,
        with_subdirs:   bool
        }

It is in the NodeFactory module that the different kinds of nodes are assembled.

structure DirNode = GenericNodeFn(
                structure Handler = DirNodeHandler)

structure CgiNode = GenericNodeFn(
                structure Handler = CgiNodeHandler)

structure SimpleBuiltinNode = GenericNodeFn(
                structure Handler = SimpleBuiltinHandler)

Each kind is assembled from the generic node functor, which implements the backbone thread, and the handler which implements the handler thread. This allows the backbone thread to be specialised to the needs of the handler (see the section called The Generic Node).

Here is the create function of the factory.

fun create {config, children, options} =
let
    val Cfg.NodeConfig {path, kind, ...} = config

    val () = Log.testInform Globals.TestStoreBuild Log.Debug
            (fn() => TF.L ["Creating node ", Cfg.listToString path]);

    (*  This name is used for locating children from path segments.
        See GenericNodeFn.forward_child.
    *)
    val node_name = if null path then "/" else List.last path
in
    case kind of
      Cfg.NodeIsDir {path} =>
        DirNode.create {
            name     = node_name,
            arg      = path,
            config   = config,
            options  = options,
            factory  = create,
            children = children
            }

    | Cfg.NodeIsBuiltin {name} =>
        SimpleBuiltinNode.create {
            name     = node_name,
            arg      = name,
            config   = config,
            options  = options,
            factory  = create,
            children = children
            }

    | Cfg.NodeIsScript {path} =>
        CgiNode.create {
            name     = node_name,
            arg      = path,
            config   = config,
            options  = options,
            factory  = create,
            children = children
            }
end

What is note-worthy here is that the factory is itself passed down to the create functions for each kind of node. This gets around a problem of circular dependencies between modules. The directory node handler needs to be able to create nodes on demand for the sub-directories that it encounters. So it wants to call the factory. But the factory must be able to call the create function for the directory node. My solution to this is to pass the factory's create function down to the directory node handler so that it can call it back (call up).

The Generic Node

The GenericNodeFn generic node functor exports a create function to create a store node. Each kind of node takes its own type of extra arguments when creating a node. For example the CGI node needs the path to the CGI script. This means that the type of the create function varies with the kind of handler. This tells me to use a functor and specialise it with the handler.

functor GenericNodeFn(
    structure Handler: NODE_HANDLER
    ): GENERIC_NODE =
struct

Here is the signature that a handler module must export. The comments say it all.

signature NODE_HANDLER =
sig

    (*  A value of this type is passed to the create function for the
        handler.
    *)
    type CreateArg

    (*  Create the thread for the handler. Optionally a new
        node configuration can be returned to update the original
        configuration.  All security-related initialisation must be
        done in here so that the master node will be blocked until
        it is ready.

        The caller should be prepared to handle exceptions from here if
        the creation fails.
    *)
    val init:   CreateArg -> Node.HndMbox * (Config.SwerveConfig option)


    (*  This tests if the handler will take the last segment
        of the URL path. For example a directory node wants
        the last segment as a file name.
    *)
    val canTakeLast:    Config.NodeConfig -> bool


    (*  This tests if the handler will take all of the
        rest of the URL path if there are no child nodes.
    *)
    val canTakeRest:    Config.NodeConfig -> bool

end

Communication between the backbone and handler threads is done through a mailbox, which has unlimited buffering of messages. This ensures that a slow handler can't cause congestion by having messages back-up into the tree of backbone threads. The mailbox is created by the handler thread and returned to the backbone thread from the init function. Here are the types for a message to a handler and the reply. The protocol is described in more detail below.

datatype HndMsg = HndReq of {
            factory:    NodeCreator,
            config:     Config.NodeConfig,
            options:    Options,
            request:    HTTPMsg.Request,
            segs:       string list,    (* remaining path segments *)
            rchan:      HndReply CML.chan
            }
and HndReply =
        HndResponse of HndMsg * HTTPMsg.Response 
    |   HndSprout   of HndMsg * Node 

Directory nodes can have extra configuration parameters supplied in a .swerve file in the directory (see the section called The Node Parameters in Chapter 8). Typically these are authorisation parameters. If the init function for a directory node returns a SwerveConfig record then the generic code will incorporate it into its node information.

Here is the generic create function. (The "H" structure is an abbreviation for the Handler argument to the functor).

fun create {name, arg, config, factory, options, children} =
let
    val Cfg.NodeConfig {auth, ...} = config

    val in_mbox = M.mailbox()
    val node    = Node {name = name, in_mbox = in_mbox}

    val (h_mbox, h_config_opt) = H.init arg

    (*  Update the options and authorisation from the node
        handler.
    *)
    val (final_auth, final_options) =
        case h_config_opt of
          NONE     => (auth, options)
        | SOME cfg => merge_config cfg auth options

    val impl = NodeImpl {
                    name     = name,
                    in_mbox  = in_mbox,
                    hnd_mbox = h_mbox,
                    hnd_reply= CML.channel(),
                    config   = config,
                    auth     = final_auth,
                    options  = final_options,
                    factory  = factory
                    }

    val gstate = GenState {
                    children = children
                    }
in
    CML.spawn (node_server impl gstate);
    SOME node
end
handle x => (Log.logExn x; NONE)

The function produces three results. There is static data about a node in the NodeImpl record.

datatype NodeImpl = NodeImpl of {
    name:           string,
    in_mbox:        NodeMsg M.mbox,
    hnd_mbox:       HndMbox,            (* msgs to the handler *)
    hnd_reply:      HndReply CML.chan,  (* replies from the handler *)
    config:         Cfg.NodeConfig,     (* the original *)
    auth:           Cfg.NodeAuth,
    options:        Node.Options,
    factory:        Node.NodeCreator
    }

The options and auth fields may differ from the original configuration if the contents of a .swerve file was merged in.

The second result is the dynamic state for a node. At the moment this only contains the list of children for a node. Since directory nodes create sub-directory nodes on demand the list of children may change.

and GenState = GenState of {
    children:       Node list
    }

The third result is the Node value itself which is the input interface to the node. The static and dynamic node data are retained by the backbone thread (running the node_server function). The Node value is returned to the caller.

The server for the backbone thread is a simple dispatcher. It received messages from the parent node through its input mailbox and also receives replies from the handler thread.

and node_server impl gstate () =
let
    val NodeImpl {in_mbox, hnd_reply, ...} = impl

    fun loop (state: GenState) =
    let
        val new_state =
            CML.select[
                CML.wrap(M.recvEvt in_mbox,
                    MyProfile.timeIt "GenNode request"
                        (handle_request impl state)),

                CML.wrap(CML.recvEvt hnd_reply, handler_reply impl state)
                ]
    in
        loop new_state
    end
in
    loop gstate
end

Here is the body of the function for handling a request from the parent node.

and handle_request impl gstate in_msg : GenState =
let
    val HTTPRequest {request, segs} = in_msg

    val NodeImpl {name, config, factory, options,
                    hnd_mbox, hnd_reply, auth, ...} = impl

... omitted material ...
in
    NodeAuth.checkAuth auth request;    (* raises Respond on error *)

    case segs of
      []          => handle_it()
    | (key::rest) => forward_child key rest;

    gstate
end
handle
  Respond resp =>
    let
        val HTTPRequest {request, ...} = in_msg
    in
        U.return request resp;
        gstate
    end

| x =>
    let
        val HTTPRequest {request, ...} = in_msg
    in
        Log.logExn x;
        U.return request (U.mkServerFail());
        gstate
    end

The authorisation is always checked first. If this fails then a Node.Respond exception will be raised. This contains a HTTP response describing the failure. The exception handler passes the response back to the HTTP protocol handler using the ResponseUtils.return utility function.

If the request is authorised then it is time to see if it is destined for this node or a child node. If the list of remaining segments is empty then the request has reached its target node. The handle_it function passes the request to the handler thread.

fun handle_it() =
let
    val () = Log.testInform G.TestStoreProto Log.Debug
                (fn() => TF.L ["Node ", name, " handles it"])

    val msg = HndReq {
                factory     = factory,
                config      = config,
                options     = options,
                request     = request,
                segs        = segs,
                rchan       = hnd_reply
                }
in
    M.send(hnd_mbox, msg)
end

If the segment list is not empty then it may be destined for a child node. This is dealt with in the forward_child function.

and forward_child key rest =
let
    val () = Log.testInform G.TestStoreProto Log.Debug
        (fn() => TF.L [
            "Forwarding to child for key ", key,
            ", rest=", Cfg.listToString rest])

    val GenState {children, ...}  = gstate

    fun match (Node {name, ...}) = (name = key)
in
    case List.find match children of
      NONE       => no_child key rest
    | SOME child => pass_to child request rest 
end



and no_child key rest =
let
    val () = Log.testInform G.TestStoreProto Log.Debug
        (fn() => TF.L [
            "No child for key ", key,
            ", rest=", Cfg.listToString rest])
in
    if H.canTakeRest config orelse
        (null rest andalso H.canTakeLast config)
    then
        handle_it()
    else
        U.return request (U.mkNotFound())
end

If the head of the list matches the name of a child node then the request will be routed down to the child node along with the tail of the segment list.

If the request doesn't match a child node then it could be that the handler wants to trap it anyway. For example if the URL path is "/a/b" and "/a" is a directory then "b" may be a file in the directory. I don't create distinct resource nodes for each file. Instead the directory node handles all of the files in the directory. Directory handlers trap file names by returning true from the canTakeLast function. Other kinds of nodes may want to perform their own intepretation of the trailing part of a path, in which case they will return true from the canTakeRest function. The directory node also does this so that it can follow sub-directories. If neither a child node nor the handler wants the message then a "404 Not Found" response is generated.

Note carefully that an existing child node takes precedence when routing. A URL path will only be passed to a handler if there is no child node that could take it. The directory node handler in the section called The Directory Node Handler expects this. It may decide to create a new child node to implement a sub-directory. Subsequent requests for this sub-directory must be routed through to the child node.

Replies from the handler thread are dealt with in the handler_reply function.

and handler_reply impl gstate reply : GenState =
let
    val NodeImpl {name, config, factory, options, ...} = impl
in
    case reply of
      HndResponse (h_req, resp) =>
        let
            val HndReq {request, ...} = h_req
        in
            U.return request resp;
            gstate
        end

    | HndSprout (h_req, child) =>
        let
            val HndReq {request, segs, ...} = h_req

            val GenState {children} = gstate

            val new_gstate = GenState {
                    children = child::children
                    }

            val rest = if null segs then [] else tl segs
        in
            pass_to child request rest;
            new_gstate
        end

end

A reply from a handler could be either a response to a HTTP request or a request to sprout a new child node. HTTP responses are shipped off immediately to the HTTP protocol handler.

The directory node handler creates new child nodes for sub-directories on demand. These have to be added to the resource node tree. The HndSprout reply from the handler tells the backbone thread to add the child to the list of children for the node. The original HTTP request is then passed along to the new child node. Passing a HTTP request to a child node is just a matter of re-packaging.

and pass_to node req rest : unit =
let
    val new_msg = HTTPRequest {
                    request = req,
                    segs    = rest
                    }
in
    Node.send node new_msg
end

The Directory Node Handler

This implements resources that map to regular files in directories. If the WithSubDirs option is enabled it effectively mounts a directory tree onto the URL resource tree.

The story starts with some types.

datatype State = State of {
        dir:    string      (* the directory path *)
        }
    
type CreateArg  = string    (* the directory path *)


(*  We can take the rest in order to try to create a chain of 
    child directories.
*)
fun canTakeRest _ = true
fun canTakeLast _ = true

I've included the directory's disk path into a state type although it is really static. It avoids having a separate type and value being passed through the rest of the code. The CreateArg type is required by the module interface. It is the type for the disk path argument to the create function used by the node factory (in the section called The Node Factory). The canTakeRest and canTakeLast functions always return true to catch all URL paths that reach the node as explained in the section called The Generic Node.

The handler is initialised with the init function.

fun init dir_path =
let
    val file = Files.appendFile dir_path ".swerve"

    val opt_config = 
        if Files.exists file
        then
            Cfg.processNodeFile file
        else
            NONE

    val state = State {dir = dir_path}
    val mbox  = M.mailbox()
in
    CML.spawn (server mbox state);
    (mbox, opt_config)
end

This reads the .swerve file in the directory, if it exists. Then it starts the handler thread. The .swerve configuration and a mailbox for the thread are returned to the backbone thread. The thread runs a trivial server function that dispatches incoming messages to the handle_request function. Here is the body of the function.

and handle_request 
        (msg as HndReq {factory, config, options, request, segs, rchan})
        state =
let
    val Cfg.NodeConfig {path = node_path, ...} = config
    val Req.Request  {url, abort, ...} = request
    val URL.HTTP_URL {path = url_path, ...} = url

... omitted material ...
    (*  If we are not at the end of a path then we can only
        try to sprout a child directory.
    *)
    and do_segs []           = index_dir()
    |   do_segs [file]       = do_file false file
    |   do_segs (file::rest) = do_file true file

in
    do_segs segs;
    state
end

The segs argument in the message is the list of trailing segments for the URL path. For example if the URL path is "/a/b/c" and the node implements the path "/a" then the segment list will be ["b", "c"]. So if the segment list is empty I have the case of a URL path that leads to the directory with no file name. An index of the directory will be generated by the index_dir function. The index could be either the index.html file if it exists or a listing of the files in the directory. Indexing is described below.

If the segment list contains one element then it is probably the name of a file in the directory although it could be the name of a sub-directory. If there is more than one element then the first must definitely be the name of a directory. The do_file function handles these cases.

fun do_file dir_only file =
let
    val State {dir, ...} = state
    val Options {follow_sym, with_subdirs, ...} = options

    val file_path = Files.appendFile dir file

    val () = Log.testInform Globals.TestStoreProto Log.Debug
        (fn() => TF.L ["Looking at file ", file_path]);
in
    if not follow_sym andalso Files.isSym file_path
    then
        bad()
    else
    if not dir_only andalso Files.isReg file_path
    then
    (
        if Files.readableReg file_path
        then
            reply_response(send_file file_path request)
        else
            reply_response(U.mkForbidden())
    )
    else
    if Files.isDir file_path
    then
    (
        if with_subdirs andalso Files.accessibleDir file_path
        then
            let
                val new_node_path = node_path @ [file]
            in
                case sprout_child factory new_node_path file_path of
                  NONE       => reply_response(U.mkServerFail())
                | SOME child => reply_sprout child
            end
        else
            reply_response(U.mkForbidden())
    )
    else
        bad()
end

Whether the file argument is a regular file or a directory, its path on disk is built from the disk path of the node's directory. Then we check what kind of file it really is. Symbolic links are followed by default when opening files so, if they are not to be, I must filter them out first. If the path is a regular file and this is reasonable, because there are no more URL segments after the file name and the server has permission to read the file, then I can send it to the client.

If the file is a directory and the WithSubDirs configuration option has been specified then the sub-directory must be automatically made into a new node in the resource store. The new node is created by the sprout_child function and then sent back to the backbone thread to be inserted into the resource tree. Then the backbone thread will route the original request to the new node and the story starts all over again.

and sprout_child factory node_path dir_path : Node.Node option =
let
    val options = Options {
        exec_cgi        = false,
        follow_sym      = false,
        with_subdirs    = true
        }

    val child_config = Cfg.NodeConfig {
            path    = node_path,
            kind    = Cfg.NodeIsDir {path = dir_path},
            options = [],           (* passed directly to the factory *)
            auth    = Cfg.NodeNoAuth
            }
in
    factory {
        config    = child_config,
        children  = [],
        options   = options
        }
end

The sprout_child function is a wrapper around a call to the node factory. The sub-directory doesn't have an entry in the server configuration otherwise the node would already exist in the tree and there would be no sprouting. So I have to synthesise a configuration that describes a directory node. At the moment I am not doing any inheritance of options in the synthetic configuration except that the WithSubDirs option must be true all the way down to get the entire tree under the sub-directory. Although I've specified NodeNoAuth for the authorisation, if the sub-directory has a .swerve file that specifies some authorisation control it will be merged in when the new node is created (see the create function in the section called The Generic Node).

Here is the send_file function that returns a regular file to the client.

and send_file file_path req : Req.Response =
let
    val () = Log.testInform Globals.TestStoreProto Log.Debug
        (fn() => TF.L ["dir_node sends file ", file_path]);

    fun file_response() =
    let
        val entity = Entity.Entity {
                    info = Entity.emptyInfo,
                    body = Entity.fileProducer file_path
                    }
    in
        Req.Response {
            status  = Status.OK,
            headers = [],
            entity  = entity
            }
    end

in
    if Files.readableReg file_path
    then
        file_response()
    else
        U.mkForbidden()
end

All it has to do is create an entity that represents the file on disk and wrap it into a HTTP response record. The Entity.fileProducer function makes a producer that can deliver from disk. The producer will fill in the file length and last modification date so I don't have to do it here.

The indexing of directories is controlled by the index_dir function mentioned above.

and index_dir() =
let
    val State {dir, ...} = state
    val Options {follow_sym, ...} = options
    val Cfg.ServerConfig {dir_index, ...} = Cfg.getServerConfig()

    val file_path = Files.appendFile dir dir_index

    val () = Log.testInform Globals.TestStoreProto Log.Debug
        (fn() => TF.L ["Indexing directory ", file_path]);
in
    if not follow_sym andalso Files.isSym file_path
    then
        bad()
    else
    if Files.isReg file_path
    then
        reply_response(send_file file_path request)
    else
        reply_response(fancy_index abort url dir)
end

The function looks to see if there is an index.html file in the directory. (The name index.html actually comes from the server configuration). If the file exists and is readable then it is returned. If the file does not exist then the contents of the directory is listed and formatted as HTML and returned (again if accessible). The result is similar to Netscape's directory indexing. This is done by the fancy_index function.

and fancy_index abort url dir : Req.Response =
let
    val URL.HTTP_URL {host, port, userinfo, path = url_path, ...} = url
    val URL.URLPath  {segs, absolute} = url_path


    fun build entries =
    let
        val text = TF.C [header(), translate entries, trailer()]
    in
        U.mkHTML Status.OK text
    end

... omitted material ...
in
    Log.testInform Globals.TestStoreProto Log.Debug (fn() => TF.L [
                "dir_node accessibleDir of ", dir, 
                " is ", Bool.toString(Files.accessibleDir dir)
                ]);

    if Files.accessibleDir dir
    then
    (
        (build(FileIO.listDir abort dir))
            handle _ => U.mkServerFail()
    )
    else
        U.mkForbidden()
end

The listing of the directory requires reading from a file descriptor so it must go through the Open File Manager and may be aborted by a time-out. The building of the HTML is a messy bit of text formatting using the TextFrag module. The code assumes that there is a "/icons" URL path in the server to fetch icons from. I'll omit the gory details.

The CGI Node Handler

This module handles requests that run CGI scripts. It conforms fairly closely to the CGI version 1.1 specification. The differences are

The CGI interface has been tested with some simple Perl scripts using the CGI.pm module.

The initialisation of the node is similar to that of the directory node in the section called The Directory Node Handler so I won't repeat it here. Instead the story starts with the handle_request function.

and handle_request 
        (msg as HndReq {config, rchan, request, segs, ...})
        script =
let
    val env  = build_environ config request (length segs)
    val resp = run_script script env request
in
    CML.send(rchan, HndResponse(msg, resp))
end

This is simple enough: build the set of environment variables; run the script and send back the response. The build_environ function is large and I'll describe it in pieces.

and build_environ config request num_left =
let
    val Cfg.NodeConfig {auth, ...} = config

    val Req.Request {url, headers, method, protocol, client, ...}
                                                      = request
    val URL.HTTP_URL {path, query, fragment, ...} = url
    val URL.URLPath {segs, ...} = path

    val script_path = URL.URLPath {
                        segs = List.take(segs, length segs - num_left),
                        absolute = false
                        }

    val trail_path = URL.URLPath {
                        segs = List.drop(segs, length segs - num_left),
                        absolute = false
                        }

    (*  Copy across approved variables.
    *)
    fun copy n =
    (
        case OS.Process.getEnv n of
          NONE   => NONE
        | SOME v => SOME(concat[n, "=", v])
    )

    val copied = List.mapPartial copy
                    ["PATH", "HOSTNAME", "LANG", "LOGNAME", 
                     "HOME", "LD_LIBRARY_PATH", "SHELL"]

This first section unpacks the arguments and copies variables out of the server's environment. Only those variables that are likely to be useful to a script and that are reasonably safe are copied. The mapPartial function suppresses variables that aren't set in the server's environment. The result is a list of strings of the form "name=value".

This next section adds in the unconditional CGI variables. (The ^ is the infix string concatenation operator).

val Cfg.ServerConfig {server_name, listen_port, ...} =
        Cfg.getServerConfig()

val basics = [
        "SERVER_NAME="     ^ server_name,
        "SERVER_PORT="     ^ (Int.toString listen_port),
        "SERVER_SOFTWARE=" ^ Globals.cgi_version,
        "REQUEST_METHOD="  ^ (Req.methodToString method),
        "SERVER_PROTOCOL=" ^ protocol,
        "GATEWAY_INTERFACE=CGI/1.1",
        "PATH_INFO="       ^ (URL.pathToString trail_path),
        "SCRIPT_NAME="     ^ (URL.pathToString script_path),

        (*  We don't set REMOTE_HOST, the script can find
            it if it wants.
        *)
        "REMOTE_ADDR="     ^ (NetHostDB.toString client)
        ]

The next section builds the optional variables. Each value is a list containing a single "name=value" string. The list is empty if the variable is not being set. The variables can then be easily merged by concatenating the lists.

val auth_env : string list =
        case auth of
          Cfg.NodeNoAuth  => []
        | Cfg.NodeBasic _ => ["AUTH_TYPE=Basic"]

val user_env : string list =
        case Hdr.getAuth headers of
          NONE => []
        | SOME (Hdr.AuthBasic (opt_id, pwd)) =>
        (
            case opt_id of
              NONE => []
            | SOME id => ["REMOTE_USER=" ^ id]
        )

val ctype_env : string list =
        case Hdr.getContentType headers of
          NONE       => []
        | SOME mtype => ["CONTENT_TYPE=" ^
                (TF.toString TF.UseLf (E.formatType mtype))]

val clen_env : string list =
        case Hdr.getContentLength headers of
          NONE     => []
        | SOME len => ["CONTENT_LENGTH=" ^ (Int.toString len)]

val query_env : string list =
        case query of
          NONE   => []
        | SOME q => ["QUERY_STRING=" ^ q]

Next all request headers that haven't been covered must be translated to CGI variables. The translation converts a header name such as "User-Agent" to the variable name HTTP_USER_AGENT. The header has to be reconstituted as a string to get the name. Finally the headers are joined together to build the complete list.

fun hdr_copy (Hdr.HdrAuthorization _) = NONE
|   hdr_copy (Hdr.HdrConLen _)        = NONE
|   hdr_copy (Hdr.HdrConType _)       = NONE
|   hdr_copy (Hdr.HdrChallenge _)     = NONE
|   hdr_copy (Hdr.HdrBad _)           = NONE
|   hdr_copy header = 
let
    (*  Find the initial colon, split off any white space after it.
        Header names become uppercase with hyphens mapped to
        underscores.
    *)
    val text = SS.all(TF.toString TF.UseLf (Hdr.formatHeader header))
    val (left, right) = SS.splitl (isntVal #":") text

    fun cvt #"-" = "_"
    |   cvt c    = str(Char.toUpper c)

    val ename  = SS.translate cvt left
    val evalue = SS.dropl Char.isSpace (SS.triml 1 right)
in
    SOME(concat["HTTP_", ename, "=", SS.string evalue])
end

val other_headers = List.mapPartial hdr_copy headers

val final_headers = List.concat[copied, basics, auth_env,
                      user_env, ctype_env, clen_env, query_env,
                      other_headers]

Here is the run_script function.

and run_script script env request : Req.Response =
let
    (*  The Aborted exception can be raised in here. *)
    val Req.Request {abort, ...} = request


    fun talk holder =
    let
        val (proc, _) = ExecReader.get holder
        val ()        = send_entity abort proc request
        val headers   = get_headers abort proc script

        (*  We don't pass these to the client.
            The last four are handled by the Entity Info.
        *)
        fun select (Hdr.HdrStatus _)        = false
        |   select (Hdr.HdrConType _)       = false
        |   select (Hdr.HdrConLen _)        = false
        |   select (Hdr.HdrConEnc _)        = false
        |   select (Hdr.HdrLastModified _)  = false
        |   select _ = true


        val status =
            case Hdr.getStatus headers of
              NONE   => Status.OK
            | SOME s => s


        (*  This includes error responses from the script.
        *)
        fun normal_response() =
        let
            val () = Log.testInform Globals.TestCGIProto Log.Debug
                            (fn()=>TF.S "CGI normal_response")

            val entity = Entity.Entity {
                            info = Hdr.toEntityInfo headers,
                            body = Entity.procProducer holder
                            }
        in
            Req.Response {
                status  = status,
                headers = List.filter select headers,
                entity  = entity
                }
        end
    in
        normal_response()
    end
    handle _ =>
        (
            kill (#1(ExecReader.get holder));
            U.mkServerFail()        (* REVISIT - should be ReqTimeout *)
        )

in
    (*  The holder will be closed in procProducer after the response
        body has been delivered.  If there is an error then the
        holder will eventually be finalised.
    *)
    case ExecReader.openIt abort (script, [], env) of
      NONE        => U.mkServerFail()  (* error already reported *)
    | SOME holder => talk holder
end

The forking and execing of the script is handled by the ExecReader module which is described in the section called The Open File Manager. This module waits for enough file descriptors before proceeding. It provides for finalisation to kill and reap the child if there is a time-out.

If the script is successfully started then the talk function sends any entity body to the stdin of the script. Then it reads the headers that come back from the script on stdout and constructs a normal response. (The normal_response function is a left-over of more complex code that I simplified). The status and entity-specific headers are separated out. An Entity value is constructed to represent the body that may or may not be still waiting on stdout to be read. The body won't be read until the response is being written to the socket of the connection, as described in the section called The Connection Protocol in Chapter 8. The get_headers function can raise the local Aborted exception if it detects an abort condition. I make an attempt to ensure that the child process is killed quickly rather than wait for finalisation.

Here is the send_entity function.

and send_entity abort proc request =
let
    val Req.Request {entity, ...} = request
    val (_, ostream) = Unix.streamsOf proc

    val consumer = CML.channel()

    val () = Log.testInform Globals.TestCGIProto Log.Debug
                (fn()=>TF.S "CGI send_entity")

    fun send_it() =
    (
        case CML.recv consumer of
          E.XferInfo _ => send_it()

        | E.XferBytes vec => 
            (
                TextIO.output(ostream, Byte.bytesToString vec);
                send_it()
            )

        | E.XferDone  => done()
        | E.XferAbort => done()
    )

    and done() =
    (
        TextIO.closeOut ostream
    )
in
    E.startProducer abort entity consumer;
    CML.spawn send_it;
    ()
end

The sending has to be done in a separate thread because there is no guarantee that the CGI script will even read its stdin let alone consume it all strictly before attempting to write to stdout. If the script doesn't read its stdin then the sending thread will block indefinitely and will eventually be caught by the garbage collector after the child process has been reaped and all files closed. To send the entity the thread acts as a consumer of the transfer procotol.

Here is the get_headers function.

and get_headers abort proc script =
let
    val (istream, _) = Unix.streamsOf proc
    val () = Log.testInform Globals.TestCGIProto Log.Debug
                (fn()=>TF.S "CGI get_headers")

    (*  This must match Connect.readLine.
        We strip the terminating \r\n.
    *)
    fun readLine() = 
    (
        if Abort.aborted abort
        then
            NONE
        else
        (
            case TextIO.inputLine istream of
              ""   => NONE

            | line =>
                let
                    val l = size line
                in
                    if l > 1 andalso String.sub(line, l-2) = #"\r"
                    then
                        SOME(String.substring(line, 0, l-2))
                    else
                    if l > 0 andalso String.sub(line, l-1) = #"\n"
                    then
                        SOME(String.substring(line, 0, l-1))
                    else
                        SOME line
                end
        )
    )


    (*  Log any bad headers and discard them. *)
    fun check [] out = out
    |   check ((Hdr.HdrBad h)::rest) out =
    (
        Log.error ["CGI ", script, " returned bad header: ", h];
        check rest out
    )
    |   check (h::rest) out = check rest (h::out)


    (*  Try to read some headers. This will return early on
        an abort.
    *)
    val headers = Hdr.readAllHeaders readLine
in
    if Abort.aborted abort
    then
        raise Aborted
    else
        check headers []
end

This is mainly a wrapper around the common Hdr.readAllHeaders function. The messy bit is emulating the handling of CR-LF that the Connect module does. If a time-out happens while the script is running then it is most likely to be detected while waiting for the headers. I check for an abort condition before each header line and after the headers have been read. The Aborted exception breaks out of the run_script function.

The Builtin Node Handler

This module implements some simple built-in kinds of nodes. They are used for testing. If you were to use the server as a front-end for an SML application then the interface between the server and the application would be modeled on this module.

The code in this module is fairly generic. It consists of a framework for running a function that creates the response to a request. This is shown here.

and handle_request 
        (msg as HndReq {config, rchan, request, ...})
        =
let
    val Cfg.NodeConfig {kind, ...} = config

    fun reply response =
    (
        CML.send(rchan, HndResponse(msg, response))
    )
in
    case kind of
      Cfg.NodeIsBuiltin {name} =>
    (
        case get_maker name of
          NONE   => reply (U.mkServerFail())
        | SOME f => reply (f request)
    )

    | _ => raise InternalError "SimpleBuiltin,handleRequest"
end



and get_maker name =
(
    case name of
      "hw"      => SOME (fn _ => U.mkHelloWorld())
    | "reject"  => SOME (fn _ => U.mkNotFound())
    | "sleep"   => SOME sleep
    | _         => NONE
)

The get_maker function selects a response-building function depending on the kind of the built-in node as specified in the node's configuration. The hello world and reject nodes return fixed responses. The sleep node delays for a number of seconds specified by the value in the query. If you configure a node as follows:

Node /sleep
{
    #  Pass a timeout as a query e.g. /sleep?3
    BuiltIn = "sleep";
}

then the URL http://.../sleep?3 will return a response 3 seconds later. Here is the sleep function.

and sleep request =
let
    val Req.Request  {url, abort, ...} = request
    val URL.HTTP_URL {query, ...} = url

    val timeout = 
        case query of
          NONE => 1
        | SOME q => getOpt(Int.fromString q, 1)

    val t_evt = CML.timeOutEvt(Time.fromSeconds(Int.toLarge timeout))
in
    CML.select[
        CML.wrap(t_evt, fn _ => ()),
        CML.wrap(Abort.evt abort, fn _ => ())
        ];

    U.mkHTML Status.OK (TF.L [
            "<html><body><p>", 
            "Slept for ", Int.toString timeout, " seconds",
            "</body></html>"])
end

First it gets the time-out from the query string or defaults to 1 second if it isn't available or readable. Then it uses a CML.select to wait for the desired time-out. This must also abort on a request time-out. The mkHTML function builds a simple response containing some HTML. See the section called The ResponseUtils Module. Note that this handler is single-threaded. So if two requests come in at the same time the second one will start its delay after the first one has finished. The handler should process these requests concurrently but I only use this function for testing at the moment so I'm not bothered.

The ResponseUtils Module

This module contains a collection of miscellaneous functions, mainly for creating HTTP responses. Here's a simple one that returns plain text. It uses a textProducer to deliver the entity body out of a string in memory. TF is the TextFrag module described in the section called The Text Module.

and mkHelloWorld() : Req.Response =
let
    val info = Entity.Info {
                etype       = SOME (Entity.simpleType "text" "plain"),
                encoding    = NONE,
                length      = NONE,
                last_mod    = NONE
                }

    val entity = Entity.Entity {
                info = info,
                body = Entity.textProducer(TF.C
                            [TF.S "hello world", TF.Nl])
                }
in
    Req.Response {
        status  = Status.OK,
        headers = [],
        entity  = entity
        }
end

This next one has a little help to generate HTML.

and mkForbidden() =
(
    mkHTML Status.Forbidden 
        (lines_to_text [
            "<html><body>", 
            "<em>Access Denied</em>",
            "</body></html>"
            ])
)


and lines_to_text lst =
(
    TF.C(map (fn l => TF.C [TF.S l, TF.Nl]) lst)
)



and mkHTML status frag =
let
    val info = Entity.Info {
                etype       = SOME (Entity.simpleType "text" "html"),
                encoding    = NONE,
                length      = NONE,
                last_mod    = NONE
                }

    val entity = Entity.Entity {
                info = info,
                body = Entity.textProducer frag
                }
in
    Req.Response {
        status  = status,
        headers = [],
        entity  = entity
        }
end

The NodeAuth Module

This module checks the client's credentials. It only uses the Basic authorisation type. The implementation is quite simple-minded. The user name and password are looked up each time by reading through the authorisation files. This could be done more efficiently by caching the file contents in memory. But then I would have to have some control mechanism to reload the cache if I change a password or add a user.

The interface is a single function.

fun checkAuth auth (req: Req.Request) =
let
in
    case auth of
      Cfg.NodeNoAuth    => ()       (* pass *)
    | Cfg.NodeBasic au  => validate_basic au req
end

If the authorisation fails this function constructs a response and returns it via the Node.Respond exception. This is caught in the GenericNodeFn functor. See the section called The Generic Node.

Here is the top-level of the Basic validation.

and validate_basic
        (auth as {realm, user_file, group_file, users, groups})
        req
        : unit =
let
    val () = Log.testInform G.TestAuth Log.Debug (fn() => TF.L [
                "Basic auth for realm ", realm])

    val Req.Request {headers, abort, ...} = req


    (*  Generate a challenge response to prompt for a password.  *)
    fun challenge() =
    let
        val () = Log.testInform G.TestAuth Log.Debug (fn() => TF.L [
                    "Returning challenge for realm ", realm])

        val resp = Req.Response {
                status  = Status.UnAuth,
                headers = [Hdr.HdrChallenge(Hdr.ChallBasic realm)],
                entity  = Entity.None
                }
    in
        raise Node.Respond resp
    end


    fun reject() =
    let
        val resp = Req.Response {
                status  = Status.UnAuth,
                headers = [],
                entity  = Entity.None
                }
    in
        raise Node.Respond resp
    end

in
    case Hdr.getAuth headers of
      NONE => challenge()

    | SOME (Hdr.AuthBasic (opt_id, pwd)) =>
    (
        case opt_id of
          NONE    => reject()
        | SOME id => validate_user abort auth reject id pwd
    )
end

To get to this function the node must require (Basic) authorisation. So if the request does not have one then the response will contain a challenge header which will make a browser prompt the user for a password and resend the request. If the request has some authorisation then it must have both a user name and password and these are validated against the files. Since reading files takes time there must be a check for an aborted connection. The abort value is passed down through the validation code. The reject function raises a "401 UnAuthorized" reponse and is passed along to the validation routines.

Here is the top-level of the user validation.

and validate_user 
        abort
        {realm, user_file, group_file, users, groups}
        rejecter id pwd 
        : unit =
let
    val () = Log.testInform G.TestAuth Log.Debug (fn() => TF.L [
                "Validate user=", id, " for realm=", realm])

    val all_users = add_group_users abort users group_file groups
in
    if List.exists (isVal id) all_users
       andalso validate_pwd abort user_file id pwd
    then
        ()
    else
        rejecter()
end

The group names in the authorisation record are expanded by add_group_users to a list of user names and added to the user name list. Then if the user name is in this list the password must be checked. (The isVal function is in the Common module). I'll skip the add_group_users function which is a messy bit of file reading and go on to the validate_pwd function. This is a simpler bit of file reading.

and validate_pwd abort user_file id pwd : bool =
let
    val () = Log.testInform G.TestAuth Log.Debug (fn() => TF.L [
                "Validate pwd for user=", id, " pwd=", pwd])

    fun loop lnum strm =
    let
        val line = TextIO.inputLine strm
    in
        if line = ""
        then
            false           (* eof so failed *)
        else
            if check_line line lnum
            then
                true
            else
                loop (lnum+1) strm
    end


    and check_line line lnum =
    let
        val (left, right) =
            SS.splitl (isntVal #":") (SS.all line)

        fun clean s = SS.dropr Char.isSpace (SS.dropl Char.isSpace s)

        (*  Trim off leading and trailing white space from the names. *)
        val user     = SS.string(clean left)
        val password = SS.string(clean(SS.triml 1 right))

        val () = Log.testInform G.TestAuth Log.Debug (fn() => TF.L [
                "Found user=", user, " pwd=", password])
    in
        user = id andalso password = pwd
    end
in
    FileIO.withTextIn abort user_file false (loop 1)
end

The FileIO.withTextIn function takes care of opening and closing the file including waiting for a file descriptor to be available. It passes a text stream to the loop function which reads the lines. The false value is a default in case the file could not be read. A line is of the form "user: pwd" with white space allowed around the user name and the password. The clean function trims off this white space.