The Socket API

There is fairly comprehensive API for socket programming in the compiler's Basis library. This is an SML/NJ extension that has gone undocumented until now as far as I know.

You can find the source for the API in the boot/Sockets directory of the compiler source. Start with the SOCKET signature in the socket-sig.sml file. The actual implementation starts with the shared material in the PreSock structure in the pre-sock.sml file.

The Generic Socket Types

In the Socket structure (with signature SOCKET) we have the following generic types.

type ('af, 'sock) sock
type 'af sock_addr

(* witness types for the socket parameter *)
type dgram
type 'a stream
type passive    (* for passive streams *)
type active     (* for active (connected) streams *)

The clever thing here is the use of type parameters to distinguish between different kinds of sockets. This lets the type checker do some checking of the use of sockets. Internally a socket is represented by the following datatype in the PreSock structure.

(* the raw representation of a socket
   (a file descriptor for now) *)
type socket = int
datatype ('af, 'sock) sock = SOCK of socket

The type just includes the integer file descriptor for the socket. The type variables are not actually used in the definition of a socket. They are only a part of the logical framework of the program that is checked by the type checker at compile time[1].

The first type parameter to sock distinguishes the different address families. All of the functions in the Socket structure accept a socket type, such as ('a, 'b) sock, with any address family, as you would expect. Address families are used at the time sockets are created. See the section called A Simple TCP Client for an example. The Socket.AF structure defines a type for address families and some functions to obtain values of the type. Normally you would use the specialised types of the section called The Specific Socket Types.

The second type parameter 'sock distinguishes between the different states of a socket. The possible types are:

Some functions in Socket only operate on passive or active streams. For example

val accept: ('a, passive stream) sock ->
            (('a, active stream) sock * 'a sock_addr)

val listen: (('a, passive stream) sock * int) -> unit

val sendVec: (('a, active stream) sock * Word8Vector.vector buf)
             -> int

The type parameters constrain you to ensure that you cannot call sendVec on the same socket value that you passed to accept or listen. You can however call sendVec on the value returned from accept.

Socket addresses are defined in Socket as being generic over address families. But you will use more specific types with their own functions for addresses in a specific family.

The Specific Socket Types

The socket types in the section called The Generic Socket Types are generic over the address family. What you will actually use are sets of socket types with the address family fixed. For example the INetSock structure defines a type of socket with the address family fixed at AF_INET for the internet protocols. The new types and values are:

datatype inet = INET

type 'a sock = (inet, 'a) Socket.sock
type 'a stream_sock = 'a Socket.stream sock
type dgram_sock = Socket.dgram sock

type sock_addr = inet Socket.sock_addr

Here a distinct type called inet has been defined, although it contains no data. Because it is defined with a datatype it is guaranteed to be different from any other type. This allows the type checker to ensure that you don't mix up sockets with different address families. The remaining types are specialisations for the inet family. The type variable in the stream_sock type will range over the types passive and active in the Socket structure.

The value INetSock.inetAF is an address family value, of type Socket.AF.addr_family, should you need to specify the family explicitly.

The UnixSock structure provides types equivalent to those in INetSock but with the address family fixed for Unix domain sockets.

Socket Addresses

The type sock_addr represents an address that you can bind a socket to. The generic address, Socket.sock_addr, is parameterised by the address family. If you look in the PreSock structure you will see that a socket address is represented internally by a byte vector.

type addr = Word8Vector.vector
datatype 'af sock_addr = ADDR of addr

For each particular address family there is a specialised address type. For example in the INetSock structure there is:

datatype inet = INET
type sock_addr = inet Socket.sock_addr

val toAddr   : (NetHostDB.in_addr * int) -> sock_addr
val fromAddr : sock_addr -> (NetHostDB.in_addr * int)
val any  : int -> sock_addr

The toAddr function will coerce an internet address and a port number to a socket address which is specialised for the inet address family. The fromAddr function will do the reverse. The any function uses the 0.0.0.0 internet address (the traditional INADDR_ANY) that you bind a server socket to if you want it to accept connections from any source address. Its argument is the port number.

To lookup an internet address you use the functions in the NetHostDB structure. These provide the equivalent of the C library's gethostbyname and gethostbyvalue functions. The signature for this structure is:

signature NET_HOST_DB =
sig
    eqtype in_addr
    eqtype addr_family
    type entry
    val name     : entry -> string
    val aliases  : entry -> string list
    val addrType : entry -> addr_family
    val addr     : entry -> in_addr
    val addrs    : entry -> in_addr list
    val getByName    : string -> entry option
    val getByAddr    : in_addr -> entry option

    val getHostName : unit -> string

    val scan       : (char, 'a) StringCvt.reader ->
                     (in_addr, 'a) StringCvt.reader
    val fromString : string -> in_addr option
    val toString   : in_addr -> string
end

You use the getByName or getByAddr functions to fetch a database entry, equivalent to C's struct hostent. They return NONE if the entry is not found. The functions name through to addrs fetch the fields of an entry. The fromString function will parse an address in the numeric formats a.b.c.d, a.b.c, a.b or a. Where there is more than one digit the left digits are 8 bit values and the last digit takes up the rest of the address. Hex numbers are allowed with a 0x prefix, octal with a 0 prefix.

For the Unix address family you have in the UnixSock structure:

datatype unix = UNIX
type sock_addr = unix Socket.sock_addr

val toAddr   : string -> sock_addr
val fromAddr : sock_addr -> string

The string is the path to the socket in the file system.

A Simple TCP Client

This example program makes a TCP connection to a port and fetches one line of response and prints it. You can test it against a server such as the SMTP mail server on port 25 or the NNTP server on port 119. Here is the central function. It's fairly straightforward.

fun connect port =
let
    val localhost =
            valOf(NetHostDB.fromString "127.0.0.1")
    val addr = INetSock.toAddr(localhost, port)
    val sock = INetSock.TCP.socket()

    fun call sock =
    let
        val _    = Socket.connect(sock, addr)
        val msg  = Socket.recvVec(sock, 1000)
        val text = Byte.bytesToString msg
    in
        print text;
        Socket.close sock
    end
    handle x => (Socket.close sock; raise x)
in
    call sock
end
handle OS.SysErr (msg, _) => raise Fail (msg ^ "\n")

The recvVec function performs the C library recv() on the socket into a buffer of 1000 bytes. Since we are expecting a text response the bytesToString coerces the byte vector into a text string. I've wrapped the connection phase into a function to make it easier to wrap an exception handler around it. The handler closes the socket and reraises the exception. This is overkill for such a simple program but it shows you what you would need to do in a larger program. All errors from the socket functions raise OS.SysErr exceptions. The exception handler for these translates them into a simpler error message.

Here is the main program to call the connect function.

fun toErr msg = TextIO.output(TextIO.stdErr, msg)

fun main(arg0, argv) =
let
in
    case argv of
      [port] => 
        (case Int.fromString port of
          NONE => raise Fail "Invalid port number\n"

        | SOME p => connect p)

    | _ => raise Fail "Usage: simpletcp port\n";

    OS.Process.success
end
handle
  Fail msg => (toErr msg; OS.Process.failure)

| x =>
(
    toErr(concat["Uncaught exception: ",
                 exnMessage x, " from\n"]);
    app (fn s => (print "\t"; print s; print "\n"))
        (SMLofNJ.exnHistory x);
    OS.Process.failure
)

A Simple TCP Server

This example program complements the simple client of the previous section. It listens on a TCP socket and sends a simple text response to each client that connects. It is a single threaded server. Here is the serve function that runs the server.

fun serve port =
let
    fun run listener =
    let
        fun accept() =
        let
            val (conn, conn_addr) = Socket.accept listener
        in
            respond conn;
            accept()
        end

        and respond conn =
        let
            val msg = "hello world from tcpserver\n"
            val buf = {buf = Byte.stringToBytes msg,
                       i = 0, sz = NONE}
        in
            ignore(Socket.sendVec(conn, buf));
            Socket.close conn
        end
        handle x => (Socket.close conn; raise x)

    in
        Socket.Ctl.setREUSEADDR(listener, true);
        Socket.bind(listener, INetSock.any port);
        Socket.listen(listener, 9);
        accept()
    end
    handle x => (Socket.close listener; raise x)
in
    run (INetSock.TCP.socket())
end
handle OS.SysErr (msg, _) => raise Fail (msg ^ "\n")

Again I have used functions to isolate the scope of exception handlers as well as to implement the server loop. The run function sets up the socket to listen for connections and runs a loop to accept each one. The socket is bound to a given port but its address is set to 0.0.0.0 (INADDR_ANY) to accept from any host. The listen function takes an integer backlog parameter, the same as the C library listen() function.

Each accepted connection returns a new socket, called conn, and the address of the connecting peer which I ignore. The respond function builds a buffer to send to the client. The sendVec function performs the C library send() function and returns its result which will be the number of bytes successfully sent. In this simple server I ignore this. If there is actually an error then the OS.SysErr exception will be raised. The buffer argument to sendVec must be a record with this type:

type 'a buf = {buf : 'a, i : int, sz : int option}

val sendVec: (('a, active stream) sock * Word8Vector.vector buf)
             -> int

The type variable 'a will be either a vector or an array of bytes depending on the function you use. The signature of the sendVec function is shown. The i field is the offset into the buffer where the send is to start. The sz field is the optional length of the data to send. If it is NONE then the data extends to the end of the buffer. The standard Subscript exception is raised if the offset and length don't fit into the buffer.

The main function for this program is almost identical to the client. It just gets a port number from the command line.

Servers with Multiple Connections

If you want to write a server to handle multiple connections then you can either write it in a single-threaded manner using the poll functions in the OS.IO structure or you can use the Concurrent ML library for a more multi-threaded style.

To use polling you will need the Socket.pollDesc function:

val pollDesc : ('a, 'b) sock -> OS.IO.poll_desc

This will obtain a descriptor from the socket suitable for use with OS.IO. Here is some example code for polling a set of sockets for reading or writing.

type ServerSock = Socket.active INetSock.stream_sock

datatype Handler = Handler of {
        socket: ServerSock,
        reader: ServerSock -> unit,
        writer: ServerSock -> unit
        }

fun poll (handlers: Handler list) =
let
    (*  Convert to a list annotated with iodesc. *)
    fun to_iodesc (Handler {socket, reader, writer}) = 
        (OS.IO.pollToIODesc(Socket.pollDesc socket),
            socket, reader, writer)

    val with_iodesc = map to_iodesc handlers

    (*  Generate a list of poll descriptors for reading
        and writing.
    *)
    fun to_poll (Handler {socket, ...}) = 
            (OS.IO.pollIn o OS.IO.pollOut o Socket.pollDesc)
            socket

    (*  Search for the matching handlers. *)
    fun check_info poll_info =
    let
        val info_iodesc = OS.IO.pollToIODesc(
                                OS.IO.infoToPollDesc poll_info)
        val handler = List.find
                      (fn arg => (#1 arg) = info_iodesc)
                      with_iodesc
    in
        case handler of
          NONE => raise Fail "polled a non-existent socket!"

        | SOME (iodesc, socket, reader, writer) =>
        (
            if OS.IO.isIn  poll_info then reader socket else ();
            if OS.IO.isOut poll_info then writer socket else ()
        )
    end

    val info_list = OS.IO.poll(map to_poll handlers, NONE)
in
    app check_info info_list
end

I've defined a record type for a handler that maps a socket to reader and writer functions. These functions will be called when the socket is ready for reading or writing respectively. My poll function takes a list of handlers and calls the readers and writers for each socket that is ready for I/O. The first step is to extend the handler data with an OS.IO.iodesc value. This is the only type of value used in OS.IO that supports the equality operator so that I can use it for looking up the handler. The Socket structure only provides for producing an OS.IO.poll_desc which I have to back-convert to an iodesc.

The to_poll function separately converts each socket to a OS.IO.poll_desc type. The pollIn and pollOut mark the descriptor for polling for input and output respectively. I then pass the descriptors to the OS.IO.poll function to get the list of resulting info records in info_list. I'm not using a timeout here.

The check_info function examines each info record. First I extract the iodesc from the info record. Then I search the with_iodesc list for a record with the same iodesc. The argument to the predicate is an annotated tuple. I use the #1 notation to get the first member of the tuple which is the iodesc. The isIn function tests if the info record indicates a socket ready for reading. If so then I call the reader. Similarly for the writer.

Here is part of the modified serve function from the server in the section called A Simple TCP Server. It's just a trivial example of calling the poll function.

fun serve port =
let
    fun run listener =
    let
        fun accept() =
        let
            val (conn, conn_addr) = Socket.accept listener
        in
            poll [Handler {
                    socket = conn,
                    reader = reader,
                    writer = writer
                    }];
            accept()
        end

        and writer conn =
        let
            val msg = "hello world from tcpserver\n"
            val buf = {
                    buf = Byte.stringToBytes msg,
                    i = 0,
                    sz = NONE
                    }
        in
            print "responding to a client\n";
            ignore(Socket.sendVec(conn, buf));
            Socket.close conn
        end
        handle x => (Socket.close conn; raise x)

        and reader conn = ()

A serious server would need to maintain a data structure of current connections. This might be a list of records similar to Handler. However you will get a nicer result if you use Concurrent ML to write the server in a multi-threaded style. This will have a reader and a writer thread for each connection. See Chapter 6.

Notes

[1]

I've corrected the order of the type variables which is a typo in PreSock.sock that has no effect.