--- /dev/null
+Network Interface
+
+There are two layers to the network interface. syscall_net implements something
+armavingly representative of the BSD sockets API, which libc wraps to look
+much more like it should.
+
+The basic state is managed by the syscall_net socket layer and state
+transitions and events cause calls into the network layer implementation and
+from the implementation back up to the syscall_net.c code.
+
+The goal of this separation is to support
+
+- A native TCP/IP stack running in userspace
+
+- The WizNET modules with an onboard TCP and 32K of buffers. These are ideal
+ for things like ethernet on small microcontrollers.
+
+- Drivewire 4, which supports multiple incoming and outgoing TCP connections
+ over the DriveWire link and the Drivewire 4 server.
+
+- Emulator TCP/IP host stack interfaces such as those in uQLX, QPC2 etc
+
+- A single socket fake AT interface as provided by some emulators modem
+ emulation. This limits us to a single TCP connection, and requires we
+ remember to do DNS lookups before opening the data socket.
+
+It cannot support devices that provide multiple TCP/IP data links but are
+unable to flow control or error recover the link properly (eg the ESP8266
+standard firmware). It ought to be possible to write replacement ESP8266
+firmware to provide a sane serial interface.
+
+Native Stack
+
+The native stack consists of a kernel driver bound to syscall_net, a user mode
+daemon and a backing file (as implemented today, although for bigger machines
+that could become a memory buffer).
+
+Thus socket(), connect() and friends execute in the context of the originating
+process. They cause the syscall_net layer to make socket transitions and these
+transitions along with some other needed messages are passed back and forth
+between the daemon and the application.
+
+The backing file is used to avoid the problem of the kernel needing large
+amounts of buffer memory for half usable networking. Instead the kernel and
+daemon communicate by copying data to and from a single large file that backs
+the networking state. In the fast sequence of events the data is effectively
+moved between the daemon and application using the disk cache as the buffers.
+However if there is more buffering needed it will spill to hard disc and all
+will be good.
+
+Currently the interface and syscall_net layer has hardwired assumptions about
+TCP/IP that need pushing into the low level implementations so that other
+protocols could also be supported.
+
+Note: accepting incoming connections is not yet handled. This will add some
+new messages.
+
+Native Stack Messages
+
+The read() call receives the next message from the kernel. The kernel passes
+events by providing a copy of the relevant socket data structures to the
+daemon including the event masks.
+
+The events the kernel sends are
+
+NEVW_STATE: A state change is pending, eg a bind or connect() call
+ sd->newstate holds the requested new socket state
+
+ For moves to SS_UNCONNECTED the daemon responds with
+ an NE_INIT message giving the lcn (logical channel number)
+ it wishes to associate with the socket in ne.data and any
+ error code in ne.ret
+
+ For other state changes it acknowledges them by responding
+ with NE_NEWSTATE and the desired new state. ne.ret again
+ holds the error code if failing or 0 if good
+
+NEVW_READ: This is sent when the user application has read data from
+ the socket and there is more room in the buffer. No reply
+ is required
+
+NEVW_WRITE: This is sent when the user application has written data to
+ the socket and there is more data in the buffer. No reply
+ is needed
+
+
+The daemon sends the following events asynchronously by using the write()
+system call
+
+NE_EVENT: An asynchronous state change has occurred, for example a socket
+ being reset or connecting. This does not clear any wait for
+ synchronous event change. If an event wait is pending then
+ a further read will report the fact and the socket can if
+ need be, be sent a NEVW_STATE giving the states. That
+ is needed as NE_EVENT can cross NEVW_STATE messages.
+
+NE_SETADDR: Set the kernel view of one of the address structures. This
+ is used when a connection is made for example.
+
+ FIXME: we don't yet handle attaching addresses to datagrams
+ in UDP unconnected mode.
+
+NE_ROOM: Inform the kernel that more space now exists. ne.data is the
+ new tbuf pointer indicating what was consumed.
+
+NE_DATA: More data has arrived, ne.data is the new rnext pointer
+
+The actual data is held in the buffer file. For UDP we keep it simple and
+put one packet in each buffer (1K max packet size currently supported). For
+TCP we treat the buffer space as a 4K RX and a 4K TX buffer. The daemon knows
+as the user eats data so it can ACK correctly, but also it knows the range
+that is free so can use the buffer as an out of order cache for the socket.
+
+Implementing For Your Platform
+
+Firstly pick the needed stack.
+
+The drivers/net/net_at implementation is a pretty minimal fake socket
+connection for an AT modem style interface. It's really only intended for
+debugging but can be used if there are no better choices. Note the FIXME's
+around +++ however.
+
+The native stack is currently very much a work in progress
+
+I have sketched out DriveWire4 and WizNet stacks but they are far from
+complete or tested. If you are interested in finishing the job let me know.
+
+
+TODO
+
+- Push socket type checking into the implementation so we can handle non TCP/IP
+stacks. That impacts how we store addresses so we may need a size ifdef or
+union to avoid burdening small boxes with full address objects.
+
+- Attach datagrams to an address. Might reduce our UDP datagram size a bit
+below 1K - do we care ? Needed on send and receive.
+
+- Allow platform to set sizes.
+
+- Implement binding ioctls, exit clean up, test of compatible struct and consts
+
+- Need to sort out the send/recv interface and address setting.
+
+- Push "local" address check out of syscall_net
+
+- net_ hook for configuration ioctls or somesuch ?
+
+- Libc needs a resolver and all the networking helper stuff BSD expects
+
+- Could we consume and undirty buffers with a slightly naughy readi hook ?
+
+- shutdown()
+
+- socket options
+
+- recvmsg/sendmsg (maybe - silly functions!)
--- /dev/null
+#include <kernel.h>
+#include <kdata.h>
+#include <netdev.h>
+#include <net_native.h>
+#include <printf.h>
+
+/* This holds the additional kernel context for the sockets */
+struct sockdata sockdata[NSOCKET];
+
+static void wakeup_all(struct socket *s)
+{
+ wakeup(s);
+ wakeup(&s->s_data);
+ wakeup(&s->s_iflag);
+}
+
+/*
+ * The daemon has sent us a message. Process the message. Note that we
+ * don't have any interrupts in this stack because everything happens
+ * in a process context of some kind
+ *
+ * Is it worth having a single call take a batch of events ?
+ */
+int netdev_write(void)
+{
+ struct socket *s;
+ struct sockdata *sd;
+ static struct netevent ne;
+
+ /* Grab a message from the service daemon */
+ if (net_ino == NULL || udata.u_count != sizeof(ne) ||
+ uget(udata.u_base, &ne, sizeof(ne)) == -1 ||
+ ne.socket >= NSOCKET) {
+ udata.u_error = EINVAL;
+ return -1;
+ }
+
+ s = sockets + ne.socket;
+ sd = sockdata + ne.socket;
+
+ switch (ne.event) {
+ /* State change. Wakes up the socket having moved state */
+ case NE_NEWSTATE:
+ s->s_state = ne.data;
+ sd->ret = ne.ret;
+ /* A synchronous state change has completed */
+ sd->event &= ~ NETW_STATEW;
+ /* Review select impact for this v wakeup_all */
+ wakeup(s);
+ break;
+ /* Asynchronous state changing event */
+ case NE_EVENT:
+ s->s_state = ne.data;
+ s->s_err = ne.ret;
+ wakeup_all(s);
+ break;
+ /* Change an address */
+ case NE_SETADDR:
+ if (ne.data < 3)
+ memcpy(s->s_addr[ne.data], ne.info,
+ sizeof(struct sockaddrs));
+ break;
+ /* Response to creating a socket. Initialize lcn */
+ case NE_INIT:
+ s->s_state = SS_UNCONNECTED;
+ s->s_lcn = ne.data;
+ sd->event = 0;
+ sd->ret = ne.ret;
+ sd->err = 0;
+ sd->rbuf = sd->rnext = 0;
+ sd->tbuf = sd->tnext = 0;
+ wakeup(s);
+ break;
+ /* Indicator of write room from the network agent */
+ case NE_ROOM:
+ sd->tbuf = ne.data;
+ wakeup(&s->s_iflag);
+ break;
+ /* Indicator of data from the network agent */
+ case NE_DATA:
+ sd->rnext = ne.data; /* More data available */
+ memcpy(sd->rlen, ne.info,
+ sizeof(uint16_t) * NSOCKBUF);
+ s->s_iflag |= SI_DATA;
+ break;
+ default:
+ kprintf("netbad %d\n", ne.event);
+ udata.u_error = EOPNOTSUPP;
+ return -1;
+ }
+ return udata.u_count;
+}
+
+/* When events are pending we simply hand all the structs to the server
+ as copies. It can then make any decisions it needs to make */
+static int netdev_report(struct sockdata *sd)
+{
+ uint8_t sn = sd - sockdata;
+ struct socket *s = sockets + sn;
+
+ if (uput(sd, udata.u_base, sizeof(*sd)) == -1 ||
+ uput(s, udata.u_base + sizeof(*sd), sizeof(*s)) == -1)
+ return -1;
+ sd->event &= ~NEVW_MASK;
+ return udata.u_count;
+}
+
+/*
+ * Scan the socket table for any socket with a pending event. Shovel
+ * the first one we find at the daemon. We should possibly round-robin
+ * these but it's not clear it's that important
+ */
+int netdev_read(uint8_t flag)
+{
+ if (net_ino == NULL || udata.u_count != sizeof(struct sockmsg)) {
+ udata.error = EINVAL;
+ return -1;
+ }
+ while(1) {
+ struct sockdata *sd = sockdata;
+ while (sd != sockdata + NSOCKET) {
+ if (sd->event)
+ return netdev_report(sd);
+ sd++;
+ }
+ if (psleep_flags(&ne, flag))
+ return -1;
+ }
+}
+
+/*
+ * The ioctl interface at the moment is simply the initialization
+ * function.
+ */
+static int netdev_ioctl(uarg_t request, char *data)
+{
+ int16_t fd;
+ inoptr ino;
+
+ switch(request) {
+ /* Daemon starting up, passing file handle of cache */
+ /* FIXME: Check sizes etc are valid via some kind of
+ passed magic hash */
+ case NET_INIT:
+ if (net_ino) {
+ udata.u_error = EBUSY;
+ return -1;
+ }
+ fd = ugetw(data);
+ if ((net_ino = getinode(fd)) == NULLINODE)
+ return -1;
+ i_ref(net_ino);
+ return 0;
+ }
+}
+
+/*
+ * On a close of the daemon close down all the sockets we
+ * have opened.
+ */
+static int netdev_close(void)
+{
+ struct socket *s = sockets;
+ if (net_ino) {
+ i_deref(net_ino);
+ while (s < sockets + NSOCKET) {
+ if (s->s_state != SS_UNUSED) {
+ s->s_state = SS_CLOSED;
+ wakeup_all(s);
+ }
+ s++;
+ }
+ }
+}
+
+/*
+ * We have received an event from userspace that requires us to wait
+ * until the network stack performs the relevant state change. Pass
+ * the wanted new state on to the daemon, then wait until our STATEW
+ * flag is cleared by a suitable message.
+ */
+static int netn_synchronous_event(struct socket *s, uint8_t state)
+{
+ uint8_t sn = s - sockets;
+ struct sockdata *sd = &sockdata[sn];
+
+ sd->event |= NETW_STATE | NETW_STATEW;
+ sd->newstate = state;
+ wakeup(&ne);
+
+ do {
+ psleep(s);
+ } while (sd->event & NETW_STATEW);
+
+ udata.u_error = sd->ret;
+ return -1;
+}
+
+/*
+ * Flag an unsolicited event to the daemon. These are used to
+ * handshake the buffer status.
+ */
+static void netn_asynchronous_event(struct socket *s, uint8_t event)
+{
+ uint8_t sn = s - sockets;
+ struct sockdata *sd = &sockdata[sn];
+ sd->event |= event;
+ wakeup(&ne);
+}
+
+/*
+ * Queue data to a stream socket. We use the entire buffer space
+ * available as a ring buffer and write bytes to it. We then update
+ * our pointer and poke the daemon to send stuff.
+ */
+static uint16_t netn_queuebytes(struct socket *s)
+{
+ arg_t n = udata.u_count;
+ arg_t r = 0;
+ uint8_t sn = s - sockets;
+ struct sockdata *sd = &sockdata[sn];
+ /* Do we have room ? */
+ if (sd->tnext == sd->tbuf)
+ return 0;
+
+ udata.u_sysio = false;
+ udata.u_offset = sn * SOCKBUFOFF + RXBUFOFF + sd->tnext;
+
+ /* Wrapped part of the ring buffer */
+ if (n && sd->tnext > sd->tbuf) {
+ /* Write into the end space */
+ uint16_t spc = TXBUFSIZ - sd->tnext;
+ if (spc < n)
+ spc = n;
+ udata.u_count = spc;
+ /* FIXME: check writei returns and readi returns properly */
+ writei(net_ino, 0);
+ if (udata.u_error)
+ return 0xFFFF;
+ sd->tnext += spc;
+ n -= spc;
+ r = spc;
+ /* And wrap */
+ if (sd->tnext == TXBUFSIZE)
+ sd->tnext = 0;
+ }
+ /* If we are not wrapped or just did the overflow write lower */
+ if (n) {
+ spc = sd->tbuf - sd->tnext;
+ if (spc < n)
+ spc = n;
+ udata.u_count = spc;
+ udata.u_offset = sn * SOCKBUFOFF + RXBUFOFF + sd->tnext;
+
+ /* FIXME: check writei returns and readi returns properly */
+ writei(net_ino, 0);
+ if (udata.u_error)
+ return 0xFFFF;
+ sd->tnext += spc;
+ r += spc;
+ }
+ /* Tell the networkd daemon there is more data in the ring */
+ netn_asynchronous_event(s, NE_WRITE);
+ return r;
+}
+
+/*
+ * Queue data to a datagram socket. At the moment we use the ring
+ * as a set of fixed sized buffers. That may want changing. We do
+ * however need to work out how to pass an address and size header
+ * in the buffers, while still getting the ring behaviour right if
+ * we changed this as well as avoiding partial writes of a datagram.
+ *
+ * FIXME: we need to attach an address to getbuf/putbuf cases because we
+ * may be using sendto/recvfrom
+ */
+static uint16_t netn_putbuf(struct socket *s)
+{
+ uint8_t sn = s - sockets;
+ struct sockdata *sd = &sockdata[sn];
+
+ if (udata.u_count > TXPKTSIZE) {
+ udata.u_error = EMSGSIZE;
+ return 0xFFFF;
+ }
+ if (sd->tnext == sd->tbuf)
+ return 0;
+
+ udata.u_sysio = false;
+ udata.u_offset = sn * SOCKBUFOFF + RXBUFOFF + sd->tbuf * TXPKTSIZE;
+ /* FIXME: check writei returns and readi returns properly */
+ writei(net_ino, 0);
+ sd->tlen[sd->tnext++] = udata.u_count;
+ if (sd->tnext == NSOCKBUF)
+ sd->tnext = 0;
+ /* Tell the network stack there is another buffer to consume */
+ netn_asynchronous_event(s, NEV_WRITE);
+ return udata.u_count;
+}
+
+/*
+ * Pull a packet from the receive buffer. We fetch the next ring buffer
+ * slot and then copy as much as is required into the user buffer. This
+ * side also needs to handle addressing better, and may make sense to
+ * use the ring buffer packing. Once done we poke the daemon so it knows
+ * space is freed.
+ */
+static uint16_t netn_getbuf(struct socket *s)
+{
+ uint8_t sn = s - sockets;
+ struct sockdata *sd = &sockdata[sn];
+ arg_t n = udata.u_count;
+ arg_t r = 0;
+
+ if (sd->rbuf == sd->rnext)
+ return 0;
+ udata.u_sysio = false;
+ udata.u_offset = sn * SOCKBUFOFF + sd->rbuf * RXPKTSIZE;
+ udata.u_count = min(udata.u_count, sd->rlen[sd->rbuf++]);
+ /* FIXME: check writei returns and readi returns properly */
+ readi(net, ino, 0);
+ /* FIXME: be smarter when we send this */
+ if (sd->rbuf == NSOCKBUF)
+ sd->rbuf = 0;
+ netn_asynchronous_event(s, NEV_READ);
+ return udata.u_count;
+}
+
+/*
+ * Pull bytes from the receive ring buffer. We copy as many bytes as
+ * we can to fulfill the user request. Short reads are acceptable if
+ * the buffer contains some data but not enough.
+ * After reading we tell the daemon and it will adjust the TCP window
+ * and send an ack frame as appropriate as well as adjusting its
+ * copy of the ring state
+ */
+static uint16_t netn_copyout(struct socket *s)
+{
+ arg_t n = udata.u_count;
+ arg_t r = 0;
+ uint8_t sn = s - sockets;
+ struct sockdata *sd = &sockdata[sn];
+
+ if (sd->rnext == sd->rbuf)
+ return 0;
+
+ udata.u_sysio = false;
+ udata.u_offset = sn * SOCKBUFOFF + sd->rbuf;
+
+ /* Wrapped part of the ring buffer */
+ if (n && sd->rnext < sd->rbuf) {
+ /* Write into the end space */
+ uint16_t spc = RXBUFSIZ - sd->rbuf;
+ if (spc < n)
+ spc = n;
+ udata.u_count = spc;
+ /* FIXME: check writei returns and readi returns properly */
+ readi(net_ino, 0);
+ if (udata.u_error)
+ return 0xFFFF;
+ sd->rbuf += spc;
+ n -= spc;
+ r = spc;
+ /* And wrap */
+ if (sd->rbuf == RXBUFSIZE)
+ sd->rbuf = 0;
+ }
+ /* If we are not wrapped or just did the overflow write lower */
+ if (n) {
+ spc = sd->rnext - sd->rbuf;
+ if (spc < n)
+ spc = n;
+ udata.u_count = spc;
+ /* FIXME: check writei returns and readi returns properly */
+ readi(net_ino, 0);
+ if (udata.u_error)
+ return 0xFFFF;
+ sd->rbuf += spc;
+ r += spc;
+ }
+ /* Tell the networkd daemon there is more room in the ring */
+ /* FIXME: be smarter when we send this */
+ netn_asynchronous_event(s, NE_READ);
+ return r;
+
+}
+
+
+/*
+ * Called from the core network layer when a socket is being
+ * allocated. We can either move the socket to SS_UNCONNECTED,
+ * or error. In our case the daemon will reply with an NE_INIT,
+ * or a state change to set an error.
+ *
+ * This call is blocking but the BSD socket API users don't expect
+ * anything to block for long. Blocking here is however needed because
+ * some of the stacks (this one included) are asynchronous to the
+ * OS.
+ */
+int net_init(struct socket *s)
+{
+ if (!net_ino) {
+ udata.u_error = ENETDOWN;
+ return -1;
+ }
+ return netn_synchronous_event(s, SS_UNCONNECTED);
+}
+
+/*
+ * A bind has occurred. This might be a user triggering a bind but it
+ * could also be an autobind.
+ *
+ * FIXME: distinguish bind and autobind so we can push address picking
+ * into the stack implementation to cover non IP stacks
+ */
+int net_bind(struct socket *s)
+{
+ return netn_synchronous_event(s, SS_BOUND);
+}
+
+/*
+ * A listen has been issued by the user. Inform the underlying TCP
+ * stack that it should accept connections on this socket. A stack that
+ * lacks incoming connection support can error instead
+ */
+int net_listen(struct socket *s)
+{
+ return netn_synchronous_event(s, SS_LISTENING);
+}
+
+/*
+ * A connect has been issued by the user. This message tells the
+ * stack to begin connecting. It should put the socket state into
+ * SS_CONNECTING before returning, or it can error.
+ */
+int net_connect(struct socket *s)
+{
+ return netn_synchronous_event(s, SS_CONNECTING);
+}
+
+/*
+ * A socket is being closed by the user. Move the socket into a
+ * closed state and free the resources used. If the underlying
+ * implementation has longer lived resources (eg a TCP port moving
+ * into TIME_WAIT) then the socket and internal resources must be
+ * disconnected from one another.
+ */
+void net_close(struct socket *s)
+{
+ /* Caution here - the native tcp socket will hang around longer */
+ netn_synchronous_event(s, SS_CLOSED);
+}
+
+/*
+ * Read or recvfrom a socket. We don't yet handle message addresses
+ * sensible and that needs fixing
+ */
+arg_t net_read(struct socket *s, uint8_t flag)
+{
+ uint16_t n = 0;
+
+ if (sd->err) {
+ udata.u_error = sd->err;
+ sd->err = 0;
+ return -1;
+ }
+ while (1) {
+ if (s->s_state < SS_CONNECTED) {
+ udata.u_error = EINVAL;
+ return -1;
+ }
+
+ if (s->s_type != SOCKTYPE_TCP)
+ n = netn_getbuf(s);
+ else
+ n = netn_copyout(s);
+ if (n == 0xFFFF)
+ return -1;
+ if (n)
+ return n;
+ s->s_iflag &= ~SI_DATA;
+ /* Could do with using timeouts here to be clever for non O_NDELAY so
+ we aggregate data. For now assume a fifo */
+ if (psleep_flags(&s->s_iflag, flag))
+ return -1;
+ }
+}
+
+/*
+ * Write or sendto a socket. We don't yet handle message addresses
+ * sensible and that needs fixing
+ */
+arg_t net_write(struct socket * s, uint8_t flag)
+{
+ uint16_t n = 0, t = 0;
+
+ if (sd->err) {
+ udata.u_error = sd->err;
+ sd->err = 0;
+ return -1;
+ }
+
+ while (t < udata.u_count) {
+ if (s->s_state == SS_CLOSED) {
+ udata.u_error = EPIPE;
+ ssig(udata.u_ptab, SIGPIPE);
+ return -1;
+ }
+ if (s->s_type != SOCKTYPE_TCP)
+ n = netn_putbuf(s);
+ else
+ n = netn_queuebytes(s);
+ /* FIXME: buffer the error in this case */
+ if (n == 0xFFFF)
+ return udata.u_count ? udata.u_count : -1;
+
+ t += n;
+
+ if (n == 0) { /* Blocked */
+ netn_asynchronous_event(s, NE_ROOM, udata.u_count);
+ if (psleep_flags(&s->s_iflag, flag))
+ return -1;
+ }
+ }
+ return udata.u_count;
+}
+
+/* Gunk we are still making up */
+struct netdevice net_dev = {
+ 0,
+ "net0",
+ IFF_POINTOPOINT
+};
+
+arg_t net_ioctl(uint8_t op, void *p)
+{
+ used(op);
+ used(p);
+ return -EINVAL;
+}
+
+void netdev_init(void)
+{
+}
+
+uint8_t use_net_r(void)
+{
+ return 1;
+}
+
+uint8_t use_net_w(void)
+{
+ return 1;
+}