From: Alan Cox Date: Fri, 19 Feb 2016 23:02:01 +0000 (+0000) Subject: networking: first cut at the native TCP/IP stack interface X-Git-Url: https://git.ndcode.org/public/gitweb.cgi?a=commitdiff_plain;h=5d0a742ba68e58589e702e953775644650184054;p=FUZIX.git networking: first cut at the native TCP/IP stack interface This is far from finished or even building yet, but now has the right sort of form. For the user space side the current plan is to plug it into something like uIP, LwIP or Harry Kalogirou's ktcp for ELKS. In theory the API is such that beyond "what fits" it shouldn't matter whether we bind a teeny/tiny stack like A J Kroll's microcontroller TCP/IP or something a bit more refined. --- diff --git a/Kernel/README.NET b/Kernel/README.NET new file mode 100644 index 00000000..ab1accd8 --- /dev/null +++ b/Kernel/README.NET @@ -0,0 +1,156 @@ +Network Interface + +There are two layers to the network interface. syscall_net implements something +armavingly representative of the BSD sockets API, which libc wraps to look +much more like it should. + +The basic state is managed by the syscall_net socket layer and state +transitions and events cause calls into the network layer implementation and +from the implementation back up to the syscall_net.c code. + +The goal of this separation is to support + +- A native TCP/IP stack running in userspace + +- The WizNET modules with an onboard TCP and 32K of buffers. These are ideal + for things like ethernet on small microcontrollers. + +- Drivewire 4, which supports multiple incoming and outgoing TCP connections + over the DriveWire link and the Drivewire 4 server. + +- Emulator TCP/IP host stack interfaces such as those in uQLX, QPC2 etc + +- A single socket fake AT interface as provided by some emulators modem + emulation. This limits us to a single TCP connection, and requires we + remember to do DNS lookups before opening the data socket. + +It cannot support devices that provide multiple TCP/IP data links but are +unable to flow control or error recover the link properly (eg the ESP8266 +standard firmware). It ought to be possible to write replacement ESP8266 +firmware to provide a sane serial interface. + +Native Stack + +The native stack consists of a kernel driver bound to syscall_net, a user mode +daemon and a backing file (as implemented today, although for bigger machines +that could become a memory buffer). + +Thus socket(), connect() and friends execute in the context of the originating +process. They cause the syscall_net layer to make socket transitions and these +transitions along with some other needed messages are passed back and forth +between the daemon and the application. + +The backing file is used to avoid the problem of the kernel needing large +amounts of buffer memory for half usable networking. Instead the kernel and +daemon communicate by copying data to and from a single large file that backs +the networking state. In the fast sequence of events the data is effectively +moved between the daemon and application using the disk cache as the buffers. +However if there is more buffering needed it will spill to hard disc and all +will be good. + +Currently the interface and syscall_net layer has hardwired assumptions about +TCP/IP that need pushing into the low level implementations so that other +protocols could also be supported. + +Note: accepting incoming connections is not yet handled. This will add some +new messages. + +Native Stack Messages + +The read() call receives the next message from the kernel. The kernel passes +events by providing a copy of the relevant socket data structures to the +daemon including the event masks. + +The events the kernel sends are + +NEVW_STATE: A state change is pending, eg a bind or connect() call + sd->newstate holds the requested new socket state + + For moves to SS_UNCONNECTED the daemon responds with + an NE_INIT message giving the lcn (logical channel number) + it wishes to associate with the socket in ne.data and any + error code in ne.ret + + For other state changes it acknowledges them by responding + with NE_NEWSTATE and the desired new state. ne.ret again + holds the error code if failing or 0 if good + +NEVW_READ: This is sent when the user application has read data from + the socket and there is more room in the buffer. No reply + is required + +NEVW_WRITE: This is sent when the user application has written data to + the socket and there is more data in the buffer. No reply + is needed + + +The daemon sends the following events asynchronously by using the write() +system call + +NE_EVENT: An asynchronous state change has occurred, for example a socket + being reset or connecting. This does not clear any wait for + synchronous event change. If an event wait is pending then + a further read will report the fact and the socket can if + need be, be sent a NEVW_STATE giving the states. That + is needed as NE_EVENT can cross NEVW_STATE messages. + +NE_SETADDR: Set the kernel view of one of the address structures. This + is used when a connection is made for example. + + FIXME: we don't yet handle attaching addresses to datagrams + in UDP unconnected mode. + +NE_ROOM: Inform the kernel that more space now exists. ne.data is the + new tbuf pointer indicating what was consumed. + +NE_DATA: More data has arrived, ne.data is the new rnext pointer + +The actual data is held in the buffer file. For UDP we keep it simple and +put one packet in each buffer (1K max packet size currently supported). For +TCP we treat the buffer space as a 4K RX and a 4K TX buffer. The daemon knows +as the user eats data so it can ACK correctly, but also it knows the range +that is free so can use the buffer as an out of order cache for the socket. + +Implementing For Your Platform + +Firstly pick the needed stack. + +The drivers/net/net_at implementation is a pretty minimal fake socket +connection for an AT modem style interface. It's really only intended for +debugging but can be used if there are no better choices. Note the FIXME's +around +++ however. + +The native stack is currently very much a work in progress + +I have sketched out DriveWire4 and WizNet stacks but they are far from +complete or tested. If you are interested in finishing the job let me know. + + +TODO + +- Push socket type checking into the implementation so we can handle non TCP/IP +stacks. That impacts how we store addresses so we may need a size ifdef or +union to avoid burdening small boxes with full address objects. + +- Attach datagrams to an address. Might reduce our UDP datagram size a bit +below 1K - do we care ? Needed on send and receive. + +- Allow platform to set sizes. + +- Implement binding ioctls, exit clean up, test of compatible struct and consts + +- Need to sort out the send/recv interface and address setting. + +- Push "local" address check out of syscall_net + +- net_ hook for configuration ioctls or somesuch ? + +- Libc needs a resolver and all the networking helper stuff BSD expects + +- Could we consume and undirty buffers with a slightly naughy readi hook ? + +- shutdown() + +- socket options + +- recvmsg/sendmsg (maybe - silly functions!) diff --git a/Kernel/dev/net/net_native.c b/Kernel/dev/net/net_native.c new file mode 100644 index 00000000..51fb1a3b --- /dev/null +++ b/Kernel/dev/net/net_native.c @@ -0,0 +1,554 @@ +#include +#include +#include +#include +#include + +/* This holds the additional kernel context for the sockets */ +struct sockdata sockdata[NSOCKET]; + +static void wakeup_all(struct socket *s) +{ + wakeup(s); + wakeup(&s->s_data); + wakeup(&s->s_iflag); +} + +/* + * The daemon has sent us a message. Process the message. Note that we + * don't have any interrupts in this stack because everything happens + * in a process context of some kind + * + * Is it worth having a single call take a batch of events ? + */ +int netdev_write(void) +{ + struct socket *s; + struct sockdata *sd; + static struct netevent ne; + + /* Grab a message from the service daemon */ + if (net_ino == NULL || udata.u_count != sizeof(ne) || + uget(udata.u_base, &ne, sizeof(ne)) == -1 || + ne.socket >= NSOCKET) { + udata.u_error = EINVAL; + return -1; + } + + s = sockets + ne.socket; + sd = sockdata + ne.socket; + + switch (ne.event) { + /* State change. Wakes up the socket having moved state */ + case NE_NEWSTATE: + s->s_state = ne.data; + sd->ret = ne.ret; + /* A synchronous state change has completed */ + sd->event &= ~ NETW_STATEW; + /* Review select impact for this v wakeup_all */ + wakeup(s); + break; + /* Asynchronous state changing event */ + case NE_EVENT: + s->s_state = ne.data; + s->s_err = ne.ret; + wakeup_all(s); + break; + /* Change an address */ + case NE_SETADDR: + if (ne.data < 3) + memcpy(s->s_addr[ne.data], ne.info, + sizeof(struct sockaddrs)); + break; + /* Response to creating a socket. Initialize lcn */ + case NE_INIT: + s->s_state = SS_UNCONNECTED; + s->s_lcn = ne.data; + sd->event = 0; + sd->ret = ne.ret; + sd->err = 0; + sd->rbuf = sd->rnext = 0; + sd->tbuf = sd->tnext = 0; + wakeup(s); + break; + /* Indicator of write room from the network agent */ + case NE_ROOM: + sd->tbuf = ne.data; + wakeup(&s->s_iflag); + break; + /* Indicator of data from the network agent */ + case NE_DATA: + sd->rnext = ne.data; /* More data available */ + memcpy(sd->rlen, ne.info, + sizeof(uint16_t) * NSOCKBUF); + s->s_iflag |= SI_DATA; + break; + default: + kprintf("netbad %d\n", ne.event); + udata.u_error = EOPNOTSUPP; + return -1; + } + return udata.u_count; +} + +/* When events are pending we simply hand all the structs to the server + as copies. It can then make any decisions it needs to make */ +static int netdev_report(struct sockdata *sd) +{ + uint8_t sn = sd - sockdata; + struct socket *s = sockets + sn; + + if (uput(sd, udata.u_base, sizeof(*sd)) == -1 || + uput(s, udata.u_base + sizeof(*sd), sizeof(*s)) == -1) + return -1; + sd->event &= ~NEVW_MASK; + return udata.u_count; +} + +/* + * Scan the socket table for any socket with a pending event. Shovel + * the first one we find at the daemon. We should possibly round-robin + * these but it's not clear it's that important + */ +int netdev_read(uint8_t flag) +{ + if (net_ino == NULL || udata.u_count != sizeof(struct sockmsg)) { + udata.error = EINVAL; + return -1; + } + while(1) { + struct sockdata *sd = sockdata; + while (sd != sockdata + NSOCKET) { + if (sd->event) + return netdev_report(sd); + sd++; + } + if (psleep_flags(&ne, flag)) + return -1; + } +} + +/* + * The ioctl interface at the moment is simply the initialization + * function. + */ +static int netdev_ioctl(uarg_t request, char *data) +{ + int16_t fd; + inoptr ino; + + switch(request) { + /* Daemon starting up, passing file handle of cache */ + /* FIXME: Check sizes etc are valid via some kind of + passed magic hash */ + case NET_INIT: + if (net_ino) { + udata.u_error = EBUSY; + return -1; + } + fd = ugetw(data); + if ((net_ino = getinode(fd)) == NULLINODE) + return -1; + i_ref(net_ino); + return 0; + } +} + +/* + * On a close of the daemon close down all the sockets we + * have opened. + */ +static int netdev_close(void) +{ + struct socket *s = sockets; + if (net_ino) { + i_deref(net_ino); + while (s < sockets + NSOCKET) { + if (s->s_state != SS_UNUSED) { + s->s_state = SS_CLOSED; + wakeup_all(s); + } + s++; + } + } +} + +/* + * We have received an event from userspace that requires us to wait + * until the network stack performs the relevant state change. Pass + * the wanted new state on to the daemon, then wait until our STATEW + * flag is cleared by a suitable message. + */ +static int netn_synchronous_event(struct socket *s, uint8_t state) +{ + uint8_t sn = s - sockets; + struct sockdata *sd = &sockdata[sn]; + + sd->event |= NETW_STATE | NETW_STATEW; + sd->newstate = state; + wakeup(&ne); + + do { + psleep(s); + } while (sd->event & NETW_STATEW); + + udata.u_error = sd->ret; + return -1; +} + +/* + * Flag an unsolicited event to the daemon. These are used to + * handshake the buffer status. + */ +static void netn_asynchronous_event(struct socket *s, uint8_t event) +{ + uint8_t sn = s - sockets; + struct sockdata *sd = &sockdata[sn]; + sd->event |= event; + wakeup(&ne); +} + +/* + * Queue data to a stream socket. We use the entire buffer space + * available as a ring buffer and write bytes to it. We then update + * our pointer and poke the daemon to send stuff. + */ +static uint16_t netn_queuebytes(struct socket *s) +{ + arg_t n = udata.u_count; + arg_t r = 0; + uint8_t sn = s - sockets; + struct sockdata *sd = &sockdata[sn]; + /* Do we have room ? */ + if (sd->tnext == sd->tbuf) + return 0; + + udata.u_sysio = false; + udata.u_offset = sn * SOCKBUFOFF + RXBUFOFF + sd->tnext; + + /* Wrapped part of the ring buffer */ + if (n && sd->tnext > sd->tbuf) { + /* Write into the end space */ + uint16_t spc = TXBUFSIZ - sd->tnext; + if (spc < n) + spc = n; + udata.u_count = spc; + /* FIXME: check writei returns and readi returns properly */ + writei(net_ino, 0); + if (udata.u_error) + return 0xFFFF; + sd->tnext += spc; + n -= spc; + r = spc; + /* And wrap */ + if (sd->tnext == TXBUFSIZE) + sd->tnext = 0; + } + /* If we are not wrapped or just did the overflow write lower */ + if (n) { + spc = sd->tbuf - sd->tnext; + if (spc < n) + spc = n; + udata.u_count = spc; + udata.u_offset = sn * SOCKBUFOFF + RXBUFOFF + sd->tnext; + + /* FIXME: check writei returns and readi returns properly */ + writei(net_ino, 0); + if (udata.u_error) + return 0xFFFF; + sd->tnext += spc; + r += spc; + } + /* Tell the networkd daemon there is more data in the ring */ + netn_asynchronous_event(s, NE_WRITE); + return r; +} + +/* + * Queue data to a datagram socket. At the moment we use the ring + * as a set of fixed sized buffers. That may want changing. We do + * however need to work out how to pass an address and size header + * in the buffers, while still getting the ring behaviour right if + * we changed this as well as avoiding partial writes of a datagram. + * + * FIXME: we need to attach an address to getbuf/putbuf cases because we + * may be using sendto/recvfrom + */ +static uint16_t netn_putbuf(struct socket *s) +{ + uint8_t sn = s - sockets; + struct sockdata *sd = &sockdata[sn]; + + if (udata.u_count > TXPKTSIZE) { + udata.u_error = EMSGSIZE; + return 0xFFFF; + } + if (sd->tnext == sd->tbuf) + return 0; + + udata.u_sysio = false; + udata.u_offset = sn * SOCKBUFOFF + RXBUFOFF + sd->tbuf * TXPKTSIZE; + /* FIXME: check writei returns and readi returns properly */ + writei(net_ino, 0); + sd->tlen[sd->tnext++] = udata.u_count; + if (sd->tnext == NSOCKBUF) + sd->tnext = 0; + /* Tell the network stack there is another buffer to consume */ + netn_asynchronous_event(s, NEV_WRITE); + return udata.u_count; +} + +/* + * Pull a packet from the receive buffer. We fetch the next ring buffer + * slot and then copy as much as is required into the user buffer. This + * side also needs to handle addressing better, and may make sense to + * use the ring buffer packing. Once done we poke the daemon so it knows + * space is freed. + */ +static uint16_t netn_getbuf(struct socket *s) +{ + uint8_t sn = s - sockets; + struct sockdata *sd = &sockdata[sn]; + arg_t n = udata.u_count; + arg_t r = 0; + + if (sd->rbuf == sd->rnext) + return 0; + udata.u_sysio = false; + udata.u_offset = sn * SOCKBUFOFF + sd->rbuf * RXPKTSIZE; + udata.u_count = min(udata.u_count, sd->rlen[sd->rbuf++]); + /* FIXME: check writei returns and readi returns properly */ + readi(net, ino, 0); + /* FIXME: be smarter when we send this */ + if (sd->rbuf == NSOCKBUF) + sd->rbuf = 0; + netn_asynchronous_event(s, NEV_READ); + return udata.u_count; +} + +/* + * Pull bytes from the receive ring buffer. We copy as many bytes as + * we can to fulfill the user request. Short reads are acceptable if + * the buffer contains some data but not enough. + * After reading we tell the daemon and it will adjust the TCP window + * and send an ack frame as appropriate as well as adjusting its + * copy of the ring state + */ +static uint16_t netn_copyout(struct socket *s) +{ + arg_t n = udata.u_count; + arg_t r = 0; + uint8_t sn = s - sockets; + struct sockdata *sd = &sockdata[sn]; + + if (sd->rnext == sd->rbuf) + return 0; + + udata.u_sysio = false; + udata.u_offset = sn * SOCKBUFOFF + sd->rbuf; + + /* Wrapped part of the ring buffer */ + if (n && sd->rnext < sd->rbuf) { + /* Write into the end space */ + uint16_t spc = RXBUFSIZ - sd->rbuf; + if (spc < n) + spc = n; + udata.u_count = spc; + /* FIXME: check writei returns and readi returns properly */ + readi(net_ino, 0); + if (udata.u_error) + return 0xFFFF; + sd->rbuf += spc; + n -= spc; + r = spc; + /* And wrap */ + if (sd->rbuf == RXBUFSIZE) + sd->rbuf = 0; + } + /* If we are not wrapped or just did the overflow write lower */ + if (n) { + spc = sd->rnext - sd->rbuf; + if (spc < n) + spc = n; + udata.u_count = spc; + /* FIXME: check writei returns and readi returns properly */ + readi(net_ino, 0); + if (udata.u_error) + return 0xFFFF; + sd->rbuf += spc; + r += spc; + } + /* Tell the networkd daemon there is more room in the ring */ + /* FIXME: be smarter when we send this */ + netn_asynchronous_event(s, NE_READ); + return r; + +} + + +/* + * Called from the core network layer when a socket is being + * allocated. We can either move the socket to SS_UNCONNECTED, + * or error. In our case the daemon will reply with an NE_INIT, + * or a state change to set an error. + * + * This call is blocking but the BSD socket API users don't expect + * anything to block for long. Blocking here is however needed because + * some of the stacks (this one included) are asynchronous to the + * OS. + */ +int net_init(struct socket *s) +{ + if (!net_ino) { + udata.u_error = ENETDOWN; + return -1; + } + return netn_synchronous_event(s, SS_UNCONNECTED); +} + +/* + * A bind has occurred. This might be a user triggering a bind but it + * could also be an autobind. + * + * FIXME: distinguish bind and autobind so we can push address picking + * into the stack implementation to cover non IP stacks + */ +int net_bind(struct socket *s) +{ + return netn_synchronous_event(s, SS_BOUND); +} + +/* + * A listen has been issued by the user. Inform the underlying TCP + * stack that it should accept connections on this socket. A stack that + * lacks incoming connection support can error instead + */ +int net_listen(struct socket *s) +{ + return netn_synchronous_event(s, SS_LISTENING); +} + +/* + * A connect has been issued by the user. This message tells the + * stack to begin connecting. It should put the socket state into + * SS_CONNECTING before returning, or it can error. + */ +int net_connect(struct socket *s) +{ + return netn_synchronous_event(s, SS_CONNECTING); +} + +/* + * A socket is being closed by the user. Move the socket into a + * closed state and free the resources used. If the underlying + * implementation has longer lived resources (eg a TCP port moving + * into TIME_WAIT) then the socket and internal resources must be + * disconnected from one another. + */ +void net_close(struct socket *s) +{ + /* Caution here - the native tcp socket will hang around longer */ + netn_synchronous_event(s, SS_CLOSED); +} + +/* + * Read or recvfrom a socket. We don't yet handle message addresses + * sensible and that needs fixing + */ +arg_t net_read(struct socket *s, uint8_t flag) +{ + uint16_t n = 0; + + if (sd->err) { + udata.u_error = sd->err; + sd->err = 0; + return -1; + } + while (1) { + if (s->s_state < SS_CONNECTED) { + udata.u_error = EINVAL; + return -1; + } + + if (s->s_type != SOCKTYPE_TCP) + n = netn_getbuf(s); + else + n = netn_copyout(s); + if (n == 0xFFFF) + return -1; + if (n) + return n; + s->s_iflag &= ~SI_DATA; + /* Could do with using timeouts here to be clever for non O_NDELAY so + we aggregate data. For now assume a fifo */ + if (psleep_flags(&s->s_iflag, flag)) + return -1; + } +} + +/* + * Write or sendto a socket. We don't yet handle message addresses + * sensible and that needs fixing + */ +arg_t net_write(struct socket * s, uint8_t flag) +{ + uint16_t n = 0, t = 0; + + if (sd->err) { + udata.u_error = sd->err; + sd->err = 0; + return -1; + } + + while (t < udata.u_count) { + if (s->s_state == SS_CLOSED) { + udata.u_error = EPIPE; + ssig(udata.u_ptab, SIGPIPE); + return -1; + } + if (s->s_type != SOCKTYPE_TCP) + n = netn_putbuf(s); + else + n = netn_queuebytes(s); + /* FIXME: buffer the error in this case */ + if (n == 0xFFFF) + return udata.u_count ? udata.u_count : -1; + + t += n; + + if (n == 0) { /* Blocked */ + netn_asynchronous_event(s, NE_ROOM, udata.u_count); + if (psleep_flags(&s->s_iflag, flag)) + return -1; + } + } + return udata.u_count; +} + +/* Gunk we are still making up */ +struct netdevice net_dev = { + 0, + "net0", + IFF_POINTOPOINT +}; + +arg_t net_ioctl(uint8_t op, void *p) +{ + used(op); + used(p); + return -EINVAL; +} + +void netdev_init(void) +{ +} + +uint8_t use_net_r(void) +{ + return 1; +} + +uint8_t use_net_w(void) +{ + return 1; +} diff --git a/Kernel/dev/net/net_native.h b/Kernel/dev/net/net_native.h new file mode 100644 index 00000000..4492cbf1 --- /dev/null +++ b/Kernel/dev/net/net_native.h @@ -0,0 +1,50 @@ +#ifndef _DEV_NET_NET_NATIVE_H + +struct sockdata { + uint8_t err; + uint8_t ret; + uint8_t event; /* Waiting events to go to user space */ +#define NEVW_STATE 1 +#define NEVW_READ 2 +#define NEVW_WRITE 4 +#define NEVW_MASK 7 +#define NEVW_STATEW 128 + uint8_t newstate; /* Requested new state */ + uint16_t rlen[NSOCKBUF]; /* TCP uses 0 as total space */ + uint8_t rbuf; + uint8_t rnext; + uint16_t tlen[NSOCKBUF]; /* Not used by TCP */ + uint16_t tbuf; /* Next transmit buffer (pointer for tcp) */ + uint16_t tnext; /* Buffers of room (bytes if TCP) */ +}; + +struct sockmsg { + struct socket s; + struct sockdata sd; +}; + +#define NE_NEWSTATE 1 +#define NE_EVENT 2 +#define NE_SETADDR 3 +#define NE_INIT 4 +#define NE_ROOM 5 +#define NE_DATA 6 + +/* These are by socket and each one is + + [RX.0][RX.1]..[RX.n][TX.0][TX.1]...[TX.n] + +*/ + +#define NSOCKBUF 4 /* 4 buffers per socket */ +#define TXBUFSIZE 1024 +#define RXBUFSIZE 1024 + +#define SOCKBUFOFF (RXBUFOFF + RXBUFSIZ) +#define RXBUFOFF TXBUFSIZ + +/* Total size is thus 8K * sockets - typically 64K for the file */ + + + +#endif \ No newline at end of file