doc/ego/ic/ic2

   1 .NH 2
   2 Representation of complex data structures in a sequential file
   3 .PP
   4 Most programmers are quite used to deal with
   5 complex data structures, such as
   6 arrays, graphs and trees.
   7 There are some particular problems that occur
   8 when storing such a data structure
   9 in a sequential file.
  10 We call data that is kept in
  11 main memory
  12 .UL internal
  13 ,as opposed to
  14 .UL external
  15 data
  16 that is kept in a file outside the program.
  17 .sp
  18 We assume a simple data structure of a
  19 scalar type (integer, floating point number)
  20 has some known external representation.
  21 An
  22 .UL array
  23 having elements of a scalar type can be represented
  24 externally easily, by successively
  25 representing its elements.
  26 The external representation may be preceded by a
  27 number, giving the length of the array.
  28 Now, consider a linear, singly linked list,
  29 the elements of which look like:
  30 .DS
  31 record
  32         data: scalar_type;
  33         next: pointer_type;
  34 end;
  35 .DE
  36 It is significant to note that the "next"
  37 fields of the elements only have a meaning within
  38 main memory.
  39 The field contains the address of some location in
  40 main memory.
  41 If a list element is written to a file in
  42 some program,
  43 and read by another program,
  44 the element will be allocated at a different
  45 address in main memory.
  46 Hence this address value is completely
  47 useless outside the program.
  48 .sp
  49 One may represent the list by ignoring these "next" fields
  50 and storing the data items in the order they are linked.
  51 The "next" fields are represented \fIimplicitly\fR.
  52 When the file is read again,
  53 the same list can be reconstructed.
  54 In order to know where the external representation of the
  55 list ends,
  56 it may be useful to put the length of
  57 the list in front of it.
  58 .sp
  59 Note that arrays and linear lists have the
  60 same external representation.
  61 .PP
  62 A doubly linked, linear list,
  63 with elements of the type:
  64 .DS
  65 record
  66         data: scalar_type;
  67         next,
  68         previous: pointer_type;
  69 end
  70 .DE
  71 can be represented in precisely the same way.
  72 Both the "next" and the "previous" fields are represented
  73 implicitly.
  74 .PP
  75 Next, consider a binary tree,
  76 the nodes of which have type:
  77 .DS
  78 record
  79         data: scalar_type;
  80         left,
  81         right: pointer_type;
  82 end
  83 .DE
  84 Such a tree can be represented sequentially,
  85 by storing its nodes in some fixed order, e.g. prefix order.
  86 A special null data item may be used to
  87 denote a missing left or right son.
  88 For example, let the scalar type be integer,
  89 and let the null item be 0.
  90 Then the tree of fig. 3.1(a)
  91 can be represented as in fig. 3.1(b).
  92 .DS
  93 .ft 5
  94                         4
  95                       /   \e
  96                     9      12
  97                   /  \e    /  \e
  98                 12    3   4   6
  99                      / \e  \e  /
 100                      8  1  5 1
 101 .ft R
 102
 103 Fig. 3.1(a) A binary tree
 104
 105
 106 .ft 5
 107 4 9 12 0 0 3 8 0 0 1 0 0 12 4 0 5 0 0 6 1 0 0 0
 108 .ft R
 109
 110 Fig. 3.1(b) Its sequential representation
 111 .DE
 112 We are still able to represent the pointer fields ("left"
 113 and "right") implicitly.
 114 .PP
 115 Finally, consider a general
 116 .UL graph
 117 , where each node has a "data" field and
 118 pointer fields,
 119 with no restriction on where they may point to.
 120 Now we're at the end of our tale.
 121 There is no way to represent the pointers implicitly,
 122 like we did with lists and trees.
 123 In order to represent them explicitly,
 124 we use the following scheme.
 125 Every node gets an extra field,
 126 containing some unique number that identifies the node.
 127 We call this number its
 128 .UL id.
 129 A pointer is represented externally as the id of the node
 130 it points to.
 131 When reading the file we use a table that maps
 132 an id to the address of its node.
 133 In general this table will not be completely filled in
 134 until we have read the entire external representation of
 135 the graph and allocated internal memory locations for
 136 every node.
 137 Hence we cannot reconstruct the graph in one scan.
 138 That is, there may be some pointers from node A to B,
 139 where B is placed after A in the sequential file than A.
 140 When we read the node of A we cannot map the id of B
 141 to the address of node B,
 142 as we have not yet allocated node B.
 143 We can overcome this problem if the size
 144 of every node is known in advance.
 145 In this case we can allocate memory for a node
 146 on first reference.
 147 Else, the mapping from id to pointer
 148 cannot be done while reading nodes.
 149 The mapping can be done either in an extra scan
 150 or at every reference to the node.