2 Representation of complex data structures in a sequential file
4 Most programmers are quite used to deal with
5 complex data structures, such as
6 arrays, graphs and trees.
7 There are some particular problems that occur
8 when storing such a data structure
10 We call data that is kept in
16 that is kept in a file outside the program.
18 We assume a simple data structure of a
19 scalar type (integer, floating point number)
20 has some known external representation.
23 having elements of a scalar type can be represented
24 externally easily, by successively
25 representing its elements.
26 The external representation may be preceded by a
27 number, giving the length of the array.
28 Now, consider a linear, singly linked list,
29 the elements of which look like:
36 It is significant to note that the "next"
37 fields of the elements only have a meaning within
39 The field contains the address of some location in
41 If a list element is written to a file in
43 and read by another program,
44 the element will be allocated at a different
45 address in main memory.
46 Hence this address value is completely
47 useless outside the program.
49 One may represent the list by ignoring these "next" fields
50 and storing the data items in the order they are linked.
51 The "next" fields are represented \fIimplicitly\fR.
52 When the file is read again,
53 the same list can be reconstructed.
54 In order to know where the external representation of the
56 it may be useful to put the length of
57 the list in front of it.
59 Note that arrays and linear lists have the
60 same external representation.
62 A doubly linked, linear list,
63 with elements of the type:
68 previous: pointer_type;
71 can be represented in precisely the same way.
72 Both the "next" and the "previous" fields are represented
75 Next, consider a binary tree,
76 the nodes of which have type:
84 Such a tree can be represented sequentially,
85 by storing its nodes in some fixed order, e.g. prefix order.
86 A special null data item may be used to
87 denote a missing left or right son.
88 For example, let the scalar type be integer,
89 and let the null item be 0.
90 Then the tree of fig. 3.1(a)
91 can be represented as in fig. 3.1(b).
103 Fig. 3.1(a) A binary tree
107 4 9 12 0 0 3 8 0 0 1 0 0 12 4 0 5 0 0 6 1 0 0 0
110 Fig. 3.1(b) Its sequential representation
112 We are still able to represent the pointer fields ("left"
113 and "right") implicitly.
115 Finally, consider a general
117 , where each node has a "data" field and
119 with no restriction on where they may point to.
120 Now we're at the end of our tale.
121 There is no way to represent the pointers implicitly,
122 like we did with lists and trees.
123 In order to represent them explicitly,
124 we use the following scheme.
125 Every node gets an extra field,
126 containing some unique number that identifies the node.
127 We call this number its
129 A pointer is represented externally as the id of the node
131 When reading the file we use a table that maps
132 an id to the address of its node.
133 In general this table will not be completely filled in
134 until we have read the entire external representation of
135 the graph and allocated internal memory locations for
137 Hence we cannot reconstruct the graph in one scan.
138 That is, there may be some pointers from node A to B,
139 where B is placed after A in the sequential file than A.
140 When we read the node of A we cannot map the id of B
141 to the address of node B,
142 as we have not yet allocated node B.
143 We can overcome this problem if the size
144 of every node is known in advance.
145 In this case we can allocate memory for a node
147 Else, the mapping from id to pointer
148 cannot be done while reading nodes.
149 The mapping can be done either in an extra scan
150 or at every reference to the node.