1 Zettair can be compiled using two different build systems. The first
2 (most commonly used) is using GNU configure and make. The second is
3 via Visual C++. Instructions on configuration, compilation and
4 installation using GNU configure and make are below. See the section
5 on Visual C++ below for instructions on compiling Zettair on windows.
7 Note that in both cases, compilation of Zettair requires the presence
8 of the zlib compression library.
13 From inside the top-level directory created when you unpacked the
14 Zettair distribution, run the 'configure' script:
18 configure accepts a range of options. To get a list of available
19 options, run with the '--help' flag:
23 The most useful option is the --prefix option, which specifies which
24 directory Zettair is going to be installed under. So, for instance,
25 to install Zettair in its own directory under '/usr/include', run
28 $ ./configure --prefix=/usr/local/zettair
30 The default Zettair installation directory is /usr/local. Make sure
31 to specify a directory into which you have, or can acquire, write
32 permission: for anything under /usr, this probably means you need to
33 be able to become the root user.
39 Once configure has finished running, the Zettair distribution is ready
40 to be compiled. Also from inside the top-level Zettair distribution
41 directory, run the 'make' program:
49 After Zettair has been compiled, you can install it in the directory
50 you specified with the --prefix option to configure. To do this, run
51 make with the 'install' target:
55 Assuming that you selected '/usr/local/zettair' as your installation
56 prefix, this will create the directories '/usr/local/zettair/bin' and
57 '/usr/local/zettair/share'. The main Zettair executable, zet, and a
58 number of utilities will be installed in the former; configuration
59 files will be installed in the latter.
61 While it is possible to run Zettair directly from the directory
62 you compiled it in without performing an explicit installation,
63 this is not recommended, as you have to explicitly specify the
64 location of the configuration files each time you build an index.
65 If you want to run from the compilation directory, we suggest you
66 run configure with the argument '--prefix=.', then run 'make
69 To save yourself having to type the full path to the zet executable
70 every time you want to run it (such as '/usr/local/zettair/bin/zet'),
71 you might want to add Zettair's bin directory to your PATH. How you
72 do this depends on which shell you use. For instance, with the bash
73 shell, edit the file ~/.bash_profile, and add a line something like:
75 PATH=/usr/local/zettair/bin:$PATH
77 This will probably not be necessary if you used the default prefix of
80 4. Visual C++ configuration, compilation and installation
81 =========================================================
83 First, obtain the latest Zettair distribution and decompress it.
84 Obtain the latest zlib source distribution (do NOT download the precompiled DLL)
85 from http://www.zlib.net/ and decompress it into a seperate directory.
86 Follow the zlib directions to create a statically-linked zlib.lib using
88 Locate zlib.lib within your zlib directory tree, and copy it to the
89 root zettair-X.X/ directory.
90 In addition, copy zlib.h and zconf.h into zettair-X.X/src/include.
92 Load zettair-X.X/win32/visualc6/zettair.dsw into Visual C++.
93 Using the Build/Set Active Configuration menu option, select the
94 executable that you wish to build.
95 Build the executable by selecting Build/Rebuild All. (Repeat for any
96 further executables that you wish to build). You may then copy the
97 created executables whereever you like, and use them.
102 Now you're ready to read the Zettair user manual, in the 'doc'
103 subdirectory of the Zettair distribution.
105 First, though, you might like to check that everything installed and
106 works ok (or perhaps you're just impatient to take your new purchase
107 for a spin). To help with this, we've included the text for Herman
108 Melville's "Moby Dick" as a sample collection for you to play around
109 with. This can be found in the subdirectory sampletext/mobydick.
111 Let's begin by indexing Moby Dick. To do this, change your current
112 directory to sampletext/mobydick. (You can index it from anywhere,
113 but this is simplest.) We'll assume that the 'zet' executable is in
114 your PATH; otherwise, substitute the full pathname to the executable
115 wherever you see 'zet' below. So, let's build this index:
117 $ zet -i -t TREC mobydick.trec
119 The '-i' argument tells zet that we're building a new index. '-t
120 TREC' tells zet that the input documents are in TREC format. If you
121 don't know what the TREC format is, don't worry: it's just a
122 convenient way for us to store multiple documents in the one file.
123 The default input format is to treat each input file as a separate
124 HTML document; since we want to treat each paragraph of Moby Dick as a
125 separate document, this would mean a lot of small files.
127 Now, the text of Moby Dick is less than 1.3 MBs in length, so this
128 won't take long to run--Zettair is more used to working with document
129 collections of 10 GB or more, but it won't complain. When it's
130 finished running, you should see two new files in the current
131 directory, one called 'index.v.0', the 'index.map'. These are
132 Zettair's index files.
134 If you're interested, the former is the index proper, containing
135 the list of terms and where each term occurs; the latter is the
136 document map, providing information about each document indexed.
137 Don't open these files, or you'll violate the EULA! No, just
138 kidding, but they're in binary format, and so won't make a lot of
139 sense in your text editor.
141 So now we're ready to run some queries. To do this, we run zet again,
142 this time without any options:
146 Zettair will load up the index (very quickly, in this case), and then
147 prompt you for input. Let's test the rumour that Moby Dick has
148 something to say about whales:
152 1. Chapter32,Paragraph21 (score 2.506133, docid 688)
153 2. Chapter32,Paragraph19 (score 2.411723, docid 686)
154 3. Chapter32,Paragraph23 (score 2.375970, docid 690)
155 4. Chapter32,Paragraph46 (score 2.344365, docid 713)
156 5. Chapter91,Paragraph17 (score 2.256999, docid 1807)
157 6. Chapter91,Paragraph18 (score 2.247493, docid 1808)
158 7. Chapter75,Paragraph10 (score 2.245327, docid 1552)
159 8. Chapter0,Paragraph74 (score 2.150178, docid 74)
160 9. Chapter0,Paragraph69 (score 2.145605, docid 69)
161 10. Chapter0,Paragraph40 (score 2.122386, docid 40)
162 11. Chapter32,Paragraph24 (score 2.119576, docid 691)
163 12. Chapter0,Paragraph92 (score 2.118144, docid 92)
164 13. Chapter36,Paragraph25 (score 2.080975, docid 773)
165 14. Chapter32,Paragraph8 (score 2.059031, docid 675)
166 15. Chapter0,Paragraph51 (score 2.054273, docid 51)
167 16. Chapter0,Paragraph63 (score 2.050327, docid 63)
168 17. Chapter79,Paragraph6 (score 2.048261, docid 1590)
169 18. Chapter64,Paragraph60 (score 2.046396, docid 1387)
170 19. Chapter49,Paragraph7 (score 2.042479, docid 1059)
171 20. Chapter55,Paragraph10 (score 2.039895, docid 1226)
173 20 results of 576 shown (took 0.001639 seconds)
175 This tells us that the word "whale" occurs in 576 documents in the
176 collection (which is to say, paragraphs in Moby Dick). Zettair thinks
177 the most pertinent paragraph is paragraph 21 of chapter 32. We can
178 ask Zettair to print out this document using the 'cache' directive and
179 specifying the document's docid:
184 <DOCNO>Chapter 32, Paragraph 21</DOCNO>
185 FOLIOS. Among these I here include the following chapters:--I. The
186 SPERM WHALE; II. the RIGHT WHALE; III. the FIN-BACK WHALE; IV. the
187 HUMP-BACKED WHALE; V. the RAZOR-BACK WHALE; VI. the SULPHUR-BOTTOM
191 Don't worry about the <DOC> and <DOCNO> tags: that's just part of the
192 TREC format we've used to mark up Moby Dick for indexing. You'll
193 notice that the word 'whale' occurs seven times in little more than
194 three lines, which is why Zettair thinks this is probably the
195 paragraph you're looking for.
197 You can, of course, query for more than one word at a time. Say we
198 were looking for a particular kind of whale:
202 1. Chapter36,Paragraph25 (score 6.672417, docid 773)
204 20. Chapter100,Paragraph31 (score 5.192795, docid 1963)
206 20 results of 652 shown (took 0.000801 seconds)
208 Hmm, 652 paragraphs--but "whale" only occurs in 576! Well, what
209 Zettair is reporting here is all the documents with either "white"
210 _or_ "whale" in them. We can tell specify that we only want documents
211 that _both_ occur in:
215 1. Chapter128,Paragraph4 (score 4.778444, docid 2413)
217 20. Chapter100,Paragraph11 (score 3.977057, docid 1943)
219 20 results of 110 shown (took 0.001045 seconds)
221 or, probably more to the point, only documents that the exact phrase
222 "white whale" occurs in:
226 1. Chapter36,Paragraph25 (score 5.618426, docid 773)
228 20. Chapter59,Paragraph4 (score 4.402413, docid 1269)
230 20 results of 80 shown (took 0.000999 seconds)
233 This is great so far (or at least, we hope you think so), but it gets
234 tiresome having to individually request each document to see if it
235 what we're looking for, especially if the documents are longer than a
236 single paragraph. What we really want is for the list of results to
237 include a summary of each document. And we can ask Zettair to provide
240 To do so, we'll have to restart Zettair. Hit "CONTROL-D" or whatever
241 key combination indicates end of input on your system to end your
242 current session. This time, we'll run the zet executable with the
243 '-q' option to indicate that we'd like to see document summaries, and
244 what form we want these summaries to be in. We'll also restrict
245 output to just the top 2 results:
247 $ zet -q capitalise -n 2
249 Zettair can highlight your search terms within the document summaries
250 in a number of different ways, 'capitalise' being one of them. So,
251 let's try out some summaries:
255 1. Chapter48,Paragraph51 (score 9.602944, docid 1051)
256 Wet, drenched through, and shivering cold, despairing of SHIP or
257 boat, we lifted up our eyes as the dawn came on. The mist still
258 spread over the SEA, the empty lantern lay crushed in the bottom of
259 the boat. ...We all heard a faint creaking, as of ropes and yards
260 hitherto muffled by the STORM. ...Affrighted, we all sprang into the
261 SEA as the SHIP at last loomed into view, bearing right down upon us
262 within a distance of not much more than its length.
263 2. Chapter0,Paragraph92 (score 9.216548, docid 92)
264 "Oh, the rare old Whale, mid STORM and gale In his ocean home will
265 be A giant in might, where might is right, And King of the boundless
268 2 results of 526 shown (took 0.012296 seconds)
272 1. Chapter35,Paragraph11 (score 10.445776, docid 745)
273 "Roll on, thou deep and DARK BLUE OCEAN, roll! Ten thousand
274 blubber-hunters sweep over thee in vain."
276 1 results of 1 shown (took 0.002945 seconds)
278 And that concludes our tour.