RobIn for IOI I/O ===== === === === by Tom Verhoeff (Chair IOI Scientific Committee) July 2002, Version 0.7 (incomplete draft) Documents RobIn Version 0.7 Text bracketed by is still To Be Completed. Contents -------- Introduction RobIn Feature Summary Historical Intermezzo I/O in (Turbo/Free) Pascal and (Turbo) C/C++ IOI-Style Meta-Format Restrictions User Requirements for RobIn How to Use RobIn How to Interpret RobIn Messages Design and Implementation of RobIn Limitations of RobIn Other Tools Concluding Remarks Acronyms Sample Programs Introduction ------------ RobIn is a software package featuring a FreePascal Unit for ROBUST INPUT from text files with an IOI-style format (IOI = International Olympiad in Informatics). RobIn was originally developed for Turbo Pascal. In this document, I provide some motivation for RobIn and I explain its use and design. Just in case you want to see it immediately, the main implementation source file is named "RobIn.pas", its usage is explained in the section "How to Use RobIn". "Example.pas" and "ValidEx.pas" are programs that illustrate how to use RobIn. RobIn simplifies the development (in FreePascal) of * /input validators/, viz. programs that check the validity of the input files produced by the organizers, and * /output checkers/, viz. programs that check the format and/or correctness of output files produced by the competitors or their programs. The description of an IOI programming task defines the FORMAT of every input and output file involved. The IOI organizers are responsible for producing valid input files. The competitors may assume that the input files are valid, that is, their programs may crash on invalid input without being penalized for it. Likewise, the competitors are responsible for delivering programs that produce correct output files. When grading the programs, format restrictions are enforced in a more relaxed way for competitor-generated output files than for organizer-generated input files. In the IOI competition, the file formats are kept simple because * the IOI focuses on algorithms and not on I/O, which serves merely as an interface to the algorithm; * the format must be easy to read and write by (1) ALL of the IOI programming languages, and (2) by the competitors (the latter mainly through editors, e.g. built into the IDE for each programming language). The competitor programs must read the input files and must write the output files. We do not want competitors to waste their time on programming input parsers, when the task is really about some complex algorithm. The competitors most likely must also somehow put together various input files for testing and they must also inspect the resulting output files. This is often done "by hand" through an editor and dumping to the screen, or even printing on paper. The organizers of an IOI have the responsibility to define file formats and to verify them. Developing an input validator and an output checker forces one to think carefully about the format involved. This turns out to be a great help in formulating the format definitions. Development of programs for input validation and output checking must not be underestimated. Such programs must be prepared to handle arbitrary data in these files and they should not crash but instead report back any problems encountered in a clear fashion. In this document, we often use the term INPUT FILE for files that are to be read by competitor programs and that are created by (tools of) the organizers. We use the term OUTPUT FILE for files that are to be created by competitors or their programs, and which are read by (tools of) the organizers. Both kind of files are input files for RobIn. RobIn Feature Summary ----- ------- ------- RobIn reads and converts text file content, while checking * file I/O for errors * library calls for correct usage (parameters, call order) * format definition for adherence to meta-format rules * file content for correct format (incl. range check) * task-specific conditions through AssertRIF and IsAbortedRIF RobIn reports diagnostic messages in a log file: * Diagnostic messages are numbered and taken from message definition files to simplify translation * Level of diagnostic detail can be tuned * Statistics are collected (per file processed and overall) * Precise feedback is provided in terms of o coordinates within the format being processed o coordinates within the file being read Historical Intermezzo ---------- ---------- Traditionally, the algorithms developed by the IOI competitors are delivered as executable programs (since IOI2001 as source code) and they are graded by executing them for a (necessarily limited) set of test cases. They are not graded by inspecting the algorithm designs or program texts. In the scientific and engineering world, it has long been known that execution-based testing is a very poor way of measuring the performance characterictics of algorithms (characteristics such as correctness, speed, and memory usage). Still, there are some (compelling?) reasons for execution-based testing at the IOI: * the language barrier, * lack of agreement on alternatives, and * the engineering principle that executable products need to be demonstrated by execution. At the IOI, the competitors are high-school students and they are not expected to be capable of adequate technical communication with foreign organizers and evaluators, neither written nor oral. This is known as the language barrier. Therefore, the tasks are translated to the competitor's native language before the competition. At the IOI, a translation step in the opposite direction, from competitor to organizer, is avoided by requiring the delivery of machine-executable products. (By the way, I still think the IOI should look for ways of expanding the evaluation process to include non-execution-based methods of verification.) Another reason for execution-based testing at the IOI is that it apprears hard to agree upon alternatives. What constitutes a design of an algorithm? How should such a design be presented and judged? We have no international notation for algorithms, like they do in mathematics. (Though we should be able to get a long way; after all, the delegation leaders have little trouble discussing the tasks, and possible approaches and obstacles to solutions.) Finally, a correctness proof alone is not good enough for an executable product. Obviously, we need to see the thing "fly" to believe it. (In my opinion, the IOI hovers dangerously between theoretical computer science and software engineering.) For execution, the algorithms somehow need to be provided with various input parameters, and the resulting output needs to be observed. Since IOI'94 in Sweden, such I/O is done through text files. All input parameters are offered in a text file, even if this concerns just a single integer. All output must be written to a text file. The text file interface provides a convenient abstraction from the programming language used to express the algorithm. Note that the IOI has always allowed a choice among multiple programming languages (for lack of agreement on a single language). Alternatives to (the direct use of) text files are * task-specific libraries that competitor programs must use, * tasks that ask for parameterized routines instead of programs. The organizers of an IOI can develop special-purpose libraries to encapsulate I/O. Such a library provides a means of I/O on the language level: variables or parameters with native types can be directly used; there is no need for formatting and conversions. This use of libraries is now common practice for reactive programming tasks. There is still the issue of restrictions on the use of the library (its interface definition), in particular, also restrictions on the calling order of library routines. Another alternative is to require that the competitors develop a routine with input and output parameters in the programming language of their choice. For evaluation, the routine is embedded into an evalution framework to produce an executable program. You might say that here the competitors develop a library consisting of a single routine. This may work better in the case of source code submission than submission of object files. The major disadvantage of both alternative approaches is that the details of the interface now depend on the programming language. Even with only two olympic programming languages, this is quite a burden on the organization. (But this does not stop us from using it for reactive tasks.) It seems to me that the current situation at the IOI reflects the current state of computer science in general quite well. We have come a long way, but we are not "there" yet. I/O in (Turbo/Free) Pascal and (Turbo) C/C++ --- -- ---------- ------ --- ----- ----- Until IOI2000, the IOI programming languages were Turbo Pascal 7.0 and Turbo C/C++ 3.0. Since IOI2001, we have moved to other Pascal and C/C++ compilers, namely * FreePascal (www.freepascal.org) * GCC (Gnu Compiler Collecion: www.gnu.org/software/gcc/gcc.html). As far as I can tell, this change in compilers has had little effect on the I/O issues. Another change since IOI2001 is the introduction of the Linux platform for grading execution of submitted programs. Competitors may develop programs under either Linux or Windows. One effect of this move is that example input and output files must be prepared for both platforms, because of differing end-of-line conventions. However, the compilers provide a proper abstraction of these I/O details. Hence, the platform has had little influence on the I/O aspects of programming. Competitors will only notice the difference if they do I/O in a way that bypasses the abstraction mechanism. Let me consider some of the subtleties of text I/O in these languages, and in particular text input. First, some general remarks about text files on the Unix/Linux and MS-DOS/Windows platform are necessary. ASCII Character Set The character set used for Unix/Linux and MS-DOS/Windows text files is Extended ASCII (256 8-bit codes). Of these, the "Standard" ASCII codes (128 7-bit codes from 0 through 127, also called Low ASCII) include * 33 control codes (0 through 31, and 127), * 1 blank (SP, space, code 32), * 94 printing characters (Latin uppercase and lowercase letters, digits, and various symbols, with codes 33 through 126). Of the control codes, the following are worth mentioning: BS Backspace, Ctrl-H, ^H, code 8, '\b' HT Horizontal Tab, Tab Ctrl-I, ^I, code 9, '\t' LF Line Feed, New Line Ctrl-J, ^J, code 10, '\n' VT Vertical Tab Ctrl-K, ^K, code 11, '\v' FF Form Feed, New Page Ctrl-L, ^L, code 12, '\f' CR Carriage Return, Ctrl-M, ^M, code 13, '\r' SUB Substitute character, Ctrl-Z, ^Z, code 26 ESC Escape, Ctrl-[, ^[, code 27 DEL Delete, Rub out, code 127 The extended codes (128 through 255, also called High ASCII) cover adorned letters (with various accents for non-English languages), some graphics and special symbols. On the MS-DOS/Windows platform, the two-character combination CR/LF (codes 13, 10) encodes the transition to a new line (end-of-line in Pascal parlance, newline in C terminology). Note that under Unix/Linux, this is accomplished by a single LF, whereas under the MacOS, a single CR is used. (On the platform-less web you can often see the disastrous consequences of these differences.) Under MS-DOS/Windows, the end of a text file is encoded by SUB (Ctrl-Z, code 26). Under Unix/Linux, the end of a text file is not encoded by a special character, but is determined from its length. The other control codes, with the possible exception of the Tab (HT, Ctrl-I, code 9), nowadays have no generally accepted role in text files. The Tab is still used, but this is problematic, because there is no standardized way to define tab settings in a text file. A common default is to assume tab settings every 8th character position, but many programmers do not configure their editor that way. The characters LF (or CR/LF under MS-DOS/Windows), HT (tab), and SP (blank) are collectively known as /whitespace/. (Note: In C, also vertical tab and form feed are treated as whitespace, but we do not.) Text file I/O in FreePascal Here are some FreePascal text I/O details (inherited from Turbo Pascal, hence labeled TP below). For convenience, we assume the following variable declarations: var f: Text; c, d: Char; i: Integer; s: String; TP01. FreePascal does not implement the file window (in ISO Pascal denoted by f^) to look one character ahead into the input stream. It also lacks the get and put routines. All I/O must be done through the (standard) eof/eoln, read/readln, and write/writeln routines. TP02. Reading a single character c by read(f,c) does NOT skip leading whitespace. Leading blanks can be skipped, for example, by "repeat read(f,c) until c<>' '". A single leading blank can be skipped by "read(f,d,c)" and ignoring d. If the file window would have been implemented (cf. TP01), then leading blanks could have been skipped by "while f^=' ' do get(f)". TP03. Reading a single character when eoln(f) is true, returns chr(10) (=LF) under Unix/Linux, and chr(13) (=CR) under MS-DOS/Windows, instead of ' ' as required by the ISO Pascal Standard. Furthermore, under MS-DOS/Windows, the next character read after CR will be chr(10) (=LF). Note that a single call readln(f) consumes both CR and LF. TP04. Reading a single character when eof(f) is true, does NOT raise a run-time I/O error (as suggested by the ISO Pascal Standard). Under MS-DOS/Windows, it returns chr(26) (^Z). Under Unix/Linux, it is not so clear what happens (I have seen chr(26) returned, but also other behavior). TP05. Reading an integer by read(f,i) will skip all leading whitespace (blank, tab, end-of-line). The read then consumes ALL subsequent non-whitespace characters until another whitespace character is encountered. A run-time I/O error (106: invalid numeric format) is raised when the non-whitespace characters violate the (Pascal) numeric format. Trailing whitespace (incl. end-of-line) is never consumed, unless calling readln(f,i), which is equivalent to "read(f,i); readln(f)". TP06. Reading a variable-length string by read(f,s) does NOT skip leading whitespace. Leading blanks are harder to skip than for characters (TP02), because the file window is lacking (TP01). If the file window would have been implemented, then leading blanks could have been skipped by "while f^=' ' do get(f)". It is, of course, possible (but awkward) to read the string with leading whitespace and to trim it afterwards: "while (s<>'') and (s[1]=' ') do Delete(s,1,1)". Note, however, that the string size must be large enough to accommodate all such whitespace. TP07. The reading of characters for a variable-length string by read(f,s) is stopped by the maximum length of the string variable s or the first end-of-line encountered. Note that trailing whitespace on a line will be read into the string. The maximum length for a variable-length string is 255 characters. A trailing end-of-line is never consumed, unless calling readln(f,s). TP08. Readln(f) consumes upto and including the next end-of-line. Under Unix/Linux, this is the next LF. Under MS-DOS/Windows, this is the next CR, and subsequent LF if present. TP09. If the file's last line is not terminated by an end-of-line, then eof(f) and eoln(f) become true simultaneously, and NOT, as the ISO Pascal Standard requires, first eoln(f) and after readln(f) then eof(f). Under the Pascal Standard, a text file consists of a sequence of zero or more /lines/, terminated by /end-of-file/ (recognized by function eof), and each line consists of a sequence of zero or more /characters/, terminated by /end-of-line/ (recognized by function eoln). Thus, the only case where an end-of-file is not preceded by an end-of-line occurs in an empty file. Also, according to the Pascal Standard, eoln should not be called when eof returns true. The subtle difference between FreePascal and the Pascal Standard is illustrated by the two character-and-line counting programs in the appendix "Sample Programs". TP10. Under Unix/Linux, writeln(f) produces LF. Under MS-DOS/Windows, writeln(f) produces CR followed by LF. TP11. Under MS-DOS/Windows, closing an output file appends the ending chr(26) (=Ctrl-Z). Text files and editors Here are some details concerning the Turbo Pascal Editor (built into the Turbo Pascal IDE; the built-in Turbo C/C++ editor is very similar: IDE1. It is NOT possible to add blanks at the end of a line. (Hit the END key to find the "real" end of a line.) IDE2. When opening a file, any blanks at the end of lines are silently discarded (and lost when the file is saved). IDE3. The last line (accessible by the cursor) is NOT written with a terminating end-of-line when saving the file. IDE4. When opening a file with its last line (properly) terminated with an end-of-line, it will show up as a trailing empty line (accessible by the cursor). IDE5. Fixed tab stops are set every 8 positions in columns 1, 9, 17, etc. Hitting the TAB key will insert an "appropriate" number of blanks. To enter a real Tab (TAB character) in the file, use the sequence ^P^I (Ctrl-P, Ctrl-I). Tabs in a file are visually not directly discernable from blanks. IDE6. Control characters (other than HT, CR/LF) are shown as "funny" graphics symbols. Text file I/O in GCC Now let me deal with some GCC C/C++ peculiarities (formerly via Turbo C/C++, hence labeled TC below). First, we deal with I/O in C. This is agreement with the ISO/ANSI C Standard. For convenience, we assume the following variable declarations: #include FILE *f; char c; int i; char s[256]; TC01. Formatted input can easily be done through fscanf with a format string and a sufficient number of result arguments (or via fgets followed by sscanf). We consider fscanf in the following. TC02. Reading a single character by fscanf(f,"%c",&c) does NOT skip leading whitespace by itself. All leading whitespace can be skipped by including a blank directly preceding the format specifier: fscanf(f," %c",&c). TC03. Reading a single character (format "%c") when end-of-line is ahead, will return LF (code 10, '\n'), and it skips both CR (present under MS-DOS/Windows) and LF. Note that putting "\n" in the format string will skip ALL whitespace at that point. TC04. Function fscanf will NOT read the end-of-file character, instead it will return a count less than the number of arguments to read, indicating premature termination. TC05. Reading an integer by fscanf(f,"%d",&i) will skip all leading whitespace; the read consumes only characters that agree with the decimal integer syntax and stops at the first character that does not match. TC06. Reading a variable-length string by fscanf(f,"%s",s) will skip all leading whitespace. This is not the case when using gets(s) or fgets(f,s,i). Note that s should be big enough to accommodate all characters read AND the terminating null '\0'. Therefore it is recommended to use fscanf(f,"%255s",s) when s can accommodate 256 characters (255 non-nulls). TC07. Reading a variable-length string is stopped after reading all (at least one) non-whitespace characters, by the first whitespace character (which is not consumed). In particular, it will never skip trailing end-of-lines, and the returned string will never include blanks and will never be empty. Again, this is not the case for gets and fgets. (A subtle difference between gets and fgets is that the latter will put the terminating newline '\n' into the string returned, whereas gets reads the newline and discards it. TC08. Reading with fscanf(f,"\n") skips all whitespace at that point, including end-of-lines, but also blanks and tabs. Therefore, this is NOT equivalent to readln in Pascal. TC09. End-of-lines are faithfully reported as they appear in the file, as character '\n'. The end of the file may, but need not, be preceded by an end-of-line. Using fscanf it is not possible to distinghuish between end-of-file, an I/O error, or a format error. TC10. Calling fprintf(f,"\n") produces LF under Unix/Linux, and CR followed by LF under MS-DOS/Windows. TC11. Under MS-DOS/Windows, closing an output file appends the ending Ctrl-Z. Under C++, the default settings for I/O give rise to the following behavior. We assume the following declarations #include ifstream inp("task.in"); ofstream out("task.out">; char c; int i; char s[256]; TC12. By default, leading whitespace is always skipped when reading. TC13. A character is read by "inp>>c;". By default, a non-whitespace character is returned. TC14. An integer is read by "inp>>i;". Reading stops at the first character not in agreement with the syntax of a decimal integer. TC15. A variable-length string is read by "inp>>s;". Reading stops at the first whitespace character. A terminating '\0' is always appended to s. By default, a non-empty string is returned. TC16. An end-of-line is written by "out<<'\n';" or "out< 1 then write(' ') ; write ( L[i] ) end { for i } ; writeln or var i: 1..100; ... ; for i := 1 to N do begin write ( L[i] ) ; if i < N then write(' ') end { for i } ; writeln Here are three incorrect fragments (to be enclosed by "var i: 1..100;" and "; writeln"): ; for i := 1 to N do write ( L[i] : 2 ) ; for i := 1 to N do write ( ' ', L[i] ) ; for i := 1 to N do write ( L[i], ' ' ) All of them are incorrect because they produce one additional blank. The first two at the line begin, and the third one at the line end. An additional weakness of the first fragment, is that changing the range of the elements in the list L, say from 0..9 to -1..10, will cause -1 and 10 to miss a preceding blank. Note also that the output of the third approach appears to be good when viewed in a Turbo editor (because that editor discards blanks at the end of a line), or when dumped to the screen or printed on paper. At IOI'95 it happened that (at least) one competitor had opted for the third approach. This person apparently was bothered by the trailing blank and changed the finishing writeln into writeln(chr(8)), that is, included a backspace (BS) "to erase the superfluous blank". When dumped to the screen, the output does indeed look good. The IOI'95 evaluation software, however, did not accept this, because it rejected control codes other than end-of-line and end-of-file. The IOI'95 evaluation software would have ignored the trailing blank, because the output format was not strictly enforced (knowing that competitors make stupid I/O mistakes)! IOI-Style Meta-Format Restrictions --------- ----------- ------------ The restrictions on the format of text files given below are inspired by the language peculariaties explained in the preceding section. They are intended to make I/O easy for the built-in routines of the IOI programming languages, and for human use. Restrictions on output files can be less strict than on input files, because the competitors (develop the programs that) write them and the organizers must read them after the competition. However, it is encouraged that the format of output files adheres to the same restrictions as input files, so as to simplify the processing of these output files by competitors as well, in case they want to apply their own tools to it. The format of input and output text files is restricted by the following general /meta-format rules/: MF01. An IOI text file consists of a sequence of ONE or more NONempty /lines/. The sequence is terminated by end-of-file (Ctrl-Z under MS-DOS/Windows). MF02. Each line consists of a sequence of ONE or more /items/. This sequence is ALWAYS terminated by an end-of-line (LF under Unix/Linux, CR/LF under MS-DOS/Windows), in particular also the LAST line. Note that both MF01 and MF02 state that text files do not contain empty lines. In output files, however, empty lines could be tolerated. MF02 also states that every line, including the last line, is terminated by an end-of-line marker. In output files, a missing end-of-line on the last line could be tolerated. (Cf. IDE3, IDE4) MF03. Each item is one of the following: /character item/: A single printable NON-blank character (codes 33 through 126). (Cf. TC02) It is discouraged to use characters other than letters or digits. It is also discouraged to mix upper and lower case letters. /integer item/: An integer number in decimal notation, possibly negative (preceded by '-'), no leading '+', no redundant leading zeroes, at most 32 bits. In output files, a leading '+' and redundant leading zeroes could be tolerated. /string item/: a NON-empty variable-length string of at most 255 printable NON-blank characters. (Cf. TP07, TC07) Variable-length here means that the length possibly cannot easily be determined from the format description or the preceding text in the file. It is discouraged to use characters other than letters or digits. It is also discouraged to mix upper and lower case letters. Notes: * Text files do not contain control characters, other than those needed to encode the end-of-lines and the end-of-file. * An item never contains whitespace, in particular, it contains no blanks. * Character and string items should not be abused to encode 'ordinary' numeric values. * Care should be taken when string items are real words in some natural language (e.g. English), because they must not be translated and they could well be meaningless to competitors. * There is no real or float number item. Reals or floats could still be treated as a sequence of characters or a string, but the appearance of floating-point arithmetic in tasks is frowned upon in the IOI. * Fixed-length strings should be treated as variable-length strings. In particular, competitors programs should be able to read fixed-length strings with the facilities for variable-length strings. MF04. Two adjacent items on the same line are either /contiguous/ (that is, without any extra characters between them), or /separated/ from each other by a SINGLE blank. The definition of the format (in the Competition Rules and/or the Task Description) must clearly state the item composition convention. Additional restrictions on item composition are given below. (Cf. TP02, TP06) In output files, extra blanks between items can be tolerated. (In input files, extra blanks could be useful to align columns and thereby improve human readability, but for the IOI we will not do so.) Note: * Comma-separated values (CSV), and quoted strings are not allowed by MF04. MF05 through MF12 concern restrictions on item composition in input files. MF05. A character item must be separated by a single blank from a directly preceding integer item on the same line. (Cf. TP05) MF06. It is discouraged that a character item is preceded by a separating blank. (Cf. TP02) Note: * When adhering to MF05 and MF06, an integer cannot precede a character, because in that case these requirements would conflict: MF05 requires a separating blank, whereas MF06 forbids it. MF07. An integer item is always separated by a single blank from a directly preceding integer item on the same line. Otherwise, the boundary cannot easily be determined, neither in Pascal, nor in C/C++. MF08. It is encouraged that an integer item is always separated by a single blank from any directly preceding item. Reason: Human readability. A character or string item contiguous before an integer may be confusing, especially if the range of allowed characters includes digits, or a plus or minus sign. Note: * In view of MF05, MF06, and MF08, it is encouraged that character items and integer items do not appear on the same line. MF09. A variable-length string item always appears at the end of a line. (Cf. TP07) MF10. A variable-length string item must be separated by a single blank from a directly preceding integer item on the same line. (Cf. TP05) MF11. It is discouraged that a variable-length string item is preceded by other items. (Cf. TP06) In view of MF09, a string item preferably occurs on a line by itself. MF12a There are no blanks at the begin of a line. In output files, blanks before the first item on a line can be tolerated. MF12b There are no blanks at the end of a line. (Cf. IDE1, IDE2) In output files, blanks after the last item on a line can be tolerated. However, these extra blanks are less excusable than those under MF04 and MF12a. MF13. The format of input files must be such that the type of every item and the occurrence of every end-of-line and end-of-file can easily be deduced ---before they appear--- from the format definition or preceding text (possibly from another file). Explicit end-of-line or end-of-file checking should NEVER be necessary, not matter which IOI programming language one uses. Below are some examples to illustrate this rule. (TC This requirement does not apply to output files (produced by competitors or their programs). It may be an unnecessary burden to require that the number of items is first written to the output file. MF14. There are no tabs and other control codes (besides LF under Unix/Linux, or CR/LF and Ctrl-Z under MS-DOS/Windows) in text files. Notes: * Format definitions provided by the IOI organizers must respect the meta-format rules stated above, and all input files must be in /strict/ agreement with the promised input format. When checking output files produced by the competitor programs in order to determine scores, it is strongly recommended that the format is not enforced strictly, but rather in a /relaxed/ manner. * The Competition Rules for all IOIs up to IOI2000 have formulated some input format restrictions. However, those formulations have mostly been LESS restrictive than the meta-format rules that I have given above. And for one aspect MORE restricive: there has been a time when non-whitespace characters were resticted to letters and digits. It is good practice to pay extra attention to the use of characters other than letters and digits, and avoid those if that is easy to do. * It is easy for programs to read files whose format agrees with the meta-format rules formulated above. In Pascal programs that use read(...) for character, integer, and string items, and that do readln at every line end, no further attention needs to be paid to the line structure. Of course, the read(x) for the last item on a line can be combined with the subsequent readln, yielding readln(x). A readln occurring directly before the read of a character or string item must NOT be omitted, because whitespace is NOT skipped. Other occurrences of readln, i.e. before integer and at the end of the file, can be omitted, because whitespace before integers is skipped. In C programs that use scanf for input and that precede every character ('%c') and string ('%s') in the format string by a blank, no further attention needs to be paid to the line structure. In C++ programs that use the default settings for input streams (skipping whitespace before all items), no explicit attention needs to be paid to the line structure. Summary of Meta-Format Rules for Item Composition Below are two tables that express which items may follow which items on the same line, and what composition is allowed. For IOI organizer-generated (input) files: First|| Second Item Item || Int | Chr | Str -----++-----+-----+----- None|| C S | C S | C S -----++-----+-----+----- Int || S | (S) | - -----++-----+-----+----- Chr || S | C(S)| - -----++-----+-----+----- Str || - | - | - For IOI competitor-generated (output) files: First|| Second Item Item || Int | Chr | Str -----++-----+-----+----- None|| C S | C S | C S -----++-----+-----+----- Int || S |(C*)S|(C*)S -----++-----+-----+----- Chr || C S | C S |(C)S -----++-----+-----+----- Str || S | S | S Notes: S Separated (by one blank, or multiple blanks when relaxed) C Contiguous (without intervening whitespace) - Forbidden * The range of characters allowed contiguously after an integer should NOT include digits. ( ) Allowed but discouraged Examples for Meta-Format Rules Here are some examples to illustrate rule MF13. The following format definitions violate rule MF13. 1. Input file ... contains at least one and at most 20 lines. Each line contains three nonnegative integers less than 5000. 2. The first and only line of input file ... contains at least one and at most 1000 characters that are uppercase letters. There are no blanks between the characters. 3. The first line of input file ... contains at most 100 integers I with 10 <= I <= 99, followed by a single character '*'. The first and second format require end-of-file and end-of-line checking, respectively, to be read. In C/C++, the second format can be handled without explicit end-of-line checking, because C/C++ does not have the 255 character limit on variable-length strings. When using the so-called "extended syntax", Turbo Pascal can also read null-terminated strings (but this is not Standard Pascal, and I don't find it required IOI knowledge). The third format has the problem that it is not known what type of item to read after the first item (which is always an integer). Thus, it requires looking ahead into the input stream to detect the terminating 'A'. If that look-ahead character turns out to be digit than it must be combined with the following digits to make up another integer. In Pascal, such a look-ahead character cannot be pushed back into the input stream. In C/C++, this can be accomplished (though it is not recommended), but I would not consider that required IOI knowledge. The total length of the input line can exceed 255. Hence, reading it into a string is not an option for Turbo Pascal (in my opinion). Here are similar format definitions that do respect rule MF13. 1. Input file ... contains at least one and at most 21 lines. Each line, except the last, contains three nonnegative integers less than 5000. The last line of the file contains a single -1. 2. The first line of input file ... contains at least 1 and at most 1000 characters that are uppercase letters, followed by a single '*'. There are no blanks between the characters. 3. The first line of input file ... contains at most 100 integers I with 10 <= I <= 99, followed by a single 0. Although these formats are acceptable, their "readability" (by a program) can be further improved as follows: 1. The first line of input file ... contains one integer N with 1 <= N <= 20. The following N lines contain three nonnegative integers less than 5000. 2. The first line of input file ... contains one integer N with 1 <= N <= 1000. The second line contains N characters that are uppercase letters. There are not blanks between the characters. 3. The first line of input file ... contains one integer N with 0 <= N <= 100. If N is positive, then there is also a second line, containing exactly N integers I with 10 <= I <= 99. For the third format, now the question is raised whether the case with N = 0 really needs to be included at all. Furthermore, one could argue that the human writability of the formats has deteriorated, since the number of items needs to be counted explicitly. User Requirements for RobIn ---- ------------ --- ----- Here is a list of requirements that I had in mind when designing RobIn: UR01 RobIn shall NOT CRASH on errors, but shall report them. UR02 RobIn shall be EASY TO USE for developing programs that check text files for adherence to IOI-style formats. UR03 RobIn shall be FLEXIBLE enough to express all reasonable IOI-style formats that adhere to the meta-format rules. UR04 RobIn shall ENFORCE the IOI-style meta-format rules. UR05 RobIn shall have a RELAXED mode and a STRICT mode for checking a format. UR06 RobIn shall be usable for developing both INPUT VALIDATORS that report to the organizers, and OUTPUT CHECKERS that report to the contestants and their coaches. UR07 RobIn shall NUMBER all error messages uniquely and take them from a text file to simplify translation. UR08 RobIn shall support TASK-SPECIFIC checking and error messages. As a development requirement, I order the importance of some software qualities as follows: 1. First comes correctness 2. Second comes usability (simplicity, generality) 3. Third comes maintainability 4. Fourth comes portability (though I have not given it much thought) 5. Only in fifth place comes performance (speed, memory) How to Use RobIn --- -- --- ----- RobIn is a FreePascal Unit that provides routines for reading text files and extracting values from those files, while at the same time checking * that the format (especially, order and composition of items, and the line structure) adheres to the meta-format rules, and * that the file content adheres to the format (including range restrictions). RobIn is "robust" and does not crash on "bad" text files, but instead provides detailed diagnostic messages. The messages are numbered and taken from message definition files to simplify translation and use by non-English organizers, coaches, and competitors. RobIn is intended to be used by organizers to construct tools that, in turn, are used by organizers (to validate input files) and by competitors and coaches (to check the format/correctness of output files). This section is addressed to the organizer who wishes to construct a tool using RobIn. Coaches and competitors may want to skip to the section titled "How to Interpret RobIn Messages". Make sure that you have RobIn.ppu (obtained by compiling RobIn.pas with TP). Simply include "uses RobIn;" in your FreePascal program and away you go. Well, you need to know somewhat more. RobIn provides the following routines (here shown without parameters): BeginLogging To open the log file and load the messages EndLogging To close the log file and unload the messages OpenRIF To open the text file to be read and checked CloseRIF To terminate reading and close the text file ReadCharRIF To read a character item ReadIntegerRIF To read an integer item ReadStringRIF To read a variable-length string item ReadlnRIF To read an end-of-line IgnoreLineRIF To skip the remainder of the line AssertRIF To do a task-specific check IsAbortedRIF To find out whether a format error occurred EofRIF To detect end-of-file (ignoring whitespace) EolnRIF To detect end-of-line (ignoring blanks) NextCharRIF To inspect next non-whitespace character SetIndentation To set the amount of required indentation (default: 0) IntToStr To help construct composite identifiers for items The acronym RIF stands for "RobIn File". The routines ReadCharRIF, ReadIntegerRIF, and ReadStringRIF are collectively also denoted by ReadRIF. The routines EofRIF, EolnRIF, and NextCharRIF are for checking output files only. Some restrictions apply to the order in which RobIn routines are called. Violating these order restrictions will not crash RobIn, but may result in early termination, and an error message that may not explicitly mention the out-of-order calling. The order restrictions on calls to BeginLogging, EndLogging, and xxxRIF can be captured by the following regular expression: ( 1. BeginLogging 2. ( "format described by program fragment using xxxRIF" )* 3. EndLogging )* where ( ... )* means "zero or more times the enclosed". A /format/ is described by writing a Pascal program fragment containing calls to OpenRIF, CloseRIF, ReadRIF, and ReadlnRIF. Typically, the order of these calls should adhere to the following regular-expression when checking input files: 1. OpenRIF 2. ( 1. ( ReadRIF )+ 2. ReadlnRIF )+ 3. CloseRIF where ( ... )+ means "one or more times the enclosed". Instead of 2.1 and 2.2, it is also possible to have 1. ( ReadRIF )* 2. IgnoreLineRIF Anywhere between OpenRIF and the corresponding CloseRIF, the format-defining program fragment can call AssertRIF to check a task-specific condition. The program can call IsAbortedRIF any time after OpenRIF --- also after CloseRIF --- to find out whether reading of the file content had been aborted (due to a fatal error). Note that checking of the format itself against the meta-format rules is always continued until CloseRIF. No order restrictions apply to IntToStr. It is just a "loose" function, typically used to construct a parameter for ReadRIF when reading a sequence of items. As an example, consider the following format definition: The first line of file 'example.in' contains two integers N and K with 2 <= N <= 9 and 1 <= K <= N. The second line contains N lowerercase letters ('a'..'z', contiguous). The following K lines each contain one nonempty string of at most 20 uppercase letters ('A'..'Z'). This format is captured by the following program fragment (assuming appropriate constant definitions and variable declarations; the full program is Example.pas): OpenRIF ( inFile, 'example.in', ... ) { read first line } ; N := ReadIntegerRIF ( inFile, Separated, 'N', 2, 9 ) ; K := ReadIntegerRIF ( inFile, Separated, 'K', 1, N ) ; ReadlnRIF ( inFile ) { read second line } ; for i := 1 to N do begin L[i] := ReadCharRIF ( inFile, Contiguous, 'L[' + IntToStr(i,1) + ']', UpperCaseChars ) end { for i } ; ReadlnRIF ( inFile ) { read following K lines } ; for j := 1 to K do begin S[j] := ReadStringRIF ( inFile, Contiguous, 'S[' + IntToStr(j,1) + ']', LowerCaseChars, 1, 20 ) ; ReadlnRIF ( inFile ) end { for j } ; CloseRIF ( inFile ) Note that Separated, Contiguous, UpperCaseChars, and LowerCaseChars are predefined in the RobIn interface section. Here is a corresponding fragment of FreePascal program that reads the input data from the input file (again assuming appropriate variable declarations). Note the great similarity with the fragment above. Fragments in C and C++ are given further down. Assign ( inFile, 'example.in' ) ; Reset ( inFile ) { read first line } ; Read ( inFile, N ) ; Read ( inFile, K ) ; Readln ( inFile ) { read second line } ; for i := 1 to N do begin Read ( inFile, L[i] ) end { for i } ; Readln ( inFile ) { read following K lines } ; for j := 1 to K do begin Read ( inFile, S[j] ) ; Readln ( inFile ) end { for j } ; Close ( inFile ) or somewhat more compactly, as a competitor might code it: Assign ( inFile, 'example.in' ) ; Reset ( inFile ) ; Readln ( inFile, N, K ) ; for i := 1 to N do Read ( inFile, L[i] ) ; Readln ( inFile ) ; for j := 1 to K do Readln ( inFile, S[j] ) ; Close ( inFile ) Here is a C version (one of many possibilities) to read the format (assuming appropriate #include and variable declarations): inFile = fopen ( "example.in", "r" ); fscanf ( inFile, "%d%d\n", &N, &K ); /* \n needed (why?) */ for ( i=1; i<=N; ++i ) { fscanf ( inFile, "%c", &L[i] ); } fscanf ( inFile, "\n" ); /* not needed (homework: why?) */ for ( j=1; j<=K; ++j ) { fscanf ( inFile, "%s\n", &S[j] ); /* \n not needed (why?) */ } fclose ( inFile ); And here is a C++ version to read the format (assuming appropriate #include and variable declarations): ifstream inFile ( "example.in" ); inFile >> N >> K; for ( i=1; i<=N; ++i ) { inFile >> L[i]; } for ( j=1; j<=K; ++j ) { inFile >> S[j]; } RIF routines: rif, comp, itemId, and various bounds> To simplify reading a list of separated items contained on a single line, RobIn automatically adjusts the itemComp parameter to Contiguous when reading the first item on a line (more accurately, to the pair (minimum: indentation, extra: itemComp.extra)). Thus, the format The first line of ... contains one integer N with 1 <= N <= 20. The second line contains N integers X with 0 <= X <= 999. is correctly captured by (assuming appropriate variable declarations) N := ReadIntegerRIF ( inFile, Separated, 'N', 1, 20 ) ; ReadlnRIF ( inFile ) ; for i := 1 to N do begin X[i] := ReadIntegerRIF ( inFile, Separated, 'X', 0, 999 ) end { for i } ; ReadlnRIF ( inFile ) It is recommended that the format program captures bounds as close as possible to the way they are given in the task description. Consider the following two equivalent format definitions: The first line of ... contains two positive integers N and K at most 100 and with K <= N. The first line of ... contains two integers N and K with 1 <= K <= N <= 100. Here are two corresponding format-defining program fragments: N := ReadIntRIF ( inFile, 'N', 1, 100 ) ; K := ReadIntRIF ( inFile, 'K', 1, 100 ) { check K < N, if not report "K > N" } ; AssertRIF ( inFile, K <= N, AssertError, {M#}900, IntToStr ( K, 1 ) + ' > ' + IntToStr ( N, 1 ) ) and N := ReadIntRIF ( inFile, 'N', 1, 100 ) ; K := ReadIntRIF ( inFile, 'K', 1, N ) These fragments are equivalent, except for the kind of diagnostics they produce, and how the competitor can interpret the diagnostic, in view of the given the format definition. Consider the input file with first line "10 20". The first program will give an AssertError of the form "K > N (20 > 10)", and the second program will give a FormatError of the form "Integer not in range (20 not in [1..10])". Use of AssertRIF, IsAbortedRIF, SetIndentation Checking Output Files Note: There may be good reasons to allow formats in output files that are forbidden for input files. For example, if a program must be written that enumerates all solutions of some kind of puzzle, and it cannot be easily known in advance how many solutions there are, then it would be cumbersome to require that the first line in the output file states the number of solutions and that each next line states a solutions. Instead, it is better to use the format where the program outputs each solution on a single line in the output file. However, if there are no solutions at all, then an empty file would not be a good idea. In output files, empty lines should never be significant. Typical use of EofRIF, assuming (as is reasonable) that the output file is not be empty: repeat { read one line } until EofRIF ( ... ) Note: It is not allowed to call EofRIF when not at the start of a format line. The reason for this is that it is unclear how to handle the situation where the remainder of the line contains some blanks but no further items; that line must still be finished by a ReadlnRIF, but EofRIF will skip both Blanks and Eolns to see if there is an Eof or some printing character ahead. Typical use of EolnRIF when reading a line, which should not be empty: repeat { read one item } until EolnRIF ( ... ) NextCharRIF is useful when the type of the next item is not known in advance. For example, in an output format defined as follows: The first line of the the output file consists of N integers in [0 .. 999], each integer is optionally followed by an asterix ('*'). Each integer must be preceded by at least one blank and at most four blanks. Each asterix must NOT be preceded by any blanks, but should be contiguous to the preceding integer. In this case, NextCharRIF can be used to look ahead at the first character of the next item. Any intervening end-of-lines are skipped and flagged as errors or warnings (depending on whether the checking level is strict or relaxed respectively). Here is a program fragment to define and read this format: SeparatedPlus.extra := 4 ; for i := 1 to N do begin a [ i ] := ReadIntegerRIF ( inFile, SeparatedPlus, 'a[i]', 0, 999 ) ; if not EolnRIF ( inFile ) then begin { test on not EolnRIF is needed for last item on line only } b [ i ] := ( NextCharRIF ( inFile ) = '*' ) ; if b [ i ] then begin c := ReadCharRIF ( inFile, Contiguous, 'c', [ '*' ] ) { ignored } end { if } end { if } end { for i } ; ReadlnRIF ( inFile ) This is similar to the situation where the number of items on the line is not known in advance. Assuming that there is at least one integer on the output line, the format can be checked by: SeparatedPlus.extra := 4 ; i := 0 ; repeat i := i + 1 ; a [ i ] := ReadIntegerRIF ( inFile, SeparatedPlus, 'a[i]', 0, 999 ) ; if not EolnRIF ( inFile ) then begin b [ i ] := ( NextCharRIF ( inFile ) = '*' ) ; if b [ i ] then begin c := ReadCharRIF ( inFile, Contiguous, 'c', [ '*' ] ) { ignored } end { if } end { if } ; until EolnRIF ( inFile ) ; ReadlnRIF ( inFile ) The Structure of a Message Definition File Each diagnostic message is identified by a number in the range 0..900. The numbers in the range 0..899 are reserved for use within RobIn. The numbers in the range 900..999 are available for task-specific messages issued via AssertRIF by the using program and loaded by BeginLogging. The association between a message number and its text is recorded in a Message Definition File (typical extension .mdf). In a message definition file, empty lines and lines starting with '#' are ignored. These can be used for documentation. Other lines should be of the form Leading and trailing blanks are ignored. The format of the message definition file is checked when it is loaded (though in a way that differs from the checking that RobIn does for RobIn files :-) and warnings are generated when the message definition file is badly formatted. Diagnostic Categories Every message is issued within a category. Usually --- but not necessarily always --- a message occurs in one category only. However, some messages are more general and make (different) sense in different categories (e.g. "Empty string"). The following table summarizes the "philosphy" behind the various Error dianostics. As with all error reporting, your mileage may vary, since RobIn cannot know what the user's intentions were. Compare this to the art of interpreting compiler error messages. IOError --> need to correct problem with user file in OS its existence, its location or name, its permissions InternalError --> need to correct mistake inside RobIn report this to me, with all evidence you can supply ExternalError --> need to correct mistake in RobIn's runtime environment or configuration, e.g. missing message definition file UsageError --> need to correct mistake in program using RobIn MetaFormatError --> need to correct format definition FormatError --> need to correct file content (general isolated issue) AssertError --> need to correct file content (task-specific issue) The distinction between UsageError and MetaFormatError is not always as clearcut as one might hope. Generally speaking, the format definition need not change when correcting a UsageError. Nevertheless, the format may still turn out to be the real source of the problem. Consider the following format definition: The first line of input file ... contains two integers N and K with 1 <= N <= 100 and 3 <= K <= N. The straightforward format-defining program fragment is N := ReadIntegerRIF ( inFile, Separated, 'N', 1, 100 ) ; K := ReadIntegerRIF ( inFile, Separated, 'K', 3, N ) However, this may give rise to a UsageError "Empty integer range" on the second ReadIntegerRIF call. This is, for instance, the case when reading was aborted (because file does not exist), in which case only the meta-format is checked. The first ReadIntegerRIF then returns the lower bound 1, making the second range empty. This is clearly an inconsistent read request. What is the matter here? Well, 1 is not really the lower bound on N, instead the actual consistent minimum value for N is 3, because of the constraint 3 <= K <= N . It is advisable to change the format definition in such a case: The first line of input file ... contains two integers N and K with 3 <= N <= 100 and 3 <= K <= N. The corresponding program-defining program fragment is: N := ReadIntegerRIF ( inFile, Separated, 'N', 3, 100 ) ; K := ReadIntegerRIF ( inFile, Separated, 'K', 3, N ) This fragment will return N=3, when reading was somehow aborted. Consequently, the range for K in the next call will no longer be empty. Hence it will not produce a UsageError about the range for K. It will, however, produce a UsageWarning, because the range consists of a single element only (which would generally be suspicious). Warnings Warnings point to a "suspicious" situation or a discouraged practice. Sometimes they can easily be explained and subsequently be ignored. In other cases, it may turn out to be a "hard to catch" mistake. Compare this again to compiler warnings. Exception handling: * We distinguish the follow exceptions: - meta-format warnings, in case the format involves a discouraged practice - meta-format errors, in case the format violates one of the IOI format restrictions - format warnings, in case the content of the file being read violates the format - format errors * In principle, after the first error has occurred, it no longer makes sense to continue reading the file. However meta-format checking proceeds even if a Format Error has occurred. > How to Interpret RobIn Messages --- -- --------- ----- -------- RIF, because it may still be followed by AssertRIF for semantic checking Integer item values are reported in decimal Character item values are reported as 'c', '''', or chr(#) String item values are reported as "c...c" (embedded '"' appear singly!) consequences of limiting diagnostics> Design and Implementation of RobIn ------ --- -------------- -- ----- The implementation of RobIn resides in the file "RobIn.pas" and contains comments as documentation. Here are some very global additional remarks. Expressing the Format I have chosen to express the format as a (Pascal) program fragment that makes calls to the routines reading the elements in an order appropriate for the format. An alternative could have been to introduce a new notation for expressing formats and to store the format description in a separate file. I think that my approach is more flexible and easier to use directly (unless you hate Pascal). Type RobInFile The file being checked is read via a variable rif: RobInFile, in particular via the text file variable rif.f. The variable rif contains other fields besides f to capture the current state of the reading and checking process, including some statistics. The sequence of characters that "passes through" rif.nextChar is exactly the contents of the file, with two exceptions. The first exception is that, under Relaxed Checking, every TAB characters is replaced by a blank. The second exception is that, under Relaxed Checking again, if the last line in the file is not terminated by an explicit end-of-line, then and EolnChar is inserted in the sequence. The window rif.nextChar acts as a look-ahead character. It is needed, because Turbo/FreePascal lacks a (Standard Pascal) file window to look ahead into the input stream. The procedure StepNextChar advances the rif.nextChar "window" one position. The look-ahead character is always valid after opening a RobInFile. Procedure StepNextChar gets the next character into nextChar. When eof is encountered, nextChar is set to EofChar. When eoln is encountered, nextChar is set to EolnChar. RobIn parses items by reading characters. The end of an integer item is detected by hitting the end-of-line or by reading a non-digit. The latter cannot be pushed back into the input stream and is stored as look-ahead character in nextChar. The next higher level of abstraction views the file as a sequence of /lines/, where each line consists of a sequence of /items/. Each item is a represented in the text file by a non-empty sequence of non-whitespace characters. These items are numbered per line starting at 1. To get appropriate diagnostic messages, it is desirable that when AssertRIF is called after some ReadRIF, this AssertRIF will still refer back to the last item read. Thus, the item number and other information about the last item read, is kept around in the RobInFile descriptor rif. Once attention is turned to the next item (either via ReadRIF, or indirectly via EofRIF, EolnRIF, or NextCharRIF), is the item number incremented. The implementation of EofRIF, EolnRIF, and NextCharRIF is somewhat awkward. These functions should look ahead in the file, WITHOUT advancing the nextChar window, seeing through blanks and, for EofRIF and NextCharRIF, also through end-of-lines. The actual processing of the ignored blanks and end-of-lines can only be done when it is known what type of item to read next and how this item is composed to its predecessor. Because I could not find a convenient way to accomplish this kind of look ahead, I have chosen to advance nextChar, and to process the skipped blanks and end-of-lines as best as possible. When not possible, the reporting is delayed and relevant state information is maintained in rif (in particular, the fields blankCount, eolnCount, excessiveBlanksReported). Because of this implementation choice, some diagnostics are reported a little later in the stream then might have been desirable. The file coordinates in the diagnostic message do not point to the precise location in the file where the situation first arose. As an example, consider the occurence of the query EolnRIF(rif) in the format reading program fragment (for checking an output file which has a variable number of items on a line). If rif.nextChar equals a blank, then this should be the separating blank before the next item on the line. But it could well be an (erroneous) trailing blank at the end of the line. Which of these possibilities is actually the case, can only be determined after finding either the end-of-line after a couple of blanks, or a non-white character (the start of the next item). The subsequent ReadRIF will determine what composition applies. Only then will it be clear whether the skipped blanks are in agreement with the format. Diagnostic messages Diagnostic output is done through theLogFileP, which is a pointer to a text file. I use such a pointer for two reasons. First of all, that way the using program can "own" the file variable and still tell only once to RobIn which file to use. Second, it simplifies switching on-the-fly between log files without having to open and close them. You can just move the pointers around. For instance, to get diagnostic output temporarily to standard output, simply insert: theLogFileP := @ output { diagnostic output now goes to output } ... theLogFileP := @ logFile { diagnostic output now goes to logFile (logFile should be open for writing) } Similarly, the selection/suppression of diagnostic messages can be changed on-the-fly by assigning to theDiagSel. All diagnostic messages produced by RobIn are preceded by "RobIn: ". Subsequent lines are indented by two blanks (implementation string constant Indent). Diagnostic messages are formulated to state what is actually the case, not what should have been the case. For example, "Empty string" (when some empty string was encountered where it should not have been), rather than "Empty string not allowed". Some diagnostic messages are followed by a /parenthetic note/, which is NOT taken from a message file, but which is constructed depending on the current situation. For example, if a value is out of bounds, the value and bounds are shown. I have strived to use as little English as possible in the parenthetic remarks, and preferably only symbolic information, to avoid the need for translation. Note that only a limited set of diagnostic messages is relevant to the competitors. Message numbers in the range 1 .. 255 are reserved for Turbo Pascal run-time error messages. Message numbers in the range 256 .. 899 are general messages of RobIn. Type Compatibility Issue for ReadRIF The routines ReadRIF have been implemented as functions. They "should" have been implemented as procedures (like the predefined read in Pascal), because they have (major) side-effects. Here is a motivation for my choice. In case of read procedures, the result (the value read) would have to be communicated via a var-parameter. These var-parameters need to have the "widest" type for the kind of item read, that is, Char for characters, LongInt for integers, and String for variable-length strings. Turbo Pascal insists that the actual var-parameter has exactly the same type as the formal var-parameter (and not a subtype). There is no automatic type conversion when assigning the result value in the procedure to the var-parameter. (Note: The predefined read procedures form an exception to this rule!) Consequently, if you have "var N: 1..100" and want to read a value using a "procedure ReadInt ( var i: LongInt )", then "ReadInt ( N )" is rejected, and you have to resort to the use of an auxiliary variable: "var ri: LongInt; ... ReadInt ( ri ) ; N := ri". On the other hand, for functions, the result type only needs to be assign compatible --- not identical --- when used in an assignment statement, and the value is automatically converted. Thus, "var N: 1..100; ... function ReadInt: LongInt; ... N := ReadInt" is correct usage. To simplify the use of RobIn (avoiding auxiliary variables), I have sacrificed elegance (no side-effects in functions). In What Order to Check Diagnostic Conditions? The order in which routines check conditions to generate diagnostic output poses some dilemmas. Often, it does not make sense to continue processing if a UsageError condition occurs, such as an empty character range in ReadStringRIF. In fact, it may be dangerous or senseless to test for a UsageWarning condition if an error is present. For example, in ReadStringRIF, if range contains a single character and minLength = maxLength, then I find that a good reason to give a UsageWarning (because at this point, the input file can contain only a single value). As a parenthetic remark, I produce that single value. But what if minLength and maxLength are negative? In that case, the UsageError should take precedence, because it violates the precondition (and no string value in the file can be acceptable). But, always first checking for UsageError conditions and, if none occurred, then checking for UsageWarning conditions, reduces the amount of diagnostic information in the log file. It may be very helpful in locating an error to receive warnings as well, because they can point to overlooked situations. As a result, there are places where I first check for UsageWarning conditions, and also places where I first check for UsageError conditions. It all depends on the context. If a UsageWarning can safely and sensibly be issued before the UsageErrors, then I do so. Turbo Pascal Extensions Used in RobIn Here is a (complete?) list of Turbo extensions to Standard Pascal that were used in the implementation of RobIn. LongInt Inc, Dec UpCase String, + on strings, Val, Str, Delete Exit, RunError Assign, Close {$I+/-}, IOResult else in a case statement Testing RobIn < STILL INCOMPLETE > Limitations of RobIn ----------- -- ----- RobIn has (of course) some limitations. Here is a brief discussion of some of the known limitations. Diagnostics for Out-of-Order Calls The diagnostics for out-of-order calls to xxxRIF are implicit, rather than explicit. In particular, the following restrictions should be checked explicitly but are not (all calls concern the SAME variable rif: RobInFile): Routine Condition OpenRIF # OpenRIF = # CloseRIF (rif not open) CloseRIF # OpenRIF = # CloseRIF + 1 (rif not closed) ReadRIF same as CloseRIF ReadlnRIF same as CloseRIF AssertedRIF same as CloseRIF IsAbortedRIF # OpenRIF > 0 (rif was opened) Note that "rif is open" does not mean the same thing as "rif.f is open", because a call to OpenRIF may fail to open rif.f. In that case, format checking is aborted, but meta-format checking continues, and a call to CloseRIF is still needed to terminate the checking, before opening another file. Another thing that cannot be diagnosed by RobIn is a missing CloseRIF call. If the using program does not make the call, then RobIn will never know. Similar limitations apply to the pair BeginLogging and EndLogging. The reason that such protection is "hard" to implement in Pascal is that no variables (except file variables) are automatically initialized to some known value. Therefore, RobIn (which does not "own" the RobInFile variable that acts as parameter to xxxRIF) must always rely on the using program to take care of initialization, for example by calling an appropriate initialization routine (in our case, OpenRIF). (If only Pascal had had a single auto-initialized type for variables; one bit would have been enough. :-) Concerning the second limitation (missing call to CloseRIF), the thing is that Turbo Pascal has no way to guarantee that control goes back to RobIn when the program exits. EndLogging does a check, but the call to EndLogging may also be missing. (Delphi does have have that facility.) < STILL INCOMPLETE > Other Tools ----- ----- CheckMsg CheckMsg.pas is a program to help check to check the structure of message files used by RobIn, especially task-specific message files. CheckMsg checks the files named by each of its command-line parameters. If no command-line parameters are given, then it asks for a file name and checks it. Application Framework Normalizer The program normal.pas normalizes text files. Normalization involves: * removing whitespace (blanks, tabs) at line begin * replacing multiple whitespace between items by a single blank * removing whitespace at line end * removing invalid characters, outside chr(32) .. chr(126) * removing empty lines (incl. lines with only whitespace) * ensuring an explicit end-of-line on the last line During normalization, changes are reported on standard output. Concluding Remarks ---------- ------- Here are some open ends. Porting to Linux and/or C/C++. The initial TurboPascal version of RobIn has been adapted to FreePascal. This version works well under Linux. I will not be working on a C/C++ version of RobIn. But if you want to give it a try, look under "Design and Implementation of RobIn" for a list of Turbo extensions to Standard Pascal that were used. ReadRealRIF or ReadFloatRIF I will not put in a ReadRealRIF/ReadFloatRIF. For one thing, the use of floating point arithmetic at the IOI is controversial (for several reasons, such as rounding, numerical stability, etc., ask Gyula Horvath). Furthermore, the format for a real/float item must be chosen such that it is common to all IOI programming languages. Its definition is more complicated than that of the items supported now. Acronyms -------- BS Backspace (ASCII character 8) CR Carriage Return (ASCII character 13) FF Form Feed (ASCII character 12) HT (Horizontal) Tab (ASCII character 9) IDE Integrated Development Environment I/O Input/Output IOI International Olympiad in Informatics LF Line Feed (ASCII character 10) OS Operating System RobIn Robust Input module (FreePascal Unit) TC Turbo C/C++ by Borland (version 3.0) TP Turbo Pascal by Borland (version 7.0) VT Vertical Tab (ASCII character 11) Sample Programs ------ -------- This appendix contains some programs to illustrate certain details that I did not want to include in the main text. The layout for these programs has been kept compact to improve the overview, at the expense of their maintainability. End-of-Line and Turbo Pascal The following two programs count the number of characters and lines in standard input. Under the ISO Standard Pascal definition, these programs are equivalent, but not under Turbo Pascal. Try the following two possible inputs, each consisting of two lines (^Z denotes Ctrl-Z): abc def ^Z abc def^Z Program TextCounter1 (correctly) reports 6 characters on 2 lines in both cases. Program TextCounter2 reports the same for the first input, but 6 characters and 1 line for the second input, because the outer loop terminates early. This can be compensated by including the line ; if eoln then Inc(lines) after the loop in TextCounter2. Note, however, that the resulting program is no longer correct ISO Pascal (because eoln is inspected when eof is true)! program TextCounter1; var chars, lines: longint; c: char; begin writeln('Count # characters and lines in standard input.') ; writeln('Standard implementation.') ; chars := 0 ; lines := 0 ; while not eof do begin while not eoln do begin read(c) ; Inc(chars) end ; readln ; Inc(lines) end ; writeln('# chars = ', chars:6) ; writeln('# lines = ', lines:6) end. program TextCounter2; var chars, lines: longint; c: char; begin writeln('Count # characters and lines in standard input.') ; writeln('Alternative implementation.') ; chars := 0 ; lines := 0 ; while not eof do begin if eoln then begin readln ; Inc(lines) end else begin read(c) ; Inc(chars) end end { N.B. Turbo Pascal needs "; if eoln then Inc(lines)" here, but this would be wrong in Standard Pascal } ; writeln('# chars = ', chars:6) ; writeln('# lines = ', lines:6) end. ~~~~ NOT YET INCORPORATED ABOVE Item composition mistakes are * multiple consecutive blanks * blanks at begin of line or end of line * empty lines * missing blanks between separated items * missing end-of-lines * end-of-lines instead of blanks * blanks instead of end-of-lines * junk at end-of-file Existence of the file to be checked and limitations on its size are typically handled prior to reading the file with RobIn. In implementation of RobIn: Currently, RobIn does not check that when an integer item is preceded contiguously by a character item on the same line, the range for the character item does not include potentially "confusion" characters (such as digits, or '+', or '-'). Some meta-format checks have been disabled, because they seem too pedantic: * Character follows integer on same format line * Integer follows character on same format line * Integer is contiguous to preceding item (a.k.a. No blank prescribed before integer) * String not at begin of format line