TXPP -- A Preprocessor for T3X

Release 1

(3rd revised issue)

Copyright © 1998-2000 Nils M Holm

mail: nmh@t3x.org
home: http://www.t3x.org/

Table Of Contents

1. Purpose

Sometimes, it is desirable to share some information between different programs, or to inline portions of frequently used code. TXPP addresses these problems. It provides a mechanism for including the contents of a file into a T3X program and another one for assigning small portions of program text to symbolic names (macros) for easy reuse.

A typical situation for the use of include files, for example, is the T3X compiler itself. The Tcode instruction set is required by the compiler frontend (which generates Tcode), the code generator (which converts Tcode into native code), and by the disassembler (which decodes Tcode instructions). By placing the Tcode definitions into a single file and including it in each part of the compiler, possible changes and extensions to the Tcode language can be applied to one centralized copy of the set. Besides saving time, it makes sure that all modules actually use the same definitions. (Note: For compatibility reasons, this method is not actually used in the T3X source code.)

The use of macros is similar, but (unless combined with the inclusion mechanism) limited to a single file. When a portion of code has to be duplicated at various positions in the same program, but the overhead of defining a procedure is not acceptable, the portion may be inlined using a macro. The text which has to be duplicated is assigned to a symbolic name and wherever the text is required, a macro expansion request is placed instead of the full text. The advantage is the same as above: Changes to the macro text only have to be made at one single place and the same text is assured to be used at all positions where the macro expansion is requested.

2. The Preprocessor Language

T3X processes its input on a purely textual basis. It is able to recognize the syntax of T3X only to a limited degree. It leaves space characters where space characters exist and does not change any line breaks. Therefore, the output line numbering is always the same as the line numbering of its input file. When this is not possible due to the inclusion of a file, for example, TXPP generates an appropriate #L meta command to reset the line numbering to reflect the original sequence.

There exist only a few different TXPP commands to fulfill the following tasks:

Each TXPP command begins with a # sign and normally extends up to the end of the line it begins in. It may begin in the middle of a line, too, but using this method is discouraged and it is not guaranteed to work in subsequent versions. The command name itself must follow the # sign immediately with no white space between them. When the name is one of the predefined built-in commands, the command and its arguments (but not including the line separation character) will be removed from the input and the associated action will be taken. When the name is the one of a previously defined macro, the command will be removed, too, and then substituted with the value (text) of that macro.

2.1 Built-in Commands

This is a summary of the built-in commands of the TXPP language. Some commands use symbol names. For TXPP, a symbol name is any sequence of upper and lower case characters and underscores (_). No other characters are allowed and upper and lower case is treated equal. (Outside of symbol names, the case of the processed text is preserved.)

The following predefined TXPP commands exist.

2.1.1 Macro Definitions

#define name text

Assign text to the symbol name. Text may be either one single word in the sense that there are no white space characters contained in it or a string delimited with double quote characters ("). When text is a single word, it will be simply copied to the value field of the macro name. When it is a string, it may contain newline characters, thereby allowing macro texts to span multiple lines. These line break characters will be removed from the string and replaced with blank characters. The delimiting double quotes also will be removed from the text before storing it.

When a macro with the given name already exists, a redefinition error will be reported.

After defining a macro, each occurrence of #name in the remainder of the file will be replaced with text.

2.1.2 Conditional Inclusion

#ifeq name value
conditional text
#end

Compare the value of the macro name against value. If the value of name is exactly the same string as value, the 'conditional text' between #ifeq and #end will be processed in the usual way. If the value of the specified macro does not match value, however, the conditional text is removed and replaced with empty lines (so that the line numbers remain the same).

#ifeq...#end blocks may be nested to any level. Inside of ignored conditional blocks, #ifeq and #end will be ignored as well as well as all other TXPP commands until an #end statement is found which terminates the negative condition.

It is an error to pass the name of an undefined macro to #ifeq.

2.1.3 File Inclusion

#include "filename"

Copy the text contained in the file filename into the output file so that the first line of the included file will begin in the line below the #include command. The line containing the command will be replaced with the meta command

#L 1 "filename" ;

which sets the compiler's input file name to the name of the included file and the line counter to 1. After the included text, TXPP inserts another #L command containing the line number where the insertion of the include file took place plus one, so that the included lines are not counted when TXTRN reports errors in the including file. Hence, the reported line numbers will match the original file and not the preprocessed version.

"Filename" is the name of the file to include. It is passed to the operating system without any changes. The case of all letters will be preserved. No prefix, suffix, or directory location will be pasted to the name. When including files from directories, the path name may contain system-dependant constructions which have to be changed when porting the program.

Hints: Slashes (/) work on Unix and DOS to separate directories in path names.
When a file name contains no white space or '#' characters, the quotes around the name may be omitted.

2.1.4 Macro Deletion

#undef name

Remove a previously defined macro. The name will be deleted from the internal macro dictionary and the space required for the name and the associated text will be released to the free memory pool. #undef may be used to delete macros when they are no longer needed or to prepare for the redefinition of a macro.

2.2 Macro Expansion

#name

Request the expansion of macro name. Whenever a # sign followed by a name which is not a built-in command is found in the input, it is removed and replaced with the text assigned to name using #define. Because all newline characters are removed from macro texts during their definitions, all expansions occur in a single line and therefore, the line numbering will never change due to a macro expansion.

Unlike commands, macro expansion requests may occur at any position in a line, even if the request is not separated from the surrounding text by space characters. Given the definition

#define readchar "Buffer[Ptr];
                Ptr := Ptr+1;
                if (Ptr>End) more()"

for example, the expansion request

ch:=#readchar;

will expand to

ch:= Buffer[Ptr]; Ptr := Ptr+1; if (Ptr>End) more() ;

Notice that TXPP does never expand any macros implicitly, but only if this is requested using #. Therefore, the text

ch:=readchar;

will not expand to anything, but simply pass through as it is.

3. Includable Extensions

The includable extensions can be found in a system-dependant directory like /usr/local/T/inc or in a per-user directory like $HOME/inc. By conventions, the include files supplied with the T3X distribution have a .inc suffix. Each extension should be included only once to avoid redefinitions. When an extension depends on another one, this is noticed in the below descriptions. All files an extension depends on must be included before the file containing the dependencies.

Some files contain only interface definitions for routines of the extended TXX interpreter and the respective runtime support libraries. Others contain function definitions themselves. Since release 4, the T3X compiler contains an optimizer which is capable of removing unused procedures. Therefore, no dead code will be generated when included procedures are not actually required.

3.1 Interface Definitions for Standard Extensions

3.1.1 The Basic Library Extensions

The file basic.inc contains the INTERFACE definitions for the extended runtime support procedures which have been added to the T3 library when T3X has been designed. For a description of these procedures, see the T3X Reference Manual.

3.1.2 The Video Terminal (VIO) Extensions

The file vio.inc contains the INTERFACE definitions of the routines used to control character-based video terminals as well as some additional definitions like color constants and keycodes. For a description of the VIO extensions, see the T3X Extensions Manual.

3.1.3 The Graphics Extensions

The include file graphics.inc contains the INTERFACE definitions of the device-independent vector graphics extensions, the definition of the event structure used by G_EVENT(), and some other useful constants. For a description of the graphics library and its procedures, see the T3X Extensions Manual.

3.2 The I/O-Stream Extension


Usage: #include basic.inc
       #include ios.inc

An I/O stream is an additional layer between a user program and a file or a device. It performs block-based I/O operations on the device it is assigned to and provides the flexibility of character-based I/O to the programmer at the same time. When reading a file character by character, for example, the I/O stream layer will translate the first character input request into a block input request and deliver subsequent characters from an internal buffer. When all characters have been read from the buffer, the next block will be read from the device. Similar methods are used to buffer other read and write requests.

Besides fully buffered input and output, the IOS extension also provides interfaces to the usual file operations like opening and closing files, moving file pointers, EOF detection, etc.

3.2.1 The IOS Structure

All IOS operations are peformed on a structure called an IOS structure which is defined in ios.inc. An IOS object (aka an I/O Stream) is created using the syntax

var name[IOS];

where name is the name of the new IOS. The internal structure of an IOS will not be described here, since no program should ever rely on a specific layout. The IOS structure may change in subsequent releases.

3.2.2 The IOS Procedures

IOS_create(iostream, fd, buffer, len, mode)

Initialize a new IOS with an already opened file and the given buffer and mode. Iostream should be an uninitialized or closed IOS. If the stream is still active while IOS_create() is applied to it, its previous settings will get lost (probably leaving the associated file or device open).

Fd is an open file descriptor which will be assigned to the IOS so that the IOS may be used to manipulate the respective file. A valid decriptor can be optained from the T3 standard procedure open().

Buffer is the memory region which will be used for buffering blocks. Its length must be explicitly specified in len. Usual sizes are 256, 512, and 1024 bytes. On some systems, larger buffers mean faster disk I/O. The buffer must be supplied by the user. It may be a global or local array or a block delivered by MEM_alloc() (see next chapter).

Mode is a bit map containing the modes which are allowed for accessing the IOS. Possible modes are as follows.

FlagMeaning
IOF_READIOS is read-only
IOF_WRITEIOS is write-only
IOF_READ | IOF_WRITE IOS is readable and writable

The programmer is responsible for supplying an fd which may be accessed using the specified mode. Otherwise, the results of subsequent IOS requests are undefined.

IOS_create() returns a pointer to iostream on success and -1 in case of an error.

IOS_open(iostream, uname, buffer, len, flags)

Initialze an IOS and assign a file to it. The arguments to IOS_open() are the same as to IOS_create() with the exception of fd which has been replaced with uname. Instead of an already open descriptor, a file name is passed to IOS_open(). The file name must be passed to it as an unpacked string.

The procedure attempts to open the file specified in uname in the given mode. If the open() call succeeds, the resulting descriptor is passed along with the other arguments to IOS_create(). If the file could not be opened or the value of mode is not legal, IOS_open() returns -1. Otherwise it returns the value of the IOS_create() call.

When mode is either IOF_READ or IOF_READ|IOF_WRITE, a non-destructive open() call is used to open the given file in the respective mode. When mode=IOF_WRITE however, the open() call will create a new file, thereby destroying any existing file with the given name. (See the description of open() in the T3X Reference Manual for details.)

IOS_wrch(iostream, char)

Write the single character char to iostream. Return the character, if everything went fine. If an I/O error occurred during the operation, return -1. An I/O error might occur, when a buffer overflow occurs and the routine fails to write the pending buffer to the assigned device.

IOS_wrwrd(iostream, val)

Write the machine word val to iostream in little endian ordering (low byte, high byte). Return nothing. No error checking is performed.

IOS_write(iostream, buffer, len)

Write len bytes from buffer to iostream. Return the number of bytes actually written. If the return value of IOS_write() is less than len, this usually indicates that there is not enough space left on the target device. A return code of -1 indicates general failure.

IOS_writes(istream, ustring)

Pack the unpacked string ustring and then pass it -- along with its length -- to IOS_write() for output. This procedure fails with a return code of -1, if ustring is longer than 1024 characters. Otherwise it returns the result of IOS_write().

IOS_rdch(iostream)

Read a single character from iostream and return it. A return value of -1 indicates either that the end of the assigned file has been reached or a general error has occurred. Use IOS_eof() to check the EOF condition.

IOS_rdwrd(iostream)

Read one machine word in little endian order (low byte, high byte) from iostream and return it. Use IOS_rdch() to read each byte. No error checking is performed.

IOS_read(iostream, buffer, len)

Read len bytes from iostream into buffer. Return the number of characters read. If the return value is less than len, the end of the assigned file has been reached. A return value of -1 indicates general failure.

IOS_reads(iostream, buffer, len)

Read up to len characters from iostream into buffer. Return the number of characters actually read. IOS_reads() terminates if either len characters have been read, the EOF has been encountered, or a newline character has been read. Therefore, it always reads (at most) a single line from the input stream. It is intended for processing terminal input and files with a line-oriented structure. When the return value of IOS_reads() is zero, the end of the input has been reached. A return value of -1 indicates a general error.

IOS_position(iostream, offh, offl, origin)

Move the file pointer of the descriptor assigned to iostream to the position

origin + (offh << 16 | offl)

where origin indicates the position where the move starts. It may have the following values:

SEEK_SETThe beginning of the file
SEEK_RELThe current position
SEEK_ENDThe end of the file

When origin is SEEK_END, the offset must be negative. Otherwise it may be either positive (moving forward) or negative (moving backard). When forming negative values, notice that offh is the high word of a 32-bit double word. Therefore, a negative value must be computed by building the complement of offl and offh. The value -1, for example, would imply offl=-1 and offh=-1.

IOS_position() implicitly flushes the buffer of iostream.

IOS_eof(iostream)

Return -1, if the end of the file assigned to iostream has been encountered during an IOS_read(), IOS_reads(), IOS_rdch(), or IOS_rdwrd() operation. Otherwise return 0.

IOS_flush(iostream)}

Write the buffer of iostream to the file assigned to the IOS, thereby synchronizing the IOS and the actual file. Return -1 in case of an error and otherwise 0.

IOS_flush() should be used only to flush the output written to character devices, like terminals or communication lines, because flushing half-filled buffers will destroy the synchronization between an IOS and the block structure of a disk file, probably resulting in slower I/O operations.

IOS_close(iostream)

Close the given iostream. Return 0 upon success and -1, if a pending buffer could not be written to disk (in this case, iostream remains open). A closed IOS instantly becomes invalid and may not be referenced any longer.

3.2.3 Notes on Using I/O-Streams

When an IOS has been set up for both, reading and writing, its buffer must be flushed when switching from read mode to write mode and vice versa. This can be done by applying IOS_flush() to the IOS explicitly or by performing an IOS_reposition() operation on it.

Because I/O streams are fully buffered, special care must be taken when handling character devices like terminals. Using write calls like IOS_writes() for writing to terminal screens may not cause any output before the next buffer overflow occurs. IOS_flush() should be used in this case to force the synchronization of the IOS and the assigned device.

Similar problems may occur when reading terminal devices. Since most terminal devices are line-buffered, the return value of IOS_read() may be less than the specified length. This is no reliable indicator for an EOF condition. Hence, IOS_eof() should be used to detect the EOF when reading character devices.

Never test the exit code of an IOS call using a less than zero condition like in

if (ios_open(s, name, buf, len, IOF_READ) < 0) ...

because S may be located at an address greater than 32767 which will be interpreted as a negative value by T3X. Therefore, the above call may seem to fail even if everything went fine.

3.3 The Dynamic Memory Extension


Usage: #include mem.inc

The include file mem.inc contains routines for managing dynamic memory pools. A memory pool is a user-allocated vector which is controlled by the routines described here. The pool may be either a global or a local vector, but if it is local, it must not be deallocated as long as the pool is in use. Any number of pools may be created as long as sufficient free memory exists.

Memory pools are useful for creating dynamic structures, like linked lists, trees, graphs, dictionaries, and other structures which allow the insertion and deletion of elements at any position. The use of limited memory regions as memory pools allows to delete all objects in one pool with a single procedure call. Therefore, it is not necessary to destroy complex structures like trees step by step.

3.3.1 Memory Management Routines

MEM_init(pool, size)

Initialize the vector pool with the given size. Pool should be an ordinary word vector (not a byte vector) and size should be its size (as specified in its declaration). Each pool must be initialized using this procedure before it can be used. MEM_init() basically adds the the entire pool to its free list. Therefore, it may be also used to destroy all objects in a given pool at once.

MEM_alloc(pool, size)

Allocate a vector of the given size in a specific pool and return a pointer to it. The size is always specified in machine words. MEM_alloc() uses a first-match algorithm. It allocates the first free vector in pool which is at least as big as size. When the allocated block is greater than size, MEM_alloc() splits it and returns the superflous part to the free list.

If MEM_alloc() fails to allocate a vector of the given size, it returns zero.

MEM_free(pool, vector)

Release the specified vector to the free list of pool. The vector must have been allocated in the same pool it is released to. After freeing a vector, the space which has been allocated by it may be used by MEM_alloc() to satisfy subsequent allocation requests. Therefore, the content of vector should be considered undefined after passing it to MEM_free().

When an invalid block is passed to MEM_free(), it terminates the calling program and prints a message like
mem_free(): bad block
A block is invalid, if it already has been freed or if it does not belong to the specified pool.

Besides returning a block to the free list of a memory pool, MEM_free() also defragments the pool. When allocating many small blocks from a pool and then freeing some of them, holes may be created as shown in the following illustration (dashed blocks [///] are allocated).

   A     B     -    D      Free memory
+------------------------------------------------------------+
|/////|/////|     |////|                                     |
+------------------------------------------------------------+
                      |............ requested ...............|

In such a situation an allocation request requiring the size of the Free memory plus 1 would fail, because not enough space is available in one piece. When D would become free, too, three continuous blocks would be available, but still no one with the size of Free memory plus 1.

   A     B     -    -      Free memory
+------------------------------------------------------------+
|/////|/////|     |    |                                     |
+------------------------------------------------------------+
                      |............ requested ...............|

Therefore, MEM_free() connects subsequent free blocks. Each time it is called, it checks the pool for sequences of free blocks and turns them into one single free block. Thereby, it creates larger free blocks which may be used to satisfy requests for larger vectors.

   A     B     Free memory
+------------------------------------------------------------+
|/////|/////|     .    .                                     |
+------------------------------------------------------------+
                      |............ requested ...............|
MEM_walk(pvec, psize, pstat, start)

Visit each block in the pool pvec. First, a pointer to the first block of a pool P must be retrieved by running

v := MEM_walk(P, @size, @stat, 1);

(with start=1). This call returns a pointer to the first block of P. If the address of a variable is passed to MEM_walk() in psize, the procedures fills in the size of the returned block. If an address is passed to it in pstat, it fills in the status of the block (0=allocated, 1=free). When either information is not wanted, zero may be supplied instead of the address of the respective variable. A nonzero value for the start argument indicates that the first block of a pool is requested. While walking through the rest of the pool, it must be zero.

In subsequent calls to MEM_walk() where the remaining blocks of a pool are visited, the value returned by the previous call must be supplied for pvec:

v := MEM_walk(v, @size, @stat, 0);

MEM_walk() returns a pointer to the first word of the currently visited block. When all blocks of a pool have been visited, it returns zero.

The following procedure computes the total ammount of memory left in a pool.

freemem(pool) do
        var     p, size, free;
        var     total;

        total := 0;
        p := MEM_walk(pool, @size, @free, 1);
        while (p) do
                if (free) total := total+size;
                p := MEM_walk(p, @size, @free, 0);
        end
        return total;
end

4. License

This document is part of the T3X compiler package which is subject to the following terms.

T3X -- A Compiler for the Procedural Language T, version 3X
Copyright (C) 1996-2000 Nils M Holm. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.