T3X Language Reference

Release 5.3

(3rd revised issue)

Copyright © 1996-2000 Nils M Holm

mail: nmh@t3x.org
home: http://www.t3x.org/

Table Of Contents

1. Conventions

The following symbolic terms are used throughout this manual:

SymbolMeaning (used as a placeholder for...)
symany symbol name.
exprany expression (see expressions, operators).
cexpr any constant expression (see constant expressions).
lvalueany lefthand side of an assignment.
stmtany statement (see statements).

2. Programs

Each program is a set of declarations. The last declaration in each program must be a compound statement which forms the initial entry point of the program.

The translator will accept the following characters:

A comment may be placed between any two atomic parts (tokens) of a program or at the beginning of the program. It is introduced by an exclamation point (!) and extends up to the end of the line. The compiler interprets it like a single blank. The following objects are tokens:

3. Declarations

DeclarationDescription
CONST sym = cexpr, ... ; Define constants and initialize them with the given values.
DECL sym(cexpr), ... ; Declare -- but do not define -- the given procedures. Cexpr is the type (arity) of the respective procedure. Used to create forward-references.
EXTERN DECL sym(cexpr), ... ; Like DECL, but declare externally defined procedures. The specified procedures will be exported as unresolved names. They have to be resolved by a linker. (Experimental)
INTERFACE sym(cexpr) = slot, ... ;
INTERFACE sym(cexpr), ... ;
Define a procedure-style interface to a routine of the runtime environment located at the given `slot'. The slot numbers may be obtained from the description of the respective runtime library or VM interpreter. If no slot number is specified, the last assigned slot plus 1 will be used. Each slot number may be assigned only to a single interface name.
PUBLIC sym(a0, ..., aN) stmt Define procedure sym like in an ordinary definition, but also export its name for external linkage. (Experimental)
STRUCT s = m0, ..., mN; Define the layout of the compound data type s with the members m0 ... mN. This statement is equal to
CONST s=N, m=0, ..., mN=N-1;
To create an s-object, use
VAR sym[s];
sym(a0, ... aN) stmt Define procedure sym with the optional formal arguments a0...aN. Stmt is a single statement forming the body of the procedure (this may be a compound statement, of course). Each procedure returns the value specified in a RETURN statement, if any. When the end of a procedure is reached without encountering a RETURN command, its return value defaults to zero.
VAR sym, ... ; Define variables. (*)
VAR sym [cexpr], ... ; Define vectors. Valid vector members are sym[0]...sym[cexpr-1]. (*) Each member has the size of a machine word.
The vector symbol itself is a constant holding the address of the actual vector. Therefore, assignments to vector symbols are not allowed.
VAR sym :: cexpr, ... ; Define byte vectors. Valid members are sym::0...sym::(cexpr-1). (*) The byte vector differs from the vector (above) only be the way its size is computed. A byte vector provides enough space to hold cexpr characters instead of machine words.

(*) Definitions of ordinal variables, vectors and byte-vectors may be intermixed in a single VAR statement.

4. Compound Statements

A compound statement (aka statement block) is used to group statements. All statements which are part of a compound statement are executed in sequence. Each compound statement is itself an ordinary statement and therefore, statement blocks may be nested. Compound statements are delimited by the keywords DO and END. There is no terminating semicolon. Compound statements may be empty.

Each compound statement may define local symbols at its beginning immediately after the keyword DO. Only the following objects may be declared in local scopes:

5. Scoping Rules

All objects which have not been declared inside of compound statements or formal argument lists are called global objects. Objects which are defined in local scopes are called local objects. Global objects become valid at the point of their declaration and they remain existant up to the end of the program. Local objects will be created when the flow of the program passes their declaration. The required storage (if any) also will be allocated at this time. The scope of a local object is the compound statement (or procedure) which it has been declared in. When this statement block is left, all of its local symbols get destroyed and the associated storage will be released.

No name may be redefined ever -- neither at the same level nor in embedded scopes. This means that a local object may not have the same name as a global one, and a local object in scope B which is embedded in A may not have a name which already has been used in A. Symbol names may be used in different subsequent scopes, though. In this context, procedure arguments are considered local symbols whose scope is the procedure they belong to.

6. Statements

StatementDescription
lvalue := expr; Evaluate expr and assign its value to lvalue. Lvalue may be any object which has an address. This includes variables, vector members, byte vector members, struct members, but NOT: constants, structs, vectors, procedures. A vector object in lvalue may be multiply subscripted to assign a value to an embedded vector:
v[i1][i2]...[iN] := expr;
Lvalue will be evaluated before expr.
sym(expr1, ..., exprN); Call the procedure sym passing the values of the expressions expr1...exprN as actual arguments to it. The number of parameters passed to a procedure must exactly match the number of is formal arguments as specified in a previous definition, declaration, or external declaration. The return value of the procedure will be discarded.
CALL sym(expr1, ..., exprN); Call the procedure whose address is stored in the (ordinal) variable sym. Arguments will be passed as described above, but no type checking will be performed. A valid procedure address may be obtained using
sym := @procedure;
If sym names a procedure rather than a variable, the CALL prefix will be ignored.
FOR (sym=expr, expr2, cexpr)
stmt
FOR (var=expr, expr2) stmt
1) Initialize sym with expr.
2) If
sym >= expr2 /\ cexpr >= 0
\/
sym <= expr2 /\ cexpr < 0
then leave the loop. Otherwise, execute stmt, add cexpr (which may be negative) to sym, and go back to step 2).
If cexpr is omitted, it defaults to 1.
HALT; Branch to the end of the entire program, thereby terminating it.
IF (expr) stmt Execute stmt, if expr evaluates to logical truth (a non-zero value).
IE (expr) stmt-T ELSE stmt-F Execute stmt-T, if expr evaluates to logical truth (any non-zero value). Otherwise execute stmt-F.
LEAVE; Immediately branch to the end of the innermost WHILE or FOR loop, thereby leaving it. (*)
LOOP; Immediately branch to the beginning of the innermost WHILE or FOR loop. In FOR loops, branch to the increment part where cexpr is added to sym and in WHILE loops branch to the point where the exit condition is checked. (*)
RETURN expr; Evaluate expr and prepare the result for passing it back to the calling procedure. Then, jump to the end of the procedure where local storage will be released and the procedure returns (*). RETURN may not be used in the main procedure.
WHILE (expr) stmt Execute the loop body stmt as long as expr evaluates to a true (non-zero) value. If expr is false before the first pass, skip the loop entirely.
;
DO END
An empty statement does nothing. It may be used in places where the language expects a statement, but nothing is to be done.

(*) If a branch command is used inside a scope which defines local symbols, the symbols will be destroyed and the associated storage will be released before the branch takes place.

7. Operators

The notation .symbol is used in this section to denote the unsigned value of a symbol. For example, if A = -1, then .A = 65535 on a 16-bit machine like the Tcode machine.

OperatorPrecAssocDescription
( expr ) 0 - Override precedence and associtivity rules. A parenthesized expression is treated as a factor. Evaluate to the value of expr.
P (...)
CALL P()
0 - Call the procedure P with some optional arguments. The value of this operation is the return value of P. When the procedure call is prefixed with the keyword CALL, the procedure whose address is stored in the variable P is called.
For details on procedure calls, see the sections about statements and declarations.
A [ B ] 0 left Evaluate to the B'th member of the vector A or the vector, the variable A points to. Since vectors may be nested, multiple subscripts are allowed to a single symbols to provide access to embedded vectors:
A [B1] [B2] ...
A :: B 0 right Evaluate to the B'th byte of the byte vector A or the byte vector pointed to by the variable A. :: associates to the right, because its results are limited to 8-bit patterns:
A :: B :: C
equals
A :: (B::C)
but no parentheses may be used to override this rule, because :: may be applied only to vectors, but not to other subexpressions.
- A 1 right Evaluate to the negative value of A.
~ A 1 right Evaluate to the bitwise inverted value of A (bitwise NOT).
\ A 1 right Evaluate to the logical complement of A (logical NOT).
A=0 => -1
A\=0 => 0

Each non-zero value is considered true, and only zero denotes false.
@ A 1 right Evaluate to the address of A. A may be any lvalue as described in the section covering statements (assignments). This includes variables and members of any kind of vector objects:
@A[5][7]
computes the address of the 7th member of the 5th embedded vector in A. @A::4 evaluates to the address of the 4th byte in A.
A * B 2 left Evaluate to the product of A and B. If A*B does not fit in a machine word, the result is undefined.
A .* B 2 left Evaluate to the (unsigned) product of .A and .B. If A.*B does not fit in a machine word, the result is undefined.
A / B 2 left Evaluate to the integer part of the quotient A/B. A/B is undefined, if B=0.
A ./ B 2 left Evaluate to the integer part of the unsigned quotient .A/.B. A./B is undefined, if B=0.
A MOD B 2 left Evaluate to the division remainder of A./B where `./' denotes an unsigned integer division:
A MOD B = A - A / B * B.
If B=0, A MOD B is undefined.
A + B 3 left Evaluate to the sum of A and B.
A - B 3 left Evaluate to the difference A-B.
A & B 4 left Evaluate to the result of performing a bitwise AND on A and B.
A | B 4 left Evaluate to the result of performing a bitwise OR on A and B.
A ^ B 4 left Evaluate to the result of performing a bitwise exclusive OR on A and B.
A << B 4 left Evaluate to the result of shifting all bits of the value of A to the left by B positions (bitwise left shift). The sign bit is undefined after this operation.
A >> B 4 left Evaluate to the result of shifting all bits of the value of A to the right by B positions (bitwise right shift). The sign bit is undefined after this operation.
A < B 5 left Evaluate to true, if A is less than B. (*)
A > B 5 left Evaluate to true, if A is greater than B. (*)
A <= B 5 left Evaluate to true, if A is less than or equal to B. (*)
A <= B 5 left Evaluate to true, if A is greater than or equal to B. (*)
A .< B 5 left Evaluate to true, if .A is less than .B. (*)
A .> B 5 left Evaluate to true, if .A is greater than .B. (*)
A .<= B 5 left Evaluate to true, if .A is less than or equal to .B. (*)
A .>= B 5 left Evaluate to true, if .A is greater than than or equal to .B. (*)
A = B 6 left Evaluate to true, if A is equal to B. (*)
A \= B 6 left Evaluate to true, if A is not equal to B. (*)
A /\ B 7 left First, evaluate A. If A is true, evaluate B. If A is false, do not evaluate B. The result is the last evaluated subexpression. This is a generalization of the logical AND: A /\ B gives true, if A AND B have a true value and false, otherwise. (**)
A \/ B 7 left First, evaluate A. If A is false, evaluate B. If A is true, do not evaluate B. The result is the last evaluated subexpression. This is a generalization of the logical OR: A \/ B gives true, if either A OR B -- or both -- have a true value and false, otherwise. (**)
A -> B : C 8 left First, evaluate A. If A is true, evaluate B, else evaluate C. If B is evaluated, do not evaluate C and vice versa. The result is the last evaluated subexpression. This is the expression form of the IE statement:
X := A -> B: C
is equal to
IE (A) X:=B; ELSE X:=C;

(*) All relational operations evaluate to false, if the respective condition does not apply.

(**) Technically speaking, these are `short circuit boolean operators'.

8. Expressions

To form valid expressions, the above operators may be used to modify or combine the factors described in this section. The minimum form of an expression is a single factor. The following table summarizes all available types of factors.

Factor TypeDescription
Symbols The sort of value a symbol name evaluates to depends on the type of the symbol. Basically, every symbol evaluates to its value. For variables, this is the value stored in it, and for constants, structs and struct members, this is the value which has been assigned to them at declaration time.
The value of a vector symbol is the address of the associated vector. Therefore, vector symbols actually evaluate to vector addresses.
Numeric
Literals
Numeric literals are written in decimal notation and they evaluate to the values they represent. A leading percent sign may be used to indicate that the literal be negative:
%123 = -123
Note: %123 is an atomic factor while -123 is the operator `-' applied to the literal `123'. See the section on constant expressions for more details.
String
Literals
A string literal is a sequence of characters delimited by double quotes ("):
"Hello, World"
Each character occupies a full machine word unless the literal is prefixed with the keyword PACKED. In packed strings, each character requires only a single byte. Unpacked strings are terminated with a NUL word, packed strings are padded with NUL characters to the next word boundary.
Special characters may be included in strings using escape sequences (which will be desribed in the following section). String literals evaluate to the addresses of their first characters. They may be considered a special form of the table (see below).
Character
Literals
A character literal is a single character enclosed in single quotes (apostrophes):
'a', '0', '\s', ''', 'X'.
It evaluates to the ASCII code of the enclosed character. Like in string literals, escape sequences may be used to represent special characters.
Tables A table is a static initialized vector denoted by a comma-separated sequence of `table members' delimited by square brackets:
[ "IF", 0, @if_stmt ]
The type of each table member may be any out of the following list:
  • numeric literals,
  • character literals,
  • string Literals,
  • constants,
  • addresses of global objects,
  • tables,
  • embedded expressions.
Addresses of global objects are included using the address operator: @object. When string literals or embedded tables are included, only a reference to the embedded object is stored inside the table while the object itself is stored somewhere else. Hence, all table members can be addressed using an ordinary subscript operator. Given
v := [ [2,4,9], [7,5,3], [6,1,8] ];
for example, v[0][2]=9 applies. To embed a dynamic expression (whose value is not known at compile time), place it in parentheses:
v := [ "x*x gives", (x*x) ];
Each time the table is passed, the value of the embedded expression (x*x) will be re-computed.
Procedure
Calls and
Subscripts
Since procedure calls and subscripts may be considered both, operations or factors, see the section about operators for their explanations.

9. Constant Expressions

Only a subset of the available operators is allowed in constant expressions. There are no precedence rules and all operations evaluate from the left to the right. Since all operations can be explicitly specified by ordering, there are no parentheses.

Constant expressions are expected wherever a value must be known at compile time, like the types in procedure declarations, vector sizes, and the values of constants.

All operators have the same meanings as in ordinary runtime expressions. The following operator are allowed:

10. Escape Sequences

In string and character literals, each escape sequence consisting of a backslash (\) and the following character will be replaced with the associated non-printable or special character.

EscapeASCIIASCII 
SequenceCodeNameDescription
\a \A 7 BEL Bell -- ring the terminal bell
\b \B 8 BS Backspace -- move over previously printed character
\e \E 27 ESC Escape -- introduce a control sequence
\f \F 12 FF Form Feed -- eject paper on printer
\n \N 10 LF Line Feed -- move to next line
\q \Q 34 " Quote -- used for inclusion in strings
\" 34 " The same as \Q
\r \R 13 CR Carriage Return -- move to column 1
\s \S 32 blank A visual form of the blank character
\t \T 9 HT Horizontal TAB -- move to next horizontal TAB stop (*)
\v \V 11 VT Vertical TAB -- move to next vertical TAB stop (**)
\\ 92 \ Backslash -- used to escape \ itself

(*) Horizontal TAB stops are mostly at located at every 8th position.

(**) VT frequently moves to the same column in the next line.

11. Meta Commands

Meta commands are commands which do not actually belong to the T3X language, but will be evaluated by the compiler. Some of them affect the compiler itself and some affect the generated code. Normally, meta commands are generated by preprocessors or front ends and there should never be any need to include one of these commands manually. (Exception: #DEBUG;.)

All meta commands begin with a # sign and like all other statements, they are terminated with a semicolon. They might occur at any place where a statement or a declaration (either local or global) is expected, but not inside of statements or declarations.

CommandDescription
#DEBUG; Turn on the emission of debug information, like source code line numbers and variable names and addresses. When the debug switch is turned on, the T3X translator will generate a LINE instruction at the beginning of each statement and an LSYM instruction for each local and a GSYM instruction for each global variable.
Debug information is intended to be used by a source level debugger.
#L line "name" ; Re-set input line number and file name. This command should be generated by preprocessors when changing the order of input lines (for example when inserting code by including a file). #L sets the internal line counter of TXTRN to 'line' (which must be specified as a decimal number) and the input file name to 'name'. When reporting errors, TXTRN will use the provided line number and file name. The command
#L line "" ;
should be used to indicate that the following text belongs to the main program and has not been included from some other file.

12. T3-Compliant (Built-in) Procedures

These procedures are available in all T3X programs. They do not require an explicit INTERFACE declaration.

ProcedureDescription
ATON(S)
String S
Compute the value of the decimal number whose ASCII representation is stored in the string S:
  1. skip leading space characters (\f,\n,\r,\s,\t).
  2. Recognize an optional minus prefix (- or %).
  3. Collect digits.
  4. Return the computed value.
A zero result may indicate both, non-numeric input or a string representing zero.
CLOSE(FD)
Descriptor FD
Close the file descriptor FD. Return zero upon success and -1 in case of an error (invalid file descriptor).
ERASE(F)
String F
Erase the file whose path name is specified in the string F. Return 0 upon success, -1 in case of failure.
NEWLINE() Write a system-dependant newline sequence to the currently selected output stream. Return null.
NTOA(N,W)
Number N,W
Create a string representing the value N in an internal buffer. If N is negative, prefix it with a minus sign (-). If W is greater than the number of characters required by the string, pad the string with blanks to the given length. W must be less than 256.
OPEN(F,M)
String F
Number M
Open the file whose path name is stored in the string F in mode M. M may have one of the following values:
  • 0 - Open read-only, fail if non-existant.
  • 1 - Open write-only, create if non-existant.
  • 2 - Open read/write, fail if non-existant.
  • 3 - Open write-only, seek to EOF, fail if non-existant.
If OPEN() succeeds, it returns a file descriptor which may be used to access the opened file. In case of failure, it returns -1.
PACK(S,P)
String S
Pstring P
Pack the string S into P by storing the least significant 8 bits of each machine word of S into a byte of P. P must provide enough space to store the packed string. S and P may denote the same storage location. In this case, S will get overwritten. PACK() returns the number of machine words required to store P (including the terminating NUL).
READS(S,N)
String S
Number N
Read up to N characters from the currently selected input port into an internal buffer, unpack them into S (see UNPACK), and return the number of characters read. A zero return value indicates that the EOF has been reached, a negative value indicates general failure. N must be <= 1024.
SELECT(P,FD)
Number P
Descriptor FD
If P=0, select a new input port and if P\=0 select a new output port. The new port will be the file decriptor FD. A valid descriptor may be obtained from OPEN(). There are also three predefined standard descriptors:
  • FD=0 - standard input (the user's terminal or a redirected file)
  • FD=1 - standard output (the user's screen or a redirected file)
  • FD=2 - standard error (also the screen or a redirected file).
SELECT() always returns the previously selected I/O port.
UNPACK(P,S)
Pstring P
String S
Unpack the packed string P into S by storing each byte of P in a separate machine word in S. Each word in S will be zero-extended. P and S may not point to the same storage location. S must provide enough space to hold the unpacked string. UNPACK() returns the number of machine words required to store S (including the terminating NUL word).
WRITES(S)
String S
Pack the string S into an internal buffer (see PACK) and write it to the currently selected output port. Return the number of characters actually written. A return value which is not equal to the length of S indicates failure. The length of S may not exceed 1024 characters.

13. Extended Runtime Support Procedures

These procedures must be declared explicitly using INTERFACE statements. The slot numbers of the procedures are given in the below table. Notice that the number of arguments in each INTERFACE declaration must exactly match the number of arguments expected by the respective procedure.

The preprocessor TXPP provides some includable files which contain the required declarations. See the documentation of TXPP for details.

ProcedureSlotDescription
MEMCOMP(R1,R2,L)
Vector R1,R2
Number L
16 Compare each byte in memory region R1 with the byte at same position in R2. If the first L bytes of both regions are equal, return zero. If two bytes at equal positions differ, return their difference
R1::P - R2::P
where P is the position of the mismatch.
MEMCOPY(D,S,L)
Vector D,S
Number L
15 Copy L bytes from S to D. Return nothing. The two vectors D and S may overlap.
READPACKED(FD,B,L)
Descriptor FD
Vector B
Number L
11 Read up to L bytes from the file descriptor FD into the buffer B. Return the number of bytes read. A zero return value indicates that the end of the input file has been reached. A negative value indicates general failure.
RENAME(OLD,NEW)
String OLD,NEW
14 Rename the file whose path name is stored in the string OLD to NEW. Return zero upon success and -1 in case of failure.
REPOSITION(FD,PH,PL,O)
Descriptor FD
Number PH,PL
Number O
13 Move the file pointer of the descriptor FD to the position specified by the values PH and PL. PH and PL are both machine words while file offsets are usually at least 32 bits wide. Therefore, the position is computed using the formula
Offset = PH * 65536 + PL .
O specifies the origin of the move:
  • O=0 - the beginning of the file
  • O=1 - the current position
  • O=2 - the end of the file.
REPOSITION returns -1 in case of an error and otherwise zero.
WRITEPACKD(FD,B,L)
Descriptor FD
Vector B
Number L
12 Write L bytes from the buffer B to the file descriptor FD. Return the number of bytes actually written. Any number which is not equal to L indicates failure.

14. License

This document is part of the T3X compiler package which is subject to the following terms.

T3X -- A Compiler for the Procedural Language T, version 3X
Copyright (C) 1996-2000 Nils M Holm. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.