In the mid-90's, I was looking for a language providing the following properties (again):
One might think that there must have been quite a few languages providing these features, but obviously my search did not lead to any satisfactory result, or I would not have invented T. Point (3) turned out to be particuraly hard to match. The language which came closest to my requirements was BCPL. The typeless approach which has been very consistently implemented in this language leads to clear, simple, and flexible semantics. The language is portable, its implementation is small and can easily be done in BCPL itself. The compiler provided by Martin Richards, the inventor of BCPL, generates code which is aimed at interpretation (for the purpose of porting the compiler), but may be translated into native code, as well.
Unfortunately, the syntax of BCPL reflects its otherwise overwhelming elegance only to a limited degree. The precedences of some operators have been chosen in a not very intuitive way in my opinion and the syntax of the language is hard to parse by a pure recursive descent parser -- there are precedence rules for statements, for example, which make parsing and understanding BCPL programs unnecessarily hard. An RD parser always had been a must for the language of my choice, because they are very easy to implement. [I have to note at this point that Richard's BCPL compiler is small, elegant and easy to understand, although it uses syntax trees and a bottom up parsing technique.]
Even if BCPL did not match my ideas exactly, it came pretty close and studying the compiler sources has influenced the design of T a lot. Without BCPL, T would not be the language it is today.
The most important thing when designing a programming language is -- in my opinion -- to define its main purpose. The design goal of T was to create a portable, simple, and easy to understand notation for the description of algorithms. T was never aimed at industrial software development. Its purpose is to support the programmer in the process of reasoning about problems. It should be a productivity tool in the sense that it provides a playground for new ideas and allows the creator of these ideas to share it with others using a formal notation. Such a notation, of course, has to be clear, simple, easy to learn, and it would be a great advantage, if a compiler for this notation would be available in many different environments.
Naturally, my interpretation of `productivity' is not exactly the same as in the rather profit-oriented `real world' and the design of the language T reflects this intention very well. T is not suitable for writing large scale `application programs', nor for `rapid application development'. It is more a notation than a programming language. Because it is simple and straight-forward, it does not force its user to pay too much attention to the language itself. Instead, it provides some very basic `building-stones' which may be used to build a formal solution for a given problem.
There are many popular programming languages which provide functions to create `popup menus', `radio buttons', database queries, `event handling' and tons of features which are very helpful when creating graphical user interfaces and such things. But all these features do not really help the programmer solve a problem -- unless the problem is to create an application program. When I am talking about a `problem', however, I usually mean the search for an algorithm. Frequently, people say things like
`A' cannot be done in language `B'.Normally, such statements are the result of too little reasoning. Basically, any algorithm can be implemented in any language. The only difference is in the amount of work one has to perform to solve the same problem in different languages. So the correct form of the above statement would be
Problem `A' is inconvenient to solve in language `B'.T provides only a very basic set of building-stones, but it turns out that this set is suitable to solve a variety of different problems in a convenient way -- including the creation of a compiler for the language itself.
Since the original design, many different versions of the language T have been created. These versions are currently at different stages in their development, where the most actively developed version is T3X. The following figure provides an overview over the currently existing versions.
T 1 (just an idea...) | | T 2 (C version, 386, 8086) | | T 3 ----+---- T 3r0c | | (386) | | | | | +---- T 3r0t | (386, 8086, symbolic) | | | T 3r1t | (386, 8086, symbolic) | | | T 3r2t -------------+---- T3Mr0 | (386, 8086, symbolic) | (6502) | | | | T 3r3 | | (386, 8088, symbolic) +---- T3Xr{0-3} | (Tcode, 386, 8086, | synthesizing TXCG, | VM in C) T 4 ---- T4r0 | (386, 8086, T3Xr4 synthesizing CG, (Tcode, 386, 8086, Modules, serapate TXCG, TXOPT, include compilation) files via TXPP) | T3Xr5 (Tcode2, 386, 8086, TXCG, TXOPT, TXPP, debugger support)
The original T (version 1) has never been fully implemented. It was merely an idea -- something I was playing with.
The first real implementation was T2. It was written in C using very conventional RD parsing techniques. There were assembly code generators for 8086 and 386-based machines. These generators are still available in all T3 and T4 packages, but not in T3M and T3X.
T3 is the root of a large subfamily of T versions. T3r0t (T version 3, release 0, T implementation) has been implemented first. The techniques used in the T3 implementation were quite different from the techniques used in the T2 translator. Since the T3 code generators were rather dumb at that time, efficiency became very important, because the compiler was used to compile itself. (At least, I wanted a reasonably fast compiler). The C version has been written at a later time to allow the installation of T3 on machines providing a C compiler. Basically, it was a modified T2 compiler. The C version is included in all T3r? packages.
The language definition of T3 differed from T2 only by the addition of some built-in runtime procedures, namely OPEN(), CLOSE(), and ERASE().
Subsequent releases of T3 were mainly bug-fix releases. Till now, T3 is one of the most stable and usable branches of the language. It is still actively developed, altough is has been mostly superseded by T3X. T3r3, for example, is an attempt to make T3 more T3X-compatible under some aspects. [Note: As of 1998, the development of T3 has been discontinued. T3r3 is the final release. From this point, all further improvements will be integrated into T3X.]
The next step in the development of T was T4. T4 features separate compilation, modules, and synthesizing code generators for 8086- and 386-based platforms. There were many other additions to the language itself which made T4 more convenient to use than T3. However, T4 remains an experiment. There has never been a real release of T4. Most enhancements to the language itself have been integrated into T3X in the course of time.
T3M was an experimental port of T3 to 6502-based systems. It has never been finished.
The flagship of the T3 family -- and in fact, of the entire T family -- is T3X (T3 eXtended). Most important features of all other branches have been integrated into T3X and a totally different approach to portability has been chosen at this time. Instead of defining a procedure call interface to the code generators, Tcode -- a simple intermediate language -- has been created to pass information from the compiler to the code generator. This way, the translator and the code generator have been conceptionally separated. Another goal of this approach was the creation of a language which is suitable for the efficient interpretation by a virtual machine. In fact, the first `back end' for the T3X compiler was a Tcode interpreter.
Of course, the use of a simple virtual machine for porting a compiler
to new platforms is not new. A very good lecture covering this topic
is still
["BCPL - the language and its compiler", Richards & Whitby-Strevens,
Cambridge U Press, 1980].
T3X is a typeless, block-structured, procedural programming language. Programs, procedures, statements, and expressions form a hierarchy: Programs consist of procedures (and statements), procedures usually contain statements, and statements frequently contain expressions. Variables may be atoms (ordinal) or vectors (one-dimensional arrays). Since there are no different types, composed data types - called structures - are almost equal to vectors. Constants may be used to represent frequently used or tuneable values.
The T3X compiler expects its input in the form of an ASCII file. The following characters will be treated as white space:
White space characters delimit tokens, but will be otherwise ignored by the compiler.
Valid input characters are the upper and lower case alphabetic characters A-Z, a-z, the decimal digits 0-9, and the special characters
! & ( ) * + , - / : ; < = > @ [ \ ] ^ _ | ~
Characters which are not contained in this alphabet may occur only in string literals, character literals, and comments. Otherwise they will cause an error.
A comment may be introduced at almost any point in a T3X program using an exclamation point (!). It extends up to but not including the end of the line. Therefore, a comment is treated the same way as a single white space character, and consequently,
wh! this is a comment ile(1) ;
is equal to
wh ile(1) ;
and not to
while(1) ;
Consequently, comments may not occur inside of a single token, but only between two tokens. This is particularly valid for string literals and character literals which are single tokens as well. A !-character inside one of these literals is treated as an ordinary character.
Symbolic names may contain alphabetic characters, the underscore character (_), and decimal digits, where the first character must be alphabetic or an underscore. Upper case and lower case characters will be treated equal. Therefore, the names
abc abC aBc aBC Abc AbC ABc ABC
would all refer to the same symbol in a T3X program. The T compiler always uses all characters contained in two symbols to distinguish them, so
very_very_very_long_symbol_number_one
and
very_very_very_long_symbol_number_two
are guaranteed to be different. The maximum length of symbol names may be limited by other factors like the maximum token length, however.
In fact, T3X is not a totally typeless language. There are variables and vectors and under some aspects these two types are different. On the other hand, the only actual data type in T3X is the machine word and a vector is nothing but a sequence of machine words. In BCPL, the typeless concept has been taken much further: vectors are always anonymous objects and a separate pointer is required to reference them. This is different in T3X: the compiler memorizes the type of the object and generates the appropriate instructions when accessing the object. To the programmer, however, the difference is minimal. The BCPL statement
LET v = VEC 25
creates an anonymous vector with the length of 26 cells and initializes the variable v with the address of the vector. The T statement
VAR v[25];
on the other hand, creates a vector with the length of 25 cells and associates its address with the symbol v. The difference is as follows: in T3X, v is a compile time variable and therefore, it may not be changed at run time (this is possible in BCPL, however, since v is an ordinary variable, there).
Since the semantics of (atomic) variables and vectors are different, one might say thay there are different types in T3X, but since the restrictions which arise from this fact are limited to the lack of assignments to vectors (but not members of vectors!), I think that the term `typeless' may as well be applied to T3X.
Each (atomic) T3X variable allocates exactly one machine word. When talking about variables in the remainder of this document, the attribute atomic is always implied. Vectors will be implicitly referred to as vectors or maybe as arrays.
Variables are defined using a VAR statement. Any number of names may be defined in a single statement:
VAR x_coord, y_coord, depth;
Although, it is recommended to define only logically connected variables in a single statement.
All types of values may be stored in a variable: numeric values, pointers to strings, pointers to vectors, pointers to structures, or single characters. The range of numeric values which may be stored in a variable actually depends upon the implementation, but the Tcode engine uses only 16 bits to represent a cell -- independently from the underlying platform. Therefore, programs which use values not in the range -32768...32767 may be considered machine-dependant.
When a variable is placed in an expression (frequently also called a righthand side value, it evaluates to its value. When it is placed on the lefthand side of an assignment, however, it evaluates to its address (which will be dereferenced immediately by the assignment operator, of course).
Constants are variables which exist only at compile time. Instead of an automatically assigned address, they are initialized with an explicit value when they are declared. Since they are compile time variables, the values of constants may not change at run time. Any number of constants may be declared in a single CONST statement:
CONST read=1, write=2, rdwr=read|write;
Each constant name must be followed by an equal sign (=) and a constant expression which evaluates to the value of the constant. Constant expressions will be explained in a later section.
Constants may occur only in righthand side expressions, where they evaluate to their values.
Vectors, like constants, are compile time variables. When they are declared, they will be initialized with the address of an array of subsequent machine words, the so-called vector members or vector elements. The address of a vector is equal to the address of its first member. Any number of vectors may be defined in a single VAR statement. Declarations of vectors and atomic variables may be mixed in one and the same statement:
VAR RingBuffer[1000], Head, Tail;
Vector declarations differ from atomic variable declarations by the trailing square brackets containing a constant expression which specifies the size of the vector in machine words. The first member of a vector has the index value 0 and the last one has the index vectorsize-1 (999 in the above example).
Since vector addresses are stored in compile time variables, they may not change at run time. Naturally, it is legal to change the values of vector members, though. When occurring in righthand side expressions, vector names evaluate to the addresses of their associated arrays.
Single members of a vector may be addressed using the subscript operator []. The expression
v[5]
for example, evaluates to the fifth member of the vector v. Subscripted vectors may occur on the left sides of expressions, as well. The assignment
v[i] := 99;
would change the i'th member of v to 99. Like atomic variables, the members of vectors may be used to store any data type, even pointers to vectors. See the description of the []-operator for details about nested vectors.
A special case of the vector is the byte vector. Like `ordinary' vectors, they are declared in VAR statements:
VAR Input::256, Output::256;
The only difference between a vector and a byte vector is the computation of the required size. The size value after the ::-operator specifies the number of characters required. The amount of memory actually allocated depends on the size of a machine word on the target machine, BPW. The Tcode interpreter assumes BPW=2. The size of a byte vector is computed using this formula:
vectorsize + BPW - 1 -------------------- BPW
which allocates enough space for at least vectorsize characters. No further type information is stored in vector entries. Therefore, it is valid to access byte vectors using [] and word vectors using ::. However, this is discouraged, because the actual vector sizes might depend on a specific implementation.
A structure is a composed data object. Only one structure may be defined in a single STRUCT statement:
STRUCT point = pt_x, pt_y;
Note that this statement does not actually create a new data object, but only the `layout' of a point object. To create a point object, an additional VAR statement is required.
VAR point_a[point], point_b[point];
creates two point objects, point_a and point_b. The members of such an object can be addressed using the subscript operator: point_a[pt_x] and point_a[pt_y].
Structures do not really have an own type. As the declaration and member access syntax already suggests, they are ordinary arrays and the member names are constants. In fact, the statement
STRUCT s = a, b, c;
is perfectly equal to
CONST s=3, a=0, b=1, c=2;
The STRUCT statement only defines symbolic names for accessing members with a fixed position and meaning and another constant which names the entire structure and holds the number of its members.
This section describes the most basic elements of each T program, the factors which may be used in expressions. This chapter is written in bottom-up order, so that factors already have been explained when expressions will be discussed, and expressions already have been explained in the section about statements.
There are many different kinds of factors: symbols, numeric literals, character constants, string literals, tables, and procedure calls. A factor may occur only in expressions and a single factor is the minimum form of an expression. Factors may be prefixed by unary operators or they may be combined using binary or ternary operators. Basically, all sorts of factors are exchangable: where one of them may occur, all others are allowed, too. The only exception is the symbol which has some additional properties which make it special. For example, symbols may be subscripted and it is possible to compute their addresses. These operations are limited to symbols. All other operations may be applied to any kind of factor, even it it makes little sense, like the multiplication of two strings (which will lead to highly machine- and platform-dependant results):
"Hello" * "World"
The evaluation of a symbol depends on its type. Variables and constants evaluate to their values, vectors evaluate to their addresses. Structure names and structure members behave like constants.
Numeric literals are written in decimal notation and represent their own values. A percent sign may be used to negate a number:
%123 = -123.The difference between %123 and -123 is that %123 is a factor while -123 is an expression (`minus' applied to a numeric factor). In fact, the percent sign has little meaning in T3X, since the compiler evaluates ordinary minus prefixes in constant expression contexts, too. In T3, constant expressions were limited to single factors and therefore, the percent sign was required to define negative constant values. The %-prefix has been kept for compatibility reasons, though. An optimizing compiler might turn -n into %n, if n is a constant numeric factor.
Character constants are single characters or escape sequences enclosed by single quote characters like
'a' '0' '\s' ''' '\\'
A character constant evaluates to the ASCII code of the enclosed character. An escape sequence may be used to include certain unprintable or special characters. The backslash character is used to introduce the sequence. The \ and the following character will be removed and replaced with the associated character. Note that no escape sequence is required to represent an apostrophy: '''. Besides most C-style sequences, the following translations will be performed: \e->ESC, \q->", and \s->blank. The latter has been included for better readability. Just compare " " with "\s\s\s". Unlike C, T3X accepts uppercase sequences as well: \e and \E both evaluate to ESC. Like in C, the escape character may be used to escape itself. Thereby, it loses its special meaning and \\ evaluates to a single backslash.
String literals are sequences of characters delimited by double quotes ("):
"Hello, World!\N"
Each character either represents itself or is part of an escape sequence as described above. A full machine word will be allocated per character, but currently, only the least significant eight bits will be filled with the ASCII code of the character. In later extensions, the entire space may be used to represent Unicode characters, for example. Each string literal is terminated with a NUL character, so n+1 machine words are required to store a string of the length n.
Since a string is an array of subsequent machine words, the []-operator may be used to access its single characters.
When a string is prefixed with the keyword PACKED, a byte instead of a full word is used to store each of its characters, and enough NUL characters (but at least one) are appended to pad the literal to the next word boundary. This way, the proper alignment of subsequent data objects is guaranteed. The byte operator :: may be used to access single characters of packed strings.
At runtime, either form of the string literal evaluates to the address of its first character.
A more general form of a literal vector is the table. A table is a static initialized vector and a generalization of BCPL-style tables. Syntactically, it is a list of table elements delimited by square brackets:
[ 7, "MOD", @modulo ]
Each table member occupies exactly one machine word. A string, for example, is represented by a pointer, while the string literal itself is placed outside of the table. Therefore, table members can be accessed using the subscript operator []:
[ 77,88,99 ] [2]
evaluates to 99. The square brackets have been chosen for delimiting tables because of the connection between vectors and subscript operators.
The class of each table member may be any out of the following list:
Constant expressions include everyting which has a value that may be computed at compile time (like numeric literals). The inclusion of strings has been explained above. Addresses of global variables and procedures are represented by a symbol name prefixed with the address operator @.
What makes tables particulary flexible is the possibility to nest them:
[ [ 2, 9, 4 ], [ 7, 5, 3 ], [ 6, 1, 8 ] ]
Like strings, embedded tables are stored outside of the surrounding table and included as pointers. Therefore, if the above table is assigned to the symbol v, the following conditions hold:
v[0] = [ 2, 9, 4 ] v[1] = [ 7, 5, 3 ] v[2] = [ 6, 1, 8 ]
Since the result of applying a subscript operator to a table containig tables results in a vector again, the subscript operator may be applied one more time, and consequently,
v[1][1]
would result in 5:
v = [ [2,4,9], [7,5,3], [6,1,8] ] v[1] = [7,5,3] v[1][1] = 5
(Remember that the first element of a vector has the index 0.)
A table which contains at least one non-constant expression is called a dynamic table. Non-constant expressions must be put in parentheses when they are to be included in a table:
v := [ "a * b = ", (a*b) ];
Embedded (non-constant) expressions are computed freshly each time the flow of the program passes the table they are contained in. Therefore, the values of table members computed by embedded expressions may be different each time the table is evaluated. This is why such a table is called 'dynamic'. The parentheses indicate to the compiler that an expression is non-constant and make it generate additional code to fill in the value of the expression whenever the table is encountered. Therefore, static (constant) expressions should never be parenthesized in tables, because inefficient code would be the result.
v := [ "5 * 7 = ", (5*7) ];
works, but computes 5*7 each time the table is evaluated. (Note: Even if an optimizing compiler would fold 5*7 to 35, the value would have to be stored in the table each time it is passed.)
On the other hand, including dynamic expressions in a table without any parentheses will lead to an error:
v := [ "a * b = ", a*b ];
will not work unless both, a and b are constant.
Like strings, tables may be prefixed with the keyword PACKED. Packed tables may contain only byte-values. Therefore, their members are limited to constant expressions with bit patterns where only the least significant 8 bits may contain values other than 0. In numbers, this is the range from -128 to 255.
Strings may be considered a special form of a table. Consequently, each string may be written as a table as well. For example,
"T3X"
is equal to
[ 'T', '3', 'X', 0 ]
(Note the trailing zero in the vector literal.) A similar relation exists between packed strings and packed tables. Like packed strings, packed tables will be padded to the next word boundary with zeroes.
The maximum number of members per table may be limited, but at least 128 elements per table are always allowed. The elements contained in nested tables do not count, but the entire embedded table counts as a single member. The same limit may exist for packed tables and string literals.
Procedure calls are represented by the procedure name followed by a parentheses-enclosed list of zero or more comma-separated arguments:
writes("Hello, World!")
Each argument may be any valid expression. When a procedure expects zero arguments, the parentheses still must be supplied: P(). A procedure call evaluates to the return value of the called procedure.
In T3X, only procedures may be called. Calls to absolute addresses and computed calls -- like in BCPL -- are not allowed. There is a mechanism to perform indirect calls, though. The CALL operator allows to call a procedure whose address is stored in a variable:
p := @printpacked; CALL p(packed "Hello, World!\n");
For several reasons, the CALL operator should be treated with special care: The argument count will not be checked. Due to the T calling conventions, a wrong number of arguments will in most cases lead to an undefined result. The result is also undefined, if the variable in the CALL statement does not point to a valid procedure. The only way to obtain a valid procedure pointer is the application of the address operator @ to a procedure symbol.
When the symbol following the keyword CALL references a procedure rather than a variable, the CALL operator will be ignored completely.
In expressions, operators may be used to modify or combine factors in various ways. Most operators may be applied to any kind of operand, even if the resulting operation does not evaluate to any meaningful value.
There are different kinds of operators and like procedures, they are classified by the number of their arguments (or operands). There are unary (prefix and postfix) operators, binary (infix) operators, and there is one ternary operator and one variadic operator.
Operators also may be classified by their precedences. The higher the precedence of an operator is, the stronger it binds its operands. For example, the term operators (multiplication, division, modulo) bind stronger than the sum operators (addition, subtraction). Therefore,
a * b + c * d
is equal to
(a*b) + (c*d)
Like in math expressions, parentheses may be used to override these default bindings. The precedence rules are simple in T3X:
The precedence rules in 3. are similar to the rules used in the evaluation of algebraic math expressions.
Another property of an operator is its associativity. An operator associates to the left when a sequence of identical operations is evaluated from the left to the right:
Associativity Expression Meaning ------------- ---------- ------- left a - b - c (a - b) - c right a - b - c a - (b - c)
In T3X, all binary operations (with the sole exception of ::) are left-associative. The byte operator associates to the right. Unary operators are always right-associative.
In the remainder of this section, all availabe operators will be explained. The appearance is ordered by descending precedence.
The operators (), [], and :: are the only postfix operators. They always bind to primary factors in the form of symbols and this dependency cannot be overridden using parentheses. Subscripts and call operators may be considered part of a factor rather then an operator applied to a factor.
The ()-operator is the only variadic operator. Given the procedure call
P(a1, ..., aN)
its arity is N+1 (P plus N arguments). The meaning of the operator is the application of the procedure P to the (optional) arguments a1...aN. If P does not have any formal arguments, the syntax is
P()
The value of the operation depends on the semantics of P.
() may be applied only to symbols of the type `procedure'. The procedure must have been declared before its first application using either a procedure definition, declaration, or external declaration. The number of arguments to a procedure call will be checked against the arity of the called procedure. If the numbers do not match, an error will be signalled.
Each argument to a procedure call may be any valid expression itself which includes, of course, procedure calls. Given the binary function P2, the following expression is perfectly valid:
P2( P2(1, 2), P2( 3, P2(4, 5) ) )
An indirect procedure call may be performed using the CALL operator which is in fact an extension to the () operator. The expression
CALL PP(a1, ..., aN)
like the previous example, evaluates to the result of the application of PP to a1...aN, but in this case, PP is a procedure pointer instead of an actual procedure. A procedure pointer is an ordinary variable which has been assigned the address of a procedure using the address operator:
PP := @P;
In indirect procedure calls, no type checking as described above will be performed. If PP is the name of a procedure instead of a variable, the keyword CALL will be ignored.
In direct and indirect calls, the calling scheme is call by value. This means that the arguments to the call will be evaluated before the call actually takes place and the value of each parametric expression will be transported to the procedure.
Note: Since vectors evaluate to their addresses, passing a vector by value will actually pass a reference to the array associated with the vector symbol. Therefore, vectors are always passed by reference: Instead of passing the entire vector, only the address of its first member is transported to the called procedure. There, the address will be stored in an (atomic) parameter variable. Since parameters are always atomic and therefore evaluate to their values and vectors evaluate to their addresses, both the actual vector and the parameter will reference the same memory location and the parameter may be used as a vector.
The subscript operator [] may be applied to vectors as well as to atomic variables:
symbol [subscript]
where subscript may be any valid expression. If symbol is a vector, the subscript operation
a[b]
evaluates to the b'th member of a. If a is an atomic variable, the operation evaluates to the b'th member of the vector, a points to. This means that both subscripts in the following example would evaluate to the same value:
var v[100], pv; var a1, a2; pv := v; a1 := v[25]; a2 := pv[25];
The reason is simple. Since vectors evaluate to their addresses, the assignment
pv := v;
stores the address of v in pv. Atomic variables, on the other hand, evaluate to their values and therefore, pv evaluates to the address of v which has been previously stored in it. Consequently, v and pv both evaluate to the address of v in the above example. Hence, a variable which holds the address of a vector may be used in place of that vector. For this reason, the subscript operator may be applied to atomic variables as well.
Since there is no nesting limit for vectors, any number of subscript operators may follow a single symbol. Assuming that v5 holds a vector containing five levels of nested vectors, the expression
v4[i1][i2][i3][i4][i5]
may be used to access single elements at the deepest nesting level. Such chains of subscripts evaluate from the left to the right.
The byte subscript operator :: differs from the ordinary (word) subscript operator in several ways. Firstly, it addresses bytes in (byte) vectors and secondly, it associates to the right. The expression
a::b
evaluates to the b'th byte contained in the vector a. Therefore, :: is mostly used to access characters in packed strings. Since the results of ::-operations are always limited to byte-width, they cannot be assumed to always return valid addresses. For this reason, byte subscripts are right-associative (the result may be a valid subscript). If the expression
a :: b :: c
would evaluate from the left to the right
a::b :: c
the result of a::b would probably not be a valid address, since it is limited to eight bits. In this case, however, the following subscript would reference the position c of a non-vector -- which is certainly not the desired result. If the expression evaluates from the right to the left, on the other hand, in
a :: b::c
the subexpression b::c is evaluated first and will probably return a valid subscript. This subscript is then applied to the (also valid) vector a.
Finally, the :: operator differs from [] in the point that there is no righthand delimiter for the subscript expression. Therefore, the righthand side of :: is always a single factor and expressions like
a::b+c
actually evaluate to
(a::b)+c
since :: has the highest precedence. To address the b+c'th byte in the array a, the subscript must be parenthesized:
a::(b+c)
Note that in either form of the subscript operator, it is impossible to change the associativity using parentheses.
All unary operators have a high precedence and bind to single factors. Unless explicitly specified using parentheses, they never affect subexpressions containing other operators except for postfix operators which have an even higher precedence. The suffix operators must bind stronger than the prefix operators, because this order leads to much more sensible semantics. For example
-P(a,b)
means `negate the result of applying P to a and b' and
~v[j]
means `evaluate to the inverse value of the j'th member of v'. If the order of precedence would be reverse, the meaning of the first example would be `apply whatever is at the negative address of P to a and b' and the second one would mean `evaluate to the j'th member of the vector located at the address expressed by the inverse value of v'. Unless someone convinces me of the advantages of the second variant, I will stick to the first version: postfix operators are evaluated before prefix operators.
Altogether, there are four prefix operators. The minus sign - (which exists as a binary operator, too) evaluates to the negative value of its operand. Like in math, any even number of minus signs has no effect. The unary minus sign is distinguished from the binary '-' by its context. When the sign occurs between two operands, it is binary. If it occurs at the place of a factor, it is unary and the factor itself follows.
The tilde operator ~ results in the value of its operand with all bits inverted. Since inverting a bit twice always yields the original value, even numbers of ~-operators have no effect, either.
The backslash \ represents the logical NOT (while ~ represents the bitwise NOT). This operator evaluates to true (-1), if its operand is false (0) and vice versa. Only the value zero is considered `false' in T3X and all non-zero values will be considered `true'. The normal form of the `true' value is -1. Two (or any even number of) subsequent logical NOT operators may be used to create the normal form of a truth value.
The address operator @ evaluates to the address of its operand. Therefore, it may be applied only to symbol names. The addresses of constants and structure members may not be computed using @, because constants have no addresses, but only values. Since the subscript operators bind stronger than the address operator, @ can be used to compute addresses of vector and structure members, and even the addresses of members of nested tables:
@v[i][j]
computes the address of the j'th member of the embedded vector v[i]. Of course, the address operator might be combined with byte subscipts, as well:
@s::i
yields the address of the i'th byte in s.
The operation A*B evaluates to the product of A and B. If AB does not fit in a machine word, the result is undefined.
A/B results in the integral part of the quotient of A and B. The result is undefined, if B is zero.
A MOD B evaluates to the difference between A and A/B*B where A/B is an integer division like described above. Therefore, A MOD B is the division remainder of A/B. Like /, MOD leads to an undefined result, if B=0.
All term operators respect the signs of both of their operands. Two equally signed operands yield a positive result and operands with different signs lead to a negative result.
As of version T3XR5, the language also provides some modified operators which work on absolute values. Modified versions of the multiplication and division operator exist. Like all modified operators, they are prefixed with a dot `.':
The operation A.*B evaluates to the product of the
unsigned values .A and .B.
A./B results in the integral part of the quotient of
.A and .B.
(The notation .X is used to denote the unsigned value
of X.)
A+B evaluates to the sum of A and B and A-B evaluates to their difference.
In T3X, all bit operations have the same precedence. The grouping of such operations must be specified explicitly using parentheses. Otherwise, evaluation is performed from the left to the right, as usual.
The operation A&B in results the bitwise AND of A and B. Each bit is the result of computing the logical product of one bit in A with the bit at the same position in B.
A|B yields the result of performing a bitwise OR on A and B. Each bit in the result is a logical sum of a bit in A and the bit at the same position in B.
A^B performs a bitwise exclusive OR (XOR). In this case, the computation of a single bit is done by combining bits at the same positions in A and B using a logical negative equivalence operation.
See the following table for the results of applying logical operations to pairs of bits.
A B AND,* OR,+ XOR,\= - - ----- ---- ------ 0 0 0 0 0 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0
A<<B evaluates to the value of A with all bits shifted to the left by B positions. This is the same as an unsigned multiplication with the B'th power of 2:
b a<<b = a .* 2
After such an operation, the sign of the result must be considered undefined. This is not relevant, of course, if A is used as a bit field where each bit represents a binary state.
A>>B yields the result of shifting the bits in A to the right by B positions. Under certain conditions similar to those described above, this result equals the quotient
b a ./ 2
Technically speaking, one might say that the shift operators in T3X perform bitwise instead of arithmetic shift operations. This implementation has been chosen, because it is hard to manipulate bit fields using arithmetic shift operators.
Relational operators are used to compare two operands. The relation between the operands is expressed as a truth value: all these operators return true, if their meaning applies to their operands and false otherwise. The following relational operations exist (.X denotes the unsigned value of X):
A < B A is less than B A > B A is greater than B A <= B A is less than or equal to B A >= B A is greater than or equal to B A .< B .A is less than .B A .> B .A is greater than .B A .<= B .A is less than or equal to .B A .>= B .A is greater than or equal to .B A = B A is equal to B A \= B A is not equal to B
Note: the operators expressing equivalence (=, \=) have a lower precedence than operators expressing ordering (> , <, >=, <=, .<, .>, .<=, .>=). For example,
A < B = C < D
is equal to
(A<B) = (C<D)
Consequently, the equation sign may be interpreted as `logical equivalence' when used between comparisons: the above expression evaluates to true, if either
(A<B) AND (C<D)
or
\(A<B) AND \(C<D)
applies. Since the inequation operator \= has the same precedence as =, it may be used as a negative logical equivalence operator (aka an Exclusive OR):
A<0 \= B<0
becomes true, if either A or B is negative. If the truth values of the comparisons A<0 and B<0 are equal, the expression yields the result `false'.
Notice again that any value may be considered a truth value in a typeless language like T3X. Basically, everything but the value zero is interpreted as `truth', and only 0 will be taken for a `false' value.
The operators A/\B and A\/B represent the logical conjunction (AND) and disjunction (OR). Generally, the expression
A /\ B
evaluates to some `true' value, only if A AND B evaluate to a `true' value and
A \/ B
yields a `true' result if either A OR B (or both of them) evaluate to a `true' value.
More specifically, /\ and \/ are so-called short circuit operators. Since the expression A/\B can lead to a `true' result only if all its operands are `true', there is no actual need to evaluate B, if A already has evaluated to a `false' value. Therefore, the second operand of a conjunction never will be evaluated by a T program, if the first one already is false. The result will be zero in this case. If, on the other hand, the first value is `true', the result of the entire conjunctional expression will be the value of the second operand. Therefore, the result of
A /\ B
can be specified more precisely as follows:
if A=0, the result is 0.
if A\=0, the result is B.
The meaning is just a more general form of the logical AND.
Similiarly, the expression A\/B can never become `false', if A already has been found out to be true. Therefore, no T program will ever evaluate B in such a case, and the result of the disjunction
A \/ B
can be explained more precisely as
A, if A\=0.
B, if A=0.
Like in the usual algebra, conjunctions bind stronger than disjunctions:
A /\ B \/ C /\ D
equals
(A/\B) \/ (C/\D)
In chains of equal logical operations, the order of evaluation is from the left to the right, as for all binary operators. This means that chains of conjunctions will be evaluated up to the first `false' occurrence and chains of disjunctions will be processed up to the first `true' occurrence. In either case, the result of the entire chain is the value of the last processed operand.
There exists a connection between the logical operators and conditional statements: Because of the short circuit evaluation, logical operators may be used to implement some flow control within expressions. The expression
A /\ B()
has almost the same meaning as
IF (A) B();
The only difference is that the expression yields a value, while the statement only has a side effect. Likewise, the expression
A \/ B()
has a meaning similar to
IF (\A) B();
The IF-statement will be explained in a later section.
The ternary conditional operator has the least precedence. Therefore, it may be used to combine any kind of expressions without using parentheses. The following expression, for example, implements the minimum function:
a<b -> a : b
Since the operator has three operands, it consists of two parts: -> and :. The meaning of the conditional operator is as follows: Imagine an expression like
A-> B: C
If the operand A (the condition) evaluates to some `true' value, B will be evaluated and otherwise, C will be evaluated. If A is evaluated, B will not be evaluated and vice versa. The result of the expression is equal to the value of the last evaluated operand.
Like the logical operators /\ and \/, the conditional operator has a connection to conditional statements:
A-> B(): C()
is equivalent to
IE (A) B(); ELSE C();
except for the fact, of course, that the expression has a value, while the statement only has a side effect. (IE means If/Else and introduces a conditional statement with an alternative). The IE-statement will be discussed in a later section.
Constant expressions are used wherever a value must be known at compile time. Only a limited set of operators is allowed in constant expressions and the order of evaluation is always from the left to the right. There are no precedence or associativity rules.
L+1*10
evaluates to (L+1)*10 and not to L+(1*10), like it would in ordinary expressions. The resons for this decision were 1) ease of implementation and 2) the fact that most conditional expressions contain only a single operator.
Because of the lack of precedence rules, parentheses are not required in constant expressions. Any order of evaluation can be explicitely specified by ordering the operations appropriately.
The following operators are recognized inside of constant expressions:
Statements are the basic building stones of T3X programs. While expressions have just a value, statements are used to `tell the program to do something'. Therefore, T3X is a so-called imperative language. Each program is a list of `commands' which is executed in sequence. Each command is also called a statement in the terminology of imperative programming.
There are different kinds of statements: assignments, procedure calls, conditional statements, loop statements, branch statements, and compound statements. The assignment is an essential part of every imperative language. It is frequently even used to characterize the imperative approach. Compound statemements do not have an own meaning, but they are used to group statements to form the bodies of loops, conditionals, and procedures. All other statement types serve the control of the program's flow.
In T, all statements have to be terminated with a semicolon. This means that a semicolon must follow every statement in a program, except for compound statements which are delimited by the keywords DO and END. In other procedural languages (like BCPL and Pascal), statements are separated rather than terminated. In such languages, a delimiter has to be used only, if two statements are written in sequence -- there may not be any delimiter after the last statement. The separation rules in BCPL are rather complex and the saving in delimiters is usually not worth the extra expense of having to remember these rules. Therefore, the simplest form of combining statements has been chosen in T: Each simple (non-compound) statement has to be terminated.
An assignment is used to transfer the value of an expression to a specific storage location. For example, the statement
A := B;
copies the value of B to A. After the assignment, both variables will have the same value. The previous value of B gets thereby lost.
The righthand side of an assigment may be any valid expression as described in the previous section. The lefthand side, however, is restricted to a subset of expressions which is frequently referred to as lvalues (lefthand side values). In T3X, each lvalue may be one of the following:
(Of course, vector members and structure members are basically the same.) Assignments to vector members are in no way limited. Addressing elements of multiply nested vectors is perfectly legal. In the section about factors, the evaluation of variables on left and righthand sides of assignments has been explained: On righthand sides, variables evaluate to their values and on lefthand sides, they evaluate to their addresses. The assignment operator := first evaluates the expression on its left side and remembers the resulting address. Then, it evaluates the expression to its right and stores the result at the memorized address.
A generalization of the evaluation of lefthand sides is the following: All but the last reference on a lefthand side of an assignment evaluates to its value. Only the last reference evaluates to its address. Some examples:
A := B;
The symbol A references a specific storage location. Since it is the only reference in the lvalue, it evaluates to its address. In the statement
A[i] := B;
A is not the last reference and hence, it yields its value (which is its address in case A is a vector). The operation [i] references the i'th member of A. Since it is the last reference on the left side, it evaluates to the address of A[i] instead of its value. Consequently, the following assignment operator stores B at this address -- the i'th member of A. The same is valid for the access of vector elements at any nesting level. The statement
A[i1][i2][i3][i4] := B;
for example, stores B in the i4'th element of A[i1][i2][i3].
Accessing byte vectors works in the same way:
A::i := B;
stores the least significant eight bits of B in the i'th byte of A. Since :: associates to the right, the last evaluated reference is the leftmost one in chains of byte operators like
A::B::i := C;
Because B::i will be evaluated first in this example, it will yield its value. Then, the address of A::(B::i) is computed. Since no more references are following after A::, the value of C & 255 will be stored in the (B::i)'th byte of A.
Note: Although the assignment symbol := looks like an operator (and is also frequently referred to as such), it may not be used inside of expressions. The occurrence of this operator automatically turns an expression into a statement.
The application of a procedure may form a complete statement:
newline();
In this case, the return value of the called procedure will be discarded and only the side effects of the procedure will actually take effect. The side effect of the above statement, for example, is the output of a (system dependant) newline sequence to the currently selected output stream.
Each procedure -- no matter whether it returns a specific value or not -- may be used in a standalone procedure call. Like in procedure calls in expression contexts, the arguments may be any valid expressions. For details, see the the section on factors in an earlier part of this manual.
There are two forms of the conditional statement, the first one being the IF statement which is avaliable in most procedural languages. Its general syntax is
IF (expression) statement
where expression may be any expression and statement may be any statement. The IF statement itself does not have to be terminated with a semicolon, since its body which is a statement, too, already supplies the terminating semicolon. The statement which forms the body of the IF statement will be executed, only if expression evaluates to some `true' (ie. non-zero) value. The following statement turns a into its absolute value:
IF (a < 0) a := -a;
If A is less than zero, then a will be filled with -a, thereby changing its sign. Since the body a := -a is executed only, if a < 0 applies, this conditional statement always leaves a positive value in a.
Notice that the semicolon in the above example belongs to the assignment.
The second form of the conditional statement is the conditional with an alternative:
IE (expression) statement-T ELSE statement-F
Like in IF statements, any valid expression and statements may be used in the places of expression and statement-{T,F}.
The meaning of the IE statement is the same as for IF statements, as long as the expression becomes `true'. In this case, the first statement (statement-T) will be executed. If the expression evaluates to `false', however, statement-F will be executed, while an IF statement without an alternative would not have any effect in this case.
IE is an abbreviation for If/Else. In most languages, the IF statement may or may not have an alternative. In T, there is a separate type of statement for each version. The reason for this choice is the `dangling else' problem which cannot arise when these statement types are separated. If no further information is supplied, the following program written in a language which allows optional alternatives would be ambiguous:
IF (condition1) IF (condition2) statement1 ELSE statement2
The problem is to decide which IF the ELSE branch belongs to: is it the alternative of IF (condition1) or IF (condition2)? The indentation in this example suggests that it belongs to the first IF. In fact, however, most languages will bind it to the most recently opened IF -- the second one in this example. In T, such an ambiguity does not exist:
IE (condition1) IF (condition2) statement1 ELSE statement2
Since the IF statement cannot have an alternative, the ELSE branch must belong to IE (condition1).
In BCPL, there are different statements for the simple IF and the IF with an alternative, too, but IE is called Test, there.
There are two kinds of loops: `while' loops and `for' loops which represent two classes of problems: those which are computable by algorithms with a known upper limit of iterations (FOR-computable or primitive recursive functions) and problems which cannot be computed by algorithms with a fixed number of iterations (WHILE-computable or general recursive functions). Since the FOR-computable functions are a subset of the WHILE-computable ones, FOR statements may be considered a special case of WHILE statements and in fact, it is possible to express a FOR loop using WHILE, but not vice versa. (In T, it is possible, but theoretically, it's not.)
There is a third kind of loop in many other languages, the repeating loop, but it turns ouf to be a special case of the WHILE loop. Repeating loops are not very frequently needed and if they are, they can easily be simulated using WHILE and IF in T.
The WHILE loop has the following general form:
WHILE (expression) statement
where expression may be any expression and statement may be any statement. The body consisting of the statement will be executed while the test expression in parentheses evaluates to some `true' value. Hence the name of this loop. If the expression becomes `false' before the statement has been passed the first time, it will never be executed. However, a loop which tests its exit condition at the end of the statement may be constructed using WHILE, IF, and a compound statement (which will be explained later in this chapter):
WHILE (1) DO ! 1 is always true statement IF (\condition) LEAVE; END
In this case, statement will be executed at least once, because the exit condition 1 is a `true' constant. In the following IF statement, the loop will be left if condition does not apply. LEAVE is used to branch out of the loop. It will be explained later, too.
The FOR loop exists in two forms: an explicit form and a short form. The explicit form looks as follows:
FOR (var=start, limit, step) statement
Var is an atomic variable which must have been declared before. Unlike in BCPL, it will not be declared implicitly by the FOR statement. Start and limit are expressions and step is a constant expression. The FOR loop works this way: Firstly, var is initialized with the value of start. Secondly, var is compared against limit. If either the condition
var<limit /\ step>=0
or
var>limit /\ step<0
holds, the loop is passed. Otherwise the loop is left and statement will not be executed any more. If either of the above consitions holds, the loop statement is executed, step is added to var, and the loop will be repeated from the point where the exit condition is checked. Like in a WHILE loop, the statement will never be executed, if the exit condition already becomes true (which is the case, if both of the above conditions become false) at the first time it is checked.
The following example prints the numbers from 0 to 9 using the standard procedures writes(), ntoa(), and newline(). (The standard procedures will be explained in the next chapter.)
FOR (i=0, 10, 1) DO writes(ntoa(i,0)); newline(); END
And this example counts down from 9 to 0:
FOR (i=9, %1, %1) DO writes(ntoa(i,0)); newline(); END
Particularly notice the limits of the FOR loops in these examples. They always specify the first value which will not be applied to the statement. Another way to write the second example would be the following one, where the FOR loop has been replaced by a WHILE loop:
i := 9; WHILE (i>%1) DO writes(ntoa(i,0)); newline(); i := i-1; END
The meaning of this program fragment is completely equal to the one employing a FOR loop, but the syntax of the FOR statement is more compact and expresses the purpose of the statement clearer. Hence, the FOR statement has been included in the T language.
The step value is optional in FOR statements. In the short form of the statement, it is omitted. If only two operands are specified with FOR, the step width defaults to one. Therefore, the statements
FOR (j=0, 100, 1) p(i);
and
FOR (j=0, 100) p(i);
have exactly the same meaning.
A `branch' passes control to a specific point in a program. Typical destinations for branch commands are the beginnings or the ends of loops or the ends of procedures or programs. There is no branch command with a freely definable destination like Goto in BCPL.
The LEAVE command which has been described in the previous subsection already, causes the immediate termination of the innermost WHILE or FOR loop. There are no operands to LEAVE.
The following code compares the characters in two strings A and B. It stops at the first position where they differ, but in any case after 100 steps:
FOR (i=0, 100) IF (a[i] \= b[i]) LEAVE;
The loop is set up for 100 passes and the conditional LEAVE makes the loop terminate, if a mismatch has been found.
The LOOP command transfers control to the beginning of the innermost loop. Like LEAVE, it has no operands. If LOOP is used inside of a FOR loop, it branches to the increment part where the value of the index variable is modified. In WHILE loops, it branches directly to the point where the exit condition is checked.
To leave a procedure, a RETURN statement may be used. It has the general form
RETURN expression;
The statement evaluates the specified expression and prepares it for passing it back to the calling procedure. Then, it performs a branch to the end of the procedure where local storage is released and the procedure is left. The value received by the calling procedure is the value of expression:
P(x) RETURN x*x; Q() DO VAR y; y := P(5); END
In this short example program, 5 is passed as an argument to the procedure P. The procedure computes the square of its argument and returns it to Q where the result will be stord in y.
All the above branch statements take care of locally allocated storage. If local symbols are defined in the bodies of loops, for example, LOOP and LEAVE will release this storage before performing the branch. This allows the use of these commands in any loop context, even if local symbols are present.
The last form of the branch is the HALT statement. There are no arguments to HALT. It performs a branch to the end of the entire program, thereby terminating it. If necessary, the command cleans up the runtime environment of the program so that it can terminate gracefully.
A compound statement (sometimes also called a block statement) is a group of statements which is treated like a single statement under some aspects. For example, a compound statement may occur at any place where an ordinary statement is expected, like in
IF (expression) statement
In such situations, a compound statement can be used to `extend' the scope of the conditional so that it is applied to a group of statements instead of a single statement:
IF (a < '0' \/ a > '9') DO writes("Not a valid digit. Code="); writes(ntoa(a, 0)); newline(); END
In this example, both writes() calls and the newline() call will be executed only, if the IF-condition applies. The keywords DO and END are used to delimit statement blocks. There is no terminating semicolon after a compound statement. The line
DO p(); q(); END ;
would be recognized as a compound statement containing the procedure calls P() and Q() and an empty statement consisting only of a single semicolon.
In T, compound statements are ordinary statements and they may occur at any place where a statement is expected. Even statements like
DO DO DO END END END
are perfectly valid. The use of compound statements in sequences becomes clear in the next sections where the allocation of local storage in compound statements is explained.
Besides the grouping of commands, compound statements provide a mechanism for the definition of local symbols and the allocation of dynamic memory. Declaration statements already have been explained in a previous section. All data objects which can be created in T also may be declared locally inside of compound statements by placing their declarations at the beginning of the statement block. Any number of declarations will be accepted after the keyword DO which introduces the block. The declaration statements themselves do not change in this case. Only the positioning inside of a statement block makes the declared symbols local to that block. The statement
DO VAR i; FOR (i=0, 10) p(i); END
for example, applies the procedure p to the sequence 0...9. The index variable is declared inside of a compound statement which also contains the FOR loop which generates the sequence. The variable i does not exist before the compound statement is entered. It will be created automatically at the point of its declaration and it will cease to exist at the end of the block it has been declared in. Therefore, variables which are local to compound statements are sometimes also called automatic variables, but the more common term is local variables.
Besides variables and vectors, structures and constants may be declared locally, too. Unlike BCPL, T3X does not support the nesting of procedure definitions, however. In case of atomic variables and vectors, the storage required by the variables is allocated when the symbol becomes valid and released when the variable is destroyed again. In most environments, automatic storage will be allocated on the runtime stack. The main purpose of local symbols is the definition of local variables in procedures.
To illustrate another application of local storage allocation, imagine the following situation:
P() DO VAR big_V[LARGE_1], big_W[LARGE_2]; task1(big_V); task2(big_W); END
where two tasks requiring large amounts of storage shall be run sequentially in the same procedure, but not enough memory for both arrays is available. On solution -- of course -- would be the creation of two procedures where each one creates local storage only for one task. Another one would be to share the vector, but both solutions only work at the cost of readablility and maintainability. T provides another solution, since the compiler guarantees that local storage is allocated exactly at the point of its declaration and released immediately at the point of the destruction of the associated symbol:
P() DO DO VAR big_V[LARGE_1]; task1(big_V); END ! big_V gets released here DO VAR big_W[LARGE_2]; task2(big_W); END ! big_W gets released here END
Since compound statements may be nested, naming conflicts may occur in many languages, like the following example (in BCPL) illustrates:
${ LET I I := 123 ${ LET I I := 456 $} WRITEF("%N*N", I); $}
The variable i which is defined in the outer compound statement is redefined in the inner block. Inside of the inner block, the variable i is assigned the value 456. Clearly, the assignment i := 123 in the outer block references the variable defined in the outer block, but which one is referenced in the inner scope? BCPL -- like most other procedural languages -- resolves this ambiguity by always giving preference to the innermost definition. Therefore, the example program fragment would print 123. When this method is used, the symbol i defined in the outer scope becomes inaccessible in the embedded scope. This effect is called shadowing: The inner definition `shadows' the outer one which thereby becomes temporarily invisible to the compiler.
T uses more strict scoping rules than most other languages: Symbols generally may not be redefined in T programs. This also applies to global symbols (symbols which have been declared at the top level -- outside of procedure definitions or statement blocks). This way, shadowing can never happen. The flexibility of local symbols remains, though, since names can be reused as soon as a local object is getting destroyed:
F(x,y) DO VAR i, j; ! ... END G(x,y) DO VAR i, j; ! ... END
As shown in this example, symbol names may be reused in procedure definitions (for formal argument names) as well as in subsequent compound statements. Since the variables i and j will be destroyed at the end of the compound statement forming the body of F, they can be reused in G. The same is valid for the argument names x and y.
The following example shows some local and global symbols and their scopes.
+++ VAR GX, GY; | | P(x, y) DO VAR x1, y1; +++ | | | STRUCT PT=PX,PY; +++ | | | DO VAR i, j; +++ | | DO VAR x2, y2; +++ | | | END --- | | | | | | DO VAR x2, y2; +++ | | | END --- | | | END --- | | | | DO CONST t=%1, f=~t; +++ | | DO VAR x2, y2; +++ | | | END --- | | | END --- | | END ---
Like all other symbols, the global variables GX and GY are valid from the point of their declaration, but unlike locally declared names, they remain existant up to the end of the program. Their scope is the entire program (beginning at their declaration). The scopes of all symbols in the example are illustrated using vertical bars. Plus signs indicate the point where a symbol name becomes valid and its storage is allocated, and minus signs mark the point of its destruction.
Especially notice that the names x2 and y2 which are used in three different scopes denote three different variables. A value stored in x1 within the first scope, for example, cannot be retrieved in the second or the third scope from x1, because the two names reference different locations. The variable which is created at the beginning of the first scope containing x1 is deleted at the end of this scope and the value stored in that variable gets lost. Assignments to local variables remain valid only between the according +++ and --- indicators.
There are two forms of the empty statement (aka null statement) in T. The first form is the single semicolon
;
Unlike in BCPL, compound statements may be empty, too:
DO END
Both null statements have absolutely no effect. Their only purpose is to fill a gap where a statement is required, but nothing is to do. They are useful to negate the meanings of complex conditions, for example. Instead of negating the condition at the cost of making it harder to understand, one might turn
IF (complex-condition) statement
into
IE (complex-condition) ; ELSE statement
Each procedure may be considered a separate small program. It communicates with other procedures using parameters and return values or through global variables. Each procedure has access to all global objects which have been declared before it. A typical T program is a set of procedures which exchange information through arguments and global storage. Generally, it is considered `good style' in procedural languages to keep procedures self-contained and use global storage as little as possible, but when an object has to be shared between a big number of different procedures, the use of top-level definitions is very common.
The definition of a procedure has only one single form in T3X, while there are different ways to declare it, depending on the type of the procedure and its location. Since there is no support for nested routines, all procedure declarations and definitions must occur at the top level -- the space between the other global declarations.
The only form of the definition is
P(a1, ... aN) statement
where P is the name of the procedure, a1...aN are the names of the formal arguments, and statement is the body of the procedure -- the part which describes its meaning.
The procedure name may be any valid symbol and it is declared in the global context. Therefore, procedure names never may be reused. (An advantage of T's strict scoping rules is that procedures cannot get shadowed.) The arguments a1...aN are local to the procedure. Their names will cease to exist after the statement has been accepted. Hence, they may be reused after the procedure definition, but not inside of it. The parentheses around the argument list must always be specified, even if the list is empty:
Q() statement
The number of arguments specified in the definition of a procedure determines the type of the procedure. The type of a procedure is notated as a single number which represents the routine's arity -- the number of its arguments. In T, the argument counts of all procedure calls will be checked. The compiler will not allow calls with a wrong number of parameters. This has to be done because of T's calling conventions: Parameters are passed in reverse order and therefore, each procedure relies on a correct number of arguments. BCPL uses a totally different approach in which it is easy to compensate for missing or superflous procedure parameters. The only real advantage of this approach, however, is the possibility of defining real variadic procedures (procedures with a variable number of arguments). The transport of variable argument lists also can be realized in T, but using a different mechanism which will be explained later.
When a procedure is called, it may receive data through its arguments. This works in the following way. Given a procedure
P(x, a, b, c) RETURN a*x*x + b*x + c;
and a procedure call
y := P(n, i, j, k);
the caller places the values of the actual arguments n, i, j, and k in a temporary storage location (usually on the runtime stack), saves the address of the following operation (the assignment in this case) and then transfers control to the procedure P. In P, the formal arguments x, a, b, and c reference storage locations which exactly match the temporary locations of the values passed to the routine, so that x=n, a=i, b=j, and c=k.
The procedure P computes a*x*x+b*x+c and returns it to the caller. Each procedure returns automatically, if its body has been processed completely or if an explicit RETURN statement is executed. In the above example, both happens at the same time. It is not unusual to specify a RETURN statement at the end of a procedure, since only RETURN may pass an explicit value back to the caller. Procedures which do not return through RETURN have an implicit zero return value. In the example, however, the value of P is explicitly specified. After passing control back to the caller, the assignment takes place, and the result of the procedure call is stored in y. Between the procedure return and the assignment, the temporary storage where the actual arguments were held is released again.
The most frequently used form of the procedure has a body consisting of a compound statement:
fib(n) DO VAR f, i, j, k; f := 1; j := 1; FOR (i=1, n) DO k := f; f := j; j := j+k; END RETURN f; END
Note that the variables declared at the beginning of the procedure
VAR f, i, j, k;
belong to the compound statement rather than to the procedure. Like in conditional statements and loops, the statement block is used to extend the scope of the procedure: not only a single statement, but a group of statements forms the body of the routine.
It is perfectly safe for a procedure to call itself. Since the declaration of a procedure takes place while parsing its head (consisting of its name and its argument list), the declaration is already valid when the compiler processes the body. Therefore, the procedure may recurse into itself:
fac(n) RETURN n=0-> 1: fac(n-1)*n;
This small example computes n!. For the trivial case n=0, it simply returns 1. To compute n! for n>0, it first computes (n-1)! and then multiplies it with n. To compute the factorial of n-1, it applies itself. Since the argument of the recursive call is decremented by one at each level of recursion, it will finally reach 0 and the procedure will start returning.
Recursion is safe in T, because local variables (which include formal arguments) are created freshly each time the according declaration is passed. Therefore, the symbol n in the above example denotes different variables at each level of recursion. To see how recursion works, the following modified example of the factorial function is recommended:
fac(n) DO VAR f; f := n=0-> 1: fac(n-1)*n; writes(ntoa(f, 10)); newline(); RETURN f; END
Of course, the usual restrictions concerning the use of global memory and other shared resources in recursive procedures apply in T, as well.
Recursive procedures which depend on each other are called `mutually recursive'. Such a configuration introduces the following problem: Given the procedures
A() DO !... B(); END B() DO !... A(); END
which depend on each others, it does not matter which one is declared first -- one will always be inaccessible from within the other. In the above configuration, B is undefined in A because it will be declared after A. When swapping the definitions, A will become undefined in B.
In BCPL, simultaneous definitions are used to solve this problem. T, however, uses a more explicit scheme, where a procedure may be declared before its definition. A declaration makes a symbol known to the compiler, but does not associate any meaning with the declared symbol. In case of procedures, the meaning may be `subsequently delivered' in a later definition. To declare a procedure, the DECL statement is used:
DECL name(type);
Analogous to VAR statements, any number of comma-separated declarations may be included in a single DECL statement. Name is the name of the procedure to declare and type is a constant expression specifying the number of formal arguments of that procedure. This value is required to type check forward calls to the procedure. The number of formal arguments in a subsequent definition must exactly match the type specified in the declaration. Otherwise, a redefinition error will be signalled.
DECL reserves the given names for later procedure definitions. Therefore, each of these names may be reused only in one subsequent procedure definition. Declaring a procedure without defining it later is an error, since this may leave forward references to the declared procedure unresolved.
To correct the above program containing the mutually recursive procedures A and B, the declaration
DECL B(0);
has to be inserted before the definition of A. Like procedure definitions, DECL statements are allowed only at the top level, but not inside of local scopes.
All procedures have fixed numbers of arguments in T. It is possible, however, to pass a variable number of arguments to a procedure using a vector. The following simple example computes the average of n values stored in the vector v:
average(n, v) DO VAR i, t; t := 0; FOR (i=0, n) t := t+v[i]; RETURN t/n; END
Since vectors are first-class objects in T, it is possible to inline vectors in procedure applications, thereby forming an elegant way of passing constant vectors with variable sizes to a procedure:
average(5, [ 2, 3, 5, 7, 11 ]); average(3, [ 123, 456, 789 ]);
Since release 4 of the T3X language, tables may contain embedded expressions. Therefore, dynamically computed values may be included in tables as well, making the use of tables for variadic procedure calls even more flexible. Given a T implementation of a subset of of the variadic BCPL library procedure WRITEF() and the fib() procedure which has been defined earlier in this chapter, it would be possible to write the following program which prints the line
fib(n) = m
for each n=1...10 and m=fib(n):
DO VAR i; FOR (i=1, 11) writef("fib(%N) = %N\n", [ (i), (fib(i)) ]); END
WRITEF() replaces each %N with the readable representation of the
value of one of the arguments following the string. Each time, a %N
is processed, the procedure advances to the next argument. The BCPL
version uses a variable number of arguments while the T
version uses a vector to transport the arguments.
BTW:
WRITEF() uses the number of %N's to determine the number of arguments passed
to it. The programmer is responsible for supplying a sufficient
number of them.
An INTERFACE statement is used to access routines which are not part of the program which contains the statement. The original purpose of INTERFACE statements is to provide an extensible way to access routines provided by the runtime environment -- either the Tcode interpreter, if a bytecode program is run or the runtime library, if the program has been compiled to native code. The general syntax of this statement is
INTERFACE name(type) = slot;
Any number of interfaces may be specfied in a comma-separated list and the part containing ' = slot' is optional. The meaning of the first part of the interface description (containing 'name(type)') is exactly equal to a DECL statement: it declares a procedure called name with type formal arguments. The only difference to a procedure declaration is the type of the declared object. While DECL declares a user-level procedure, INTERFACE declares an interface procedure.
Interface procedures are called in a way which slightly differs from ordinary procedure calls. (This difference is totally transparent to the programmer, though.) While a user-level procedure simply has an address to jump to, an interface procedure is part of another program which provides the necessary runtime support for the T program in the form of procedures. Since these procedures are not defined in the same source program, an `interface' for accessing them has to be created. The internal form of this interface is a jump table which simply contains the addresses of the available runtime support procedures.
When a program starts up, there are some predefined interfaces which will be created automatically by the RT interpreter or library. These interfaces allow access to the procedures which were `built into' the original T3 to provide T3 compatibility. Additional procedures require an explit INTERFACE declaration. For example, T3X provides a standard routine for copying memory regions. To use it, it must be declared first, using the statement
INTERFACE memcopy(3) = 15;
The routine has three arguments and occupies slot number 15 in the above described jump table. The slot number is assigned by the runtime library -- it cannot be freely chosen by the programmer. Memcopy() is always located at slot 15. The INTERFACE statement declares an interface to a predefined slot. It assigns slot numbers to names and not to procedures!
When no slot number is specified, the compiler automatically advances to the next free slot number. Therefore, it is sufficient to specify an explicit slot number with the first declaration, if a set of interfaces with subsequent slot numbers has to be declared. For example,
INTERFACE readpacked(3) = 11, writepacked(3), reposition(4);
will create the interface procedure readpacked() to reference slot 11, writepacked() to use slot 12, and reposition() to access slot 13.
Each slot number may be used only by a single procedure. The attempt to use the same slot number twice will lead to an error.
The slot numbers of various T3X extensions can be found in a later chapter where the functions of these procedures will be explained.
T3X provides an experimental extension for defining public and external procedures. This mechanism is a more flexible but also more complex alternative to the above described interface procedures. Instead of accessing runtime support procedures, it is primarily aimed at separate compilation. When there are two T3X modules where module A contains procedure a and module B contains procedure b, then procedure b can call a, if
Given that a has two arguments, the definition in A would look like this:
PUBLIC a(x, y) ...
where the keyword PUBLIC is followed by an ordinary procedure definition. Note that only definitions but not declarations can be flagged public. The PUBLIC modifier generates a special Tcode instruction which is used to export the name of the procedure. When native code is generated, the exported name will be turned into a global symbol of the TEXT segment, thereby allowing it to be accessed by other modules. When interpreting a Tcode module, exported symbols will be ignored.
A module which wishes to call an external procedure must declare the foreign procedure using an `external declaration' of the form
EXTERN DECL a(2);
where the keyword EXTERN is followed by a normal DECL statement. However, external declarations do not have to be resolved in the same program, but they generate special Tcode instructions which export their names. When generating native code, this instruction will create an external reference which has to be resolved by a loader. The attempt to interpret a Tcode module that contains external references will lead to an error.
The number of formal arguments in the public definition must exactly match the number specified in the external declaration. Otherwise, strange things will happen. Currently, type checking calls to external procedures relies on the correct type specification in the respective external declaration, so be careful.
Each program has an initial entry point where the execution begins at run time. In T, the entry point is a compound statement at the top level which does not belong to any procedure definition. This compound statement is mandatory and it always must be the last definition in the entire program. Consequently, the minimum valid T program is
DO END
The main procedure, like any other compound statement, may declare its own local symbols. Since it has no name, it cannot recurse, however. Also, RETURN may not be used, because there is no procedure to return to.
The last statement in the main procedure is an implicit
HALT;
statement so that the entire program will terminate automatically when the end of the procedure has been reached.
This is a summary of the T3X standard runtime support procedures which are contained in the minumum version of the Tcode interpreter TXX and the runtime support library LIBTX. There exist extended versions of both, the interpreter and the library, but the extensions contained in these version are not considerered part of the T3X definition. Hence, they will be not be discussed in this manual.
The procedures explained in this section have been introduced in version 3r0 of the language T whose definition did not contain a statement for interfacing external procdures. Therefore, INTERFACE declarations of these procedures will be implicitly contained in every T3X program so that these procedures are always available.
READS(Buffer, Count) -- String, Number => Number
This procedure reads up to count characters from the currently selected input stream and unpacks the string read into buffer so that each character will occupy one machine word. It will also append a trailing NUL word to properly terminate the string. Because READS() uses an internal buffer, count may not be larger than 1024. The procedure returns the number of characters actually read. When reading from a terminal, entering a newline character will terminate the READS() call on most systems. In this case, the length of the line read will be returned. A return code of zero usually indicates the end of the input file or the input of an EOF character. A result less than zero indicates general failure.
READS() may read binary data containing NUL characters. Therefore, measuring the length of the string in buffer to determine the number of characters received is not guaranteed to work.
At startup time, the input port will be connected to the terminal of the user who has started the program, so that READS() will read the user's keyboard.
WRITES(String) -- String => Number
WRITES() packs string into an internal buffer and then writes the packed string to the currently selected output stream. All but the least significant 8 bits of each word in string will get truncated. WRITES() determines the number of characters to write by counting the words in string up to a delimiting NUL word. Since it checks for a NUL word, WRITES() can be used to write binary data: Setting bit #8 (by an | operation med 256) makes a NUL character a non-delimiter. Since the 8th bit will be truncated, however, a NUL-character will be written.
WRITES() returns the number of characters actually written. Any value below the length of string indicates general failure. Due to the use of an internal buffer, WRITES() may not write strings with a length of more than 1024 characters.
At startup time, the output port will be connected to the terminal of the user who has started the program, so that WRITES() will write to the user's screen.
NEWLINE() => 0
This procedure writes a system-dependant newline sequence to the currently selected output stream ("\n" on Unix and Plan9 and "\r\n" on DOS, for example).
SELECT(Port, Fd) -- Number, Descriptor => Descriptor
This routine selects a new input or output stream. If port is zero, it selects an input stream and if port is non-zero, it selects an output stream. Fd is the file descriptor of the new I/O stream. A valid file descriptor may be obtained by an OPEN() call. Alternatively, one of the standard desciptors 0, 1, 2 may be used. 0 denotes `standard input' (the user's keyboard), 1 denotes `standard output' (the user's screen), and 2 describes the `standard error' port which is usually associated with the terminal screen, too. SELECT() replaces the currently selected input or output port with fd without checking whether it is a valid descriptor. It returns the descriptor which was in effect before the call so that it may be restored later.
The following example illustrates the use of OPEN(), CLOSE() and SELECT():
DO VAR old, new; ! This goes to the terminal writes("Creating file HELLO"); newline(); ! Create file HELLO new := open("HELLO", 1); if (new = %1) halt; ! OPEN() failed ! Select the file for output old := select(1, new); ! This goes to the file HELLO writes("Hello, World!"); newline(); ! Restore the old output channel select(1, old); ! This goes to the terminal again writes("Done"); newline(); END
OPEN(Path, Mode) -- String, Number => Descriptor
OPEN() opens the file whose path name is specified in the unpacked string path in the given mode. Mode is a numeric value specifiying how to open the file and what operations will be allowed on it. The following table summarizes the possible values for mode:
| Allow Allow If file is Append Mode | write read nonexistant to file -----+----------------------------------------- 0 | No Yes Fail No 1 | Yes No Create No 2 | Yes Yes Fail No 3 | Yes No Fail Yes
If the OPEN() call succeeds, the procedure will return a file descriptor for the opened or newly created file. This file descriptor can be passed to SELECT() or one of the new T3X I/O procedures. When the descriptor is no longer used, it should be destroyed using CLOSE().
If OPEN() fails for some reason, it will return the value -1 which is not a valid descriptor.
CLOSE(Fd) -- Descriptor => Number
This procedure deletes a file descriptor created by OPEN(). It returns zero upon success and a negative value, if an invalid descriptor has been passed to it. A closed descriptor becomes invalid immediately and may no longer be used. All pending I/O operations will be performed on the file associated with fd before the descriptor is deletd. CLOSE() may not be used to close one of the standard descriptors 0, 1, and 2.
ERASE(Path) -- String => Number
This routine deletes the file whose path name is specified in the unpacked string path. It returns zero on success and a negative value, if the file does not exist or the user has no permission to delete it. On systems providing multiple links per file (like Unix and its descendants), ERASE() only deletes the specified link.
ATON(Numstr) -- String => Number
ATON() parses the unpacked string numstr and converts it into a numeric value. First, it skips all leading white space characters (\f, \n, \r, \s, \t). Then, it checks for a leading minus sign. Either - or % is recognized. Only a single sign may occur. Finally, the procedure collects all decimal digits following the optional spaces and sign and computes the value represented by the string. It returns the computed value. If a non-numeric string is passed to ATON(), it returns zero. Note: the return value zero may stand for both, non-numeric input or any valid string containing a zero value (like "0").
NTOA(Value, Width) -- Number, Number => String
This procedure creates a readable representation of the numeric object value in an internal buffer and returns a pointer to the generated string. The representation of the number will be unpacked. If width is larger than the number of characters required by the string representing value, the string will be filled to a length of width using blank characters (\s). If value is negative, the numeric string will be prefixed with an ordinary minus sign (-).
Because of the use of an internal buffer, the maximum value for width is limited to 255. Also, the procedure is not reentrant. Therefore, the statememt
concat(ntoa(a,0), ntoa(b,0));
will pass two times ntoa(b,0) to the -- fictious -- procedure concat, and not ntoa(a,0) and ntoa(b,0). If multiple values computed by NTOA() shall be used in one expression, all but the last result must be saved in a user-supplied buffer:
INTERFACE memcopy(3) = 15; DO VAR buffer[20]; memcopy(buffer, ntoa(a,0), 20); concat(buffer, ntoa(b,0)); END
PACK(S, P) -- String, Pstring => Number
PACK() packs the unpacked string S into P by copying each machine word from S into a byte in P. Of course, all but the least significant 8 bits of each machine word get lost in P. S and P may denote the same location. In this case, S will be packed `in situ' and its original content will be overwritten. PACK() returns the number of machine words required by the packed string P (including the terminating NUL character and padding bytes). P must provide enough space to hold the packed string. Any value which is greater than half the length of S is safe. The exact length of the packed string P is
length of S + BPW - 1 --------------------- BPW
where BPW is the number of bytes per machine word on the target machine. For Tcode programs, BPW=2 applies on all platforms.
UNPACK(P, S) -- Pstring, String => Number
This procedure unpacks the packed string P into S by copying each byte from P into a separate machine word in S. Each character will be extended by filling all but its least significant eight bits with zeroes. Because unpacking a string expands its size, P and S may not reference the same location. If they do, the result of UNPACK() is undefined. The procedure returns the number of machine words required to store the unpacked string including the terminating NUL word. S must provide enough space to hold the unpacked string. Generally, four times the length of P plus 1 is safe on 16- and 32-bit systems. To compute the exact amount of required storage, reverse the formula in the description of PACK().
The procedures described in this section are extensions which have been added when T3X was created. They do not exist in earlier versions. Since interfaces to these procedures will not be implicitly defined, the respective declarations will be included in the descriptions in this subsection.
INTERFACE WRITEPACKED(3) = 12;
WRITEPACKED(Fd, Buffer, Len) -- Fdesc, Vec, Num => Num
This routine writes len bytes from buffer to the file (or device) associated with the file descriptor fd. A valid descriptor may be obtained from OPEN(). The standard descriptors may be used, too (see SELECT() in the previous section). Unlike in the T3 I/O procedures, the file descriptor is specified explicitly here. SELECT() has no effect on the extended I/O procedures. WRITEPACKED() returns the number of bytes actually written. Any return value which is less than len indicates failure.
INTERFACE READPACKED(3) = 11;
READPACKED(Fd, Buffer, Len) -- Fdesc, Vec, Num => Num
This procedure reads up to len bytes from the file (or device) associated with the file descriptor fd into buffer. A file descriptor may be obtained from OPEN(). The standard descriptors explained in the SELECT() entry may be used, too. Like in all extended I/O procedures, the file descriptor is specified explicitly and therefore, SELECT() has no effect on this procedure. READPACKED() returns the number of bytes actually read. When reading from a terminal device, the return value may be smaller than len, since data is transferred line by line and READPACKED() may return when a newline character is encountered. A zero return value indicates that the end of the input file has been reached or an EOF character has been typed. A negative return value indicates general failure.
INTERFACE REPOSITION(4) = 13;
REPOSITION(Fd, PosH, PosL, How) -- Fdesc, Num, Num, Num => Num
REPOSITION() moves the file pointer of the file descriptor fd to a new position. The file pointer always points to the offset in the file where the next read/write operation will take place. PosH and posL contain the new position. PosH specifies its high word and posL its low word. Consequently, the new position will be
PosH .* 65536 + PosL
The value of how determines the method of moving the pointer:
How | Method -----+-------------------------------- 0 | From the beginning of the file 1 | Relative to current position 2 | From the end of the file
When moving from the beginning of the file, the position must be positive and when moving from the end, it must be negative. Note that negative values must be 32-bit values. To move back 1024 bytes, for example, use
REPOSITION(fd, %1, %1024, 1);
The procedure returns zero on success and a negative value in case of failure.
INTERFACE RENAME(2) = 14;
RENAME(Old, New) -- String, String => Number
Rename changes the name of the directory entry stored in old to new. Both strings are unpacked. When a path is specified in old, only the last part of the path -- the file name -- will be changed to new. New may not contain any path separators. RENAME() returns zero upon success and a negative value in case of failure.
INTERFACE MEMCOPY(3) = 15;
MEMCOPY(D, S, L) -- Vector, Vector, Number => 0
MEMCOPY() copies L bytes from the source vector S to the destination vector D. Since it cannot fail under normal circumstances, it does not return a meaningful value. Of course, MEMCOPY() could be easily implemented as a T program itself, but the runtime support routine is written in assembly language for higher efficiency. The regions A and B may overlap.
INTERFACE MEMCOMP(3) = 16;
MEMCOMP(R1, R2, L) -- Vector, Vector, Number => Number
This routine compares up to L bytes at corresponding position in R1 and R2. When L positions have been compared, the both memory regions are considered equal and MEMCOMP() returns zero. When two different bytes are found during the comparison, the procedure stops comparing and returns the difference between the two values
R1::p - R2::p
where p is the current position. Therefore, MEMCOMP() can be used for lexically sorting strings, for example.
Like MEMCOPY(), MEMCOMP() has been implemented in assembly language for efficiency reasons.
This manual is part of the T3X compiler package which is distributed under the following terms.
T3X -- A Compiler for the Procedural Language T, version 3X
Copyright (C) 1996-2000 Nils M Holm. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.