An array is a container object that can contain many values of one data type. Arrays are very useful objects and are indispensable for certain types of programming. The purpose of this chapter is to describe how arrays are defined and used in the S-Lang language.
The S-Lang language supports multi-dimensional arrays of all data
types. Since the Array_Type
is a data type, one can even
have arrays of arrays. To create a multi-dimensional array of
SomeType use the syntax
SomeType [dim0, dim1, ..., dimN]
Here dim0, dim1, ... dimN specify the size of
the individual dimensions of the array. The current implementation
permits arrays consist of up to 7
dimensions. When a
numeric array is created, all its elements are initialized to zero.
The initialization of other array types depend upon the data type,
e.g., String_Type
and Struct_Type
arrays are
initialized to NULL
.
As a concrete example, consider
a = Integer_Type [10];
which creates a one-dimensional array of 10
integers and
assigns it to a
.
Similarly,
b = Double_Type [10, 3];
creates a 30
element array of double precision numbers
arranged in 10
rows and 3
columns, and assigns it to
b
.
There is a more convenient syntax for creating and initializing a
1-d arrays. For example, to create an array of ten
integers whose elements run from 1
through 10
, one
may simply use:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
Similarly,
b = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0];
specifies an array of ten doubles.
An even more compact way of specifying a numeric array is to use a range-array. For example,
a = [0:9];
specifies an array of 10 integers whose elements range from 0
through 9
. The most general form of a range array is
[first-value : last-value : increment]
where the increment is optional and defaults to 1
. This
creates an array whose first element is first-value and whose
successive values differ by increment. last-value sets
an upper limit upon the last value of the array as described below.
If the range array [a:b:c]
is integer valued, then the
interval specified by a
and b
is closed. That is, the
kth element of the array x_k is given by x_k=a+ck and
must satisfy a<=x_k<=b. Hence, the number of elements in an
integer range array is given by the expression 1 + (b-a)/c.
The situation is somewhat more complicated for floating point range
arrays. The interval specified by a floating point range array
[a:b:c]
is semi-open such that b
is not contained in
the interval. In particular, the kth element of [a:b:c]
is
given by x_k=a+kc such that a<=x_k<b when
c>=0, and b<x_k<=a otherwise. The number of elements
in the array is one greater than the largest k that
satisfies the open interval constraint.
Here are a few examples that illustrate the above comments:
[1:5:1] ==> [1,2,3,4,5]
[1.0:5.0:1.0] ==> [1.0, 2.0, 3.0, 4.0]
[5:1:-1] ==> [5,4,3,2,1]
[5.0:1.0:-1.0] ==> [5.0, 4.0, 3.0, 2.0];
[1:1] ==> [1]
[1.0:1.0] ==> []
[1:-3] ==> []
Another way to create an array is apply the dereference operator
@
to the DataType_Type
literal Array_Type
. The
actual syntax for this operation resembles a function call
variable a = @Array_Type (data-type, integer-array);
where data-type is of type DataType_Type
and
integer-array is a 1-d array of integers that specify the size
of each dimension. For example,
variable a = @Array_Type (Double_Type, [10, 20]);
will create a 10
by 20
array of doubles and assign it
to a
. This method of creating arrays derives its power from
the fact that it is more flexible than the methods discussed in this
section. We shall encounter it again in section ??? in the context
of the array_info
function.
It is sometimes possible to change the `shape' of an array using
the reshape
function. For example, a 1-d 10 element array
may be reshaped into a 2-d array consisting of 5 rows and 2
columns. The only restriction on the operation is that the arrays
must be commensurate. The reshape
function follows the
syntax
reshape (array-name, integer-array);
where array-name specifies the array to be reshaped to have
the dimensions given by integer-array
, a 1-dimensional array of
integers. It is important to note that this does not create a
new array, it simply reshapes the existing array. Thus,
variable a = Double_Type [100];
reshape (a, [10, 10]);
turns a
into a 10
by 10
array.
An individual element of an array may be referred to by its
index. For example, a[0]
specifies the zeroth element
of the one dimensional array a
, and b[3,2]
specifies
the element in the third row and second column of the two
dimensional array b
. As in C array indices are numbered from
0
. Thus if a
is a one-dimensional array of ten
integers, the last element of the array is given by a[9]
.
Using a[10]
would result in a range error.
A negative index may be used to index from the end of the array,
with a[-1]
referring to the last element of a
,
a[-2]
referring to the next to the last element, and so on.
One may use the indexed value like any other variable. For
example, to set the third element of an integer array to 6
, use
a[2] = 6;
Similarly, that element may be used in an expression, such as
y = a[2] + 7;
Unlike other S-Lang variables which inherit a type upon assignment,
array elements already have a type. For example, an attempt to
assign a string value to an element of an integer array will result
in a type-mismatch error.
One may use any integer expression to index an array. A simple example that computes the sum of the elements of 10 element 1-d array is
variable i, sum;
sum = 0;
for (i = 0; i < 10; i++) sum += a[i];
Unlike many other languages, S-Lang permits arrays to be indexed by
other integer arrays. Suppose that a
is a 1-d array of 10
doubles. Now consider:
i = [6:8];
b = a[i];
Here, i
is a 1-dimensional range array of three integers with
i[0]
equal to 6
, i[1]
equal to 7
,
and i[2]
equal to 8
. The statement b = a[i];
will create a 1-d array of three doubles and assign it to b
.
The zeroth element of b
, b[0]
will be set to the sixth
element of a
, or a[6]
, and so on. In fact, these two simple
statements are equivalent to
b = Double_Type [3];
b[0] = a[6];
b[1] = a[7];
b[2] = a[8];
except that using an array of indices is not only much more
convenient, but executes much faster.
More generally, one may use an index array to specify which elements are to participate in a calculation. For example, consider
a = Double_Type [1000];
i = [0:499];
j = [500:999];
a[i] = -1.0;
a[j] = 1.0;
This creates an array of 1000
doubles and sets the first
500
elements to -1.0
and the last 500
to
1.0
. Actually, one may do away with the i
and j
variables altogether and use
a = Double_Type [1000];
a [[0:499]] = -1.0;
a [[500:999]] = 1.0;
It is important to understand the syntax used and, in particular,
to note that a[[0:499]]
is not the same as
a[0:499]
. In fact, the latter will generate a syntax error.
Often, it is convenient to use a rubber range to specify
indices. For example, a[[500:]]
specifies all elements of
a
whose index is greater than or equal to 500
. Similarly,
a[[:499]]
specifies the first 500 elements of a
.
Finally, a[[:]]
specifies all the elements of a
;
however, using a[*]
is more convenient.
Now consider a multi-dimensional array. For simplicity, suppose
that a
is a 100
by 100
array of doubles. Then
the expression a[0, *]
specifies all elements in the zeroth
row. Similarly, a[*, 7]
specifies all elements in the
seventh column. Finally, a[[3:5][6:12]]
specifies the
3
by 7
region consisting of rows 3
, 4
,
and 5
, and columns 6
through 12
of a
.
We conclude this section with a few examples.
Here is a function that computes the trace (sum of the diagonal
elements) of a square 2 dimensional n
by n
array:
define array_trace (a, n)
{
variable sum = 0, i;
for (i = 0; i < n; i++) sum += a[i, i];
return sum;
}
This fragment creates a 10
by 10
integer array, sets
its diagonal elements to 5
, and then computes the trace of
the array:
a = Integer_Type [10, 10];
for (j = 0; j < 10; j++) a[j, j] = 5;
the_trace = array_trace(a, 10);
We can get rid of the for
loop as follows:
j = Integer_Type [10, 2];
j[*,0] = [0:9];
j[*,1] = [0:9];
a[j] = 5;
Here, the goal was to construct a 2-d array of indices that
correspond to the diagonal elements of a
, and then use that
array to index a
. To understand how
this works, consider the middle statements. They are equivalent
to the following for
loops:
variable i;
for (i = 0; i < 10; i++) j[i, 0] = i;
for (i = 0; i < 10; i++) j[i, 1] = i;
Thus, row n
of j
will have the value (n,n)
,
which is precisely what was sought.
Another example of this technique is the function:
define unit_matrix (n)
{
variable a = Integer_Type [n, n];
variable j = Integer_Type [n, 2];
j[*,0] = [0:n - 1];
j[*,1] = [0:n - 1];
a[j] = 1;
return a;
}
This function creates an n
by n
unit matrix,
that is a 2-d n
by n
array whose elements are all zero
except on the diagonal where they have a value of 1
.
When an array is created and assigned to a variable, the interpreter allocates the proper amount of space for the array, initializes it, and then assigns to the variable a reference to the array. So, a variable that represents an array has a value that is really a reference to the array. This has several consequences, some good and some bad. It is believed that the advantages of this representation outweigh the disadvantages. First, we shall look at the positive aspects.
When a variable is passed to a function, it is always the value of the variable that gets passed. Since the value of a variable representing an array is a reference, a reference to the array gets passed. One major advantage of this is rather obvious: it is a fast and efficient way to pass the array. This also has another consequence that is illustrated by the function
define init_array (a, n)
{
variable i;
for (i = 0; i < n; i++) a[i] = some_function (i);
}
where some_function
is a function that generates a scalar
value to initialize the ith element. This function can be
used in the following way:
variable X = Double_Type [100000];
init_array (X, 100000);
Since the array is passed to the function by reference, there is no
need to make a separate copy of the 100000
element array. As
pointed out above, this saves both execution time and memory. The
other salient feature to note is that any changes made to the
elements of the array within the function will be manifested in the
array outside the function. Of course, in this case, this is a
desirable side-effect.
To see the downside of this representation, consider:
variable a, b;
a = Double_Type [10];
b = a;
a[0] = 7;
What will be the value of b[0]
? Since the value of a
is really a reference to the array of ten doubles, and that
reference was assigned to b
, b
also refers to the same
array. Thus any changes made to the elements of a
, will also
be made implicitly to b
.
This begs the question: If the assignment of one variable which
represents an array, to another variable results in the assignment
of a reference to the array, then how does one make separate copies
of the array? There are several answers including using an index
array, e.g., b = a[*]
; however, the most natural method is
to use the dereference operator:
variable a, b;
a = Double_Type [10];
b = @a;
a[0] = 7;
In this example, a separate copy of a
will be created and
assigned to b
. It is very important to note that S-Lang
never implicitly dereferences an object. So, one must explicitly use
the dereference operator. This means that the elements of a
dereferenced array are not themselves dereferenced. For example,
consider dereferencing an array of arrays, e.g.,
variable a, b;
a = Array_Type [2];
a[0] = Double_Type [10];
a[1] = Double_Type [10];
b = @a;
In this example, b[0]
will be a reference to the array that
a[0]
references because a[0]
was not explicitly
dereferenced.
Many functions and operations work transparently with arrays.
For example, if a
and b
are arrays, then the sum
a + b
is an array whose elements are formed from the sum of
the corresponding elements of a
and b
. A similar
statement holds for all other binary and unary operations.
Let's consider a simple example. Suppose, that we wish to solve a
set of n
quadratic equations whose coefficients are given by
the 1-d arrays a
, b
, and c
. In general, the
solution of a quadratic equation will be two complex numbers. For
simplicity, suppose that all we really want is to know what subset of
the coefficients, a
, b
, c
, correspond to
real-valued solutions. In terms of for
loops, we can write:
variable i, d, index_array;
index_array = Integer_Type [n];
for (i = 0; i < n; i++)
{
d = b[i]^2 - 4 * a[i] * c[i];
index_array [i] = (d >= 0.0);
}
In this example, the array index_array
will contain a
non-zero value if the corresponding set of coefficients has a
real-valued solution. This code may be written much more compactly
and with more clarity as follows:
variable index_array = ((b^2 - 4 * a * c) >= 0.0);
S-Lang has a powerful built-in function called where
. This
function takes an array of integers and returns a 2-d array of
indices that correspond to where the elements of the input array
are non-zero. This simple operation is extremely useful. For
example, suppose a
is a 1-d array of n
doubles, and it
is desired to set to zero all elements of the array whose value is
less than zero. One way is to use a for
loop:
for (i = 0; i < n; i++)
if (a[i] < 0.0) a[i] = 0.0;
If n
is a large number, this statement can take some time to
execute. The optimal way to achieve the same result is to use the
where
function:
a[where (a < 0.0)] = 0;
Here, the expression (a < 0.0)
returns an array whose
dimensions are the same size as a
but whose elements are
either 1
or 0
, according to whether or not the
corresponding element of a
is less than zero. This array of
zeros and ones is then passed to where
which returns a 2-d
integer array of indices that indicate where the elements of
a
are less than zero. Finally, those elements of a
are
set to zero.
As a final example, consider once more the example involving the set of
n
quadratic equations presented above. Suppose that we wish
to get rid of the coefficients of the previous example that
generated non-real solutions. Using an explicit for
loop requires
code such as:
variable i, j, nn, tmp_a, tmp_b, tmp_c;
nn = 0;
for (i = 0; i < n; i++)
if (index_array [i]) nn++;
tmp_a = Double_Type [nn];
tmp_b = Double_Type [nn];
tmp_c = Double_Type [nn];
j = 0;
for (i = 0; i < n; i++)
{
if (index_array [i])
{
tmp_a [j] = a[i];
tmp_b [j] = b[i];
tmp_c [j] = c[i];
j++;
}
}
a = tmp_a;
b = tmp_b;
c = tmp_c;
Not only is this a lot of code, it is also clumsy and error-prone.
Using the where
function, this task is trivial:
variable i;
i = where (index_array != 0);
a = a[i];
b = b[i];
c = c[i];
All the examples up to now assumed that the dimensions of the array
were known. Although the intrinsic function length
may be
used to get the total number of elements of an array, it cannot be
used to get the individual dimensions of a multi-dimensional array.
However, the function array_info
may be used to
get information about an array, such as its data type and size.
The function returns three values: the data type, the number of
dimensions, and an integer array containing the size
of each dimension. It may be used to determine the number of rows
of an array as follows:
define num_rows (a)
{
variable dims, type, num_dims;
(dims, num_dims, type) = array_info (a);
return dims[0];
}
The number of columns may be obtained in a similar manner:
define num_cols (a)
{
variable dims, type, num_dims;
(dims, num_dims, type) = array_info (a);
if (num_dims > 1) return dims[1];
return 1;
}
Another use of array_info
is to create an array that has the
same number of dimensions as another array:
define make_int_array (a)
{
variable dims, num_dims, type;
(dims, num_dims, type) = array_info (a);
return @Array_Type (Integer_Type, dims);
}