One of the most powerful features of S-Lang is the transparent manner in which scalars and arrays may be used in the same context. For example,
y = sin(x)
produces a scalar result when x is a scalar, or a vector when x
is a 1D array, and so on. This feature is referred to as vectorization,
since it obviates the need to explicitly loop over array indices in
S-Lang scope; instead, the loop is implied and executes in compiled C
scope, yielding significantly greater performance. Vectorization
encompasses more than the simple promotion of scalar arguments to
arrays, however. More generally, we say a function is vectorized when
any of its arguments may be of multiple ranks; no distinction is made
between the promotion of a scalar (rank 0) argument to 1D or higher,
a 2D array to 3D, and so forth. When a vectorized function is
invoked with an argument whose rank exceeds that of its default usage
(e.g. as specified in a C function prototype), we say that both the
argument and the invocation itself are vectored.
Vectorization naturally complements the strong array handling facilities within S-Lang. It provides powerful multidimensional mathematical capability, nearly analytic in its expressiveness, and with performance on par with -- or exceeding -- that of compiled C code and commercial analysis packages; together these make S-Lang a very strong platform for numerical analysis in science and engineering. The fact that vectorization is native to S-Lang distinguishes it from other popular scripting languages like Perl, Python, or Tcl, which natively lack high-performance multidimensional numerical capability. Array handling within S-Lang is typically much faster than its counterparts in these languages, as much as an order of magnitude in some cases, as well as more concise because it's built into the very fabric of the language.
SLIRP extends this capability, with commensurate performance and economy
of codesize, to a wide range of external C, C++, and Fortran codes; this
enables wrappers to be called with either vectored or non-vectored
semantics, and with few restrictions on the quantity, datatype, or rank
of the arguments to functions that may be automatically vectorized.
The -openmp option may be used to tune vectorized code so that OpenMP-aware compilers
will automatically parallelize the wrappers, boosting performance even
further in the presence of multiple cpus (or cores).
As a concrete example of the advantage of vectorization, here are
performance ratios of the S-Lang strlen intrinsic versus a vectorized
wrapper for strlen generated by SLIRP :
The vectorized strlen is roughly 1 to 2 orders of magnitude faster than
the optimal native S-Lang usage (employing array_map to iterate
over the string array); other S-Lang looping constructs -- such as
for, while, or even loop --
yield an even greater performance advantage for the vectorized strlen.
The quality of SLIRP vectorization is evident in the following plots
which demonstrate that the generated wrappers for cos and sin
have vectorized performance equivalent to the hand-crafted cos and
sin intrinsics native to S-Lang (see the examples/vec
directory for examples and more performance details).
Vectorization is a transformation of the entire function semantics, in the sense that every argument received by the wrapper is tagged to optionally accept a vectored value. A vectorized function may be called with both vectored and non-vectored arguments at runtime, but at code generation time SLIRP will either vectorize every argument accepted by the wrapper or none at all. When SLIRP cannot vectorize one or more arguments of a function (see below) it will instead generate a standard, non-vectorized wrapper. There are several ways to request that a function be vectorized:
-vec option, which causes SLIRP to attempt to
vectorize every function in the interface being wrapped#vectorize block#vectorize block-openmp option, in which case vectorized wrappers may be
parallelized with OpenMPFor example, given two functions with C prototypes of
void vmult (double *x, double *y, double *result, int len);
size_t strlen (const char *s);
an interface file containing
#vectorize
strlen
void vmult(double *x, double *y, double *OUT, int DIM1);
#end
would yield a module with vectorized wrappers for the strlen and
vmult functions (these and other examples are included within the
examples/vec subdirectory of SLIRP).
While the types in the vectorized prototype for
vmult match, of course, that of the underlying function being wrapped,
the parameter names chosen change how the function will be called from
S-Lang scope. First, as discussed in the previous chapter, the
double *OUT parameter is an output #argmap, which transforms the
result argument into an output value of the wrapper. Second, DIM1
denotes that the final argument describes the length of each array
and that it, too, may
be omitted when calling the wrapper. S-Lang calls to vmult therefore
operate as if it accepts two arrays and returns 1, all of Double_Type. If
we were to load the module into slsh, for example, and invoke vmult
with no arguments
slsh> import("vec")
slsh> vmult
Usage: double[] = vmult(double[],double[])
This function has been vectorized.
slsh>
(these examples assume a S-Lang 2 shell, with semicolon appending turned on),
the usage statement reflects this, as well as the fact that the function
has been vectorized. The DIM1 annotation also plays another important
role, in that it signifies to SLIRP that before the wrapper calls vmult
proper it must ensure that the array arguments are all of equal length:
slsh> vmult([1,2,3], [3,4])
Array shape or length mismatch
slsh> vmult([1,2,3], 4)
Scalar cannot be used here
Without such a precaution the underlying vmult function would likely yield
undefined results or memory corruption and an application crash. Two
examples of calling vmult correctly, then, are
slsh> print( vmult([1,2,3], [5,5,5]) )
5
10
15
slsh> Arr = Int_Type[2,3]
slsh> Arr[0,*] = 5
slsh> Arr[1,*] = 100
slsh> print( vmult(Arr, [3, 4, 5]) )
15 20 25
300 400 500
Observe that the first call is not vectored, because vmult is
prototyped to expect an array, assumed to be of dimension 1, and that's
what was passed in; vmult is called only once by the wrapper. The second
call is vectored, because one of its arguments is a 2D array; now the
wrapper calls vmult twice, once as
vmult([5, 5, 5], [3, 4, 5], retval, 3)
and again as
vmult([100, 100, 100], [3, 4, 5], retval+3, 3)
(where retval is a double* dynamically allocated to store
6 values) and yields the 2D array
| 15 20 25 |
| 300 400 500 |
To convince yourself that the latter is actually true, consider:
slsh> vmult(Arr, [3, 4, 5])
Double_Type[2, 3]
Note that the vectorization behavior in the second case resembles that
of the array_map() intrinsic when passed arguments of different
lengths: only the first element of shorter arrays will be (re)used
in each iteration. Here the vector [3,4,5] is treated as the
first element of a vacuously 2D array of dimension 1x3.
This flexibility can yield shorter and cleaner code, e.g.
by avoiding the creation of temporary container arrays simply to
satisfy dimensionality constraints.
Finally, it's worth noting that even though Arr
and [3,4,5] are arrays of Integer_Type, S-Lang transparently casts them
to Double_Type when the arguments are transferred to the wrapper.
At runtime SLIRP uses a few simple metrics to decide
To begin, each argument passed to a wrapper has an expected rank: a
non-negative integer representing its dimensionality, or the number of
indices required to uniquely identify a single element within the argument.
SLIRP infers the expected rank of each function argument
directly from its prototype, at code generation time.
For example, the primary arguments of vmult and strlen from the preceding
section are of ranks 1 and 0: when invoked as prototyped the primary inputs
to vmult are 1D arrays, while strlen expects a scalar (one C-style string),
which for generality is considered an array of dimension zero. An
N-dimensional C array such as
double matrix[3][3][5];
has a rank of N (3 in this case), with the proviso that arguments such as
int something(double x[3][5])
int something_else(double **x)
are not commensurate. The first function will be vectorized with rank
2; the second, however, will be vectorized with rank 1, because
double** describes a 1D array of pointers, represented as opaque
types in S-Lang scope.
A pointer argument may be tagged as multidimensional, though, through
the use of multiple DIM annotations. For example, a C prototype of
void sub1_2d(int *matrix, int numrows, int numcols);
vectorized with
#vectorize
void sub1_2d(int *matrix, int DIM1, int DIM2);
#end
yields a wrapper which accepts 1 argument, an integer array of rank 2.
The value of the second and third arguments to the underlying C
function will be calculated automatically from the input S-Lang array.
The actual rank of an argument is its dimensionality as passed
at runtime. When the actual rank of any argument exceeds its expected
rank, SLIRP needs to determine how many times the wrapped function should
be called, or the number of iterations of the vectorization. In
the preceding vmult example this value is 2; extended to 3D
slsh> Arr3D = Double_Type[2,2,3]
slsh> Arr3D[0, *, *] = Arr
slsh> Arr3D[1, *, *] = 2 * Arr
slsh> result = vmult(Arr3D, [7,8,9] )
vmult proper will be called 4 times, and the wrapper will yield
the following 2x2x3 array
| | 35 40 45 | |
| | 700 800 900 | |
| |
| | 70 80 90 | |
| | 1400 1600 1800 | |
SLIRP determines the number of iterations by selecting a master
array M, simply the input argument of highest rank, and then computing
the product of its excess dimensions. Formally, if A
and E respectively represent the actual and expected ranks
of M, and D is a vector of length A describing the size of each
dimension of M (in row-major form), then
with the expected proviso that
is by definition when K < J.
Finally, SLIRP determines what to pass to each wrapped function invocation
by calculating a stride for each input argument; this indicates by how
much the index into the given argument --- viewed as a linear sequence of
contiguous elements --- should be advanced after each call. SLIRP aims
for flexibility by allowing arrays of different shapes to be used within
a single vectored call. For instance, in the 3D vmult call above the
strides of the first and second arguments are 3 and 0, respectively;
within the wrapper Arr3D and [7, 8, 9] are effectively represented as
double *arg1 = {5, 5, 5, 100, 100, 100, 10, 10, 10, 200, 200, 200};
double *arg2 = {7, 8, 9};
the return value is allocated as
double *retval = malloc( sizeof(double) * 12);
and the 4 calls to vmult proper are executed as
vmult(arg1, arg2, retval, 3);
vmult(arg1+3, arg2+0, retval+3, 3);
vmult(arg1+6, arg2+0, retval+6, 3);
vmult(arg1+9, arg2+0, retval+9, 3);
The stride of the master array M, and all isomorphic arguments, is thus
the number of elements in M not covered by its excess dimensions; that
is, the product of its expected dimensions (rank)
To see why, recall that the product of all dimensions of M, i.e. the
total number of elements, is
and must be equal to the product of the excess and expected dimensions
Because the first term here is the number of iterations, the stride may
be expressed concisely as
An error will be signaled if arguments not isomorphic to M have number
of elements not equal to the stride of M; such arguments will otherwise
be assigned a stride of 0.
In the majority of cases SLIRP attempts to faithfully vectorize every function requested. However, it will refuse to vectorize functions
#novectorize block.--with-nvec=NEW_VALUE, although
you should consider that functions with too many input arguments may indicate
questionable design.
As noted earlier, the -openmp option will tune the emission of vectorized wrappers
for parallelization with a suitable OpenMP-aware compiler, making it very easy
to gain potentially significant performance increases on multiprocessor
(or multicore) machines. Consider the plots below, which indicate the
performance of SLIRP OpenMP-aware wrappers for a selection of commonly used ANSI
C math functions, over a range of real-valued arrays. The performance
numbers for the first plot were
generated with 2 1.8 GHz AMD Athlon MP 2200 processors, on Debian Gnu/Linux
3.1 (Sarge) using the 20070221 pre-release version of GCC 4.2; the second
set of numbers was generated with 4 750Mhz sparcv9 processors, on Solaris
5.9 with Sun Studio 9. The module and scripts used to generate and plot
these performance data, as well as the data themselves, are located in the
examples/openmp subdirectory of the SLIRP distribution. The plots
were generated within the ISIS modeling and analysis tool (
http://space.mit.edu/cxc/isis/).
At present the -openmp option cannot be used to wrap Fortran functions. It also
reduces the flexibility of striding, in the sense that each array passed
to a parallelized wrapper must have a stride of 1.