In part
one of this series comparing JScheme with SISC, I said "My
overall theme is that their design goals are different, and that
they should be judged accordingly." I'd say that this claim
was moderately borne out by the remainder of the first two parts,
which dealt with the syntactic differences between the two Scheme
implementations' ways of calling Java code. If your goal
were to call Java from Scheme while writing as little Scheme code
as possible, JScheme would have an advantage. SISC's way of
doing it, on the other hand, is more integrated with the overall
"Scheme way" of doing things; this is an advantage for SISC
(though not all would agree that it is a significant advantage),
so this qualifies as an example of a tradeoff: of a decision in
which each available choice has both advantages and disadvantages.
Now I want to return to one of the points discussed in the post
preceding the one quoted above,
the post titled A Better Scheme. That is, I want to return to another kind of difference
between JScheme and SISC, namely how they handle numbers.
In a subsequent post, I will follow this up by explaining how this
difference in the treatment of numbers affects certain aspects of how
you call Java from each, that is, from JScheme or from SISC.
Having done that, I hope to make an even stronger argument for the
contention that designing a programming language (including a Scheme
implementation) is a matter of tradeoffs: that there are cases where
making the language "better" in one respect cannot help but make it
"worse" in another.
In doing so, by the way, I aim to make this post a bit more accessible
than the last two: not to assume, as they did, that the reader already
understands the basics of Scheme programming.
I won't cover numbers in general, just integers (that is, whole
numbers). Talking about fractions would introduce complexity
that isn't needed to make my point ... and the more so if I were to
add those creatures that are actually called "complex numbers".
In general, a programming language will have a default integer "type":
a default way of representing, in the computer, a value that is
declared, or otherwise known, to be an integer. And in many
languages, probably most, a value of that default integer type takes
up a fixed amount of space in the computer memory. In Java, the
default integer type is called int , and it takes up 32
bits.
It turns out, after allowing for negative as well as positive values,
that the largest possible positive value representable by a
Java int is 231-1 (raise two the the
thirty-first power, then subtract one).
(Some readers, with technical backgrounds, will find this
obvious. Others may be more than happy to take it on
faith. But perhaps you don't fall into either of these
categories; perhaps you'd like to see an explanation of how 32 bits
yields 231-1 as the largest possible positive value.
I've written something which attempts to explain this, and made it
available at the following location under my
"home page"
at The
Well: http://www.well.com/user/edelsont/software/concepts/integer-range.html.)
The main point is that there is a fixed maximum value that can be
represented as a Java int . So what happens if a
Java program takes two values, each represented that way, and tells
the computer to multiply them together? It will give you back
the answer in the form of another Java int , which has,
naturally enough, the same maximum value.
Perhaps you can see that there is a potential problem here: one which
may arise, or not, depending on the actual values of the original
two int s when the program is run. Even though
neither of the values is greater than 231-1, their product
could still be greater than that number, and thus too large to be
represented as a Java int . Any such event, where
the true result of a calculation will not fit in the data type being
used to represent it, is called an "overflow". What will Java do
in such an event?
You might expect that the Java system would detect this and
automatically "promote" the result to some other data type, which is
capable of representing its value accurately. Another
possibility is that it might indicate that it is unable to give you a
correct answer for this particular calculation. For example, you
might think of what some spreadsheet programs do when a column isn't
wide enough to hold the number it's supposed to display: they fill the cell
with asterisks, rather than displaying a number at all.
In fact, Java does neither of these things; it simply gives you a
wrong answer. That is, when the correct mathematical result is
not within the range of integers representable as
Java int s, Java gives you an answer which is within the
range, but is not correct. It may even tell you that the product
of two positive numbers is negative.
If you are writing a Java program, and it includes some arithmetic
calculations, and there is a possibility that the values being fed
into the calculations may produce overflow, there are things you can
do to prevent the program from giving the user the wrong
answer when overflow does occur. But in order to accomplish
this, you have to tell the computer to do the calculation in a
different way, thus making the program you write more complex than it
would be if you just used the nice, simple Java notation for
doing multiplication.
All this is true of Java, and of a lot of other programming languages
as well: of most of them, I'm reasonably certain. But Scheme is
one of the exceptions. According to the standard which
specifies how a Scheme implementation is supposed to work, when a
Scheme program says to multiply two integers together, it's supposed to
get the right answer.
Actually, the Scheme programmer can specify that the starting values
are not "exact" to begin with, in which case the Scheme system is
allowed to avoid the extra work required in order to get the exact
right answer. But, in the case of integers, at least, this is
not the default behavior: the programmer has to go out of her way to
specify that exact results are not needed, and, if she hasn't done
that, exact results she will get. This is the opposite of the
behavior of Java, and of most other programming languages, in this
type of situation.
How is it possible for a Scheme implementation to accomplish
this? If you have a fixed amount of computer memory in which to
represent a value, that necessarily places a limit on the
range of values that can be represented. So, in essence, there's
only one way to meet this requirement of accuracy. A Scheme
implementation, in order to conform to the standard in this respect,
must not place a fixed limit on the number of bits (the amount of
memory) used to represent an integer. It also must not leave it
up to the authors of Scheme programs to specify, by choice of data
type, how much memory is to be allocated for any particular integer
value, such as the result of a calculation.
Instead, the Scheme implementation must determine, at run time, how
much memory space will be required to hold the mathematically correct
result, and make that space available. ("At run time" means when
the program is being run, as opposed, say, to when it is
written.) The amount of space will depend on the actual values
of the numbers being fed into the calculation; and those values are
usually not known when the program is being written, since they may
come, directly or indirectly, from run-time inputs to the program.
So in general, the memory requirements for integer values in a Scheme
program cannot be determined until run time. This makes it
harder to create a Scheme implementation which conforms to the
standard. On the other hand, it makes it easier to write
reliable Scheme programs, if you know that they will "run under" [be
executed by] a conforming implementation.
JScheme does not conform to this part of the Scheme
specification. Its default integer type, like Java's, has a
fixed amount of memory space allocated for it; in fact, JScheme's
default integer type is Java's default integer type.
And when you ask JScheme to multiply two numbers, each of which is
represented internally in the default integer type, it allocates the
same fixed amount of memory for the result, just as Java does.
And so, if the result of the calculation is too large to fit in that
fixed amount of memory, you get a wrong answer ... again, just as in
Java.
There was an example of this in the post titled A Better Scheme,
dated February 11, 2008. The example involved the number
230 (two to the thirtieth power). I said "If you ask
JScheme to multiply this number by two, it tells you that the answer
is negative." I didn't say so at the time, but there was a
reason for picking this specific sample multiplication: its correct
answer is 231. Now you know that the biggest number
that can be represented accurately, using JScheme's (and Java's)
default integer type, is 231-1: the true result of the specified
calculation is too big to fit ... by a margin of one.
So JScheme gets the wrong answer, as Java does. Perhaps this is
what was meant (or at least part of what was meant) by a remark that I
quoted
in part
one, to the effect that JScheme can be thought of as a "Scheme
skin" over Java. (That was said, as you may recall, by one of
those developers principally responsible for maintaining JScheme.)
A Scheme skin over Java! That's a metaphor, of course. But it
seems to mean about the same thing as saying that JScheme is Java
dressed up to look like Scheme; and that, in turn, could be said to
imply that JScheme isn't really Scheme.
A more formal, and less loaded, way of putting the point is that
JScheme has the syntax of Scheme, but, at least in this aspect of its
handling of integers, it has the semantics of Java.
And really, to me, "less loaded" is better. A purist, in the
light of something like this, might call JScheme a faux Scheme ...
and think that obviously, therefore a real Scheme programmer would
never touch it.
I'm not that sort of purist. I don't think that a decision about
whether to use JScheme for something should be based on whether
JScheme is "really Scheme": it should be based on whether JScheme
meets the requirements of the job you need to do.
That said, however ... I think that for many projects, this aspect of
JScheme's handling of integers does count significantly against
it. I don't want to have to do extra work to be sure of getting
right answers from an arithmetic calculation. Nor do I want to
spend time thinking about whether that is necessary: about whether I
know enough about the inputs to the calculation, the values of the
numbers being fed in, to be confident that the JScheme/Java semantics
will produce correct results.
In earlier days, programmers thought very carefully about such things,
because calculations using a fixed amount of memory take less machine
resources than do ones using a variable amount, and machine resources
used to be expensive enough that programmers needed to spend a lot of
their time finding ways to minimize the use of them. But the
cost of programmer time has risen, while the cost of machine resources
has fallen, so that it rarely makes sense, any more, to expend
significant programmer effort on saving resources in these sorts of
ways.
SISC does conform to this part of the Scheme specification: it doesn't
have a default integer type with a fixed size limit, it just adjusts
the amount of memory, and thus the range of values that can accurately
be represented, as needed.
So SISC removes this source of possible wrong answers for me, without
the extra effort on my part. That contributes significantly to
my tending to prefer SISC over JScheme; and I would say, more
generally, that it gives SISC a significant advantage over
JScheme. (By the same reasoning, it gives any conforming Scheme
implementation a significant advantage over Java, and over most other
programming languages as well.)
With this advantage, however, comes a disadvantage. You get the
advantage without the disadvantage, if you are writing pure Scheme
code. But if you also need to write explicit calls from Scheme
to Java code, then this same design decision, unavoidably, makes it
more laborious to do it in SISC than it is in JScheme.
This sounds like the same issue which got the most attention in both
parts one and two of this series: JScheme's "Java Dot Notation", which
also makes calling Java easier from JScheme than from SISC, but which
some criticize for being a departure from the simple Scheme
syntax. But it's not the same issue: it's also about calling
Java from SISC versus doing so from JScheme, but this time, it's more
specifically about passing parameters, and integer parameters in
particular, between the two.
The next post in the series will explain how this design decision, the
same one which makes it easier to be sure that you will get correct
arithmetic results from SISC, also makes it more difficult to pass
integer parameters from SISC to Java, as compared with doing so from
JScheme. Thus it will fulfill the promise to give a clear-cut
example of the claim that there are unavoidable tradeoffs in designing
a programming language ... or even, as in this case, in designing
[what you call] an implementation of an existing programming
language.
Categorie(s) for this post:
Scheme.
4:50:20 PM
|
|