Tom Edelson's Songline

Tom Edelson's Songline
Writing about computers, life, and society from the perspective of a "poly Quaker Taoist" living in the Triangle region of North Carolina.

My home page at The Well

Categories In This Blog:

Home page

About me

Blogging

Greyhounds

Macintosh

Personal Finance Software

Poetry

Politics and Social Change

Polyamory

Programming Languages

• Java

• Perl

• Scheme

Radio Userland

Retirement

Writing

Wednesday, April 9, 2008

Comparing JScheme and SISC: Tradeoffs in Programming Language Design (part 3)

In part one of this series comparing JScheme with SISC, I said "My overall theme is that their design goals are different, and that they should be judged accordingly." I'd say that this claim was moderately borne out by the remainder of the first two parts, which dealt with the syntactic differences between the two Scheme implementations' ways of calling Java code. If your goal were to call Java from Scheme while writing as little Scheme code as possible, JScheme would have an advantage. SISC's way of doing it, on the other hand, is more integrated with the overall "Scheme way" of doing things; this is an advantage for SISC (though not all would agree that it is a significant advantage), so this qualifies as an example of a tradeoff: of a decision in which each available choice has both advantages and disadvantages.

Now I want to return to one of the points discussed in the post preceding the one quoted above, the post titled A Better Scheme. That is, I want to return to another kind of difference between JScheme and SISC, namely how they handle numbers.

In a subsequent post, I will follow this up by explaining how this difference in the treatment of numbers affects certain aspects of how you call Java from each, that is, from JScheme or from SISC. Having done that, I hope to make an even stronger argument for the contention that designing a programming language (including a Scheme implementation) is a matter of tradeoffs: that there are cases where making the language "better" in one respect cannot help but make it "worse" in another.

In doing so, by the way, I aim to make this post a bit more accessible than the last two: not to assume, as they did, that the reader already understands the basics of Scheme programming.

I won't cover numbers in general, just integers (that is, whole numbers). Talking about fractions would introduce complexity that isn't needed to make my point ... and the more so if I were to add those creatures that are actually called "complex numbers".

In general, a programming language will have a default integer "type": a default way of representing, in the computer, a value that is declared, or otherwise known, to be an integer. And in many languages, probably most, a value of that default integer type takes up a fixed amount of space in the computer memory. In Java, the default integer type is called int, and it takes up 32 bits.

It turns out, after allowing for negative as well as positive values, that the largest possible positive value representable by a Java int is 2³¹-1 (raise two the the thirty-first power, then subtract one).

(Some readers, with technical backgrounds, will find this obvious. Others may be more than happy to take it on faith. But perhaps you don't fall into either of these categories; perhaps you'd like to see an explanation of how 32 bits yields 2³¹-1 as the largest possible positive value. I've written something which attempts to explain this, and made it available at the following location under my "home page" at The Well: http://www.well.com/user/edelsont/software/concepts/integer-range.html.)

The main point is that there is a fixed maximum value that can be represented as a Java int. So what happens if a Java program takes two values, each represented that way, and tells the computer to multiply them together? It will give you back the answer in the form of another Java int, which has, naturally enough, the same maximum value.

Perhaps you can see that there is a potential problem here: one which may arise, or not, depending on the actual values of the original two ints when the program is run. Even though neither of the values is greater than 2³¹-1, their product could still be greater than that number, and thus too large to be represented as a Java int. Any such event, where the true result of a calculation will not fit in the data type being used to represent it, is called an "overflow". What will Java do in such an event?

You might expect that the Java system would detect this and automatically "promote" the result to some other data type, which is capable of representing its value accurately. Another possibility is that it might indicate that it is unable to give you a correct answer for this particular calculation. For example, you might think of what some spreadsheet programs do when a column isn't wide enough to hold the number it's supposed to display: they fill the cell with asterisks, rather than displaying a number at all.

In fact, Java does neither of these things; it simply gives you a wrong answer. That is, when the correct mathematical result is not within the range of integers representable as Java ints, Java gives you an answer which is within the range, but is not correct. It may even tell you that the product of two positive numbers is negative.

If you are writing a Java program, and it includes some arithmetic calculations, and there is a possibility that the values being fed into the calculations may produce overflow, there are things you can do to prevent the program from giving the user the wrong answer when overflow does occur. But in order to accomplish this, you have to tell the computer to do the calculation in a different way, thus making the program you write more complex than it would be if you just used the nice, simple Java notation for doing multiplication.

All this is true of Java, and of a lot of other programming languages as well: of most of them, I'm reasonably certain. But Scheme is one of the exceptions. According to the standard which specifies how a Scheme implementation is supposed to work, when a Scheme program says to multiply two integers together, it's supposed to get the right answer.

Actually, the Scheme programmer can specify that the starting values are not "exact" to begin with, in which case the Scheme system is allowed to avoid the extra work required in order to get the exact right answer. But, in the case of integers, at least, this is not the default behavior: the programmer has to go out of her way to specify that exact results are not needed, and, if she hasn't done that, exact results she will get. This is the opposite of the behavior of Java, and of most other programming languages, in this type of situation.

How is it possible for a Scheme implementation to accomplish this? If you have a fixed amount of computer memory in which to represent a value, that necessarily places a limit on the range of values that can be represented. So, in essence, there's only one way to meet this requirement of accuracy. A Scheme implementation, in order to conform to the standard in this respect, must not place a fixed limit on the number of bits (the amount of memory) used to represent an integer. It also must not leave it up to the authors of Scheme programs to specify, by choice of data type, how much memory is to be allocated for any particular integer value, such as the result of a calculation.

Instead, the Scheme implementation must determine, at run time, how much memory space will be required to hold the mathematically correct result, and make that space available. ("At run time" means when the program is being run, as opposed, say, to when it is written.) The amount of space will depend on the actual values of the numbers being fed into the calculation; and those values are usually not known when the program is being written, since they may come, directly or indirectly, from run-time inputs to the program.

So in general, the memory requirements for integer values in a Scheme program cannot be determined until run time. This makes it harder to create a Scheme implementation which conforms to the standard. On the other hand, it makes it easier to write reliable Scheme programs, if you know that they will "run under" [be executed by] a conforming implementation.

JScheme does not conform to this part of the Scheme specification. Its default integer type, like Java's, has a fixed amount of memory space allocated for it; in fact, JScheme's default integer type is Java's default integer type. And when you ask JScheme to multiply two numbers, each of which is represented internally in the default integer type, it allocates the same fixed amount of memory for the result, just as Java does. And so, if the result of the calculation is too large to fit in that fixed amount of memory, you get a wrong answer ... again, just as in Java.

There was an example of this in the post titled A Better Scheme, dated February 11, 2008. The example involved the number 2³⁰ (two to the thirtieth power). I said "If you ask JScheme to multiply this number by two, it tells you that the answer is negative." I didn't say so at the time, but there was a reason for picking this specific sample multiplication: its correct answer is 2³¹. Now you know that the biggest number that can be represented accurately, using JScheme's (and Java's) default integer type, is 2³¹-1: the true result of the specified calculation is too big to fit ... by a margin of one.

So JScheme gets the wrong answer, as Java does. Perhaps this is what was meant (or at least part of what was meant) by a remark that I quoted in part one, to the effect that JScheme can be thought of as a "Scheme skin" over Java. (That was said, as you may recall, by one of those developers principally responsible for maintaining JScheme.)

A Scheme skin over Java! That's a metaphor, of course. But it seems to mean about the same thing as saying that JScheme is Java dressed up to look like Scheme; and that, in turn, could be said to imply that JScheme isn't really Scheme.

A more formal, and less loaded, way of putting the point is that JScheme has the syntax of Scheme, but, at least in this aspect of its handling of integers, it has the semantics of Java.

And really, to me, "less loaded" is better. A purist, in the light of something like this, might call JScheme a faux Scheme ... and think that obviously, therefore a real Scheme programmer would never touch it.

I'm not that sort of purist. I don't think that a decision about whether to use JScheme for something should be based on whether JScheme is "really Scheme": it should be based on whether JScheme meets the requirements of the job you need to do.

That said, however ... I think that for many projects, this aspect of JScheme's handling of integers does count significantly against it. I don't want to have to do extra work to be sure of getting right answers from an arithmetic calculation. Nor do I want to spend time thinking about whether that is necessary: about whether I know enough about the inputs to the calculation, the values of the numbers being fed in, to be confident that the JScheme/Java semantics will produce correct results.

In earlier days, programmers thought very carefully about such things, because calculations using a fixed amount of memory take less machine resources than do ones using a variable amount, and machine resources used to be expensive enough that programmers needed to spend a lot of their time finding ways to minimize the use of them. But the cost of programmer time has risen, while the cost of machine resources has fallen, so that it rarely makes sense, any more, to expend significant programmer effort on saving resources in these sorts of ways.

SISC does conform to this part of the Scheme specification: it doesn't have a default integer type with a fixed size limit, it just adjusts the amount of memory, and thus the range of values that can accurately be represented, as needed.

So SISC removes this source of possible wrong answers for me, without the extra effort on my part. That contributes significantly to my tending to prefer SISC over JScheme; and I would say, more generally, that it gives SISC a significant advantage over JScheme. (By the same reasoning, it gives any conforming Scheme implementation a significant advantage over Java, and over most other programming languages as well.)

With this advantage, however, comes a disadvantage. You get the advantage without the disadvantage, if you are writing pure Scheme code. But if you also need to write explicit calls from Scheme to Java code, then this same design decision, unavoidably, makes it more laborious to do it in SISC than it is in JScheme.

This sounds like the same issue which got the most attention in both parts one and two of this series: JScheme's "Java Dot Notation", which also makes calling Java easier from JScheme than from SISC, but which some criticize for being a departure from the simple Scheme syntax. But it's not the same issue: it's also about calling Java from SISC versus doing so from JScheme, but this time, it's more specifically about passing parameters, and integer parameters in particular, between the two.

The next post in the series will explain how this design decision, the same one which makes it easier to be sure that you will get correct arithmetic results from SISC, also makes it more difficult to pass integer parameters from SISC to Java, as compared with doing so from JScheme. Thus it will fulfill the promise to give a clear-cut example of the claim that there are unavoidable tradeoffs in designing a programming language ... or even, as in this case, in designing [what you call] an implementation of an existing programming language.

Categorie(s) for this post: Scheme.

4:50:20 PM comment []