[kaffe] Re: Computing remainders

Sun Jun 29 03:36:01 PDT 2003

Hi Tony,

thanks for the quick reply!

--- Tony Wyatt <wyattaw at optushome.com.au> wrote:
> Hi Luca and Dalibor,
> 
> On 28/06/03, you wrote:
> 
> >> The Motorola M68k series and (it would seem) the PowerPC series
> >> processors do not generate an exception, but silently generate the wrong
> >> answer.
> > 
> > Actually, I think the answer may be permitted by the ISO C 89 standard.
> > See http://www.lysator.liu.se/c/rat/c3.html#3-3-5 for the rationale why
> > C89 allowed some flexibility with respect to the result of the remainder
> > operation.
> > 
> On my m68k processor, the actual answer depends on the time of day and phase
> of the moon. I get very different results on the Amiga OS and Linux. The
> "answer", whatever may be allowable, is inconsistent.

I know how frustrating discussing undefined behaviour in C can be from my
lurking on the gcc mailing list, and news:comp.lang.std.c .

Since LONG_MIN % -1L results in undefined behavior in C, anything could happen.
I don't think that the undefined behavior even has to be consistent, as far as
the C language standard is concerned. C can be such a mess ;)

> > One still has to special-case the MIN_VALUE % -1 == 0 case, though. Since
> > for example LONG_MIN / -1 => LONG_MAX + 1 => overflow, resulting in
> > undefined behaviour.
> > 
>  
> According to the Java spec 15.17.2 (just above the "Remainder" para you
> quoted, the result of (LONG_MIN / -1L) is well defined and is equal to the
> dividend. It is this single point in the dividend/divisor plane that is in
> error on
> the m68k hardware. However, I agree that a special case is the most expedient
> fix.

I believe that the writers of the Java Lang spec made so much fuss about it,
precisely because division with negative operands in not well-defined in C89
(the old C standard, also known as ANSI C). In C89 -3 / 2 can be -1 or -2,
depending whether the C implementation rounds towards negative infinity or
towards zero. I believe that's the reason why they give all the examples on how
division is supposed to work. Since LONG_MIN / -1 is undefined, they added
another line defining its result in Java.

See for example the reasoning why glibc provides div_t and friends here:
http://www.gnu.org/manual/glibc-2.2.5/html_node/Integer-Division.html#Integer%20Division

> > I've "back-ported" your patch to configure.in, and checked it in. I think
> > that the patched test is causing the problems now, but I didn't notice it
> > since on ix86 the test fails with a floating point exception anyway.
> Well, humph. BTW, I think you mean an "Arithmetic exception" There is no FP
> involved.

That's the beauty of undefined behavior: it doesn't have to make sense. Here's
what I get on my i386-linux box with gcc-2.95.3:

topic at clerks:~> cat remainder.c 
#include <stdio.h>
#include <limits.h>
#define T long
#ifndef LONG_MIN
# define LONG_MIN ((((unsigned T)(~(T)0))>>1)+1)
#endif
T foo(T i, T j);
int main() { 
T tmp = foo(LONG_MIN, -1l); 
printf ("%ld\n", tmp);
return
0;}
T foo(T i, T j) { return i % j; }

topic at clerks:~> gcc remainder.c 
topic at clerks:~> ./a.out 
Floating point exception

As you say, there is no floating point expression anywhere in the code, but the
undefined behaviour doesn't care ;)

> > Here's the current test:
> > 
> > #include <limits.h>
> > #define T long
> > #ifndef LONG_MIN
> > # define LONG_MIN ((((unsigned T)(~(T)0))>>1)+1)
> > #endif
> > T foo(T i, T j);
> > int main() { return foo(LONG_MIN, -1l) == 0; }
> The "==" sign is the only change I made. It requires the result to be equal
> to zero, whereas the original test only required the test to complete without
> error. 

Yeah, I liked your patch, since it made obvious what the test was supposed to
check. I've posted a patch that clarifies the test a bit further to explicitely
return EXIT_FAILURE and EXIT_SUCCESS. Did you have a chance to take a look at
it?

Here's what I believe is the problem with the current test:
return foo(LONG_MIN, -1l) == 0

The boolean expression is true when LONG_MIN % -1l == 0, i.e. when LONG_MIN %
-1L works like in the Java spec. It is false ( i.e. zero in C) otherwise. 

Returning 0 from the main function in C means that the program completed its
task successfully [1]. the configure script interprets that as 'hey, it worked!
long modulo is not broken, then', when in fact it is. 

So I'd prefer to explicitely return EXIT_SUCCESS and EXIT_FAILURE to make the
code even more explicit. I think the line above should be rewritten as:

#include <stdlib.h>

/* snip */

if (foo(LONG_MIN, -1l) == 0) {
  return EXIT_SUCCESS;
}

return EXIT_FAILURE;

/* snip */

Could you give it a try on the Amiga ?

> > T foo(T i, T j) { return i % j; }
> > 
> > According to what I've said above, the test is broken anyway, since
> > LONG_MIN % -1 will always overflow, resulting in undefined behavior. It
> > has to be special cased.
> > 
> > The other issue is % with negative operands, which is not well-defined by
> > C89. So I'd like to propose something along these lines for icode.h :
> > 
> Why can't we rely on the java spec for the required behaviour? It defines
> the result according to standard mathematical principles (those that I
> learned many years ago, anyway).

We can't rely on Java spec, because kaffe's interpreter is implemented in C ;)

C is a language that is not as well defined in some corner cases as Java. So
when we have to implement Java's behaviour in C, we have to brace for the worst
(for example x86 throwing very silly exceptions) and work around undefined,
implementation-defined and unspecified behaviour [2] ;)

> > Analogous implementations should be made for long remainder, and both int
> > and long division. As usual, I'd like to hear your comments.
> > 
> Sorry, DT, but I think you're over-reacting. I don't think the solution
> needs to be that verbose. I think that the single special case we had
> previously is sufficient. Unless there is another platform that gives the
> wrong result, the two erring processors are well served by the fix we have
> (the x86 gives an exception, the m68k gives a randomly wrong result).

If we are just talking about x86 and m68k, then there is no special case
necessary: let's simply make the current special case the default case. 

That would fix the undefined behaviour issue on all platforms. 

It would also implement what the Java spec demands on those platforms where /
with negative operands in C rounds towards zero and remainder is negative only
if one operand is negative. I don't know if kaffe has been ported to platforms
where that's not the case.

> All that code makes me shiver. I guess it's not likely to slow an
> interpreter much further, but in the JIT, it'd be murder.

I also think my code quite ugly ;)

I tried to write an as portable as possible implementation of Java's remainder
in C. We could use it as a special case for the interpreter on platforms where
the C implementation doesn't have the same semantics as java with respect to /
and % on negative operands.

Writing JIT code is different from writing in pure C. In JIT, you have
well-specified behaviour in the processor specification, so you can pick the
right opcode and flag combination for the job. In C, I don't have that level of
control on all platforms, so I have to plan for the worst case. ;)

Which of course doesn't mean that one has to give up on optimizing for the
platform at hand. So I think we should have both implementations in the
interpreter, the current special case, and my slow-but-portable implementation.

> Now I don't know about other platforms, but I did a trick with the JIT(1)
> on the m68k platform. In "jit-m68k.def", I made a new "remainder" function
> that checks the divisor and returns zero if the divisor is (-1L). I haven't
> yet implemented this trick in JIT3, since the remainder function there is
> quite different.

Sounds good to me. That's similar to the route I'd like to go for the
interpreter as well.

1. make the current special case (#ifdef LONG_MODULO_BROKEN) the only case.
That should fix powerpc, x86 and m68k.
2. remove the according configure.in checks. since they become unnecessary.
3. write configure.in checks to see if C / with negative args rounds to 0
4. write configure.in checks to see if C % with negative args is negative
5. implement Java / and % on those platforms where the checks above fail using
my slow div_t based implementation.
6. put the division and remainder code in its own header file

what do you think about it?

cheers,
dalibor topic

[1] Actually one should return EXIT_SUCCESS, which is #defined to 0 on all
platforms I know.

[2] The C Standard provides a few levels of 'we don't know what's going to
happen when you do that'. One of the reasons why I prefer Java over C is that
writing portable code is a piece of cake, compared to C. The latest C standard,
C99, has 27 pages (Annex J) dedicated to just listing portability issues in the
C language standard. That's before one can even start to care about differences
between C89 and C99, broken compilers and standard library implementations etc.

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com