Discussion:
Overflow errror--what could cause this?
(too old to reply)
Clay Blankenship
2008-06-27 20:59:41 UTC
Permalink
I am running a very large program--an implementation of the Land
Information System--on a Linux cluster (1 processor) using Intel
Fortran 90. I am getting an overflow error with the following
message.

7200.000 3.0411966E+07 6.5823113E+10
7200.000 3.0402494E+07 6.5785352E+10
7200.000 3.0392248E+07 6.5744241E+10
7200.000 3.0386762E+07 6.5722331E+10
7200.000 2.1660269E+10 1.7544675E+12
forrtl: error (72): floating overflow
Image PC Routine Line
Source LIS-0(mpi:***@b 00000000012F3960
sheels_main_ 722 sheels_main.F90 LIS-0(mpi:***@b
000000000041BED8 Unknown Unknown Unknown LIS-0(mpi:
***@b 0000000000BB7A12 lsm_module_mp_lis 392
lsm_module.F90 LIS-0(mpi:***@b 0000000001226AAB
retrospective_run 185 retrospective_runMod.F90 LIS-0(mpi:
***@b 000000000041D391 Unknown Unknown Unknown
LIS-0(mpi:***@b 0000000000B94EA3 MAIN__ 71
lisdrv.F90
LIS-0(mpi:***@b 00000000004058F2 Unknown Unknown
Unknown
libc.so.6 0000002A961B8AAA Unknown Unknown
Unknown
LIS-0(mpi:***@b 000000000040582A Unknown Unknown
Unknown

The exact line where it dies can be something as simple as setting one
variable equal to another. I have rewritten the code several
functionally identical ways and I have gotten the crash on all of the
following lines:

(1) temp_dt=min(diffterm,drainterm)

(2) temp_dt=diffterm

(3) if (darcydt.gt.diffterm) then

All of these variables are locally declared REALs.
The last output line before the error, above, shows the value of
darcydt, diffterm, and drainterm just before the crash. These are
large but legitimate numbers. This program goes through this section
of code thousands of times before it dies (always at the same place).

I know there is not enough information here to diagnose what is wrong,
but I am at a loss as to where to look. Is it possible that something
is going wrong elsewhere but causing the diagnostic here?

Thanks,
Clay Blankenship

National Space Science and Technology Center
Huntsville, AL
glen herrmannsfeldt
2008-06-27 22:45:18 UTC
Permalink
Clay Blankenship wrote:
(snip)
Post by Clay Blankenship
The exact line where it dies can be something as simple as setting one
variable equal to another. I have rewritten the code several
functionally identical ways and I have gotten the crash on all of the
(1) temp_dt=min(diffterm,drainterm)
(2) temp_dt=diffterm
(3) if (darcydt.gt.diffterm) then
All of these variables are locally declared REALs.
As you say, it is a little hard to tell.

Note, though, that REAL overflows at about 1E38. Data may
be stored in internal registers with a larger range, and only
detect overflow when actually stored, so it is possible to
overflow even with just an assignment.

-- glen
r***@sun.com
2008-06-28 00:41:13 UTC
Permalink
Post by Clay Blankenship
I am running a very large program--an implementation of the Land
Information System--on a Linux cluster (1 processor) using Intel
Fortran 90. I am getting an overflow error with the following
message.
7200.000 3.0411966E+07 6.5823113E+10
7200.000 3.0402494E+07 6.5785352E+10
7200.000 3.0392248E+07 6.5744241E+10
7200.000 3.0386762E+07 6.5722331E+10
7200.000 2.1660269E+10 1.7544675E+12
forrtl: error (72): floating overflow
Image PC Routine Line
lisdrv.F90
Unknown
libc.so.6 0000002A961B8AAA Unknown Unknown
Unknown
Unknown
The exact line where it dies can be something as simple as setting one
variable equal to another. I have rewritten the code several
functionally identical ways and I have gotten the crash on all of the
(1) temp_dt=min(diffterm,drainterm)
(2) temp_dt=diffterm
(3) if (darcydt.gt.diffterm) then
All of these variables are locally declared REALs.
If the situation is as you describe, none of those statements
could cause an overflow exception. The only statements that
could cause an overflow exception are the assignments, and if
the two sides of the assignment have the same types, even
that possibility is eliminated, barring a compiler error
or compiling for an x87. I would need much stronger evidence
to suspect a compiler error. The x87 case can be ruled out
by using the option -fp_port.
Post by Clay Blankenship
The last output line before the error, above, shows the value of
darcydt, diffterm, and drainterm just before the crash. These are
large but legitimate numbers. This program goes through this section
of code thousands of times before it dies (always at the same place).
I know there is not enough information here to diagnose what is wrong,
but I am at a loss as to where to look. Is it possible that something
is going wrong elsewhere but causing the diagnostic here?
If the compiler is generating x87 code, that is a strong
possibility.

Bob Corbett
Catherine Rees Lay
2008-07-09 09:02:40 UTC
Permalink
Clay Blankenship wrote:
(snip)
Post by Clay Blankenship
The exact line where it dies can be something as simple as setting one
variable equal to another. I have rewritten the code several
functionally identical ways and I have gotten the crash on all of the
(1) temp_dt=min(diffterm,drainterm)
(2) temp_dt=diffterm
(3) if (darcydt.gt.diffterm) then
All of these variables are locally declared REALs.
The last output line before the error, above, shows the value of
darcydt, diffterm, and drainterm just before the crash. These are
large but legitimate numbers. This program goes through this section
of code thousands of times before it dies (always at the same place).
I know there is not enough information here to diagnose what is wrong,
but I am at a loss as to where to look. Is it possible that something
is going wrong elsewhere but causing the diagnostic here?
Thanks,
Clay Blankenship
National Space Science and Technology Center
Huntsville, AL
Any time you get an error message which makes no sense at all, and
changing the code randomly moves it to another location, the most likely
problem is that you've overwritten a bit of memory somewhere else
entirely. And the most likely way of doing that is that you have gone
beyond the bounds of an array. Another possibility is argument
mismatches in a subroutine call - but in my experience it's nearly
always the array bounds, especially since it's happening so far into the
run. Does your compiler have array bounds checking, and is it practical
for you to enable it? I know it may not be for a very large and
long-running program...if not, and it's always dying on the same
iteration, you at least have a starting point to look at.

Catherine.
--
Catherine Rees Lay

Polyhedron Software Ltd. Registered Office: Linden House,
93 High St, Standlake, Witney, OX29 7RH, United Kingdom.
Registered in England No.2541693. Vat Reg No. GB 537 3214 57
robin
2008-07-09 21:50:44 UTC
Permalink
Post by Clay Blankenship
I am running a very large program--an implementation of the Land
Information System--on a Linux cluster (1 processor) using Intel
Fortran 90. I am getting an overflow error with the following
message.
7200.000 3.0411966E+07 6.5823113E+10
7200.000 3.0402494E+07 6.5785352E+10
7200.000 3.0392248E+07 6.5744241E+10
7200.000 3.0386762E+07 6.5722331E+10
7200.000 2.1660269E+10 1.7544675E+12
forrtl: error (72): floating overflow
You'd have to look at the .EXE code for instructions
in the vicinity and compare with the instruction images
for those instructions.

It seems to me that something has been corrupted,
and that can occur because of a subscript error,
a substring error, or mismatched arguments.
David Flower
2008-07-10 08:28:13 UTC
Permalink
Post by Clay Blankenship
I am running a very large program--an implementation of the Land
Information System--on a Linux cluster (1 processor) using Intel
Fortran 90. �I am getting an overflow error with the following
message.
� �7200.000 � � �3.0411966E+07 �6.5823113E+10
� �7200.000 � � �3.0402494E+07 �6.5785352E+10
� �7200.000 � � �3.0392248E+07 �6.5744241E+10
� �7200.000 � � �3.0386762E+07 �6.5722331E+10
� �7200.000 � � �2.1660269E+10 �1.7544675E+12
forrtl: error (72): floating overflow
Image � � � � � � �PC � � � � � � � �Routine � � � � � �Line
lisdrv.F90
Unknown
libc.so.6 � � � � �0000002A961B8AAA �Unknown � � � � � � � Unknown
Unknown
Unknown
The exact line where it dies can be something as simple as setting one
variable equal to another. �I have rewritten the code several
functionally identical ways and I have gotten the crash on all of �the
(1) � temp_dt=min(diffterm,drainterm)
(2) � temp_dt=diffterm
(3) � if (darcydt.gt.diffterm) then
All of these variables are locally declared REALs.
The last output line before the error, above, shows the value of
darcydt, diffterm, and drainterm just before the crash. �These are
large but legitimate numbers. �This program goes through this section
of code thousands of times before it dies (always at the same place).
I know there is not enough information here to diagnose what is wrong,
but I am at a loss as to where to look. �Is it possible that something
is going wrong elsewhere but causing the diagnostic here?
Thanks,
Clay Blankenship
National Space Science and Technology Center
Huntsville, AL
Is it possible to run the program using a totally different compiler
and/or operating system ?

Especially helpful of the compiler has a lot of checks that you can
set

Dave Flower

PS My experience is that getting a program to produce the same results
using disserent operating sytems and compilers is a very good check of
that program

Continue reading on narkive:
Loading...