Clay Blankenship
2008-06-27 20:59:41 UTC
I am running a very large program--an implementation of the Land
Information System--on a Linux cluster (1 processor) using Intel
Fortran 90. I am getting an overflow error with the following
message.
7200.000 3.0411966E+07 6.5823113E+10
7200.000 3.0402494E+07 6.5785352E+10
7200.000 3.0392248E+07 6.5744241E+10
7200.000 3.0386762E+07 6.5722331E+10
7200.000 2.1660269E+10 1.7544675E+12
forrtl: error (72): floating overflow
Image PC Routine Line
Source LIS-0(mpi:***@b 00000000012F3960
sheels_main_ 722 sheels_main.F90 LIS-0(mpi:***@b
000000000041BED8 Unknown Unknown Unknown LIS-0(mpi:
***@b 0000000000BB7A12 lsm_module_mp_lis 392
lsm_module.F90 LIS-0(mpi:***@b 0000000001226AAB
retrospective_run 185 retrospective_runMod.F90 LIS-0(mpi:
***@b 000000000041D391 Unknown Unknown Unknown
LIS-0(mpi:***@b 0000000000B94EA3 MAIN__ 71
lisdrv.F90
LIS-0(mpi:***@b 00000000004058F2 Unknown Unknown
Unknown
libc.so.6 0000002A961B8AAA Unknown Unknown
Unknown
LIS-0(mpi:***@b 000000000040582A Unknown Unknown
Unknown
The exact line where it dies can be something as simple as setting one
variable equal to another. I have rewritten the code several
functionally identical ways and I have gotten the crash on all of the
following lines:
(1) temp_dt=min(diffterm,drainterm)
(2) temp_dt=diffterm
(3) if (darcydt.gt.diffterm) then
All of these variables are locally declared REALs.
The last output line before the error, above, shows the value of
darcydt, diffterm, and drainterm just before the crash. These are
large but legitimate numbers. This program goes through this section
of code thousands of times before it dies (always at the same place).
I know there is not enough information here to diagnose what is wrong,
but I am at a loss as to where to look. Is it possible that something
is going wrong elsewhere but causing the diagnostic here?
Thanks,
Clay Blankenship
National Space Science and Technology Center
Huntsville, AL
Information System--on a Linux cluster (1 processor) using Intel
Fortran 90. I am getting an overflow error with the following
message.
7200.000 3.0411966E+07 6.5823113E+10
7200.000 3.0402494E+07 6.5785352E+10
7200.000 3.0392248E+07 6.5744241E+10
7200.000 3.0386762E+07 6.5722331E+10
7200.000 2.1660269E+10 1.7544675E+12
forrtl: error (72): floating overflow
Image PC Routine Line
Source LIS-0(mpi:***@b 00000000012F3960
sheels_main_ 722 sheels_main.F90 LIS-0(mpi:***@b
000000000041BED8 Unknown Unknown Unknown LIS-0(mpi:
***@b 0000000000BB7A12 lsm_module_mp_lis 392
lsm_module.F90 LIS-0(mpi:***@b 0000000001226AAB
retrospective_run 185 retrospective_runMod.F90 LIS-0(mpi:
***@b 000000000041D391 Unknown Unknown Unknown
LIS-0(mpi:***@b 0000000000B94EA3 MAIN__ 71
lisdrv.F90
LIS-0(mpi:***@b 00000000004058F2 Unknown Unknown
Unknown
libc.so.6 0000002A961B8AAA Unknown Unknown
Unknown
LIS-0(mpi:***@b 000000000040582A Unknown Unknown
Unknown
The exact line where it dies can be something as simple as setting one
variable equal to another. I have rewritten the code several
functionally identical ways and I have gotten the crash on all of the
following lines:
(1) temp_dt=min(diffterm,drainterm)
(2) temp_dt=diffterm
(3) if (darcydt.gt.diffterm) then
All of these variables are locally declared REALs.
The last output line before the error, above, shows the value of
darcydt, diffterm, and drainterm just before the crash. These are
large but legitimate numbers. This program goes through this section
of code thousands of times before it dies (always at the same place).
I know there is not enough information here to diagnose what is wrong,
but I am at a loss as to where to look. Is it possible that something
is going wrong elsewhere but causing the diagnostic here?
Thanks,
Clay Blankenship
National Space Science and Technology Center
Huntsville, AL