Post by Lynn McGuireI have decided to convert all the Hollerith in our f77 code to character
strings....
Does anyone know of a tool to automate this ? It looks like a swamp
to me !
It can be a swamp.
One occasionally sees codes which have a lot of CHARACTER*4 (and
sometimes CHARACTER*6 and CHARACTER*10) arrays. These are obvious relics
from a 'quick and dirty' Hollerith->character conversion. It may have been
quick, and is probably more reliable than Holleriths (which are usually
accompanied by a lot of non-Standard masking/shifting that can be
eliminated.) But some basic problems remain.
A systematic approach is needed. One question though: what compiler
are you targeting? My very first step would be to get the code into a form
that allows for more 'fearless' changing by getting it into a more
Fortran-90-like form. That is:
1.) Make sure all your callers and callees type/kind/rank match. This is
usually easy to do by creating a Fortran-90 MODULE which 'includes' all of
the subroutines and functions:
module all
contains
include 'sub_aa.f'
include 'sub_ab.f'
:
include 'sub_zz.f'
end module
The above takes about 30 seconds to create (using the ls -1 command and any modern
text editor.) More tedious is that one has to change all the END statements
to either END SUBROUTINE or END FUNCTION. But that can be largely automated
with a couple of simple shell scripts.
Another consideration in your case, I wouldn't be suprised to find a compiler
that would fail to grok 0.5m LOC in a single module. I assume that your code
is broken into many functional subgroups, and this would tend to indicate that
at least that many modules would be needed - corresponding to those subgroups.
You will be amazed at how many errors you never knew you had! But once repaired,
one can make fairly significant changes to the code and get immediate feedback
as to what broke.
2.) Make sure all your COMMON blocks match by making sure their definitions,
including declarations for the variables, at least reside in INCLUDE files
and used consistently throughout the code. (Better yet is to place the global
data in modules, but I don't view it as strictly necessary at this point.)
Again this may flesh out a number of bugs you never knew you had. I recommend
the above two steps for ANY code, and in my mind is the 'minimal f90 conversion'.
As an alternative to the above, some compiler environments provide static
analysers to check all of the above without making a lot of source changes.
(An example is ftnlint - which runs under IRIX.) There are 3rd party tools
to do the same. But I find the f90 approach to be the most trouble free,
portable, and integrated way to do it.
Once 1.) and 2.) have been completed, you will then be poised to start
looking at the Holleriths.
3.) Focus on one functional area at a time. Make a change to a data item, then
recompile to find all the places it breaks. It is easiest to start with data
items which are completely local to a routine, as opposed to ones which are
used in some global sense (e.g., through procedure calls or in COMMON.)
Some EQUIVALENCEing may be temporarily needed to equate the storage of a
new CHARACTER variable with the old INTEGER variables. Again a Fortran-90
compiler is generally better than a Fortran-77 compiler because the rules
for EQUIVALENCEing CHARACTER and non-CHARACTER entities were relaxed a bit
at F90.
Run your regression test base and fix problems. Then repeat step #3 as
needed.
Some things to watch out for (in no particular order):
1.) By the Fortran-66 Standard, Hollerith data is placed into a word
'left-justified, blank-filled'. This is important to keep in mind when
trying to understand the code.
As an extension, many compilers had/have 'left-justified, zero-fill'
and 'right-justified, zero-filled' variants of Hollerith constants
available. These were often used especially in masking/shifting code
to simplify the code.
2.) Sometimes you will find places where pages of Hollerith code can
literally be thrown out and replaced with just a few lines or even a
single intrinsic function call. This was especially true in places
where a lot of masking/shifting was going on to access packed characters
within words. In modern code one can use substring notation with hardly
a thought.
3.) ENCODE/DECODE roughly translate to READ/WRITE (or do I have that
backwards...)
4.) FORMAT statements:
4a.) When performing formatted I/O, you will have to change formats like
(20A4) to (A80) - or more desirably simply (A).
4b.) Don't waste time worrying about Hollerith strings in output formats.
Though considered obsolete, they are generally harmless. Their main
disadvantage over quoted strings is that one has to count the characters.
5.) Dummy args in subroutine/function containing CHARACTER strings.
One mistake a lot of folks make is to set the string length 'as large
as I'll ever expect it to be', when they SHOULD use an asterisk:
subroutine charsub (a, b)
implicit none
character(80) :: a ! Usually BAD
character(*) :: b ! Usually CORRECT
Mess the above up and you will get strange memory bashing problems.
Hopefully the above gives you enough to get started. I am sure others
can add to the list.
Walt
-...-
Walt Spector
(w6ws att earthlinkk dott nett)