Reading a line with unknown number of data

Discussion:

Reading a line with unknown number of data

(too old to reply)

Luka Djigas

2008-10-13 18:32:12 UTC

Hello everyone,

please, I need your help with something:

1 2 3 4 5 6
----------------
1 | 1 2 3 4 5 6
2 | 7 8 9 10
3 | 11 12 13 14 15 16
4 | 17 18
5 | 19 20 21 22 23 24

I have a table like the one above (only bigger and uglier :-) in which
I don't know the number of elements in each row.

I'm reading it into an array of (nrows,ncolumns; in this case (5,6)),
but since I don't know the number of elements, I was wondering is
there a way to read all the elements in one row, something like

dimension a(5,6)

open(unit=1, ...)

do 10 i=1,nrows
read(1,*,eor=10)(a(i,j),j=1,999)

10 continue
and then determine the number of elements in each row of an array ?

Also, since I will need to interpolate in 2 directions between the
given values (for example for x=5.5 and y=4.5) is there a way in such
an array to determine the difference between the loaded value from a
file, and default value (zero) from just declaring an array ?
In other words, how to determine whether both values for interpolation
exists ? In the above example, if x are horizontal values, value of
x=5.5 and y=4.5 would not exist because row "17 18" doesn't have 6
elements?

I would appreciate all your suggestions and input on this topic.

Best regards
Luka Djigas

m***@skyway.usask.ca

2008-10-13 17:31:20 UTC

Post by Luka Djigas
Hello everyone,
1 2 3 4 5 6
----------------
1 | 1 2 3 4 5 6
2 | 7 8 9 10
3 | 11 12 13 14 15 16
4 | 17 18
5 | 19 20 21 22 23 24
I have a table like the one above (only bigger and uglier :-) in which
I don't know the number of elements in each row.
I'm reading it into an array of (nrows,ncolumns; in this case (5,6)),

<snip>
I would just read it as a string; get its length with lentrim
or equiv. function, then check through changing all non
digits to spaces (easy way is to equivalence the string to
an integer*1 array and test the integer*1: (decimal) 48-57 = digit)
(better leave in decimal points if you have floating ..!)
Then count the number of numbers and read them wirh
read( ,*)
If you have embedded non space/comma/characters
I don't think you can read as numeric directly
except if you know the format.

Chris

Luka Djigas

2008-10-14 12:47:09 UTC

Post by m***@skyway.usask.ca

Post by Luka Djigas
1 2 3 4 5 6
----------------
1 | 1 2 3 4 5 6
2 | 7 8 9 10
3 | 11 12 13 14 15 16
4 | 17 18
5 | 19 20 21 22 23 24

Hello Chris, thanks for answering.

I made a mistake in my previous post.
There are no | and --- in the original table; since I was just making
up numbers I just used those to separate them, and didn't think
somebody would naturally assume that they were in the table too
(sorry)
I just wanted to separate x values, y values and z values one from
each other.

Post by m***@skyway.usask.ca
I would just read it as a string; get its length with lentrim
or equiv. function, then check through changing all non
digits to spaces (easy way is to equivalence the string to
an integer*1 array and test the integer*1: (decimal) 48-57 = digit)
(better leave in decimal points if you have floating ..!)

Actually, all values are floating values :-(
(another mistake on my part - got to stop writing posts late in the
evening)

Anyway, I understand this part.

Post by m***@skyway.usask.ca
Then count the number of numbers and read them wirh
read( ,*)

But I'm not sure how to "count numbers". Internal read is familiar to
me, but I still don't know how to determine the number of numbers in
each row without counting them manually.

best regards
Luka Djigas

Post by m***@skyway.usask.ca
If you have embedded non space/comma/characters
I don't think you can read as numeric directly
except if you know the format.

e p chandler

2008-10-14 00:12:25 UTC

Post by Luka Djigas
Hello everyone,
1 2 3 4 5 6
----------------
1 | 1 2 3 4 5 6
2 | 7 8 9 10
3 | 11 12 13 14 15 16
4 | 17 18
5 | 19 20 21 22 23 24
I have a table like the one above (only bigger and uglier :-) in which
I don't know the number of elements in each row.
I'm reading it into an array of (nrows,ncolumns; in this case (5,6)),
but since I don't know the number of elements, I was wondering is
there a way to read all the elements in one row, something like
dimension a(5,6)
open(unit=1, ...)
do 10 i=1,nrows
read(1,*,eor=10)(a(i,j),j=1,999)
10 continue
and then determine the number of elements in each row of an array ?
Also, since I will need to interpolate in 2 directions between the
given values (for example for x=5.5 and y=4.5) is there a way in such
an array to determine the difference between the loaded value from a
file, and default value (zero) from just declaring an array ?
In other words, how to determine whether both values for interpolation
exists ? In the above example, if x are horizontal values, value of
x=5.5 and y=4.5 would not exist because row "17 18" doesn't have 6
elements?
I would appreciate all your suggestions and input on this topic.
Best regards
Luka Djigas

Not a Fortran solution, but:

Pre-process the input file padding the rows out to a uniform length
with un-used "sentinel" values.
It's easy in AWK.

Or pre-process the input file prefixing each row with a count of the
number of items in each. (NF in AWK).
Read into a character variable. Internal read for NF. Then Internal
read the same character variable for the data.

- e

glen herrmannsfeldt

2008-10-14 01:33:44 UTC

e p chandler wrote:
(snip)

Post by e p chandler
Or pre-process the input file prefixing each row with a count of the
number of items in each. (NF in AWK).
Read into a character variable. Internal read for NF. Then Internal
read the same character variable for the data.

If you have the count, you can read it directly with

READ(2,*) CNT,(A(I),I=1,CNT)

possibly inside a loop such that you copy CNT and the values
in A somewhere else. or maybe

READ(2,*) CNT,(A(I),I=1,MIN(CNT,UBOUND(A,1)))

to avoid problems with bad counts.

-- glen

e p chandler

2008-10-14 01:31:57 UTC

Post by glen herrmannsfeldt
(snip)

Post by e p chandler
Or pre-process the input file prefixing each row with a count of the
number of items in each. (NF in AWK).
Read into a character variable. Internal read for NF. Then Internal
read the same character variable for the data.

If you have the count, you can read it directly with
READ(2,*) CNT,(A(I),I=1,CNT)
possibly inside a loop such that you copy CNT and the values
in A somewhere else. or maybe
READ(2,*) CNT,(A(I),I=1,MIN(CNT,UBOUND(A,1)))
to avoid problems with bad counts.
-- glen

Punch cards ate my brain. :-(.

- e

Luka Djigas

2008-10-14 13:09:39 UTC

On Mon, 13 Oct 2008 17:33:44 -0800, glen herrmannsfeldt

Post by glen herrmannsfeldt
(snip)

Post by e p chandler
Or pre-process the input file prefixing each row with a count of the
number of items in each. (NF in AWK).
Read into a character variable. Internal read for NF. Then Internal
read the same character variable for the data.

If you have the count, you can read it directly with
READ(2,*) CNT,(A(I),I=1,CNT)
possibly inside a loop such that you copy CNT and the values
in A somewhere else. or maybe
READ(2,*) CNT,(A(I),I=1,MIN(CNT,UBOUND(A,1)))
to avoid problems with bad counts.
-- glen

Hi glen, thanks for answering.

Yes, I will probably use this way. Seems the most easy to do, and the
most easy to read later.

Hovewer, I'm still having trouble with the second part.

Does fortran have any way of differing from a loaded element in an
array and the default value of an array (that being zero) ?

best regards
Luka Djigas

Arjen Markus

2008-10-14 13:20:00 UTC

Post by Luka Djigas
On Mon, 13 Oct 2008 17:33:44 -0800, glen herrmannsfeldt

Post by glen herrmannsfeldt
(snip)

Post by e p chandler
Or pre-process the input file prefixing each row with a count of the
number of items in each. (NF in AWK).
Read into a character variable. Internal read for NF. Then Internal
read the same character variable for the data.

If you have the count, you can read it directly with
READ(2,*) CNT,(A(I),I=1,CNT)
possibly inside a loop such that you copy CNT and the values
in A somewhere else. or maybe
READ(2,*) CNT,(A(I),I=1,MIN(CNT,UBOUND(A,1)))
to avoid problems with bad counts.
-- glen

Hi glen, thanks for answering.
Yes, I will probably use this way. Seems the most easy to do, and the
most easy to read later.
Hovewer, I'm still having trouble with the second part.
Does fortran have any way of differing from a loaded element in an
array and the default value of an array (that being zero) ?
best regards
Luka Djigas- Tekst uit oorspronkelijk bericht niet weergeven -
- Tekst uit oorspronkelijk bericht weergeven -

Best to set the values in the array to a value that you
know will not be part of the actual data. A choice I often
make is -999.0 (because it is precisely representable and
it is a value not often encountered in (my) practice).

Then if you are careful only to fill the array elements
that actually have values in your input, this guard value
can be easily recognised.

Another choice might be: huge(array) or even NaN, though that
is a rather tricky "value" to get right.

But to answer your question directly: Fortran perse does not have
a reserved value to indicate there is no value.

Regards,

Arjen

Luka Djigas

2008-10-14 14:07:08 UTC

On Tue, 14 Oct 2008 06:20:00 -0700 (PDT), Arjen Markus

Post by Arjen Markus

Post by Luka Djigas
- Tekst uit oorspronkelijk bericht weergeven -

Best to set the values in the array to a value that you
know will not be part of the actual data. A choice I often
make is -999.0 (because it is precisely representable and
it is a value not often encountered in (my) practice).
Then if you are careful only to fill the array elements
that actually have values in your input, this guard value
can be easily recognised.
Another choice might be: huge(array) or even NaN, though that
is a rather tricky "value" to get right.
But to answer your question directly: Fortran perse does not have
a reserved value to indicate there is no value.

This is the answer I was looking for, and hoping I would not get :-)
Brute-force solution it is, then.
Sometimes I wonder why I keep on searching for "a more elegant
solutions", when usually these work just fine ?

Best regards
Luka Djigas

Post by Arjen Markus
Regards,
Arjen

Arjen Markus

2008-10-14 14:35:11 UTC

Post by Luka Djigas
This is the answer I was looking for, and hoping I would not get :-)
Brute-force solution it is, then.
Sometimes I wonder why I keep on searching for "a more elegant
solutions", when usually these work just fine ?

Most of the time there are no easy answers. Years ago I read
about the NULL value used in many database management systems.
While seemingly a simple matter, a missing value (as indicated
by this NULL) can have any of at least five different meanings.
Things like:
- no value yet
- value not above the detection limit (in case of measurements)
- measurement made but unreliable
- quantity not applicable
- value out of range

How do you cope with such different meanings? No single "value"
can do that. In the case you described, it may be simpler, but
still, a missing value can have more than one meaning even then:
suppose the data represent the water depth in a lake, as measured
on a grid. The gaps might be indicate islands or places where
you simply could not measure (or record the measurement).

Regards,

Arjen

Richard Maine

2008-10-14 16:34:25 UTC

Post by Arjen Markus
Best to set the values in the array to a value that you
know will not be part of the actual data. A choice I often
make is -999.0 (because it is precisely representable and
it is a value not often encountered in (my) practice).

Jesus, no! Shades of f66. I have horrible memories of codes that did
things like that, mostly because there wasn't a portable way to check
for end of file. I have also seen plenty of bugs related to the
practice, including some from much more recent times. I'm tempted to
wander off into a "war story" of one from the Shuttle, which had some
other "interesting" facets as well, but that would be a bit of
diversion. Perhaps in a separate post. A few points.

1. Note that the code shown does assume that you already know the number
of valid elements and have a variable (the CNT) that indicates how many
valid elements there are. If you have such a variable, then for heaven's
sake, use it instead of testing for some flag value.

2. If you don't have such a variable (whether by reading it as in the
code shown, or by some other means), then be aware that the "obvious"
trick with flag values is *NOT* standard conforming or portable. I've
been bit by it before. Namely you canot do something like
array = flag_value
read(...) array
and then look for the flag value to see how many valid elements there
were. There are multiple reasons why this doesn't work... at all in some
cases, or portably in others. For list-directed reads, you have the
problem of automatically going to the next record. In other cases, you
have the problem of aray becoming undefined.

So far, nobody has mentioned the way I deal with things like this.

Start by reading the line into a character variable.

Then parse out the non-blank fields and read them one at a time with an
internal read, counting as you go. An alternative is to just count the
number of fields and then go back and do a single read of that many, but
that's a minor detail.

The main point is to parse out the fields yourself. There isn't a magic
I/O trick to do that. Just do it with normal character operations. Some
Fortran users seem to get "scared off" of doing character things. But it
isn't as though this one is above the level of trivial. We are talking a
handful of lines of code. Literally. One might even manage a finger or
so left over. Certainly any useful Fortran program does things orders of
magnitude more complicated.

I tend to have a utility routine to parse out the next field in a
string. Mine is far more complicated than you'd need because I make it a
bit more general to deal with an arbitrary list of delimiters. If you
have nothing but blanks between fields, as in the example, things are
trivial. See the scan, index, and verify intrinsics. All you are doing
is looking for the first non-blank character and then the next blank
character after that. The field is the substring between them (inclusive
of the non-blank starter character). If you didn't find a non-blank
character, you are done.

Sure, you could use some separate scripting language to preprocess the
file. But why add that complication to save 5 (or so) lines of
standard-conforming code. I think it is mostly a mater of
character-manipulation phobia.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain

glen herrmannsfeldt

2008-10-14 18:06:47 UTC

Richard Maine wrote:
(snip)

Post by Richard Maine
2. If you don't have such a variable (whether by reading it as in the
code shown, or by some other means), then be aware that the "obvious"
trick with flag values is *NOT* standard conforming or portable. I've
been bit by it before. Namely you canot do something like
array = flag_value
read(...) array
and then look for the flag value to see how many valid elements there
were. There are multiple reasons why this doesn't work... at all in some
cases, or portably in others. For list-directed reads, you have the
problem of automatically going to the next record. In other cases, you
have the problem of aray becoming undefined.

Note that it works in C. Well, C doesn't have a way to read
in an array other than an explicit loop, but reading with scanf
won't change list items if there is no valid input value.
That includes non-numeric data for a numeric format.
(I don't know about E for floating point formats, though.)

This allows for a default value to be changed with valid data,
or left alone if no valid data is available.

Post by Richard Maine
Start by reading the line into a character variable.

This is often needed in C, too. scanf will continue across
multiple lines, so reading a line to a char array and then
using sscanf is needed to respect line boundaries.

The C library does have a simple string parse routine, though
some people don't like using it. strtok() modifies the string
by adding null characters at the end of a field, and then returning
a pointer to the beginning. You can change the delimiters allowed
on each call. Some people don't like that it modifies the string,
but it does work, and is fairly efficient for many problems.

(There was a post not so long ago about the features of the
C++ library, and similar features in the Fortran language or library.)

-- glen

Arjen Markus

2008-10-14 20:34:23 UTC

Post by Richard Maine

Post by Arjen Markus
Best to set the values in the array to a value that you
know will not be part of the actual data. A choice I often
make is -999.0 (because it is precisely representable and
it is a value not often encountered in (my) practice).

Jesus, no! Shades of f66. I have horrible memories of codes that did
things like that, mostly because there wasn't a portable way to check
for end of file. I have also seen plenty of bugs related to the
practice, including some from much more recent times. I'm tempted to
wander off into a "war story" of one from the Shuttle, which had some
other "interesting" facets as well, but that would be a bit of
diversion. Perhaps in a separate post. A few points.

That would be an interesting story, I think.

But it is not quite what I meant to say:

1. There is no way to indicate that a particular array element (or
variable)
has not received a value, other than by reserving a specific flag
value
or by keeping the information in a mask array of sorts.

2. I am aware that in this particular case the row or column is
filled
regularly, so only at the end there may be array elements that do
not
have a value. I suppose one could use that information to optimise
the interpolation algorithm, but in general there might be holes.

3. I was referring only to the second part of the question ;)

As for the solution you describe: yes, I have found out that the
most reliable way to get a variable amount of data in is via character
strings.
(This technique is what I use in my XML parsers - http://xml-fortran.sf.net
to be
precise).

A sketch of the code:

do i = 1,some-obvious-maximum
read( string, *, iostat = error ) (data(k), k=1,i)
if ( error /= 0 ) then
number = i - 1
exit
endif
enddo

I usually prefer to use * in these cases, even though it has its
pitfalls.

Regards,

Arjen

Richard Maine

2008-10-14 22:28:35 UTC

Post by Arjen Markus

Post by Richard Maine
I'm tempted to
wander off into a "war story" of one from the Shuttle,

That would be an interesting story, I think.

Since you ask... :-) And because it makes a nice break from struggling
with the California educational bureaucracy (submitting an application
for our math tutoring center to be approved as a supplemental provider
for next year)... :-(

When we were analyzing postflight data from the first Shuttle entry, we
got a tape (a physical tape as I recall) with trajectory and air data
from one of the contractors (TRW). About the first thing we did was make
a rough plot of the most relevant signals for our work. Somewhere in the
middle of the Pacific, the data, which had looked more or less
reasonable before then, suddenly jumped by several orders of magnitude.
I don't even recall exactly how many, but it was quite a few. Then about
as the vehicle crossed the coast, the data jumped back down to plausible
magnitudes.

We were a bit upset by being given this obviously junk data (and quite
late delivery as well). We got even more upset whan the reply to our
complaint was basically that the contractor didn't feel it was their job
to evaluate whether the data they gave us was any good. (Need I say that
we no longer made any use of data from that particular group?)

Our investigation into what happened finally found the following.

The computations relied on vertical atmospheric data (from weather
baloons and other sources) at a collection of points more or less along
the flight path. To get the data for any particular shuttle position,
one needed to interpolate between the points where the data was
available. The "Interpolation" has a slight issue of the shuttle not
being exactly between two points, but it is close enough for a simple
approximation to work.

The mistake was in determining what 2 points to interpolate between. The
algorithm used was to interpolate between the two points that were
closest to the shuttle's position. That might be plausible if the points
were equally spaced, but they weren't - not even close. There was one
sample point at Hawaii, one on the coast at Vandenberg, and another... I
forget exactly where, perhaps Bakersfield, but anyway relatively close
to Vandenberg, at least compared to the distance to Hawaii. A hair over
halfway between Hawaii and Vandenberg, suddenly the 2 closest points
were Vandenberg and Bakersfield, so the algorithm tried to "interpolate"
between them.

That was bad enough, but another poor choice (in my view) made it worse.
Recall that I said this had something to do with flag values. The data
tables for Bakersfield didn't have values for the kinds of altitudes the
vehicle was at over the middle of the Pacific. It was not envisioned
that the Bakersfield data would be used for the middle of the Pacific.
So those spots in the data tables were filled with, I think it was
999.0. Not that the subsequent processing actually checked for the flag
values; that just wasn't supposed to happen. The standard atmosphere
value for, say, air density is, if I recall, 0.002378 slugs per cubic
foot at sea level, and quite a bit less at altitude. I might have lost a
zero in there; its been a while. But anyway, 999. is, um... not a very
good value, by quite a lot.

Combine that with an attitude that checking your work is someone else's
job instead of your own, and you get the junk data that we saw.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain

glen herrmannsfeldt

2008-10-15 00:07:36 UTC

Richard Maine wrote:
(snip)

Post by Richard Maine
When we were analyzing postflight data from the first Shuttle entry, we
got a tape (a physical tape as I recall) with trajectory and air data
from one of the contractors (TRW).

By the way, NOVA tonight is about the space shuttle.

-- glen

John Harper

2008-10-15 01:11:22 UTC

Post by Richard Maine
Combine that with an attitude that checking your work is someone else's
job instead of your own, and you get the junk data that we saw.

That reminds me of being asked several years ago "Why do you keep
finding bugs in compilers?". I replied "Does nobody else in this
university ever check their work?"

-- John Harper, School of Mathematics, Statistics and Computer Science,
Victoria University, PO Box 600, Wellington 6140, New Zealand
e-mail ***@vuw.ac.nz phone (+64)(4)463 6780 fax (+64)(4)463 5045

Richard Nixon

2008-10-15 01:34:32 UTC

Post by John Harper

Post by Richard Maine
Combine that with an attitude that checking your work is someone else's
job instead of your own, and you get the junk data that we saw.

That reminds me of being asked several years ago "Why do you keep
finding bugs in compilers?". I replied "Does nobody else in this
university ever check their work?"

Can't let it live, John.

That has to be a double entendre. If a party, in particular one with
academic prerogatives, checks *his* work, it is fundamentally different
than when such a party checks the work of others--their work. That sounds
like peer review by Karl Rove.

I think the entire liturgical world is looking at a bunch of parables about
crappy workers now. A union guy like me only needs to have a card to check
my own work and that of my cohort simultaneously.

--
Richard Milhous Nixon

All the modern inconveniences...
~~ Mark Twain

John Harper

2008-10-15 21:12:12 UTC

Post by Richard Nixon

Post by John Harper

Post by Richard Maine
Combine that with an attitude that checking your work is someone else's
job instead of your own, and you get the junk data that we saw.

That reminds me of being asked several years ago "Why do you keep
finding bugs in compilers?". I replied "Does nobody else in this
university ever check their work?"

That has to be a double entendre. If a party, in particular one with
academic prerogatives, checks *his* work, it is fundamentally different
than when such a party checks the work of others--their work.

As I have never written a compiler, putting my own program through
2 or more compilers was indeed checking the work of others whenever
it wasn't helping me find a bug that I had perpetrated myself.

I'm not sure about the fundamentality of the difference between checking
one's own work and other people's: when checking mine I have sometimes
found errors in textbooks, reference books or research papers that I
had been using. A few of those research papers were my own earlier
publications:-(

-- John Harper, School of Mathematics, Statistics and Computer Science,
Victoria University, PO Box 600, Wellington 6140, New Zealand
e-mail ***@vuw.ac.nz phone (+64)(4)463 6780 fax (+64)(4)463 5045

Arjen Markus

2008-10-15 06:48:45 UTC

Post by Richard Maine
That was bad enough, but another poor choice (in my view) made it worse.
Recall that I said this had something to do with flag values. The data
tables for Bakersfield didn't have values for the kinds of altitudes the
vehicle was at over the middle of the Pacific. It was not envisioned
that the Bakersfield data would be used for the middle of the Pacific.
So those spots in the data tables were filled with, I think it was
999.0. Not that the subsequent processing actually checked for the flag
values; that just wasn't supposed to happen.

Many are the things that should not happen in our programs :).

One of my war stories is this:
Quite a few years ago, I had to install some of our software in
Hong Kong. I spent a week or so trying to make sure I knew all the
details of the installation process and after struggling with
bizarre issues like an aliased "cd" command that messed up the
trial installation on a workstation in our offices, I went
to our client, fully confident I could do the job.

Arriving at the workstation where it all was going to happen,
I opened one of the CD boxes and found out that that CD was still
in Holland...

Regards,

Arjen

Dave Allured

2008-10-15 06:38:43 UTC

Arjen Markus wrote:

<snip>

Post by Arjen Markus
3. I was referring only to the second part of the question ;)
As for the solution you describe: yes, I have found out that the
most reliable way to get a variable amount of data in is via character
strings.
(This technique is what I use in my XML parsers - http://xml-fortran.sf.net
to be
precise).
do i = 1,some-obvious-maximum
read( string, *, iostat = error ) (data(k), k=1,i)
if ( error /= 0 ) then
number = i - 1
exit
endif
enddo
I usually prefer to use * in these cases, even though it has its
pitfalls.

Very clever, Mr. Markus. This algorithm has two problems: it fails to
report actual errors in the data, then skipping everything else to the
right on the same line; and it becomes very inefficient for large
amounts of data. (Square law for width, I believe.) If you can be
absolutely sure that the input will always be small and error-free, then
go ahead and do this.

Otherwise I echo the remarks of others. Read each line as a character
array, and parse out the numbers yourself. It is not that complicated,
and you can rest assured that some inconceivable glitch in the data will
always be caught rather than ignored.

A more obscure benefit is that if you wish, you can have your own
delimiter rules and check number syntax *better* than some Fortrans.

--Dave

Arjen Markus

2008-10-15 06:43:41 UTC

Post by m***@skyway.usask.ca
<snip>

Post by Arjen Markus
3. I was referring only to the second part of the question ;)
As for the solution you describe: yes, I have found out that the
most reliable way to get a variable amount of data in is via character
strings.
(This technique is what I use in my XML parsers -http://xml-fortran.sf.net
to be
precise).
do i = 1,some-obvious-maximum
read( string, *, iostat = error ) (data(k), k=1,i)
if ( error /= 0 ) then
number = i - 1
exit
endif
enddo
I usually prefer to use * in these cases, even though it has its
pitfalls.

Very clever, Mr. Markus. This algorithm has two problems: it fails to
report actual errors in the data, then skipping everything else to the
right on the same line; and it becomes very inefficient for large
amounts of data. (Square law for width, I believe.) If you can be
absolutely sure that the input will always be small and error-free, then
go ahead and do this.
Otherwise I echo the remarks of others. Read each line as a character
array, and parse out the numbers yourself. It is not that complicated,
and you can rest assured that some inconceivable glitch in the data will
always be caught rather than ignored.
A more obscure benefit is that if you wish, you can have your own
delimiter rules and check number syntax *better* than some Fortrans.
--Dave- Tekst uit oorspronkelijk bericht niet weergeven -
- Tekst uit oorspronkelijk bericht weergeven -

I am well aware of the limitations of this approach :).
I may revisit that code and implement a simple tokenising
algorithm along the lines described here, when I have time
to do so. My main excuse is laziness.

Regards,

Arjen

Luka Djigas

2008-10-16 16:58:42 UTC

On Tue, 14 Oct 2008 13:34:23 -0700 (PDT), Arjen Markus

Post by Arjen Markus
1. There is no way to indicate that a particular array element (or
variable)
has not received a value, other than by reserving a specific flag
value
or by keeping the information in a mask array of sorts.
2. I am aware that in this particular case the row or column is
filled
regularly, so only at the end there may be array elements that do
not
have a value. I suppose one could use that information to optimise
the interpolation algorithm, but in general there might be holes.
3. I was referring only to the second part of the question ;)

Hello Arjen, thanks for answering.

Btw, let me just thank everyone else as well, for putting in their bit
of experience, and helping with the first part.
In the end, I used the part with preprocessing the file and adding a
number which tells the number of elements in a row. Although the
'reading the line into a character variable ...' approach definitely
has its advantages.

But I'm still having trouble with the second part. I have:
y(1) y(2) y(3) ... y(j)
x(1) z(1,1) z(1,2) z(1,3) ... z(1,j)
x(2) z(2,1) z(2,2)
...
x(i) z(i,1) z(i,2) ... z(i,j)

The problem is some elements are missing (quite a few actually). They
are not missing 'randomly', but sistematically ... uhmm, how can I put
this more clearly ?
This is not the case - z(i,j):
z11 z12 z13 missing z14 missing z15
missing z22 z23 missing
z31 missing z33 z34 missing
("missing" - the missing element)

but rather this is:
z11 z12 z13 z14
z21 z22 z23 z24 z25 z26
z31 z32 z33 z34 z35 z36
z41 z42 z43 z44
z51 z52 z53 z54

I use some old 2d interpolation routine:
call inter(x, z(which has to be transferred to 1d array),
num_of_points, x input, z output)

I can of course use the brute force approach (the "-999" one :-), so
if the interpolation results in something like (-500) then it's wrong.
Start again with new values.

But, you mentioned "using that information to optimise the
interpolation algorithm". Could you please elaborate that part a
little, or suggest some text for further reading ?

Best regards
Luka Djigas

Arjen Markus

2008-10-17 06:46:55 UTC

Post by Luka Djigas
I can of course use the brute force approach (the "-999" one :-), so
if the interpolation results in something like (-500) then it's wrong.
Start again with new values.

Don't do that! That is exactly what Richard Maine warned about.
No, the idea of such a special value is that you can then determine
if there is a missing value, so that you know you need a different
approach. For instance:
First scan the array for missing values and use a simple technique
to fill in these blanks, like an average of the nearest neighbours.
Only then the interpolation can proceed.

Post by Luka Djigas
But, you mentioned "using that information to optimise the
interpolation algorithm". Could you please elaborate that part a
little, or suggest some text for further reading ?

An alternative to the above, because your data stop somewhere along
the row is to detect whether you are still in the part with proper
data or not. If not, you will need to extrapolate - keep the value
constant or do a linear extrapolation.

For instance (one-dimensional):

coord: 0 1 2 3 4 5 6
value 1 -2 -3 -4 -2 1

If you need to know the value at 6.5, then you need to extrapolate
somehow, but you know from the data you have that you need to
extrapolate from 5 onwards. So, possible choices are:
- Keep the value equal to the last known value: coord>5: value <- 1
- Draw a line through the last two points: coord>5: value = -2 +
3*(coord-4)

In two dimensions this becomes trickier, but still possible.

(Sorry, I know of no texts that I can refer you to. Such missing value
problems seem to escape the textbooks. Or I have missed them)

Regards,

Arjen

e p chandler

2008-10-15 04:30:42 UTC

On Oct 14, 12:34 pm, ***@see.signature (Richard Maine) wrote:

Re: finding the number of "fields" surrounded by "white-space" in a
character string
so that you can handle records with a variable number of data
items in each one

Post by Richard Maine
So far, nobody has mentioned the way I deal with things like this.
Start by reading the line into a character variable.
Then parse out the non-blank fields and read them one at a time with an
internal read, counting as you go. An alternative is to just count the
number of fields and then go back and do a single read of that many, but
that's a minor detail.
The main point is to parse out the fields yourself. There isn't a magic
I/O trick to do that. Just do it with normal character operations. Some
Fortran users seem to get "scared off" of doing character things. But it
isn't as though this one is above the level of trivial. We are talking a
handful of lines of code. Literally. One might even manage a finger or
so left over. Certainly any useful Fortran program does things orders of
magnitude more complicated.
I tend to have a utility routine to parse out the next field in a
string. Mine is far more complicated than you'd need because I make it a
bit more general to deal with an arbitrary list of delimiters. If you
have nothing but blanks between fields, as in the example, things are
trivial. See the scan, index, and verify intrinsics. All you are doing
is looking for the first non-blank character and then the next blank
character after that. The field is the substring between them (inclusive
of the non-blank starter character). If you didn't find a non-blank
character, you are done.
Sure, you could use some separate scripting language to preprocess the
file. But why add that complication to save 5 (or so) lines of
standard-conforming code. I think it is mostly a mater of
character-manipulation phobia.

The last time I wrote one of these it was in Pascal - two index vars,
and a flag. A nice little state machine!

Here's my attempt in Fortan:

implicit none
character*80 s
character d
integer n, p, q, r

d = ' '

print *, 'string?'
read '(a)', s

n = 0
p = 1

do
q = verify( s(p:), d) ! non space?
if (q == 0) exit ! no
r = scan( s(p+q:), d) ! len field

n = n + 1
print *, '|' // &
s(p + (q-1) : p + (q-1) + (r-1) ) &
// '|'

p = p + (q-1) + (r-1) + 1 ! next field?
end do

end

The tricky part for me is realizing that q and r are relative to the
*remaining* part of s. So (q-1) is the offset from p where the field
starts or the number of leading spaces. The search for the next
delimiter starts one character over. The next search starts at the
current position plus the number of spaces to skip plus the number of
characters in the field just found.

Note: I have not tested this routine extensively but it seems to work
well for me. Some of the code was written as it was for rhetorical
reasons.

This critter took about 3.5 hours to write, test and document.

- e

e p chandler

2008-10-15 15:56:28 UTC

Post by e p chandler
Re: finding the number of "fields" surrounded by "white-space" in a
character string
so that you can handle records with a variable number of data
items in each one

Post by Richard Maine
So far, nobody has mentioned the way I deal with things like this.
Start by reading the line into a character variable.
Then parse out the non-blank fields and read them one at a time with an
internal read, counting as you go. An alternative is to just count the
number of fields and then go back and do a single read of that many, but
that's a minor detail.

As I expected this code has a bug. It fails if the last character of s
is not a space.

Post by e p chandler
implicit none
character*80 s
character d
integer n, p, q, r
d = ' '
print *, 'string?'
read '(a)', s
n = 0
p = 1
do
q = verify( s(p:), d) ! non space?
if (q == 0) exit ! no
r = scan( s(p+q:), d) ! len field
n = n + 1
print *, '|' // &
s(p + (q-1) : p + (q-1) + (r-1) ) &
// '|'
p = p + (q-1) + (r-1) + 1 ! next field?
end do
end

One way to prevent the bug is to make sure the length of s is always
greater than any possible input record so that on reading it is space
filled to the end. Another way is to make the length of s at least 1
longer than you need, and directly set its last character to space
after reading input. The user might find it annoying to have his last
character "disappear". The last is to check if R is equal to 0. In
that case, there are no trailing spaces. Set R to the remainder of s,
after skipping leading spaces in the last fragment.
To do this insert the following

if (r == 0) r = len(s) - ((p-1) + (q-1)) ! no traling

after the line starting with r = .....

I hope _this_ code is bug free.

- e

feenberg

2008-10-18 14:10:47 UTC

Post by Luka Djigas
Hello everyone,
1 2 3 4 5 6
----------------
1 | 1 2 3 4 5 6
2 | 7 8 9 10
3 | 11 12 13 14 15 16
4 | 17 18
5 | 19 20 21 22 23 24
I have a table like the one above (only bigger and uglier :-) in which
I don't know the number of elements in each row.
I'm reading it into an array of (nrows,ncolumns; in this case (5,6)),
but since I don't know the number of elements, I was wondering is
there a way to read all the elements in one row, something like
dimension a(5,6)
open(unit=1, ...)
do 10 i=1,nrows
read(1,*,eor=10)(a(i,j),j=1,999)
10 continue
and then determine the number of elements in each row of an array ?
Also, since I will need to interpolate in 2 directions between the
given values (for example for x=5.5 and y=4.5) is there a way in such
an array to determine the difference between the loaded value from a
file, and default value (zero) from just declaring an array ?
In other words, how to determine whether both values for interpolation
exists ? In the above example, if x are horizontal values, value of
x=5.5 and y=4.5 would not exist because row "17 18" doesn't have 6
elements?
I would appreciate all your suggestions and input on this topic.
Best regards
Luka Djigas

If you can modify the input data a very simple solution is av
available. Just add a "/" (slash) at the end of each line. A free-
format read will always terminate at the slash mark, leaving the
remaining elements of the read list unchanged. If you initialize those
elements to a flag value, then you can determine which were read. So,
for example if the input line is:

1 2 3 /

and the fortran is:

d=-1
read(*,*) a b c d

then d will remain -1, but a b and c will be set to 1, 2, and 3. I use
this all the time and have always wondered why it was nearly unknown.

Daniel Feenberg

25 Replies
334 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Luka Djigas 2008-10-13 18:32:12 UTC

m***@skyway.usask.ca 2008-10-13 17:31:20 UTC

Luka Djigas 2008-10-14 12:47:09 UTC

e p chandler 2008-10-14 00:12:25 UTC

glen herrmannsfeldt 2008-10-14 01:33:44 UTC

e p chandler 2008-10-14 01:31:57 UTC

Luka Djigas 2008-10-14 13:09:39 UTC

Arjen Markus 2008-10-14 13:20:00 UTC

Luka Djigas 2008-10-14 14:07:08 UTC

Arjen Markus 2008-10-14 14:35:11 UTC

Richard Maine 2008-10-14 16:34:25 UTC

glen herrmannsfeldt 2008-10-14 18:06:47 UTC

Arjen Markus 2008-10-14 20:34:23 UTC

Richard Maine 2008-10-14 22:28:35 UTC

glen herrmannsfeldt 2008-10-15 00:07:36 UTC

John Harper 2008-10-15 01:11:22 UTC

Richard Nixon 2008-10-15 01:34:32 UTC

John Harper 2008-10-15 21:12:12 UTC

Arjen Markus 2008-10-15 06:48:45 UTC

Dave Allured 2008-10-15 06:38:43 UTC

Arjen Markus 2008-10-15 06:43:41 UTC

Luka Djigas 2008-10-16 16:58:42 UTC

Arjen Markus 2008-10-17 06:46:55 UTC

e p chandler 2008-10-15 04:30:42 UTC

e p chandler 2008-10-15 15:56:28 UTC

feenberg 2008-10-18 14:10:47 UTC

about - legalese

Loading...