Discussion:
automatic definition of character variable: maximum length and initialization problems
(too old to reply)
Stefano Zaghi
2013-08-20 15:24:50 UTC
Permalink
Hi all,

I have some problems with the definition of a character variable inside a procedure (e.g. a function). The problem seems very trivial, but...

I have a huge array (rank 1) in input and I would like to define a (local to the function) character variable with the length equals to the array size (automatic length size).

The pseudo code of the function could be:

function foo(array)
implicit none
integer, intent(IN):: array(1:)
logical:: foo
character(len=size(array,dim=1,kind=8)):: string

"do something with string"
return
endfunction foo

This approach fails if the array is very huge (in particular if the number of its elements is greater than 2^31-1) producing a zero length string. I do not know if the standard (std2003) fix a limit on the character variables length, I only know that the Intel Compiler has the following behavior:

The largest valid value for len ... is 2**31-1 on IA-32 architecture; 2**63-1 on IntelĀ® 64 architecture. Negative values are treated as zero.

This let me suppose that the maximum character length is compiler-dependent.

However a workaround seems to exist, in particular avoiding the use of "size" function into the string definition:

function foo2(n,array)
implicit none
integer(8), intent(IN):: n
integer, intent(IN):: array(1:n)
logical:: foo2
character(len=n):: string

"do something with string"
return
endfunction foo2

This approach works producing a correct length string even if n>2^31-1 (on 64bit architecture of course). This is true only for Intel Compiler, whereas gfortran (4.7) does not work in both cases.

In order to let you to reproduce my problem I have prepared a minimal program:

program test_length
implicit none
integer, parameter:: I8P = selected_int_kind(18)
integer, allocatable:: array(:)
integer(I8P), parameter:: n32=2_I8P**31-1_I8P, n64=2_I8P**31
integer(I8P):: n
character(len=n32):: string32
character(len=n64):: string64
integer:: ierr

print *,'32bit test'
n = n32
allocate(array(1:n),stat=ierr)
print *,'allocate stat: ',ierr
print *,'size : ',size(array,dim=1,kind=I8P)
print *,'len_ok : ',len_ok(n=n)
print *,'len_ko : ',len_ko(array=array)
deallocate(array)
print *
print *,'64bit test'
n = n64
allocate(array(1:n),stat=ierr)
print *,'allocate stat: ',ierr
print *,'size : ',size(array,dim=1,kind=I8P)
print *,'len_ok : ',len_ok(n=n)
print *,'len_ko : ',len_ko(array=array)
deallocate(array)
print *
print *,'Direct len function testing'
print *,'len(string32) : ',len(string32,kind=I8P)
print *,'len(string64) : ',len(string64,kind=I8P)
stop
contains
function len_ko(array)
implicit none
integer, intent(IN):: array(1:)
integer(kind=I8P):: len_ko
character(len=size(array,dim=1,kind=I8P)):: string

len_ko=len(string,kind=I8P)
return
endfunction len_ko

function len_ok(n)
implicit none
integer(kind=I8P), intent(IN):: n
integer(kind=I8P):: len_ok
character(len=n):: string

len_ok=len(string,kind=I8P)
return
endfunction len_ok
endprogram test_length

Running on my workstation (GNU/Linux 3.5.0-18 x86_64 with 24 GB of RAM) I have obtained:

32bit test
allocate stat: 0
size : 2147483647
len_ok : 2147483647
len_ko : 2147483647

64bit test
allocate stat: 0
size : 2147483648
len_ok : 2147483648
len_ko : 0

Direct len function testing
len(string32) : 2147483647
len(string64) : 2147483648

This is true for the Intel Compiler (13.x), whereas, on the same workstation, gfortran (4.7) always fails if n>2^31-1.

Note that the "size" function works properly outside the character definition.

Note also that substituting "size" with "ubound" (for example) into the len_ko string definition does not solve the problem.

Can someone explains the behavior of the above listed code?
Thank you very much for any suggestions,
sincerely
Stefano
glen herrmannsfeldt
2013-08-20 17:08:03 UTC
Permalink
Post by Stefano Zaghi
I have some problems with the definition of a character
variable inside a procedure (e.g. a function). The problem
seems very trivial, but...
I have a huge array (rank 1) in input and I would like to define
a (local to the function) character variable with the length
equals to the array size (automatic length size).
(snip)
Post by Stefano Zaghi
This approach fails if the array is very huge (in particular
if the number of its elements is greater than 2^31-1) producing
a zero length string. I do not know if the standard (std2003)
fix a limit on the character variables length, I only know
Pretty much all compilers have some length restrictions, and
the standard allows for that. It might be that many 64 bit
compilers only allow 32 bits for character length.

Seems to me that one fix is to use an array of CHARACTER*1
instead of a string. Given that it is supposed to match
an array, that might even be a better way.

The limit for CHARACTER*1 arrays should usually be the same
as the limit for other arrays.

That does mean that you can't use some intrinsic functions
that work on CHARACTER strings, though.

-- glen
James Van Buskirk
2013-08-20 22:35:01 UTC
Permalink
Post by Stefano Zaghi
This let me suppose that the maximum character length is
compiler-dependent.
gfortran uses INTEGER(C_INT) rather than INTEGER(C_SIZE_T) for
string lengths internally. I can't recall whether there are any
plans to change this.
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
glen herrmannsfeldt
2013-08-21 00:07:22 UTC
Permalink
Post by James Van Buskirk
Post by Stefano Zaghi
This let me suppose that the maximum character length is
compiler-dependent.
gfortran uses INTEGER(C_INT) rather than INTEGER(C_SIZE_T) for
string lengths internally. I can't recall whether there are any
plans to change this.
Seems right to me. I presume you can still make arrays out of
those large strings.

Wouldn't surprise me if some compilers still used 16 bit
for the length, even on 32 bit systems.

The OP didn't really say what he wanted, but I suspect that
an array is a better choice.

-- glen
Stefano Zaghi
2013-08-21 06:47:08 UTC
Permalink
Hi,
thank you all.

I think all your comments are right, but...

I have already supposed that gfortran has different limit, but intel fortran support 64bit lenght, thus I am surprised that "len_ko" function does not work right while "len_ok" does.

As the 1-character array is concerned, I have already used an array of 1 character. Indeed, the problem has been pointed out jut when I have tried to convert the array of 1 character to a single string.

However, none of you have an answer to the questions:
1) why "len_ko" function fails while "len_ok" one not?
2) Do you have tried the test code posted?

Thank you all again.
glen herrmannsfeldt
2013-08-21 07:10:08 UTC
Permalink
Stefano Zaghi <***@gmail.com> wrote:

(snip)
Post by Stefano Zaghi
I have already supposed that gfortran has different limit,
but intel fortran support 64bit lenght, thus I am surprised
that "len_ko" function does not work right while "len_ok" does.
As the 1-character array is concerned, I have already used an
array of 1 character. Indeed, the problem has been pointed out
jut when I have tried to convert the array of 1 character to
a single string.
1) why "len_ko" function fails while "len_ok" one not?
2) Do you have tried the test code posted?
Well, "just because" isn't so far off, but I would test with
a length at least 2**32 to be sure. It isn't so obvious what
might happen with 2**31.

Large arrays are reasonably well supported, though still aren't
used all that often. Character strings that long are pretty rare.

-- glen
Richard Maine
2013-08-21 07:47:23 UTC
Permalink
Post by glen herrmannsfeldt
Post by Stefano Zaghi
1) why "len_ko" function fails while "len_ok" one not?
2) Do you have tried the test code posted?
Well, "just because" isn't so far off,
I'd say that was right on. Such limits are compiler dependent. There is
no particular requirement that a compiler's limit on character length be
the same as its limit on arrays. If you *REALLY* want to know why that
compiler has those particular limits, then the only way you are going to
find out is to ask the vendor. The rest of us can do no better than
speculate. The vendor might or might not tell you; that's not exactly
normal customer support information. And much of the reason that's not
normal customer support information is that it does't make any concrete
difference to the customer.

What makes a difference is what the limits are - not why that's what
they are. I could probably speculate on the reasons, but it would just
be speculation and, as noted above, it doesn't matter anyway. If you
complain to the vendor that you'd like higher limits, that's at least a
valid request. I'd say that the chances of your request making much
difference are negligable; that's probably a big enough deal that it
would take a lot of requests or maybe a single request by a really
important major customer who was willing to pay (a lot) for it. But
small as the odds are of such a request making much difference, those
odds are a *LOT* better than a complaint based on your possibly not
liking their reasons; that just has pretty much zero odds.
--
Richard Maine
email: last name at domain . net
domain: summer-triangle
Stefano Zaghi
2013-08-21 08:16:02 UTC
Permalink
Dear Richard,

I think I was not enough clear.

I know that the length limit is compiler-dependent. I am not surprised that gfortran and intel fortran behave differently. I am not asking for supporting huge string by any vendor, I am searching for an explanation of the obscure (for me) behaviors of the character strings initialization.

My question is related to the different behavior of "len_ok" and "len_ko" functions in the case of 64bit length. In particular the problem is very simple:

why "character(n):: string" declaration works fine (with n>2^31-1, thus "n" is a I8P kind, that is the case of len_ok) and "character(size(array,kind=I8P)):: string" does not (with array larger than 2^31-1, that is the case of len_ko)?

The code posted was designed for evidencing the different behaviors between len_ok and len_ko in the case of n>2^31-1 (in the case the compiler used support huge strings, of course).

Thank you again.
glen herrmannsfeldt
2013-08-21 09:45:34 UTC
Permalink
Stefano Zaghi <***@gmail.com> wrote:

(snip)
Post by Stefano Zaghi
I know that the length limit is compiler-dependent. I am not
surprised that gfortran and intel fortran behave differently.
I am not asking for supporting huge string by any vendor,
I am searching for an explanation of the obscure (for me)
behaviors of the character strings initialization.
My question is related to the different behavior of "len_ok"
and "len_ko" functions in the case of 64bit length.
why "character(n):: string" declaration works fine
(with n>2^31-1, thus "n" is a I8P kind, that is the case
of len_ok) and "character(size(array,kind=I8P)):: string"
does not (with array larger than 2^31-1, that is the
case of len_ko)?
The first answer is that when you overflow, in general, or exceed
an implementation limit, in particular, surprising things might
happen.
Post by Stefano Zaghi
The code posted was designed for evidencing the different
behaviors between len_ok and len_ko in the case of n>2^31-1
(in the case the compiler used support huge strings, of course).
Not knowing the exact reason, I suspect it is related to the
use of 2**31. Note that 2**31 fits in unsigned 32 bits, but
not in signed 32 bits. Reasonably often that gives surprising
results. Among others, note that in twos complement 32 bit
that 2**31 overflows, in the usual implementation, to -(2**31).
Also, that -(-2**31)=-(2**31), again in the usual implemenation
of overflow. That is, abs(-2147483648) is negative.

Try it with 2**32 instead, and see what it does.

-- glen
Stefano Zaghi
2013-08-21 09:55:45 UTC
Permalink
I Glen,

thank you for your interest.
Post by glen herrmannsfeldt
The first answer is that when you overflow, in general, or exceed
an implementation limit, in particular, surprising things might
happen.
The case is not overflow: intel compiler support 64bit length, in fact "character(n):: string" works fine with n>2^31-1. On the contrary I do not understand why "character(size(array,kind=I8P)):: string" does not work.
Post by glen herrmannsfeldt
Not knowing the exact reason, I suspect it is related to the
use of 2**31. Note that 2**31 fits in unsigned 32 bits, but
not in signed 32 bits. Reasonably often that gives surprising
results. Among others, note that in twos complement 32 bit
that 2**31 overflows, in the usual implementation, to -(2**31).
Also, that -(-2**31)=-(2**31), again in the usual implemenation
of overflow. That is, abs(-2147483648) is negative.
Try it with 2**32 instead, and see what it does.
I am sorry Glen, but I forget to say that I have already done this test and works fine. For example with 2^32 the result of the above code is:

32bit test
allocate stat: 0
size : 2147483647
len_ok : 2147483647
len_ko : 2147483647

64bit test
allocate stat: 0
size : 4294967296
len_ok : 4294967296
len_ko : 0

Direct len function testing
len(string32) : 2147483647
len(string64) : 4294967296

As you can see "len_ok" function works fine while "len_ko" does not. This is my question: why it happen?

Thank you again.
James Van Buskirk
2013-08-21 18:33:12 UTC
Permalink
Post by Stefano Zaghi
The case is not overflow: intel compiler support 64bit length, in
fact "character(n):: string" works fine with n>2^31-1. On the
contrary I do not understand why
"character(size(array,kind=I8P)):: string" does not work.
I speculate that the reson is that the failing expression is being
used as a specification expression. Fortran admits of three kinds
of expressions: ordinary expressions, initialization expressions,
and specification expressions. Actually there are some more like
specification expressions in ELEMENTAL functions, but let's keep
things simple. The rules for what is allowed differs for each
kind of expression, and what the compiler does with them is
different, too. It's almost like you need 3 different compilers:
one for each kind of expression (well, maybe 2.5 because
specification expressions are more similar to ordinary expressions
than initialization expressions).

You will find that what works normally in ordinary expressions can
break in intialization expressions or specification expressions
simply because the effective compiler that handles the latter two
kinds of expressions isn't as well tested as the one that compiles
ordinary expressions. My suggestion is that you send a bug report
to Intel or post the bug on their web page.
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
Stefano Zaghi
2013-08-22 07:41:05 UTC
Permalink
Hi James,
thank you for your help.
Post by James Van Buskirk
I speculate that the reson is that the failing expression is being
used as a specification expression. Fortran admits of three kinds
of expressions: ordinary expressions, initialization expressions,
and specification expressions.
Ok, I am following you.
Post by James Van Buskirk
Actually there are some more like
specification expressions in ELEMENTAL functions, but let's keep
things simple.
I am getting lost...
Post by James Van Buskirk
The rules for what is allowed differs for each
kind of expression, and what the compiler does with them is
different, too.
Ok, so the standard does not specify what is allowed into the specification expression, is right?
Post by James Van Buskirk
one for each kind of expression (well, maybe 2.5 because
specification expressions are more similar to ordinary expressions
than initialization expressions).
Muble, muble... I am getting lost again...
Post by James Van Buskirk
You will find that what works normally in ordinary expressions can
break in intialization expressions or specification expressions
simply because the effective compiler that handles the latter two
kinds of expressions isn't as well tested as the one that compiles
ordinary expressions.
This sounds reasonable. However, it also makes me worried. Is the standard so "reticent" about the "specification expression" of character variables?
Post by James Van Buskirk
My suggestion is that you send a bug report
to Intel or post the bug on their web page.
This the last option, but I am still thinking to an error of mine.

Thank you again.
Richard Maine
2013-08-22 15:46:04 UTC
Permalink
Stefano Zaghi <***@gmail.com> wrote:
[James wrote:]
Post by Stefano Zaghi
Post by James Van Buskirk
The rules for what is allowed differs for each
kind of expression, and what the compiler does with them is
different, too.
Ok, so the standard does not specify what is allowed into the
specification expression, is right?
No, not at all. The standard lays it out in excruciating detail. The
rules James is talking about are the rules in the standandard. When he
says that the rules are different, he means different between the
different kinds of expression - not different between different
compilers.
Post by Stefano Zaghi
Post by James Van Buskirk
You will find that what works normally in ordinary expressions can
break in intialization expressions or specification expressions
simply because the effective compiler that handles the latter two
kinds of expressions isn't as well tested as the one that compiles
ordinary expressions.
This sounds reasonable. However, it also makes me worried. Is the standard
so "reticent" about the "specification expression" of character
variables?
No. Again, the standard specifies it in detail. When James talks about
what works in various kinds of expression, he is talking about compiler
bugs - not specifications of the standard. I might note that James has
is rather noted on this forum for being able to come up with complicated
and tricky examples of specification and initialization expressions that
expose compiler bugs.
--
Richard Maine
email: last name at domain . net
domain: summer-triangle
Stefano Zaghi
2013-08-23 07:39:05 UTC
Permalink
Thank you Richard,

do you think this a compiler bug? Do you think this is a case to submit to Intel developers?
Steve Lionel
2013-08-23 14:41:37 UTC
Permalink
Post by Stefano Zaghi
Thank you Richard,
do you think this a compiler bug? Do you think this is a case to submit to Intel developers?
I think it is a compiler bug and I will let the developers know. For
some reason the LEN reference in len_ko is not properly picking up the
64-bit length.
--
Steve Lionel
Developer Products Division
Intel Corporation
Merrimack, NH

For email address, replace "invalid" with "com"

User communities for Intel Software Development Products
http://software.intel.com/en-us/forums/
Intel Software Development Products Support
http://software.intel.com/sites/support/
My Fortran blog
http://www.intel.com/software/drfortran

Refer to http://software.intel.com/en-us/articles/optimization-notice
for more information regarding performance and optimization choices in
Intel software products.
Stefano Zaghi
2013-08-26 07:18:10 UTC
Permalink
Hi Steve,
thank you very much for your interest.

My best regards,
John Harper
2013-08-25 23:10:00 UTC
Permalink
Post by Richard Maine
Post by Stefano Zaghi
Ok, so the standard does not specify what is allowed into the
specification expression, is right?
No, not at all. The standard lays it out in excruciating detail. The
rules James is talking about are the rules in the standandard. When he
says that the rules are different, he means different between the
different kinds of expression - not different between different
compilers.
And different between different versions of the standard. For example you
may use the merge intrinsic in an f2003 specification expression but not an
f95 one. As Richard himself has often reminded us, few compilers implement
all of f2003, most implement all of f95 but only some of the f2003 features
that weren't in f95, and some also implement some f2008 features.
--
John Harper
Stefano Zaghi
2013-08-26 07:21:42 UTC
Permalink
Hi John,

you are right. Thank you.

Richard Maine
2013-08-21 16:41:05 UTC
Permalink
Post by Stefano Zaghi
I know that the length limit is compiler-dependent. I am not surprised
that gfortran and intel fortran behave differently. I am not asking for
supporting huge string by any vendor, I am searching for an explanation of
the obscure (for me) behaviors of the character strings initialization.
Note that there is no initialization in evidence. I think you probably
mean "declaration", or possibly you could be talking about specification
expressions.
Post by Stefano Zaghi
why "character(n):: string" declaration works fine (with n>2^31-1, thus
"n" is a I8P kind, that is the case of len_ok) and
"character(size(array,kind=I8P)):: string" does not (with array larger
than 2^31-1, that is the case of len_ko)?
The code posted was designed for evidencing the different behaviors
between len_ok and len_ko in the case of n>2^31-1 (in the case the
compiler used support huge strings, of course).
If it were me, first, I'd want to verify that the compiler does indeed
support strings of the length you are using. Note that one case
happening to appear to work does not constitute support. It is not at
all unusual for things to partially work before they are complete.
Compiler limts like that ought to be documented; I'd want to check that
documentation to see if the compiler formally claims to support it.

Assuming that the compiler does support it, you might just be looking at
a bug in the processing of specification expressions. If that's the
case, then "why" still isn't really relevant. something like "oops, we
overlooked that case" is likely as an answer. It would be worth
reporting the possible bug, however.
--
Richard Maine
email: last name at domain . net
domain: summer-triangle
Stefano Zaghi
2013-08-22 07:33:01 UTC
Permalink
Hi Richard,

thank you for your interest.
Post by Richard Maine
Note that there is no initialization in evidence. I think you probably
mean "declaration", or possibly you could be talking about specification
expressions.
I am sorry, you are right: I said initialization, but I was thinking to the specification of character length into the declaration.
Post by Richard Maine
If it were me, first, I'd want to verify that the compiler does indeed
support strings of the length you are using. Note that one case
happening to appear to work does not constitute support. It is not at
all unusual for things to partially work before they are complete.
Compiler limts like that ought to be documented; I'd want to check that
documentation to see if the compiler formally claims to support it.
As I wrote in the first post, Intel Compiler documentation clearly specifies that this compiler support 64bit length.
Post by Richard Maine
Assuming that the compiler does support it, you might just be looking at
a bug in the processing of specification expressions. If that's the
case, then "why" still isn't really relevant. something like "oops, we
overlooked that case" is likely as an answer. It would be worth
reporting the possible bug, however.
Before thinking to a bug, I am thinking to my lack of Fortran knowledge... In particular, I am not sure if there are some rules concerning the use of intrinsic functions (size, ubound, lbound,...) into the dimension specification of a character variable.

Thank you again.
Richard Maine
2013-08-22 07:37:40 UTC
Permalink
Post by Stefano Zaghi
Post by Richard Maine
Assuming that the compiler does support it, you might just be looking at
a bug in the processing of specification expressions. If that's the
case, then "why" still isn't really relevant. something like "oops, we
overlooked that case" is likely as an answer. It would be worth
reporting the possible bug, however.
Before thinking to a bug, I am thinking to my lack of Fortran knowledge...
In particular, I am not sure if there are some rules concerning the use
of intrinsic functions (size, ubound, lbound,...) into the dimension
specification of a character variable.
The code looks quite fine in that respect.
--
Richard Maine
email: last name at domain . net
domain: summer-triangle
JB
2013-08-21 07:39:59 UTC
Permalink
Post by James Van Buskirk
Post by Stefano Zaghi
This let me suppose that the maximum character length is
compiler-dependent.
gfortran uses INTEGER(C_INT) rather than INTEGER(C_SIZE_T) for
string lengths internally. I can't recall whether there are any
plans to change this.
Yes, there is a plan to change to INTEGER(C_SIZE_T). While the change
itself is relatively trivial, it does break the procedure calling ABI
as well as the library ABI. So it's waiting for a bunch of other ABI
breaking changes to go in at the same time, primarily the new array
descriptor.
--
JB
Continue reading on narkive:
Search results for 'automatic definition of character variable: maximum length and initialization problems' (Questions and Answers)
6
replies
help needed in c program?
started 2007-04-20 06:45:26 UTC
programming & design
Loading...