Discussion:
CPU time for transcendental functions
(too old to reply)
Robinn
2023-12-15 01:59:56 UTC
Permalink
I got some old neural network code (written about 30 years ago).
It has several activation functions, which only change 2 lines, like so:

if (activation(1:2).eq.'SI' .or. activation(1:2).eq.'LO') then
output(i,j) = 1.0/(1.0+EXP(-output(i,j))) ! sigmoid
slope(i,j) = output(i,j) * (1.0 - output(i,j)) ! sigmoid
elseif (activation(1:2).eq.'TA') then
output(i,j) = TANH(output(i,j)) ! TANH
slope(i,j) = 1.0 - output(i,j)*output(i,j) ! TANH
elseif (activation(1:2).eq.'AR') then
y = output(i,j)
output(i,j) = ATAN(y) ! arctan
slope(i,j) = 1.0/(1.0 +y*y) ! arctan
elseif (activation(1:5).eq.'SOFTP') then
y = EXP(output(i,j))
output(i,j) = LOG(1.0+y) ! softplus
slope(i,j) = 1.0/(1.0+1.0/y) ! softplus
elseif (activation(1:5).eq.'SOFTS') then
y = output(i,j)
output(i,j) = y/(ABS(y)+1.0) ! softsign
slope(i,j) = 1.0/(1.0+ABS(y))**2 ! softsign

Now when running it, the tanh option is slowest, as expected.
But the sigmoid (using exp) is faster than softsign, which only needs
abs and simple arithmetic. How can this be? Even if exp is using a
table lookup and spline interpolation, I would think that is slower.
Softsign would have an extra divide, but I can't see that tipping the
scales.
Steven G. Kargl
2023-12-15 04:22:13 UTC
Permalink
Post by Robinn
I got some old neural network code (written about 30 years ago).
if (activation(1:2).eq.'SI' .or. activation(1:2).eq.'LO') then
output(i,j) = 1.0/(1.0+EXP(-output(i,j))) ! sigmoid
slope(i,j) = output(i,j) * (1.0 - output(i,j)) ! sigmoid
elseif (activation(1:2).eq.'TA') then
output(i,j) = TANH(output(i,j)) ! TANH
slope(i,j) = 1.0 - output(i,j)*output(i,j) ! TANH
elseif (activation(1:2).eq.'AR') then
y = output(i,j)
output(i,j) = ATAN(y) ! arctan
slope(i,j) = 1.0/(1.0 +y*y) ! arctan
elseif (activation(1:5).eq.'SOFTP') then
y = EXP(output(i,j))
output(i,j) = LOG(1.0+y) ! softplus
slope(i,j) = 1.0/(1.0+1.0/y) ! softplus
elseif (activation(1:5).eq.'SOFTS') then
y = output(i,j)
output(i,j) = y/(ABS(y)+1.0) ! softsign
slope(i,j) = 1.0/(1.0+ABS(y))**2 ! softsign
Now when running it, the tanh option is slowest, as expected.
But the sigmoid (using exp) is faster than softsign, which only needs
abs and simple arithmetic. How can this be? Even if exp is using a
table lookup and spline interpolation, I would think that is slower.
Softsign would have an extra divide, but I can't see that tipping the
scales.
There is insufficient information to provide much help. First, what
compiler and operating system? Second, how did you do the timing?
Third, is there a minimum working example that others can profile?
--
steve
Giorgio Pastore
2023-12-22 14:37:52 UTC
Permalink
Post by Steven G. Kargl
Post by Robinn
I got some old neural network code (written about 30 years ago).
if (activation(1:2).eq.'SI' .or. activation(1:2).eq.'LO') then
output(i,j) = 1.0/(1.0+EXP(-output(i,j))) ! sigmoid
slope(i,j) = output(i,j) * (1.0 - output(i,j)) ! sigmoid
elseif (activation(1:2).eq.'TA') then
output(i,j) = TANH(output(i,j)) ! TANH
slope(i,j) = 1.0 - output(i,j)*output(i,j) ! TANH
elseif (activation(1:2).eq.'AR') then
y = output(i,j)
output(i,j) = ATAN(y) ! arctan
slope(i,j) = 1.0/(1.0 +y*y) ! arctan
elseif (activation(1:5).eq.'SOFTP') then
y = EXP(output(i,j))
output(i,j) = LOG(1.0+y) ! softplus
slope(i,j) = 1.0/(1.0+1.0/y) ! softplus
elseif (activation(1:5).eq.'SOFTS') then
y = output(i,j)
output(i,j) = y/(ABS(y)+1.0) ! softsign
slope(i,j) = 1.0/(1.0+ABS(y))**2 ! softsign
Now when running it, the tanh option is slowest, as expected.
But the sigmoid (using exp) is faster than softsign, which only needs
abs and simple arithmetic. How can this be? Even if exp is using a
table lookup and spline interpolation, I would think that is slower.
Softsign would have an extra divide, but I can't see that tipping the
scales.
There is insufficient information to provide much help. First, what
compiler and operating system? Second, how did you do the timing?
Third, is there a minimum working example that others can profile?
Fourth, what were the numbers of timing.

Giorgio
Thomas Jahns
2024-01-30 08:40:22 UTC
Permalink
Post by Robinn
I got some old neural network code (written about 30 years ago).
      if (activation(1:2).eq.'SI' .or. activation(1:2).eq.'LO') then
         output(i,j) = 1.0/(1.0+EXP(-output(i,j)))       ! sigmoid
         slope(i,j) = output(i,j) * (1.0 - output(i,j)) ! sigmoid
      elseif (activation(1:2).eq.'TA') then
         output(i,j) = TANH(output(i,j))                 ! TANH
         slope(i,j) = 1.0 - output(i,j)*output(i,j)     ! TANH
      elseif (activation(1:2).eq.'AR') then
         y = output(i,j)
         output(i,j) = ATAN(y)                           ! arctan
         slope(i,j) = 1.0/(1.0 +y*y)                  ! arctan
      elseif (activation(1:5).eq.'SOFTP') then
         y = EXP(output(i,j))
         output(i,j) = LOG(1.0+y)                        ! softplus
         slope(i,j) = 1.0/(1.0+1.0/y)               ! softplus
      elseif (activation(1:5).eq.'SOFTS') then
         y = output(i,j)
         output(i,j) = y/(ABS(y)+1.0)                    ! softsign
         slope(i,j) = 1.0/(1.0+ABS(y))**2             ! softsign
Now when running it, the tanh option is slowest, as expected.
But the sigmoid (using exp) is faster than softsign, which only needs
abs  and simple arithmetic. How can this be? Even if exp is using a table lookup
and spline interpolation, I would think that is slower.
Softsign would have an extra divide, but I can't see that tipping the scales.
You perhaps are not aware that the string comparisons (for which most compilers
call the strncmp function) you have in your conditionals are quite expensive on
todays CPUs. I would recommend to use an INTEGER constant to make the switch.

Thomas

Loading...