Efficient Template Tensor library.
This library is adapted to
MoFEM needs.. Several new capabilities are added to original
FTensor library, however that philosophy and spirit of original implementation is preserved.
If some operator is not working because is not added, you can let us know, or pleas feel free to add it by yourself and make pull request to merge your contribution with the library.
Example:
dx(i) = F(i,I)*dX(I);
C(I,J) = F(i,I)^F(i,J);
b(i,j) = F(i,I)^F(j,I);
FTensor
February 6, 2013
FTensor is a set of C++ classes that allow a great deal of abstraction when dealing with tensors, yet delivers uncompromising efficiency. It uses template expressions to provide expressiveness and speed.
FTensor is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. A copy of the license should be included in the file LICENSE.
FTensor uses template expressions to optimize code. I owe a huge debt to Todd Veldhuizen who originally used template expressions in developing the Blitz library (see www.oonumerics.org).
FTensor uses many of the ideas from that library.
FTensor's biggest claim to fame is that it handles implicit summation. Thus you can write
instead of having to write
Also, the result is strongly typed by the indices, so you can't write
or even
It has Tensor0,
Tensor1, Tensor2, Tensor2_symmetric, Tensor3_dg (symmetric on the first two indices), Tensor3_antisymmetric (antisymmetric on the last two indices), Tensor3_christof(symmetric on the last two indices), Tensor4_ddg(symmetric on the first two, and last two, indices) and Tensor4_Riemann(antisymmetric on the first two, and last two, indices, and symmetric under cyclic permutation of the last three indices). I wrote this for a General Relativity code, so that is why I implemented this particular choice of tensors.
The dimension of the tensors are determined by a template parameter. For two, three, or four dimensions, everything should just work. For higher dimensions, the only caveat is that you can't use a simple constructor like
You have to go in to FTensor_new/Tensor1/Tensor1_value.hpp and add it in. If you want to use it for pointers, you have to add it to FTensor_new/Tensor1/Tensor1_pointer.hpp. You don't have to do this. You could instead just type
T1(0)=0;
T1(1)=1;
T1(2)=2;
T1(3)=3;
T1(4)=4;
It is more unsafe, but it doesn't require you to mess with the internals of the code. If you want to turn on bounds checking (so that, for example, T1(5) will give you a run time error) then compile everything with FTENSOR_DEBUG defined.
If you would like higher dimensional constructors, please let me know and I'll put them in.
You can store a Tensor2 in either column-major (the default) or row-major format. It comes in the form of an optional fourth template parameter. So to get column-major format, you can rely on on the default
or you can declare it explicitly
Tensor2<double,2,3,column_major> T2;
To get row-major format (for FORTRAN or some graphics libraries), declare it as
Tensor2<double,2,3,row_major> T2;
It can also handle pointers to doubles. So you could write
double a0[10000], a1[10000], a2[10000], b0[10000], b1[10000], b2[10000],
c0[10000], c1[10000], c2[10000];
for(int a=0;a<10000;a++)
{
A(i)=B(i)+C(i);
++A;
++B;
++C;
}
If you are familiar with Blitz, it also uses template expressions to optimize code. However, Blitz optimizes one expression at a time. So, for example, if you want to invert a 3x3 matrix, you can write it like
det=a(0,0)*a(1,1)*a(2,2)
+ a(1,0)*a(2,1)*a(0,2)
+ a(2,0)*a(0,1)*a(1,2)
- a(0,0)*a(2,1)*a(1,2)
- a(1,0)*a(0,1)*a(2,2)
- a(2,0)*a(1,1)*a(0,2);
inverse(0,0)= (a(1,1)*a(2,2) - a(1,2)*a(1,2))/det;
inverse(0,1)= (a(0,2)*a(1,2) - a(0,1)*a(2,2))/det;
inverse(0,2)= (a(0,1)*a(1,2) - a(0,2)*a(1,1))/det;
inverse(1,1)= (a(0,0)*a(2,2) - a(0,2)*a(0,2))/det;
inverse(1,2)= (a(0,2)*a(0,1) - a(0,0)*a(1,2))/det;
inverse(2,2)= (a(1,1)*a(0,0) - a(1,0)*a(1,0))/det;
However, det is just going to be thrown away at the end. We don't need to store it for all (10000 or whatever) points. We just need to compute it for one point, use it in six expressions, and forget it. The Blitz method makes you ship the memory of det in and out of the cache 6 times. A better way to do this is to put the whole inversion into one loop. I've seen a factor of 4 improvement doing it this way. The disadvantages, which are all-too-real, are that you have to manually start the loop, and you have to remember to increment the variables. In the case of the inversion, it ends up looking like
double det;
for(int i=0;i<10000;i++
{
det=a(0,0)*a(1,1)*a(2,2)
+ a(1,0)*a(2,1)*a(0,2)
+ a(2,0)*a(0,1)*a(1,2)
- a(0,0)*a(2,1)*a(1,2)
- a(1,0)*a(0,1)*a(2,2)
- a(2,0)*a(1,1)*a(0,2);
inverse(0,0)= (a(1,1)*a(2,2) - a(1,2)*a(1,2))/det;
inverse(0,1)= (a(0,2)*a(1,2) - a(0,1)*a(2,2))/det;
inverse(0,2)= (a(0,1)*a(1,2) - a(0,2)*a(1,1))/det;
inverse(1,1)= (a(0,0)*a(2,2) - a(0,2)*a(0,2))/det;
inverse(1,2)= (a(0,2)*a(0,1) - a(0,0)*a(1,2))/det;
inverse(2,2)= (a(1,1)*a(0,0) - a(1,0)*a(1,0))/det;
++a;
++inverse;
}
Forgetting to put in the ++ operators could result in subtle bugs. You could also have problems if you put in more than one loop:
for(int i=0;i<10000;i++)
{
a(i,j)=...
++a;
}
for(int i=0;i<10000;i++)
{
a(i,j)+=...
++a;
}
This will end up writing off of the end of a. Furthermore, I use the restrict keyword, so you might get some weird problems if you try to alias things. You might want to #define the restrict away. I found that it actually decreased performance for extremely complicated expressions.
Basically, you're giving up some expressive power.
It can handle quite complex expressions. As a real life example
K_new(i,j)=Lapse*(R(i,j) + Trace_K*K(i,j) - (2*K(i,k)^K_mix(k,j))
- 0.5*matter_ADM*g(i,j) - S_ADM(i,j))
+ Shift_up(k)*dK(i,j,k)
+ (K(i,k)*dShift_up(k,j) || K(j,k)*dShift_up(k,i))
- ddLapse(i,j) + (dLapse(k)*christof(k,i,j));
K_new is symmetric, the ^ operator means contract to make a Tensor2_symmetric, and the || means add to make a Tensor2_symmetric (it is not a symmetrizer, so it doesn't divide by 2). I had to use these operators (instead of * and +) to keep the compilers from making a Tensor2 instead. You can't assign a Tensor2 to a Tensor2_symmetric, so you have to explicitly request the symmetrized result.
KCC was able to optimize the entire expression. I don't know if any other compilers can fully optimize these things well. gcc doesn't (though that isn't surprising). I couldn't get SGI's CC compiler to do it either. xlC can't optimize it, and I've had problems with Internal Compiler Errors and incorrect code. The Portland Group's compiler doesn't optimize it. For a more complete discussion of compilers, please see the paper at
http://www.wlandry.net/Projects/FTensor
If your compiler can't optimize the expressions, then you might end up with something slower than doing everything by hand, but maybe not. For small examples (like the matrix inversion) there is a large slowdown (factors of three or more). However, in my real-life code, I saw slow downs of only 10-20% using gcc or xlC compared to KCC. I think I was still dominated by memory bandwith.
Also, not all possible operations are supported. I can't think of any right now, but I know they exist. It is not hard to add in operations, but I don't need it, so I haven't done it. A somewhat more useful extension might be antisymmetric rank 2 tensors. I have a way of doing that, but it isn't particularly nice. I won't describe it here, except to say that it is unsafe. You can look in Tensor3_antisymmetric for how I handled the antisymmetries in that.
There is a directory for tests with a README that should explain how to use them.
If you have any questions, feedback, bug reports, feature requests, etc., feel free to send me an email.
Enjoy, Walter Landry
wland.nosp@m.ry@c.nosp@m.altec.nosp@m.h.ed.nosp@m.u- Todo:
- Some operator for tensor4 and Tensor4_ddg are not tested for varied number of dimensions. For example first two indices has dimension 2 and last two 4.
- Todo:
- Some operators for symmetric tensor 4 need to be implemented.
- Todo:
- Some operators for symmetric tensor 3 on last two indices are not implemented.
- Todo:
- Tensor4 that has symmetry on minor fist two and last two and major symmetry between fist and last two.
- Todo:
- Tensor4 that has major symmetric only on first two and last two.
- Todo:
- More documentation and examples. Documentation of functions and very limited at this point