Re: [Orekit Developers] Is serialization of propagators really useful

On Wed, Oct 31, 2012 at 12:02 PM, MAISONOBE Luc <luc.maisonobe@c-s.fr> wrote:

Hi all,

Hi Luc,

From the very beginning of Orekit (back in 2002), some basic priciples have been applied throughout the library. These principles are based on the rationale that the library is a middle level one and can be used in very different context and hence must be as robust as possible. One of these principles is to have immutable objects as much as possible (orbits, dates, transforms, attitudes, all of these are immutable and will stay immutable). Another of these priciples was that all objects should implement Serializable if possible. The rationale for this later principle was that users of the library may need to store reference to Orekit objects in their own classes, and they may need to have these classes serializable (for example to transfer them to a remote server using a message-oriented middleware in a distributed application).

As many Orekit are defined using top level interfaces, the "implements Serializable" declaration was done at the interface level. This is what happens for the Propagator interface or for the AttitudeProvider interface. For now, all propagators at least at the language level are serializable. In fact, this does not work well since about three years for numerical propagator due to a change in Apache Commons Math (see <http://svn.apache.org/viewvc?view=revision&revision=786881>) released with version 2.0 of the library, which implied a corresponding change had to be introduced in Orekit (see <https://www.orekit.org/forge/projects/orekit/repository/revisions/e49df7c6460576f46993558389b10a6b74b11430>). Since this change, when a numerical propagator was serialized and deserialized, the new propagator had no integrator anymore. So the user was required to reset a new integrator using the new setIntegrator method. This was not really satisfying.

We are now facing worse problems as there are more places were objects cannot really be serialized. As an example, force models may require a celestial body such as the Sun, which is provided as a PVCoordinatesProvider, which is not serializable. Even if the providers available in Orekit such as CelestialBody could be serialized, it would be a very bad idea to do so as they can have huge datasets.

There were several (heated) discussions about serialization a while ago in the Apache Commons Math (see <http://markmail.org/thread/26zmab5xo2rr4eap> for the last one). The outcome was that basically serialization must not be a general rule and should be done properly and one a case by case basis. One of the interesting outcome of these discussions is a trick that allows users of a non-serializable class to still use serialization in their top level class as they requires it using proxy objects to replace the non-serializable ones (the use can be because they have a non-serializable field in their class or the extend a non-serializable class). This is used in Apache Commons Math at a few places (see for example PointValuePair, which is a case where the serializable class extends a non-serializable one). We also have started using this trick in Orekit as well (see for example the DataTransferObject inner class in the InterpolatingTransformProvider class in the frames package).

Imho, shifting away from serializable algorithms is a good idea. I do not think there exists a use-case where distributed computing can / shall be achieved by transmitting both function and data.
In theory this may work well, but there are is usually so much context-dependant information involved that makes it impractical. Also from a configuration management point of view it is desirable to have
control about your execution environment on a different layer of your system.

Regarding the serialization of immutable objects: I think Evan made a good point, that the context in which an object is created (e.g. which ephemerides loaded) and on which it also depends somehow must be known when deserializing it again. Looking at the problem from a pure technical point of view, serializing only the bits and bytes the object is composed of may be sufficient, but imho it is also very important to know exactly what the data is all about, so you can use it correctly for later calculations or analysis.

We can serialize a CelestialBody, or an Orbit, but how can we ensure, that the same environment at its creation time is present at the time of deserialization? And more importantly, how can we indicate that there is a mis-alignment to prevent improper use of data results which are very difficult to track down?

Maybe something like this should not be the goal of a library like orekit, but then I would remove serialization support completely to not give the users a wrong impression on the use of it (And I know that I now sound very much like Gilles ;-).

So our basic principle about serialization is difficult to manage, and thanks to the proxy objects trick it is in fact not really needed anymore. I would propose to follow Apache Commons Math policy and limit our use of Serializable to some basic classes where it makes sense (typically data containers like orbit, attitude, date, transforms, body shapes, geodetic point ...) and remove it for complex cases like Propagator or AttitudeProvider.

What do you think about this change?

Thomas