[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Orekit Developers] Multi-threading problems in OREKIT



Hello,

MAISONOBE Luc <luc.maisonobe@c-s.fr> a écrit :

Hello,

Here are some thoughts about the multi-threading issues and the direction we are aiming at. We have tried to combine requests from several different users (some have been expressed publicly in this list or in the bug tracker, some have been directed directly to the Orekit team). This is mainly a trade-off between various threads and includes contributions and ideas from several persons.
I would like to present it here so everyone interested can give  
their opinion about it.
Some work has been done on this topic and committed to the forge in a  
dedicated branch. See TimeStampedGenerator and TimeStampedCache. For  
now, no classes use these features.
We would be happy to get some feedback about this.

best regards,
Luc

The ideas are based on the reasons why we have some singletons in  
Orekit. Singletons are used for caching and are a really important  
feature. Without caching for example, the performances for Earth  
precession-nutation computation would be really really poor.
Many server-based applications that use multi-threading now do not  
handle Threads by themselves anymore (as was the case in the Java 4  
era). They rather use Javas 5 ExecutorService, and in most cases the  
implementation they use are thread pools provided by the factory  
method Executors.newFixedThreadPool. Such thread pools do *not*  
guarantee the same thread will serve all requests from a specific  
client. In fact, when a request is submitted to the executor  
service, a random idle thread is waken up to execute it and once  
completed the thread is put again in the idle pool where it can be  
picked up later. Threads are not started or stopped, they are  
recycled. There are no correlations at all between threads and  
requests and they are bound together almost at random. This clearly  
rules out ThreadLocal singletons as their caching is very likely to  
be invalidated in this case (see below). In fact, using ThreadLocal  
puts a very stringent assumption on how the application handles its  
threads so it cannot be reliably used in a general purpose library  
of intermediate level like Orekit. It is safe only at application  
level or at dedicated support library level where the global  
architecture is known in advance.
Some people have also proposed using memcached to handle such data.  
Memcached is for sure a complete solution, but seems rather big and  
would not scale down. It would also be a huge dependency for Orekit,  
so this solution was also discarded.
So we have some shared data cache we can't remove, and we must  
assume threads and requests are not bound together, and we don't  
want our cache to be invalidated at each request.
However, there is still hope. We can assume that a server would  
serve only a small number of remote clients (say a dozen or  
something like that) and that each client will issue requests that  
do have some temporal locality, i.e. they all correspond to some  
reduced time range. One very important property shared by our  
singletons is that they are all date based. This is the major trick  
we used in the following design.
As an example, we consider the following use case. Lets consider a  
typical operational ground system where several engineers work in  
parallel. At some time, we may have application A performing some  
computation for orbit determination on data centered around last  
week, application B computing next cycle maneuvers for the next 6  
months, application C doing some history browsing on data covering  
last year, and application D doing real time monitoring on current  
date. In this case, we have four applications each using a different  
time range. If these application use a shared central server to  
perform conversions between Earth and inertial frames for example,  
the server will get random requests in these four time ranges only,  
and will need to cache very large Earth Orientation Parameter data  
in each case.
The current version of Orekit fails miserably in this case as  
precession/nutation computations are cached in a singleton and the  
cache covers only a few days. So when a conversion request from one  
application is served just after a request from another application,  
the new request appears to be out of current cache range, so cache  
is invalidated and the complete cache must be recomputed (which is  
computing intensive). Then the request is answered, and the cache  
may be invalidated again just after that. We get cache misses all  
the time and the cache finally hinders performances a lot instead of  
improving them.
So what do we propose to solve this problem?

We propose to set up a cache dedicated to sequences of TimeStamped objects that would handle a small number of disjoint ranges at the same time (in our use case, four ranges would be needed). This cache must be thread safe and it must store large data sets. The cache would be created empty and be able to add new entries. The various ranges would be allocated as needed up to the user-configured max number. Ranges could be configured with a max number of points or max duration span to avoid huge memory consumption. Typically, a real-time monitoring could set up ranges limited to a few hours used as a rolling range as time flows, but station-keeping simulators would ask for almost month-long ranges as they go back and forth in the station-keeping cycle trying to optimize maneuvers. Entries within a single range would be invalidated in rolling cycles depending on request dates (inserting points at one side would involve dropping points at the other side, taking care time-locality is reversed according to forward or backward propagation). When a new range is needed, it would be allocated up to the max number without invalidating other ranges, but if the max number of ranges is exceeded (which would be a user configuration mistake), then another range would be dropped, using some standard cache policy, typically LRU (Least Recently Used).
We propose to use a single class for all these caches:

 public class TimeStampedCache<T extends TimeStamped> {
 }

The T parameter would be the type of data cached (UTCTAIOffset for the UTC scale, or similar classes for EOP or planetary ephemeris). The cache would only provide thread safety, ranges handling, entries handling within the ranges. It would not provide any computation on the entries.
This class would be used by higher level objects which would not  
store the complete data, but only a few reference to the elements  
they currently need. For example an object that would need simple or  
no interpolation like UTC-TAI would store only two references to the  
previous point and next point, whereas an object that would need  
higher degree polynomial interpolation like precession/nutation  
would store a sub-array of a dozen references. These objects being  
small, they could be reallocated, copied, made immutable if desired,  
depending on the need. They would rely on the TimeStampedCache  
methods to retrieve the references. Three methods are foreseen for  
this in the TimeStampedCache class:
 T[] getNeighbors(final AbsoluteDate central, final int n)
    throws OrekitException;

 T getBefore(final AbsoluteDate date)
    throws OrekitException;

 T getAfter(final AbsoluteDate date)
    throws OrekitException;

This is a follow-on on the idea to have small local data in small objects that can exist in many instances, and to have the big cache in a singleton. So the small objects could avoid handling multi-threading if they are built to be immutable for example, or they could handle simple synchronization or locking if they are mutable but rely on an already thread-safe class to retrieve their data set.
In order for the TimeStampedCache class to populate its ranges as  
requests arrive, it needs some way to compute the entry points. This  
is done by providing the class with a generator at build time. A  
generator is simply an implementation of the following interface:
public interface TimeStampedGenerator<T extends TimeStamped> {

    /** Generate an entry to be cached.
     * @param date date at which the entry should be generated
     * (may be null if no entry can be generated)
     * @exception OrekitException if entry generation is attempted but fails
     */
    T generate(AbsoluteDate date) throws OrekitException;

}

Note that some dedicated logic is already foreseen to cope with time ranges that can never be extended (like leap seconds which did not exist prior to 1972) but should not trigger errors if a request is made far in the past, and to also cope with time ranges that should theoretically be extensible but due to missing data cannot provide points at some requested datrd and should trigger an error. This is an implementation detail we will not describe in this message.
With this setting, a cache for UTC-TAI would provide a generator  
that is based on reading the UTC-TAI history file, a cache for  
planetary ephemeris would provide a generator that is based on  
reading JPL or IMCCE files, a cache for precession/nutation would  
provide a generator that is a self-contained computation using IERS  
conventions.
Another point worth mentioning is that TimeStampedCache could handle  
multi-threading using the standard ReentrantReadWriteLock from the  
concurrency package instead of synchronzed blocks. Such locks allow  
multiple read at the same time, which is OK and scales well. They  
guard against multiple writes (i.e. cache extension or invalidation)  
which should never happen simultaneously.
So here are our current thoughts about this. We would like to know  
what other people think about this.
Best regards,
Luc

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.




----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.