[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Orekit Developers] Dataloader Project


It seems that I should have made myself a lot more clear before ;)!

First off, the digest will never be shown or given to the user (although it could). It is an extension on the current DataLoader's to make it possible to determine the origin of the loaded data.

Second, doing this hash without requiring implementations to manually compute digests can be done in multiple ways. The method that I opted for in the design from the previous e-mail is an AbstractDataLoader with an implementation for loadData. This method wraps the given inputstream to compute a digest and calls a new abstract method called 'processData' with it. 'processData' contains the implementation of the former 'loadData' and simply does whatever it did before.

Now what is all this for? The digest uniquely identify the _InputStream_ used in the DataLoader implementation to load data. This means that the _origin_ of the loaded data can be identified, e.g. a 'finals.all (IAU2000)' file from the IERS. What this enables is the possibilty to check two things when making the data persistent. 
(1) Warn the user of the Database-filling application that he's using the same file for the same data (not that important).
(2) Warn the user of the Database-filling application that for example an identical or similar EOPEntry was added from a different digest, i.e., a different origin.

I think that keeping the sources of loaded-data is a good idea because you want your database to be filled incrementally, you don't want to reset it every time you add more data. If the sources are not known you don't know when a clash is harmless (1) or possibly alarming (2). If you think that keeping the sources of persisted data is not important enough to justify the work I can continue with the JPA proposal that I talked about in my proposal. The proposal also talks a bit about the digest which might help to understand it.

I hope this clears things up a bit and sorry for being so vague!

On 2 July 2014 08:25, MAISONOBE Luc <luc.maisonobe@c-s.fr> wrote:
Bob Reynders <tzbobr@gmail.com> a écrit :

Hello everyone,

Hi Bob,

Today is the start of June and thus the start of my contribution through
SOCIS on Orekit!

Welcome !

My current proposal towards Database support is an extension of the current
API to incorporate a message digest of the inputstreams that are used to
parse data. This digest (simple SHA-1 hash) will be used to identify the
source of the data and will be stored with the data in the database.

I would prefer a design shown in the attached diagram for this, if this
design is not to intrusive then I propose that I'll make a draft of the
database storage application using the design shown in my original proposal
and see how it goes.

If the sources of data are not important towards the correctness of the
data (e.g. we don't have to warn users of the database application that (1)
the file is already used to load the data or (2) an entry has already been
added from a different source) then we can drop this functionality and move
on with the database storage.

I don't understand yet the purpose of this digest. Could you elaborate on this?
Currently, as most of the data loading is done under the hood, the user is not really aware of it. Users mostly configure the DataProvidersManager and forget about it. In rare case, they also configure a few regular expressions to restrict loading to certain files when several formats are supported. In other rare cases, they retrieve the list of already loaded data from the DataProvidersManager. They often access neither the loader nor the loaded data, so I'm not sure when they should get the MessageDigest and for what purpose.

For now, we let the consistency of the data set under the responsibility of the user and did not provide any way to control it. So if your proposal adds this feature, it may be an interesting addition.

In your class diagram, could you also explain the separation of loadData (which already exists in the interface) and processData (which is new). Is it related to the database access?

I have mentioned this before in my proposal but what is the preferred
persistence API? In my opinion, JPA is most accessible and known to Java
developers. There is however a large group of developers that dislike the
ORM approach.

You initial proposal has been seen only by the committers, it was not made public before the students were selected. Now that this part is done and you are here with us, you can send it again to this public list. I'll wait for this to be available to everyone before commenting on it.

best regards,


With the feature discussion out of the way I would also like to ask some
practical questions. As you have noticed by now this e-mail has gotten
rather large. I'm a fan of reporting very frequently and would like to
continue to do this. What kind of schedule would the Orekit developers
prefer (weekly, bi-weekly, daily) and where should I report (mailing list,
forge documents/wiki)?


This message was sent using IMP, the Internet Messaging Program.

Attachment: bobreynders_proposal.pdf
Description: Adobe PDF document