Proposal for the storage of simulation runs in Oracle

by Ilse Koenig, GSI June 18, 2002

The storage of information about simulation runs has two aspects

To share the simulation runs:
One persons produces a file, but many people can use it.
This requires the storage of all the details behind the different flags and files in the initialization file "geaini.dat" in HGEANT. If this information is available via a web page, the people can decide, if they can use this simulation or if they have to make a new one.
To store the parameters needed to analyze the data

Before we discuss and finally implement all the details, we should start in a first step with the storage of the following information:

Unique Run Id
Unique Filename
Project
Number of Events
Author
Comment (may contain actually the link to a web-page)
Created
Unique Run Start

How to get unique run ids and filenames?
We cannot use the time as the DAQ, because people work in parallel, run jobs on a batch farm, and by chance two simulations could start at the same time.
One possible solution: Ask Oracle for a unique number.
Running HGEANT with Oracle support, one could get it automatically. But this is normally not the case. Therefore I could make a web-form to get this number, but then one has to add it manually to the HGEANT initialization file. If one forgets this and uses an old file, the old run id is stored in the event header. It will not be possible later to store this run in Oracle and it's also not possible to change the event headers (at least not without copying the data to a new file).

How do we want to handle this?

How to define the Run Start?
Our version management is designed for runs coming in sequential order. Parameters stay the same until experiment conditions change and a new version is needed.
Simulations on the other hand are completely unordered. Many people make simulations at the same time, but for different purposes, with different projectile/target combinations, energies, geometries, ...
They may need different parameter sets for the analysis.

To attach parameters to the simulation runs, we needed a Run Start. This date is somehow artificial, but needed to read the parameters. Runs with same parameter sets have to be grouped together, otherwise one has to validate parameter version for each run individually and cannot validate them for a range of runs.

I propose to introduce projects (analog to beamtimes for real runs).
For simulation projects corresponding to real beamtimes the names should be "experiment + SIM", e.g. NOV01SIM.
Projects have a time range (e.g. the same as a real beamtime) and all simulation runs get start times inside this date range.
We can think about sub-projects (e.g. with/without magnetic field), which again define a time window for a couple of runs.
The most containers will only need one version and one can take the date range of the project for validation. All runs made later in this project will use these parameters without further need to validate.
Others may have different versions for the sub-projects.

How many parameter containers really change and when? How many versions should we expect?