New Data Analysis Systems at GSI
Draft V 0.5 (July 22, 98, H. G. Essel)
x M. Kaspar
x P. Koczon
x W. Koenig
x Ch. Kozhuharov
x W.F.J. Müller
x K. Sümmerer
According to the 1998 DV paper most experiments currently use GOOSY
and/or PAW except the Hades experiment which uses ROOT. Platforms are:
PAW: OpenVMS, Linux, AIX
ROOT: DECunix, Linux
Analysis Classes and Modes
One can classify the requirements for analysis systems in three
Class A: Experiments with complex detector set-ups (geometry, tracks,
reconstruction etc.) and high data volume. Examples are HADES and FOPI. The
CERN experiment ALICE also would fit here.
Class B: Experiments with complex histograms, statistical analysis. Examples are KAOS,
EB, FRS, ESR.
Class C: Experiments with low computing, not too many channels per event,
statistical analysis. Examples are small test experiments.
Besides these classes there are three analysis operation modes:
Offline: Analysis gets data from storage (A, B, C)
Online: Analysis gets data from DAQ (A, B, C)
Control: Analysis is integrated in DAQ, i.e. results of analysis are used to
control the DAQ (B, C).
The objective is a replacement of GOOSY and PAW by one or more new
systems. These systems shall be available by the end of 1999. The current systems must be
maintained for a period of time after the delivery of the new system. New functionality
may be required by new types of experiments, i.e. at the ESR.
In a first step we have to develop a roadmap for the development of the new system(s).
In the following sections we describe in a rather brief way the "user
requirements", i.e. the key characteristics the new systems shall have. In this
draft, the requirements are not assigned to the classes A, B or C yet.
Then we refer briefly to possible solutions and propose a process to realize the new
New systems shall run on Unix, Windows and VMS to increase acceptance.
The platforms are: Linux, NT, AIX, DECunix, and OpenVMS. The order of
importance is currently not yet fixed.
A graphical user interface to operate the system improves the
New systems shall provide context sensitive help and/or assistance.
- New systems shall provide a graphical user interface.
This is normally provided together with a GUI. A tutorial and
real-world examples are required.
New systems shall be programmable through a graphical user interface.
This would be like LabView or IrisExplorer.
The graphical user interface shall be "compatible" with the GUI of MBS.
Often the analysis is operated on-line together with the DAQ. Then the
GUIs should have the same look & feel.
New systems shall provide a script interface with full functionality.
It is necessary for batch jobs, but also to execute predefined scripts
interactively. All data elements and functions, i.e. graphics shall be available.
Results of scripts/actions shall be accessible by subsequent scripts/actions.
The output of scripts like fit results shall be storable in a way that
they can be accessed by subsequent scripts.
Analysis shall be controlled during execution.
When an analysis is coupled to DAQ it might be necessary to execute
commands in the analysis loop, i.e. to change parameters or to stop the analysis. This
feature is required also offline when analyzing interactively large files.
Analysis shall be able to run interactively or in batch.
This is automatically
achieved by a script interface.
The analysis software should be independent of the data input, i.e.
online or offline, even if an online analysis has a different functionality as an offline
analysis. Offline analysis from raw data tapes shall be possible.
Analysis shall be able to get event input from DAQ servers.
- Analysis shall be independent of input source.
The DAQ systems provide various event servers. The new system shall
implement clients for these servers.
All systems shall be able to exchange relevant data.
Systems must be able to process data, i.e. histograms and event or
other data, as produced by DAQ, slow control, simulations and other commonly used systems.
This is the GEF format for histograms, the MBS format for event data, and N-tuples for
compressed event data. The system shall provide modules to read/write other event data
Systems shall provide a data management.
Organization of data elements, IO and storage shall be supported by
appropriate tools. All parameters of an experiment or analysis run should be stored in
standard data bases. These include set-up parameters of DAQ, calibration parameters,
filters, run specifications etc. The parameters must be accessible from the analysis.
Data elements shall be arrays of aggregates.
Histograms and other (user) data elements are referenced by names. When
very many data elements of the same kind exist, it shall be possible to process
multidimensional arrays of such elements.
Systems shall automatically save accumulated data in case of termination.
It saves a lot of time (CPU and human) if accumulated data are not lost
in case of abnormal program termination. This means especially histograms.
This means the "classical" user event routine doing all the
Systems shall provide easy to use programming interfaces to the graphics.
- Systems shall provide easy to use programming interfaces to the event data.
There must be an API to access graphical objects like polygons or
scatter plot points.
Systems shall provide tools to map event data to detector geometry.
The data representation in the event data
might not be suited for further processing. A representation fitting detectors or physical
items should be supported.
Systems shall provide complex projections in multidimensional data spaces.
A mechanism like N-tuples, but also visualization of complex data and
various kinds of filters (conditions). The mechanisms shall
be fast and interactively configurable.
Analysis shall provide statistic tools.
Other methods besides fit functions needed for the analysis of
statistic data shall be developed.
Analysis shall have online access to DAQ control functions.
Sometimes event data must be accumulated and analyzed online to steer
some DAQ set ups.
Some parts of the analysis shall optionally run online in the front-ends.
When the online analysis is needed to control the DAQ it would be
necessary to have direct access to the control hardware. In this case the analysis must
run in the DAQ front end. Only a subset of the functionality is needed. The visualization
might be done on a remote node.
When the analysis is running it is necessary to look at the data.
Display shall provide visualization of complex and/or multiple objects.
- Display shall operate independently of analysis execution.
It should be possible to compose complex views of histograms or other
data. This could be views of many histograms in one frame, 3D views of data and polygons
Display set-ups shall be savable.
Often one wants to save all display parameters, i.e. boundaries and
scaling, and apply them to different data.
Display shall generate ready to publish paper prints.
At least the system should make prints like WYSIWYG. Eventually a data
export to other graphics packages would be necessary.
Use existing software as base.
Use software common to community.
Keep external software unchanged except changes are accepted and implemented by authors.
Add required features.
Constraint 1 makes it necessary to evaluate software packages which
could be the base for the new system(s). Possible packages are:
Some features of these packages:
Easy to use.
Could be standard for slow
Class C, Control
Runs in MBS
GSI home made
GSI home made
Class B, Class A
Does it survive?
Class B, Class A
Not yet proven
Feasibility Study Projects
We have to establish some projects to learn about the features of the packages under
- Develop simple histogram display GUI
- Develop input channel to MBS
- Develop some simple VIs for analysis
- GUI development
- Input from MBS
- Storage I/O
- Java interface
- Hades project
- Get up a running environment
- Iris Explorer