Configuration#

Configuring the LCLStreamer Application#

The behavior of the LCLStreamer application is fully determined by the content of its configuration file. This file determines which implementation of the different workflow components should be used, how each component is configured, and which data elements should be retrieved and processed by the application. Additionally, it specifies where the data should come from.

LCLStreamer reads each section of the configuration file to determine which implementation of each component it should use. Each section identifies, via the type entry, which specific instance of the component LCLStreamer should use (providing the name of a Python class that implements the component). For example:

event_source:
    type: Psana1EventSource

processing_pipeline:
    type: BatchProcessingPipeline
    ...

data_serializer:
    type: HDF5BinarySerializer
    ...

data_handlers:

    - type: BinaryFileWritingDataHandler
      ...

    - type: BinaryDataStreamingDataHandler
      ...

data_sources:
    timestamp:
        type: Psana1Timestamp
    ...

With these configuration options, LCLStreamer reads psana1 data (Psana1EventSource), batches the retrieved data (BatchProcessingPipeline), serializes the data as a binary blob with the internal structure of an HDF5 file (HDF5BinarySerializer), and finally hands the binary blob to two data handlers: one that saves it as a file (BinaryFileWritingDataHandler) and one that streams it through a network socket (BinaryDataStreamingDataHandler).

In addition to the type entry, each section in the configuration file contains entries for other parameters needed to configure the corresponding component (see below).

Some configuration parameters don’t apply to a specific component, but to the entirety of LCLStreamer; for example: the label that identifies the source of the data (a specific experiment, file, or data-producing socket), or the strategy that LCLStreamer should apply when encountering corrupted data events. These parameters can be provided at the top of the configuration file, outside of the various component sections. For example:

source_identifier: exp=xpptut15:run=430
skip_incomplete_events: false

event_source:
    type: Psana1EventSource

processing_pipeline:
    type: BatchProcessingPipeline
    ...

data_serializer:
    type: HDF5BinarySerializer

Configuring LCLStreamer’s components#

In addition to the type entry, which defines the nature of the component, other entries in each section can be used to provide parameters for each of the Python classes that implement the LCLStreamer components. For example:

data_serializer:
    type: HDF5BinarySerializer
    compression_level: 3
    compression: fzp
    fields:
        timestamp: /data/timestamp
        detector_data: /data/data
        random: /data/random
        photon_wavelength: /data/wavelength

In this section, the provided parameters specify that the Data Serializer component is implemented by the HDF5BinarySerializer Python class. The serializer compresses the data using the zfp algorithm, with a compression level of 3. The fields entry describes the internal HDF5 path where each data item should be saved.

Configuring the data sources#

The data_sources section of the configuration file defines the data that LCLStreamer extracts from every data event it processes. If a piece of information is part of a data event but is not included in the data_sources section, LCLStreamer ignores it.

The data_sources section of the configuration file consists of a dictionary of data sources. Each entry has a key, which acts as a name that identifies the extracted data throughout the whole LCLStreamer data workflow, and a value, which is itself a dictionary. This inner dictionary defines the nature of the data source (via the usual mandatory type entry) and any other parameters needed to configure it. As above, the type of a data source is the name of the Python class that implements it. For example:

data_sources:
    timestamp:
        type: Psana1Timestamp

    detector_data:
        type: Psana1DetectorInterface
        psana_name: Jungfrau1M
        psana_fields: calib

This snippet of the configuration file defines two data sources, one called timestamp and one called detector_data. The timestamp data source is of type Psana1Timestamp. This means that a Python class of the same name determines how this type of data is extracted from a data event. The detector_data source is instead of type Psana1DetectorInterface. The configuration parameter psana_name is passed to the Python class Psana1DetectorInterface that defines how this type of data is retrieved.

Configuration Options#