LUTE Configuration Database Specification
Date: 2025-08-12
VERSION: v0.2
Basic Outline
- The backend database will be sqlite, using the standard Python library.
- A high-level API is provided, so if needed, the backend database can be changed without affecting
Executor level code.
- One LUTE database is created per working directory for this iteration of the database. Note that this database is independent of any database used by a workflow manager (e.g. Airflow) to manage task execution order.
- Attempts were made to make the database 3NF/BCNF normal; however, denormalization was chosen in some cases for simplicity.
- Each database has the following tables:
- 1 table for
executions: Each time any Managed Task is run a new entry is made in this table.
- 1 table for
tasks: Contains the listing of Managed Tasks, the executions table links here.
- 1 table for
executors: Contains the listing of Executors, the executions table links here.
- 1 table for
config: Contains sets of parameters used for the AnalysisHeader, the executions table links here.
- 1 table for
results: Contains the result fields for each execution. The executions table links here.
- 1 table for
environment: Contains environment variables and values, each one linked to a specific execution in the executions table.
- 1 table for
communicators: The entries in the executors table link as needed here.
- 1 table for
schema: The results table links to entries in this table to describe which schemas the results conform to.
- 1 table for
base_schema: The schema table entries are constructed as the bitwise or of the ids in this table, since a result may implement multiple base_schema.
- 1 table for
parameters: Contains the parameters and their values. Each entry links to a specific execution in the executions table.
- 1 table for
parameter_types: Which contain the type specifications to fully reconstruct a parameter. The entries in both the executions table, and the parameters table link here. The former is linked to describe the full TaskParameters model for the execution. The latter links to describe the type of each parameter in the TaskParameters model.
- 1 table for
param_meta: Which contains additional fields used internally to properly construct parameter fields.
Schematic

executions table
The executions table contains an entry for each time any Managed Task is run. It links all necessary information together.
| id |
task_id |
parameter_type_id |
executor_id |
config_id |
result_id |
timestamp |
| 1 |
1 |
1 |
1 |
1 |
1 |
"YYYY-MM-DD HH:MM:SS" |
|
|
|
|
|
|
|
Each of the xyz_id columns follow the ids in the respective tables to allow a complete reconstruction of the Task, TaskParameters, Executor, and results data.
Column descriptions
| Column |
Description |
id |
ID of the execution in this table. |
task_id |
Pointer to the entry in the tasks table holding the information about the Task run. |
parameter_type_id |
Pointer to the entry in the parameter_types table that describes the TaskParameters model. |
executor_id |
Pointer to the entry in the executors table holding the information about the Executor used. |
config_id |
Pointer to the entry in the config table holding the AnalysisHeader for this execution. |
result_id |
Pointer to the entry in the results table holding the results for this execution. |
timestamp |
Timestamp for the execution. |
|
|
Constraints
This table DOES NOT contain constraints. This breaks full normalization to some degree. Redundant entries are allowed.
tasks table
The tasks table holds Task names.
Constraints
Column descriptions
| Column |
Description |
id |
ID of the entry in this table. |
name |
Name of the Task. |
|
|
executors table
The executors table holds information about the kinds of Executors being used.
| id |
name |
comm |
| 1 |
"MyTask" |
3 |
|
|
|
Column descriptions
| Column |
Description |
id |
ID of the entry in this table. |
name |
Name of the Task. |
comm |
Bitwise OR of entries in the communicators table describing the communicators used by this Executor. |
|
|
Constraints
- The combination of all columns (name, poll_interval, comm) must be unique. Multiple executions can reference the same row of this table.
- The
comm entry must be a BITWISE OR of the ids in the communicators table. This constraint is setup by the separate triggers rather than on table creation.
config table
| id |
title |
experiment |
run |
date |
lute_version |
task_timeout |
| 2 |
"My experiment desc" |
"EXPx00000 |
1 |
YYYY/MM/DD |
0.1 |
6000 |
|
|
|
|
|
|
|
These parameters are extracted from the TaskParameters object. Each of those contains an AnalysisHeader object stored in the lute_config variable. For a given experimental run, this value will likely be shared across any Tasks that are executed.
Column descriptions
| Column |
Description |
id |
ID of the entry in this table. |
title |
Arbitrary description/title of the purpose of analysis. E.g. what kind of experiment is being conducted |
experiment |
LCLS Experiment. Can be a placeholder if debugging, etc. |
run |
LCLS Acquisition run. Can be a placeholder if debugging, testing, etc. |
date |
Date the configuration file was first setup. |
lute_version |
Version of the codebase being used to execute Tasks. |
task_timeout |
The maximum amount of time in seconds that a Task can run before being cancelled. |
|
|
Constraints
- The combination of all columns (title, experiment, run, date, lute_version, task_timeout) must be unique. Multiple executions can reference the same row of this table.
results table
| id |
schema_id |
payload |
summary |
status |
valid_flag |
| 2 |
3 |
"Payload" |
"Summary" |
COMPLETED |
1 |
|
|
|
|
|
|
Column descriptions
| Column |
Description |
id |
ID of the entry in this table. |
schema_id |
Pointer to the entry in the schema table that describes the output produced by the Task |
payload |
Potentially full result. If the object is incompatible with the database, will instead be a path where it can be found. |
summary |
Short text summary of the Task result. This is provided by the Task, or sometimes the Executor. |
status |
Reported exit status of the Task. Note that the output may still be labeled invalid by the valid_flag (see below). |
valid_flag |
A boolean flag for whether the result is valid. May be 0 (False) if e.g., data is missing, or corrupt, or reported status is failed. |
|
|
Constraints
- The combination of all columns (schema_id, payload, summary, status, valid_flag) must be unique. Multiple executions can reference the same row of this table.
environment table
| id |
execution_id |
name |
value |
| 2 |
3 |
"PATH" |
"/usr/bin/" |
|
|
|
|
Column descriptions
| Column |
Description |
id |
ID of the entry in this table. |
execution_id |
Pointer to the entry in the executions table that this env var was used for. |
name |
Environment variable name. |
value |
Environment variable value. |
|
|
Constraints
This table DOES NOT contain constraints. This breaks full normalization to some degree. A unique index is constructed from (execution_id, name), however.
communicators table
The communicators table holds descriptions of implemented Communicator objects. The entries may be associated to multiple Executors.
| id |
name |
description |
| 2 |
SocketCommunicator |
"Communication over TCP sockets.." |
|
|
|
Column descriptions
| Column |
Description |
id |
ID of the entry in this table. |
name |
Name of the Communicator. |
description |
Description of communication method. E.g. used TCP sockets. |
|
|
Constraints
- The (name, description) combination must be unique. The name is allowed to be duplicate, because the same Communicator may have slightly different implementations which are reflected in the description column.
- The
id entry must be a power of 2 as the schema table uses a bitwise OR to indicate implementation of multiple base_schema.
schema table
The schema table holds information about the combinations of schemas implemented by the various results. This is the information provided in impl_schemas when construction parameter modls..
Column descriptions
| Column |
Description |
id |
ID of the entry in this table. |
schema |
This is a bitwise OR of all the ids in the base_schema table which contribute. |
|
|
A particular value of schema may for example be:
0: No schemas implemented
1: base_schema 1 is implemented
2: base_schema 1 is implemented
3: base_schema 1 and 2 are implemented
All the ids in the base_schema table are powers of two, or the value 0.
The schema column of the schema table is constrained to be schema <= SELECT SUM(id) FROM base_schema.
Constraints
- The
schema column must be unique. Multiple executions may reference the same entry in this table.
- The
schema entry must be a BITWISE OR of the ids in the base_schema table. This constraint is setup by separate triggers rather than on table creation.
base_schema table
The base_schema table holds the currently supported basic schema that may be provided. These are atomic, but a Task may implement multiple entries from this table.
| id |
name |
| 0 |
"none" |
| 1 |
"hdf5" |
| 2 |
|
| 4 |
|
Column descriptions
| Column |
Description |
id |
ID of the entry in this table. |
name |
A string representation for this base schema type |
|
|
The id column of this table is constrained to be the value 0 or a power of two: CHECK (id = 0 OR (id > 0 AND (id & (id - 1)) = 0)).
Constraints
- The name (and id) must be unique.
- The
id entry must be a power of 2 as the schema table uses a bitwise OR to indicate implementation of multiple base_schema.
parameters table
The parameters table holds combinations of parameters and their values.
| id |
execution_id |
meta_id |
name |
value |
| 2 |
3 |
3 |
"param_1" |
2 |
|
|
|
|
|
Column descriptions
| Column |
Description |
id |
ID of the entry in this table. |
execution_id |
Pointer to the execution where this parameter/value combination was used. |
meta_id |
Pointer to the entry in the param_meta table containing associated parameter meta data. |
name |
Parameter name. |
value |
Parameter value. |
|
|
Constraints
This table DOES NOT contain constraints. This breaks full normalization to some degree. Redundant entries are allowed.
The param_meta table holds meta data combinations that can be associated to a parameter/value combination. A single meta data set may be associated to multiple parameter/value entries. The meta data in this table is mostly used for internal LUTE objects, but also contains information about reconstructing the parameter type.
| id |
rename_param |
flag_type |
description |
is_result |
| 2 |
"new_param" |
"--" |
"new-param does X." |
0 |
|
|
|
|
|
Column descriptions
| Column |
Description |
id |
ID of the entry in this table. |
flag_type |
Specify the type of flag for passing this argument. One of "-", "--", or "" |
rename_param |
Change the name of the parameter as passed on the command-line. |
description |
Documentation of the parameter's usage or purpose. |
is_result |
bool. If the set_result Config option is True, we can set this to True to indicate a result. |
|
|
Constraints
- The combination of all columns (rename_param, flag_type, description, is_result) must be unique. Multiple parameters can reference the same row of this table.
parameter_types table
The parameter_types table holds type reconstructions.
| id |
type_name |
definition |
| 2 |
MyTypeClass |
"json_model_of_class" |
|
|
|
Column descriptions
| Column |
Description |
id |
ID of the entry in this table. |
type_name |
The name of the type - usually a Python class name. |
definition |
A json string that can be used to reconstruct the object. |
|
|
Note: Despite parameters themselves possibly having a complex type definitions, they do not reference entries in this table (either through the parameters or param_meta table). The overall TaskParameters object schema contains the definitions of all parameters, so the individual parameters do not need to also reference this table.
Constraints
- The
definition column must be unique. The type_name column is allowed to be repeated to account for the possibility (although small) that a TaskParameters object is updated over the lifetime of the database.
- The
definition is required to be NOT NULL. All entries in the table must contain a definition.
API
This API is intended to be used at the Executor level, with some calls intended to provide default values for Pydantic models. Utilities for reading and inspecting the database outside of normal Task execution are addressed in the following subheader.
Write
record_analysis_db(cfg: DescribedAnalysis) -> None: Writes the configuration to the backend database.
- ...
- ...
Read
read_latest_db_entry(db_dir: str, task_name: str, param: str) -> Any: Retrieve the most recent entry from a database for a specific Task.
- ...
- ...
Utilities
Scripts
invalidate_entry: Marks a database entry as invalid. Common reason to use this is if data has been deleted, or found to be corrupted.
- ...
TUI and GUI
dbview: TUI for database inspection. Read only.
- ...