From 16d13c5e466079677c1e98a63554e029710e8144 Mon Sep 17 00:00:00 2001 From: Jorge Lisa <64639359+AcquaDiGiorgio@users.noreply.github.com> Date: Mon, 9 Mar 2026 14:45:40 +0100 Subject: [PATCH 01/11] docs: LHCb Workflows and Commands --- docs/design/lhcb-workflows.md | 388 ++++++++++++++++++++++++++++++++++ 1 file changed, 388 insertions(+) create mode 100644 docs/design/lhcb-workflows.md diff --git a/docs/design/lhcb-workflows.md b/docs/design/lhcb-workflows.md new file mode 100644 index 0000000..064ffa5 --- /dev/null +++ b/docs/design/lhcb-workflows.md @@ -0,0 +1,388 @@ +# LHCb Workflow Commands + +## Types of workflows + +```mermaid +flowchart TD + subgraph "USER Job (setExecutable)" + direction TB + subgraph PreProcessing0[PreProcessing] + CreateDataFile0[CreateDataFile] + end + subgraph Processing0[Processing] + CommandLineTool0[CommandLineTool] + end + subgraph PostProcessing0[PostProcessing] + direction TB + FileUsage0[FileUsage] + UserJobFinalization0[UserJobFinalization] + + FileUsage0 ~~~ UserJobFinalization0 + end + PreProcessing0 --> Processing0 + Processing0 --> PostProcessing0 + end + + subgraph "USER Job (setApplication)" + direction TB + subgraph PreProcessing1[PreProcessing] + CreateDataFile1[CreateDataFile] + end + subgraph Processing1[Processing] + direction TB + LbRunApp1[LbRunApp] + AnalyseXmlSummary1[AnalyseXmlSummary] + LbRunApp1 --> AnalyseXmlSummary1 + end + subgraph PostProcessing1[PostProcessing] + direction TB + FileUsage1[FileUsage] + AnalyseFileAccess1[AnalyseFileAccess] + UserJobFinalization1[UserJobFinalization] + + FileUsage1 ~~~ AnalyseFileAccess1 + AnalyseFileAccess1 ~~~ UserJobFinalization1 + end + PreProcessing1 --> Processing1 + Processing1 --> PostProcessing1 + end +``` + +```mermaid +flowchart TD + subgraph "Simulation Job" + direction TB + subgraph Processing0[Processing] + direction TB + LbRunApp0[LbRunApp] + AnalyseXmlSummary0[AnalyseXmlSummary] + + LbRunApp0 --> AnalyseXmlSummary0 + end + subgraph PostProcessing0[PostProcessing] + direction TB + UploadLogFile0[UploadLogFile] + UploadOutputData0[UploadOutputData] + FailoverTransfer0[FailoverTransfer] + BookkeepingReport0[BookkeepingReport] + WorkflowAccounting0[WorkflowAccounting] + + UploadLogFile0 ~~~ UploadOutputData0 + UploadOutputData0 ~~~ FailoverTransfer0 + FailoverTransfer0 ~~~ BookkeepingReport0 + BookkeepingReport0 ~~~ WorkflowAccounting0 + end + + Processing0 --> PostProcessing0 + end + + subgraph "Reconstruction Job" + direction TB + subgraph Processing1[Processing] + direction TB + LbRunApp1[LbRunApp] + AnalyseXmlSummary1[AnalyseXmlSummary] + + LbRunApp1 --> AnalyseXmlSummary1 + end + subgraph PostProcessing1[PostProcessing] + direction TB + UploadLogFile1[UploadLogFile] + UploadOutputData1[UploadOutputData] + FailoverTransfer1[FailoverTransfer] + BookkeepingReport1[BookkeepingReport] + WorkflowAccounting1[WorkflowAccounting] + RemoveInputData1[RemoveInputData] + + UploadLogFile1 ~~~ UploadOutputData1 + UploadOutputData1 ~~~ RemoveInputData1 + RemoveInputData1 ~~~ FailoverTransfer1 + FailoverTransfer1 ~~~ BookkeepingReport1 + BookkeepingReport1 ~~~ WorkflowAccounting1 + end + Processing1 --> PostProcessing1 + end +``` + +The commands are not in sequence as they can be executed in any order because they don't depend on any other's outputs. +If this wasn't the case, that should be taken into account and ensure they are set in the required arrangement. + +## Relations between commands and DIRAC Components + +```mermaid +--- +title: Dirac-CWL commands +config: + flowchart: + defaultRenderer: "elk" + curve: linear +--- +flowchart LR + %% ====================== + %% DataManager + %% ====================== + + getDestinationSEList{{**getDestinationSEList**}} + getDestinationSEList getDestinationSEList_l@===> DataManager + getFileDescenents{{"**getFileDescenents (lhcb)**"}} + getFileDescenents getFileDescenents_l@===> DataManager + DataManager[**DataManager**] + + classDef DataManagerLink stroke:#A31E00 + classDef DataManagerNode fill:#FF542E,stroke:#A31E00,stroke-width:4px ; + class DataManager,getDestinationSEList,getFileDescenents DataManagerNode + class getDestinationSEList_l,getFileDescenents_l DataManagerLink + + %% ====================== + %% OpsHelper + %% ====================== + + getValue{{**getValue**}} + getValue getValue_opsHelper_l@===> OpsHelper + OpsHelper[**OpsHelper**] + + classDef OpsHelperLink stroke:#A35F00 + classDef OpsHelperNode fill:#FFA82E,stroke:#A35F00,stroke-width:4px ; + class getValue,OpsHelper OpsHelperNode + class getValue_opsHelper_l OpsHelperLink + + %% ====================== + %% StorageElement + %% ====================== + + getUrl{{**getUrl**}} + getUrl getUrl_l@===> StorageElement + putFile{{**putFile**}} + putFile putFile_l@===> StorageElement + StorageElement[**StorageElement**] + + classDef StorageElementLink stroke:#E6C300 ; + classDef StorageElementNode fill:#FFFC2E,stroke:#E6C300,stroke-width:4px ; + class getUrl,putFile,StorageElement StorageElementNode + class getUrl_l,putFile_l StorageElementLink + + %% ====================== + %% DataStoreClient + %% ====================== + + addRegister{{**addRegister**}} + addRegister addRegister_l@===> DataStoreClient + DataStoreClient[**DataStoreClient**] + + classDef DataStoreClientLink stroke:#75A300 ; + classDef DataStoreClientNode fill:#C4FF2E,stroke:#75A300,stroke-width:4px ; + class addRegister,DataStoreClient DataStoreClientNode + class addRegister_l DataStoreClientLink + + %% ====================== + %% FailverTransfer + %% ====================== + + transferAndRegisterFile{{**transferAndRegisterFile**}} + transferAndRegisterFile transferAndRegisterFile_l@===> FailoverTransfer + setFileReplicationRequest{{**_setFileReplicationRequest**}} + setFileReplicationRequest setFileReplicationRequest_l@===> FailoverTransfer + FailoverTransfer[**FailoverTransfer**] + + classDef FailoverTransferLink stroke:#3CA300 ; + classDef FailoverTransferNode fill:#7BFF2E,stroke:#3CA300,stroke-width:4px ; + class transferAndRegisterFile,setFileReplicationRequest,FailoverTransfer FailoverTransferNode + class transferAndRegisterFile_l,setFileReplicationRequest_l FailoverTransferLink + + %% ====================== + %% JobReport + %% ====================== + + setJobParameter{{**setJobParameter**}} + setJobParameter setJobParameter_l@===> JobReport + setApplicationStatus{{**setApplicationStatus**}} + setApplicationStatus setApplicationStatus_l@===> JobReport + JobReport[**JobReport**] + + classDef JobReportLink stroke:#00A354 ; + classDef JobReportNode fill:#2EFF9A,stroke:#00A354,stroke-width:4px ; + class setJobParameter,setApplicationStatus,JobReport JobReportNode + class setJobParameter_l,setApplicationStatus_l JobReportLink + + %% ====================== + %% BookkeepingClient + %% ====================== + + getFileMetadata{{**getFileMetadata**}} + getFileMetadata getFileMetadata_l@===> BookkeepingClient + sendXMLBookkeepingReport{{**sendXMLBookkeepingReport**}} + sendXMLBookkeepingReport sendXMLBookkeepingReport_l@===> BookkeepingClient + getFileTypes{{"**getFileTypes (lhcb)**"}} + getFileTypes getFileTypes_l@===> BookkeepingClient + BookkeepingClient[**BookkeepingClient**] + + classDef BookkeepingClientLink stroke:#00A383 ; + classDef BookkeepingClientNode fill:#2EFFD5,stroke:#00A383,stroke-width:4px ; + class getFileMetadata,sendXMLBookkeepingReport,getFileTypes,BookkeepingClient BookkeepingClientNode + class getFileMetadata_l,sendXMLBookkeepingReport_l,getFileTypes_l BookkeepingClientLink + + %% ====================== + %% FileReport + %% ====================== + + getFiles{{**getFiles**}} + getFiles getFiles_l@===> FileReport + setFileStatus{{**setFileStatus**}} + setFileStatus setFileStatus_l@===> FileReport + commit{{**commit**}} + commit commit_l@===> FileReport + generateForwardDISET{{**generateForwardDISET**}} + generateForwardDISET generateForwardDISET_l@===> FileReport + FileReport[**FileReport**] + + classDef FileReportLink stroke:#006AA3 ; + classDef FileReportNode fill:#2EB6FF,stroke:#006AA3,stroke-width:4px ; + class getFiles,setFileStatus,commit,generateForwardDISET,FileReport FileReportNode + class getFiles_l,setFileStatus_l,commit_l,generateForwardDISET_l FileReportLink + + %% ====================== + %% ConfigurationSystem + %% ====================== + + getValueGconf{{**getValue**}} + getValueGconf getValue_Gconf_l@===> ConfigurationSystem + ConfigurationSystem[**ConfigurationSystem**] + + classDef ConfigurationSystemLink stroke:#0034A3 ; + classDef ConfigurationSystemNode fill:#5C8FFF,stroke:#0034A3,stroke-width:4px ; + class getValueGconf,ConfigurationSystem ConfigurationSystemNode + class getValue_Gconf_l ConfigurationSystemLink + + %% ====================== + %% FileCatalog + %% ====================== + + addFile{{**addFile**}} + addFile addFile_l@===> FileCatalog + FileCatalog[**FileCatalog**] + + classDef FileCatalogLink stroke:#4400A3 ; + classDef FileCatalogNode fill:#A05CFF,stroke:#4400A3,stroke-width:4px ; + class addFile,FileCatalog FileCatalogNode + class addFile_l FileCatalogLink + + %% ====================== + %% Commands + %% ====================== + + UploadLogFile("UploadLogFile") + + UploadLogFile UploadLogFile_l1@===> getDestinationSEList + UploadLogFile UploadLogFile_l2@===> getValue + UploadLogFile UploadLogFile_l3@===> getUrl + UploadLogFile UploadLogFile_l4@===> putFile + UploadLogFile UploadLogFile_l5@===> transferAndRegisterFile + UploadLogFile UploadLogFile_l6@===> setJobParameter + UploadLogFile UploadLogFile_l7@===> setApplicationStatus + UploadLogFile UploadLogFile_l8@===> getFileTypes + + class UploadLogFile_l1 DataManagerLink + class UploadLogFile_l2 OpsHelperLink + class UploadLogFile_l3,UploadLogFile_l4 StorageElementLink + class UploadLogFile_l5 FailoverTransferLink + class UploadLogFile_l6,UploadLogFile_l7 JobReportLink + class UploadLogFile_l8 BookkeepingClientLink + + %% ====================== + + UploadOutputData("UploadOutputData") + + UploadOutputData UploadOutputData_l1@===> transferAndRegisterFile + UploadOutputData UploadOutputData_l2@===> setJobParameter + UploadOutputData UploadOutputData_l3@===> setApplicationStatus + UploadOutputData UploadOutputData_l4@===> sendXMLBookkeepingReport + UploadOutputData UploadOutputData_l5@===> addFile + + class UploadOutputData_l1 FailoverTransferLink + class UploadOutputData_l2,UploadOutputData_l3 JobReportLink + class UploadOutputData_l4 BookkeepingClientLink + class UploadOutputData_l5 FileCatalogLink + + %% ====================== + + RemoveInputData("RemoveInputData") + + RemoveInputData RemoveInputData_l1@===> getFileDescenents + RemoveInputData RemoveInputData_l2@===> setApplicationStatus + + class RemoveInputData_l1 DataManagerLink + class RemoveInputData_l2 JobReportLink + + %% ====================== + + FailoverTransferC("FailoverTransfer") + + FailoverTransferC FailoverTransferC_l1@===> getFiles + FailoverTransferC FailoverTransferC_l2@===> setFileStatus + FailoverTransferC FailoverTransferC_l3@===> generateForwardDISET + FailoverTransferC FailoverTransferC_l4@===> commit + + class FailoverTransferC_l1,FailoverTransferC_l2,FailoverTransferC_l3,FailoverTransferC_l4 FileReportLink + + %% ====================== + + BookkeepingReport("BookkeepingReport") + + BookkeepingReport BookkeepingReport_l1@===> setApplicationStatus + BookkeepingReport BookkeepingReport_l2@===> getFileMetadata + BookkeepingReport BookkeepingReport_l3@===> getValueGconf + + class BookkeepingReport_l1,BookkeepingReport_l2 JobReportLink + class BookkeepingReport_l3 ConfigurationSystemLink + + %% ====================== + + WorklflowAccounting("WorklflowAccounting + (StepAccounting)") + + WorklflowAccounting WorklflowAccounting_l1@===> addRegister + WorklflowAccounting WorklflowAccounting_l2@===> getValueGconf + class WorklflowAccounting_l1 DataStoreClientLink + class WorklflowAccounting_l2 ConfigurationSystemLink + + %% ====================== + + AnaliseFileAccess("AnaliseFileAccess") + + UserJobFinalization("UserJobFinalization") + + UserJobFinalization UserJobFinalization_l1@===> transferAndRegisterFile + UserJobFinalization UserJobFinalization_l2@===> setFileReplicationRequest + UserJobFinalization UserJobFinalization_l3@===> setJobParameter + UserJobFinalization UserJobFinalization_l4@===> setApplicationStatus + + class UserJobFinalization_l1,UserJobFinalization_l2 FailoverTransferLink + class UserJobFinalization_l3,UserJobFinalization_l4 JobReportLink + + %% ====================== + + FileUsage("FileUsage") + + FileUsage FileUsage_l1@===> getValueGconf + class FileUsage_l1 ConfigurationSystemLink +``` + +## Command's inputs & outputs + +| Command | Consumes | Creates | Requires | +| --- | --- | --- | --- | +| CreateDataFile | Inputs | data.py | poolXMLCatName | +| UploadLogFile | Outputs | N/A | JobID ProductionID Namespace ConfigVersion | +| UploadOutputData | Outputs Inputs XMLSummary.xml | N/A | OutputDataStep OutputList OutputMode ProductionOutputData SiteName | +| RemoveInputData | Inputs | N/A | N/A | +| FailoverTransfer | Inputs | request.json | N/A | +| BookkeepingReport | Outputs | bookkeeping.xml | StepID ApplicationName ApplicationVersion StartTime ProductionId StepNumber SiteName JobType | +| WorkflowAccounting | N/A | N/A | RunNumber ProdID EventType SiteName ProcessingStep CpuTime NormCpuTime InputsStats OutputStats InputEvents OutputEvents EventTime NProcs JobGroup FinalState | +| AnalyseFileAccess | XMLSummary.xml pool_xml_catalog.xml | N/A | N/A | +| UserJobFinalization | UserOutputData | bookkeeping.xml | JobId UserOutputSE SiteName UserOutputPath ReplicateUserOutData UserOutputLFNPrep | + +**Legend:** + +- **Consumes**: Files that will be processed +- **Creates**: Files that generates +- **Requires**: Extra information required from the parameters or DIRAC From 0f5729b92a60acbce836cfeeefb6fb3c9df5e050 Mon Sep 17 00:00:00 2001 From: Jorge Lisa <64639359+AcquaDiGiorgio@users.noreply.github.com> Date: Thu, 12 Mar 2026 09:25:33 +0100 Subject: [PATCH 02/11] docs(lhcb-workflows): Add comparisons between current and new system --- docs/design/lhcb-workflows.md | 270 +++++++++++++++++++++++++--------- 1 file changed, 202 insertions(+), 68 deletions(-) diff --git a/docs/design/lhcb-workflows.md b/docs/design/lhcb-workflows.md index 064ffa5..2b62f04 100644 --- a/docs/design/lhcb-workflows.md +++ b/docs/design/lhcb-workflows.md @@ -2,28 +2,69 @@ ## Types of workflows +Currently, the Workflow Modules execute in a predefined order. + +For the new approach with CWL, the modules are called "commands" and can be executed in any order, because they don't depend on any other's outputs. However, an order has to be defined while defining the `JobType`, which can be the same as the current order. + +### USER Job (setExecutable) + ```mermaid -flowchart TD - subgraph "USER Job (setExecutable)" +flowchart LR + subgraph Current + direction TB + + CreateDataFile0[CreateDataFile] + LHCbScript0[LHCbScript] + FileUsage0[FileUsage] + UserJobFinalization0[UserJobFinalization] + + CreateDataFile0 --> LHCbScript0 + LHCbScript0 --> FileUsage0 + FileUsage0 --> UserJobFinalization0 + end + + subgraph New direction TB - subgraph PreProcessing0[PreProcessing] - CreateDataFile0[CreateDataFile] + subgraph PreProcessing1[PreProcessing] + CreateDataFile1[CreateDataFile] end - subgraph Processing0[Processing] - CommandLineTool0[CommandLineTool] + subgraph Processing1[Processing] + CommandLineTool1[CommandLineTool] end - subgraph PostProcessing0[PostProcessing] + subgraph PostProcessing1[PostProcessing] direction TB - FileUsage0[FileUsage] - UserJobFinalization0[UserJobFinalization] - - FileUsage0 ~~~ UserJobFinalization0 + FileUsage1[FileUsage] + UserJobFinalization1[UserJobFinalization] + + FileUsage1 ~~~ UserJobFinalization1 end - PreProcessing0 --> Processing0 - Processing0 --> PostProcessing0 + PreProcessing1 --> Processing1 + Processing1 --> PostProcessing1 end - - subgraph "USER Job (setApplication)" + + Current ~~~ New +``` + +### USER Job (setApplication) + +```mermaid +flowchart LR + subgraph Current + direction TB + + CreateDataFile0[CreateDataFile] + GaudiApplication0[GaudiApplication] + FileUsage0[FileUsage] + AnalyseFileAccess0[AnalyseFileAccess] + UserJobFinalization0[UserJobFinalization] + + CreateDataFile0 --> GaudiApplication0 + GaudiApplication0 --> FileUsage0 + FileUsage0 --> AnalyseFileAccess0 + AnalyseFileAccess0 --> UserJobFinalization0 + end + + subgraph New direction TB subgraph PreProcessing1[PreProcessing] CreateDataFile1[CreateDataFile] @@ -31,62 +72,153 @@ flowchart TD subgraph Processing1[Processing] direction TB LbRunApp1[LbRunApp] - AnalyseXmlSummary1[AnalyseXmlSummary] - LbRunApp1 --> AnalyseXmlSummary1 end subgraph PostProcessing1[PostProcessing] direction TB + AnalyseXmlSummary1[AnalyseXmlSummary] FileUsage1[FileUsage] AnalyseFileAccess1[AnalyseFileAccess] UserJobFinalization1[UserJobFinalization] - + + AnalyseXmlSummary1 ~~~ FileUsage1 FileUsage1 ~~~ AnalyseFileAccess1 AnalyseFileAccess1 ~~~ UserJobFinalization1 end PreProcessing1 --> Processing1 Processing1 --> PostProcessing1 end + + Current ~~~ New ``` +### Simulation Job + +For this type of job and for the following one (Reconstruction), currently we have some kind of processing and a post-processing step. The main difference with the new approach is that the processing step also contained modules and as this step could be executed multiple times, so did those modules. + +Now, as we moved those commands out of the processing step, the commands that used to execute multiple times, now they need to deal with multiple outputs at a time, as they only execute once. + ```mermaid -flowchart TD - subgraph "Simulation Job" +flowchart LR + direction TB + + subgraph Current direction TB - subgraph Processing0[Processing] + + + subgraph Processing0[''Processing''] direction TB - LbRunApp0[LbRunApp] + + GaudiApplication0[GaudiApplication] AnalyseXmlSummary0[AnalyseXmlSummary] - - LbRunApp0 --> AnalyseXmlSummary0 + ErrorLogging0[ErrorLogging] + BookkeepingReport0[BookkeepingReport] + StepAccounting0[StepAccounting] + + + GaudiApplication0 --> AnalyseXmlSummary0 + AnalyseXmlSummary0 --> ErrorLogging0 + ErrorLogging0 --> BookkeepingReport0 + BookkeepingReport0 --> StepAccounting0 end - subgraph PostProcessing0[PostProcessing] - direction TB - UploadLogFile0[UploadLogFile] + + subgraph PostProcessing0[''PostProcessing''] + direction TB + UploadOutputData0[UploadOutputData] + UploadLogFile0[UploadLogFile] + UploadMC0[UploadMC] FailoverTransfer0[FailoverTransfer] - BookkeepingReport0[BookkeepingReport] - WorkflowAccounting0[WorkflowAccounting] - - UploadLogFile0 ~~~ UploadOutputData0 - UploadOutputData0 ~~~ FailoverTransfer0 - FailoverTransfer0 ~~~ BookkeepingReport0 - BookkeepingReport0 ~~~ WorkflowAccounting0 + + UploadOutputData0 --> UploadLogFile0 + UploadLogFile0 --> UploadMC0 + UploadMC0 --> FailoverTransfer0 end - + Processing0 --> PostProcessing0 + end - - subgraph "Reconstruction Job" + + subgraph New direction TB subgraph Processing1[Processing] direction TB LbRunApp1[LbRunApp] + end + subgraph PostProcessing1[PostProcessing] + direction TB AnalyseXmlSummary1[AnalyseXmlSummary] + UploadLogFile1[UploadLogFile] + UploadOutputData1[UploadOutputData] + FailoverTransfer1[FailoverTransfer] + BookkeepingReport1[BookkeepingReport] + WorkflowAccounting1[WorkflowAccounting] + + AnalyseXmlSummary1 ~~~ UploadLogFile1 + UploadLogFile1 ~~~ UploadOutputData1 + UploadOutputData1 ~~~ FailoverTransfer1 + FailoverTransfer1 ~~~ BookkeepingReport1 + BookkeepingReport1 ~~~ WorkflowAccounting1 + end + + Processing1 --> PostProcessing1 + end + + Current ~~~ New +``` + +### Reconstruction Job + +```mermaid +flowchart LR + subgraph Current + direction TB + + + subgraph Processing0[''Processing''] + direction TB + + GaudiApplication0[GaudiApplication] + AnalyseXmlSummary0[AnalyseXmlSummary] + ErrorLogging0[ErrorLogging] + BookkeepingReport0[BookkeepingReport] + StepAccounting0[StepAccounting] + + + GaudiApplication0 --> AnalyseXmlSummary0 + AnalyseXmlSummary0 --> ErrorLogging0 + ErrorLogging0 --> BookkeepingReport0 + BookkeepingReport0 --> StepAccounting0 + end + + subgraph PostProcessing0[''PostProcessing''] + direction TB + + UploadOutputData0[UploadOutputData] + RemoveInputData0[RemoveInputData] + UploadLogFile0[UploadLogFile] + UploadMC0[UploadMC] + FailoverTransfer0[FailoverTransfer] + + UploadOutputData0 --> RemoveInputData0 + RemoveInputData0 --> UploadLogFile0 + UploadLogFile0 --> UploadMC0 + UploadMC0 --> FailoverTransfer0 + end + + Processing0 --> PostProcessing0 + + end + + subgraph New + direction TB + subgraph Processing1[Processing] + direction TB + LbRunApp1[LbRunApp] - LbRunApp1 --> AnalyseXmlSummary1 end subgraph PostProcessing1[PostProcessing] direction TB + AnalyseXmlSummary1[AnalyseXmlSummary] UploadLogFile1[UploadLogFile] UploadOutputData1[UploadOutputData] FailoverTransfer1[FailoverTransfer] @@ -94,6 +226,7 @@ flowchart TD WorkflowAccounting1[WorkflowAccounting] RemoveInputData1[RemoveInputData] + AnalyseXmlSummary1 ~~~ UploadLogFile1 UploadLogFile1 ~~~ UploadOutputData1 UploadOutputData1 ~~~ RemoveInputData1 RemoveInputData1 ~~~ FailoverTransfer1 @@ -102,10 +235,9 @@ flowchart TD end Processing1 --> PostProcessing1 end -``` -The commands are not in sequence as they can be executed in any order because they don't depend on any other's outputs. -If this wasn't the case, that should be taken into account and ensure they are set in the required arrangement. + Current ~~~ New +``` ## Relations between commands and DIRAC Components @@ -118,9 +250,9 @@ config: curve: linear --- flowchart LR - %% ====================== + %% ====================== %% DataManager - %% ====================== + %% ====================== getDestinationSEList{{**getDestinationSEList**}} getDestinationSEList getDestinationSEList_l@===> DataManager @@ -133,9 +265,9 @@ flowchart LR class DataManager,getDestinationSEList,getFileDescenents DataManagerNode class getDestinationSEList_l,getFileDescenents_l DataManagerLink - %% ====================== + %% ====================== %% OpsHelper - %% ====================== + %% ====================== getValue{{**getValue**}} getValue getValue_opsHelper_l@===> OpsHelper @@ -145,10 +277,10 @@ flowchart LR classDef OpsHelperNode fill:#FFA82E,stroke:#A35F00,stroke-width:4px ; class getValue,OpsHelper OpsHelperNode class getValue_opsHelper_l OpsHelperLink - - %% ====================== + + %% ====================== %% StorageElement - %% ====================== + %% ====================== getUrl{{**getUrl**}} getUrl getUrl_l@===> StorageElement @@ -161,9 +293,9 @@ flowchart LR class getUrl,putFile,StorageElement StorageElementNode class getUrl_l,putFile_l StorageElementLink - %% ====================== + %% ====================== %% DataStoreClient - %% ====================== + %% ====================== addRegister{{**addRegister**}} addRegister addRegister_l@===> DataStoreClient @@ -174,24 +306,24 @@ flowchart LR class addRegister,DataStoreClient DataStoreClientNode class addRegister_l DataStoreClientLink - %% ====================== + %% ====================== %% FailverTransfer - %% ====================== + %% ====================== transferAndRegisterFile{{**transferAndRegisterFile**}} transferAndRegisterFile transferAndRegisterFile_l@===> FailoverTransfer setFileReplicationRequest{{**_setFileReplicationRequest**}} setFileReplicationRequest setFileReplicationRequest_l@===> FailoverTransfer FailoverTransfer[**FailoverTransfer**] - + classDef FailoverTransferLink stroke:#3CA300 ; classDef FailoverTransferNode fill:#7BFF2E,stroke:#3CA300,stroke-width:4px ; class transferAndRegisterFile,setFileReplicationRequest,FailoverTransfer FailoverTransferNode class transferAndRegisterFile_l,setFileReplicationRequest_l FailoverTransferLink - %% ====================== + %% ====================== %% JobReport - %% ====================== + %% ====================== setJobParameter{{**setJobParameter**}} setJobParameter setJobParameter_l@===> JobReport @@ -204,9 +336,9 @@ flowchart LR class setJobParameter,setApplicationStatus,JobReport JobReportNode class setJobParameter_l,setApplicationStatus_l JobReportLink - %% ====================== + %% ====================== %% BookkeepingClient - %% ====================== + %% ====================== getFileMetadata{{**getFileMetadata**}} getFileMetadata getFileMetadata_l@===> BookkeepingClient @@ -221,9 +353,9 @@ flowchart LR class getFileMetadata,sendXMLBookkeepingReport,getFileTypes,BookkeepingClient BookkeepingClientNode class getFileMetadata_l,sendXMLBookkeepingReport_l,getFileTypes_l BookkeepingClientLink - %% ====================== + %% ====================== %% FileReport - %% ====================== + %% ====================== getFiles{{**getFiles**}} getFiles getFiles_l@===> FileReport @@ -240,9 +372,9 @@ flowchart LR class getFiles,setFileStatus,commit,generateForwardDISET,FileReport FileReportNode class getFiles_l,setFileStatus_l,commit_l,generateForwardDISET_l FileReportLink - %% ====================== + %% ====================== %% ConfigurationSystem - %% ====================== + %% ====================== getValueGconf{{**getValue**}} getValueGconf getValue_Gconf_l@===> ConfigurationSystem @@ -253,9 +385,9 @@ flowchart LR class getValueGconf,ConfigurationSystem ConfigurationSystemNode class getValue_Gconf_l ConfigurationSystemLink - %% ====================== + %% ====================== %% FileCatalog - %% ====================== + %% ====================== addFile{{**addFile**}} addFile addFile_l@===> FileCatalog @@ -269,9 +401,9 @@ flowchart LR %% ====================== %% Commands %% ====================== - + UploadLogFile("UploadLogFile") - + UploadLogFile UploadLogFile_l1@===> getDestinationSEList UploadLogFile UploadLogFile_l2@===> getValue UploadLogFile UploadLogFile_l3@===> getUrl @@ -316,7 +448,7 @@ flowchart LR %% ====================== FailoverTransferC("FailoverTransfer") - + FailoverTransferC FailoverTransferC_l1@===> getFiles FailoverTransferC FailoverTransferC_l2@===> setFileStatus FailoverTransferC FailoverTransferC_l3@===> generateForwardDISET @@ -355,20 +487,22 @@ flowchart LR UserJobFinalization UserJobFinalization_l2@===> setFileReplicationRequest UserJobFinalization UserJobFinalization_l3@===> setJobParameter UserJobFinalization UserJobFinalization_l4@===> setApplicationStatus - + class UserJobFinalization_l1,UserJobFinalization_l2 FailoverTransferLink class UserJobFinalization_l3,UserJobFinalization_l4 JobReportLink %% ====================== FileUsage("FileUsage") - + FileUsage FileUsage_l1@===> getValueGconf class FileUsage_l1 ConfigurationSystemLink ``` ## Command's inputs & outputs +Some commands have been removed, such as `UploadMC` or `ErrorLogging`, so they won't appear in this table. + | Command | Consumes | Creates | Requires | | --- | --- | --- | --- | | CreateDataFile | Inputs | data.py | poolXMLCatName | From 56f1a0b2b213d2353a71ac223a6a64c803b3d343 Mon Sep 17 00:00:00 2001 From: Jorge Lisa <64639359+AcquaDiGiorgio@users.noreply.github.com> Date: Thu, 12 Mar 2026 10:04:24 +0100 Subject: [PATCH 03/11] docs(lhcb-workflows): Add brief explanation for each command --- docs/design/lhcb-workflows.md | 40 +++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/docs/design/lhcb-workflows.md b/docs/design/lhcb-workflows.md index 2b62f04..34753b4 100644 --- a/docs/design/lhcb-workflows.md +++ b/docs/design/lhcb-workflows.md @@ -520,3 +520,43 @@ Some commands have been removed, such as `UploadMC` or `ErrorLogging`, so they w - **Consumes**: Files that will be processed - **Creates**: Files that generates - **Requires**: Extra information required from the parameters or DIRAC + +### CreateDataFile + +Creates a `data.py` data file from the inputs to be used by Ganga. + +### AnalyseXMLSummary + +Performs a series of checks on the XMLSummary output to make sure the execution was done correctly. + +### BookkeepingReport + +Generates a bookkeeping report file based on the XMLSummary and the pool XML catalog. + +### WorkflowAccounting + +Prepare and send accounting information to the DIRAC Accounting system. + +### FileUsage + +Report file usage to a DataFileUsage service. + +### UploadOutputData + +Registers every output generated to the corresponding SE and to the Master Catalog or to the FailoverSE in case of failure. + +### FailoverTransfer + +Commits the status of the files in the file report. The status will be "Processed" if everything ended properly or "Unused" if it did not. + +### UploadLogFile + +Uploads a compressed list of outputs to a DIRAC LogSE. + +### RemoveInputData + +Removes the inputs and their replicas (if any) from every SE and File Catalog. + +### AnalyseFileAccess + +Uses the XMLCatalog and XMLSummary to check if the access of each input file was successful or not. From 83bf433526904831363424e2c5c50e2e540734e0 Mon Sep 17 00:00:00 2001 From: Jorge Lisa <64639359+AcquaDiGiorgio@users.noreply.github.com> Date: Thu, 12 Mar 2026 10:07:27 +0100 Subject: [PATCH 04/11] docs(lhcb-workflows): Fix spelling --- docs/design/lhcb-workflows.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/design/lhcb-workflows.md b/docs/design/lhcb-workflows.md index 34753b4..bcfa9f3 100644 --- a/docs/design/lhcb-workflows.md +++ b/docs/design/lhcb-workflows.md @@ -256,14 +256,14 @@ flowchart LR getDestinationSEList{{**getDestinationSEList**}} getDestinationSEList getDestinationSEList_l@===> DataManager - getFileDescenents{{"**getFileDescenents (lhcb)**"}} - getFileDescenents getFileDescenents_l@===> DataManager + getFileDescendants{{"**getFileDescendants (lhcb)**"}} + getFileDescendants getFileDescendants_l@===> DataManager DataManager[**DataManager**] classDef DataManagerLink stroke:#A31E00 classDef DataManagerNode fill:#FF542E,stroke:#A31E00,stroke-width:4px ; - class DataManager,getDestinationSEList,getFileDescenents DataManagerNode - class getDestinationSEList_l,getFileDescenents_l DataManagerLink + class DataManager,getDestinationSEList,getFileDescendants DataManagerNode + class getDestinationSEList_l,getFileDescendants_l DataManagerLink %% ====================== %% OpsHelper @@ -307,7 +307,7 @@ flowchart LR class addRegister_l DataStoreClientLink %% ====================== - %% FailverTransfer + %% FailoverTransfer %% ====================== transferAndRegisterFile{{**transferAndRegisterFile**}} @@ -439,7 +439,7 @@ flowchart LR RemoveInputData("RemoveInputData") - RemoveInputData RemoveInputData_l1@===> getFileDescenents + RemoveInputData RemoveInputData_l1@===> getFileDescendants RemoveInputData RemoveInputData_l2@===> setApplicationStatus class RemoveInputData_l1 DataManagerLink @@ -469,13 +469,13 @@ flowchart LR %% ====================== - WorklflowAccounting("WorklflowAccounting + WorkflowAccounting("WorkflowAccounting (StepAccounting)") - WorklflowAccounting WorklflowAccounting_l1@===> addRegister - WorklflowAccounting WorklflowAccounting_l2@===> getValueGconf - class WorklflowAccounting_l1 DataStoreClientLink - class WorklflowAccounting_l2 ConfigurationSystemLink + WorkflowAccounting WorkflowAccounting_l1@===> addRegister + WorkflowAccounting WorkflowAccounting_l2@===> getValueGconf + class WorkflowAccounting_l1 DataStoreClientLink + class WorkflowAccounting_l2 ConfigurationSystemLink %% ====================== From 8349284c4177e5c10b477cd2c2504b8d2f749a54 Mon Sep 17 00:00:00 2001 From: Jorge Lisa <64639359+AcquaDiGiorgio@users.noreply.github.com> Date: Thu, 26 Mar 2026 16:36:10 +0100 Subject: [PATCH 05/11] fix(docs): Correct lhcb-workflows --- docs/design/lhcb-workflows.md | 111 ++++++++++++++++++++++++---------- 1 file changed, 79 insertions(+), 32 deletions(-) diff --git a/docs/design/lhcb-workflows.md b/docs/design/lhcb-workflows.md index bcfa9f3..87836d8 100644 --- a/docs/design/lhcb-workflows.md +++ b/docs/design/lhcb-workflows.md @@ -4,7 +4,9 @@ Currently, the Workflow Modules execute in a predefined order. -For the new approach with CWL, the modules are called "commands" and can be executed in any order, because they don't depend on any other's outputs. However, an order has to be defined while defining the `JobType`, which can be the same as the current order. +For the new approach with CWL, the modules are called "commands". The order of execution of the commands has to be defined while creating the `JobType`, which can be the same as the current order. + +To select the proper order of the commands, the developer needs to take into account what each one generates and consumes. For LHCb commands, most of them do not need to be on any specific order, except a few such as `UploadOutputData` and `BookkeepingReport`, where the former depends on the latter and `FailoverRequest` and `UserJobFinalization`, which need to execute last, as they commits the operation requests. ### USER Job (setExecutable) @@ -36,7 +38,7 @@ flowchart LR FileUsage1[FileUsage] UserJobFinalization1[UserJobFinalization] - FileUsage1 ~~~ UserJobFinalization1 + FileUsage1 --> UserJobFinalization1 end PreProcessing1 --> Processing1 Processing1 --> PostProcessing1 @@ -127,11 +129,11 @@ flowchart LR UploadOutputData0[UploadOutputData] UploadLogFile0[UploadLogFile] UploadMC0[UploadMC] - FailoverTransfer0[FailoverTransfer] + FailoverRequest0[FailoverRequest] UploadOutputData0 --> UploadLogFile0 UploadLogFile0 --> UploadMC0 - UploadMC0 --> FailoverTransfer0 + UploadMC0 --> FailoverRequest0 end Processing0 --> PostProcessing0 @@ -149,15 +151,15 @@ flowchart LR AnalyseXmlSummary1[AnalyseXmlSummary] UploadLogFile1[UploadLogFile] UploadOutputData1[UploadOutputData] - FailoverTransfer1[FailoverTransfer] + FailoverRequest1[FailoverRequest] BookkeepingReport1[BookkeepingReport] WorkflowAccounting1[WorkflowAccounting] AnalyseXmlSummary1 ~~~ UploadLogFile1 - UploadLogFile1 ~~~ UploadOutputData1 - UploadOutputData1 ~~~ FailoverTransfer1 - FailoverTransfer1 ~~~ BookkeepingReport1 - BookkeepingReport1 ~~~ WorkflowAccounting1 + UploadLogFile1 ~~~ WorkflowAccounting1 + WorkflowAccounting1 ~~~ BookkeepingReport1 + BookkeepingReport1 --> UploadOutputData1 + UploadOutputData1 --> FailoverRequest1 end Processing1 --> PostProcessing1 @@ -197,12 +199,12 @@ flowchart LR RemoveInputData0[RemoveInputData] UploadLogFile0[UploadLogFile] UploadMC0[UploadMC] - FailoverTransfer0[FailoverTransfer] + FailoverRequest0[FailoverRequest] UploadOutputData0 --> RemoveInputData0 RemoveInputData0 --> UploadLogFile0 UploadLogFile0 --> UploadMC0 - UploadMC0 --> FailoverTransfer0 + UploadMC0 --> FailoverRequest0 end Processing0 --> PostProcessing0 @@ -221,17 +223,19 @@ flowchart LR AnalyseXmlSummary1[AnalyseXmlSummary] UploadLogFile1[UploadLogFile] UploadOutputData1[UploadOutputData] - FailoverTransfer1[FailoverTransfer] + FailoverRequest1[FailoverRequest] BookkeepingReport1[BookkeepingReport] WorkflowAccounting1[WorkflowAccounting] RemoveInputData1[RemoveInputData] + AnalyseXmlSummary1 ~~~ UploadLogFile1 - UploadLogFile1 ~~~ UploadOutputData1 - UploadOutputData1 ~~~ RemoveInputData1 - RemoveInputData1 ~~~ FailoverTransfer1 - FailoverTransfer1 ~~~ BookkeepingReport1 - BookkeepingReport1 ~~~ WorkflowAccounting1 + UploadLogFile1 ~~~ RemoveInputData1 + RemoveInputData1 ~~~ WorkflowAccounting1 + WorkflowAccounting1 ~~~ BookkeepingReport1 + BookkeepingReport1 --> UploadOutputData1 + UploadOutputData1 --> FailoverRequest1 + end Processing1 --> PostProcessing1 end @@ -258,12 +262,17 @@ flowchart LR getDestinationSEList getDestinationSEList_l@===> DataManager getFileDescendants{{"**getFileDescendants (lhcb)**"}} getFileDescendants getFileDescendants_l@===> DataManager + removeFile{{**removeFile**}} + removeFile removeFile_l@===> DataManager + getSiteSEMapping{{**getSiteSEMapping**}} + getSiteSEMapping getSiteSEMapping_l@===> DataManager + DataManager[**DataManager**] classDef DataManagerLink stroke:#A31E00 classDef DataManagerNode fill:#FF542E,stroke:#A31E00,stroke-width:4px ; - class DataManager,getDestinationSEList,getFileDescendants DataManagerNode - class getDestinationSEList_l,getFileDescendants_l DataManagerLink + class DataManager,getDestinationSEList,getFileDescendants,removeFile,getSiteSEMapping DataManagerNode + class getDestinationSEList_l,getFileDescendants_l,removeFile_l,getSiteSEMapping_l DataManagerLink %% ====================== %% OpsHelper @@ -398,6 +407,19 @@ flowchart LR class addFile,FileCatalog FileCatalogNode class addFile_l FileCatalogLink + %% ====================== + %% DataUsage + %% ====================== + + sendDataUsageReport{{**sendDataUsageReport**}} + sendDataUsageReport sendDataUsageReport_l@===> DataUsageClient + DataUsageClient[**DataUsageClient**] + + classDef DataUsageClientLink stroke:#A98B76 ; + classDef DataUsageClientNode fill:#BFA28C,stroke:#A98B76,stroke-width:4px ; + class sendDataUsageReport,DataUsageClient DataUsageClientNode + class sendDataUsageReport_l DataUsageClientLink + %% ====================== %% Commands %% ====================== @@ -429,17 +451,20 @@ flowchart LR UploadOutputData UploadOutputData_l3@===> setApplicationStatus UploadOutputData UploadOutputData_l4@===> sendXMLBookkeepingReport UploadOutputData UploadOutputData_l5@===> addFile + UploadOutputData UploadOutputData_l6@===> getFileDescendants + UploadOutputData UploadOutputData_l7@===> getSiteSEMapping class UploadOutputData_l1 FailoverTransferLink class UploadOutputData_l2,UploadOutputData_l3 JobReportLink class UploadOutputData_l4 BookkeepingClientLink class UploadOutputData_l5 FileCatalogLink + class UploadOutputData_l6,UploadOutputData_l7 DataManagerLink %% ====================== RemoveInputData("RemoveInputData") - RemoveInputData RemoveInputData_l1@===> getFileDescendants + RemoveInputData RemoveInputData_l1@===> removeFile RemoveInputData RemoveInputData_l2@===> setApplicationStatus class RemoveInputData_l1 DataManagerLink @@ -447,14 +472,14 @@ flowchart LR %% ====================== - FailoverTransferC("FailoverTransfer") + FailoverRequest("FailoverRequest") - FailoverTransferC FailoverTransferC_l1@===> getFiles - FailoverTransferC FailoverTransferC_l2@===> setFileStatus - FailoverTransferC FailoverTransferC_l3@===> generateForwardDISET - FailoverTransferC FailoverTransferC_l4@===> commit + FailoverRequest FailoverRequest_l1@===> getFiles + FailoverRequest FailoverRequest_l2@===> setFileStatus + FailoverRequest FailoverRequest_l3@===> generateForwardDISET + FailoverRequest FailoverRequest_l4@===> commit - class FailoverTransferC_l1,FailoverTransferC_l2,FailoverTransferC_l3,FailoverTransferC_l4 FileReportLink + class FailoverRequest_l1,FailoverRequest_l2,FailoverRequest_l3,FailoverRequest_l4 FileReportLink %% ====================== @@ -464,7 +489,8 @@ flowchart LR BookkeepingReport BookkeepingReport_l2@===> getFileMetadata BookkeepingReport BookkeepingReport_l3@===> getValueGconf - class BookkeepingReport_l1,BookkeepingReport_l2 JobReportLink + class BookkeepingReport_l1 JobReportLink + class BookkeepingReport_l2 BookkeepingClientLink class BookkeepingReport_l3 ConfigurationSystemLink %% ====================== @@ -479,7 +505,12 @@ flowchart LR %% ====================== - AnaliseFileAccess("AnaliseFileAccess") + AnalyseFileAccess("AnalyseFileAccess") + + AnalyseFileAccess AnalyseFileAccess_l1@===> addRegister + class AnalyseFileAccess_l1 DataStoreClientLink + + %% ====================== UserJobFinalization("UserJobFinalization") @@ -496,7 +527,22 @@ flowchart LR FileUsage("FileUsage") FileUsage FileUsage_l1@===> getValueGconf + FileUsage FileUsage_l2@===> sendDataUsageReport class FileUsage_l1 ConfigurationSystemLink + class FileUsage_l2 DataUsageClientLink + + %% ====================== + + AnalyseXMLSummary("AnalyseXMLSummary") + + AnalyseXMLSummary AnalyseXMLSummary_l1@===> getFileTypes + AnalyseXMLSummary AnalyseXMLSummary_l2@===> setApplicationStatus + AnalyseXMLSummary AnalyseXMLSummary_l3@===> setFileStatus + + class AnalyseXMLSummary_l1 BookkeepingClientLink + class AnalyseXMLSummary_l2 JobReportLink + class AnalyseXMLSummary_l3 FileReportLink + ``` ## Command's inputs & outputs @@ -507,13 +553,14 @@ Some commands have been removed, such as `UploadMC` or `ErrorLogging`, so they w | --- | --- | --- | --- | | CreateDataFile | Inputs | data.py | poolXMLCatName | | UploadLogFile | Outputs | N/A | JobID ProductionID Namespace ConfigVersion | -| UploadOutputData | Outputs Inputs XMLSummary.xml | N/A | OutputDataStep OutputList OutputMode ProductionOutputData SiteName | +| UploadOutputData | Outputs Inputs XMLSummary.xml bookkeeping.xml | N/A | OutputDataStep OutputList OutputMode ProductionOutputData SiteName | | RemoveInputData | Inputs | N/A | N/A | -| FailoverTransfer | Inputs | request.json | N/A | +| FailoverRequest | Inputs | request.json | N/A | | BookkeepingReport | Outputs | bookkeeping.xml | StepID ApplicationName ApplicationVersion StartTime ProductionId StepNumber SiteName JobType | | WorkflowAccounting | N/A | N/A | RunNumber ProdID EventType SiteName ProcessingStep CpuTime NormCpuTime InputsStats OutputStats InputEvents OutputEvents EventTime NProcs JobGroup FinalState | | AnalyseFileAccess | XMLSummary.xml pool_xml_catalog.xml | N/A | N/A | -| UserJobFinalization | UserOutputData | bookkeeping.xml | JobId UserOutputSE SiteName UserOutputPath ReplicateUserOutData UserOutputLFNPrep | +| UserJobFinalization | UserOutputData | request.json | JobId UserOutputSE SiteName UserOutputPath ReplicateUserOutData UserOutputLFNPrep | +| AnalyzeXmlSummary | XMLSummary.xml | N/A | ProdId ApplicationName | **Legend:** @@ -545,7 +592,7 @@ Report file usage to a DataFileUsage service. Registers every output generated to the corresponding SE and to the Master Catalog or to the FailoverSE in case of failure. -### FailoverTransfer +### FailoverRequest Commits the status of the files in the file report. The status will be "Processed" if everything ended properly or "Unused" if it did not. From d52dd62bbf664273375749d6ad66a24e0757e4d3 Mon Sep 17 00:00:00 2001 From: Jorge Lisa <64639359+AcquaDiGiorgio@users.noreply.github.com> Date: Thu, 26 Mar 2026 16:49:18 +0100 Subject: [PATCH 06/11] chore(docs): Improve styling in lhcb command relationship flowchart --- docs/design/lhcb-workflows.md | 191 ++++++++++++++++++++++++---------- 1 file changed, 135 insertions(+), 56 deletions(-) diff --git a/docs/design/lhcb-workflows.md b/docs/design/lhcb-workflows.md index 87836d8..e9cf8a8 100644 --- a/docs/design/lhcb-workflows.md +++ b/docs/design/lhcb-workflows.md @@ -258,19 +258,27 @@ flowchart LR %% DataManager %% ====================== - getDestinationSEList{{**getDestinationSEList**}} + DataManager[DataManager] + + %% Functions + + getDestinationSEList{{getDestinationSEList}} getDestinationSEList getDestinationSEList_l@===> DataManager - getFileDescendants{{"**getFileDescendants (lhcb)**"}} + + getFileDescendants{{"getFileDescendants (lhcb)"}} getFileDescendants getFileDescendants_l@===> DataManager - removeFile{{**removeFile**}} + + removeFile{{removeFile}} removeFile removeFile_l@===> DataManager - getSiteSEMapping{{**getSiteSEMapping**}} + + getSiteSEMapping{{getSiteSEMapping}} getSiteSEMapping getSiteSEMapping_l@===> DataManager - DataManager[**DataManager**] + %% Styling + + classDef DataManagerLink stroke:#C00707 + classDef DataManagerNode fill:#FF4400,stroke:#C00707,stroke-width:4px,color:black,font-weight:bold ; - classDef DataManagerLink stroke:#A31E00 - classDef DataManagerNode fill:#FF542E,stroke:#A31E00,stroke-width:4px ; class DataManager,getDestinationSEList,getFileDescendants,removeFile,getSiteSEMapping DataManagerNode class getDestinationSEList_l,getFileDescendants_l,removeFile_l,getSiteSEMapping_l DataManagerLink @@ -278,12 +286,18 @@ flowchart LR %% OpsHelper %% ====================== - getValue{{**getValue**}} + OpsHelper[OpsHelper] + + %% Functions + + getValue{{getValue}} getValue getValue_opsHelper_l@===> OpsHelper - OpsHelper[**OpsHelper**] + + %% Styling classDef OpsHelperLink stroke:#A35F00 - classDef OpsHelperNode fill:#FFA82E,stroke:#A35F00,stroke-width:4px ; + classDef OpsHelperNode fill:#FFA82E,stroke:#A35F00,stroke-width:4px,color:black,font-weight:bold ; + class getValue,OpsHelper OpsHelperNode class getValue_opsHelper_l OpsHelperLink @@ -291,14 +305,21 @@ flowchart LR %% StorageElement %% ====================== - getUrl{{**getUrl**}} + StorageElement[StorageElement] + + %% Functions + + getUrl{{getUrl}} getUrl getUrl_l@===> StorageElement - putFile{{**putFile**}} + + putFile{{putFile}} putFile putFile_l@===> StorageElement - StorageElement[**StorageElement**] - classDef StorageElementLink stroke:#E6C300 ; - classDef StorageElementNode fill:#FFFC2E,stroke:#E6C300,stroke-width:4px ; + %% Styling + + classDef StorageElementLink stroke:#FFA240 ; + classDef StorageElementNode fill:#FFD41D,stroke:#FFA240,stroke-width:4px,color:black,font-weight:bold ; + class getUrl,putFile,StorageElement StorageElementNode class getUrl_l,putFile_l StorageElementLink @@ -306,12 +327,18 @@ flowchart LR %% DataStoreClient %% ====================== - addRegister{{**addRegister**}} + DataStoreClient[DataStoreClient] + + %% Functions + + addRegister{{addRegister}} addRegister addRegister_l@===> DataStoreClient - DataStoreClient[**DataStoreClient**] - classDef DataStoreClientLink stroke:#75A300 ; - classDef DataStoreClientNode fill:#C4FF2E,stroke:#75A300,stroke-width:4px ; + %% Styling + + classDef DataStoreClientLink stroke:#BBC863 ; + classDef DataStoreClientNode fill:#F0E491,stroke:#BBC863,stroke-width:4px,color:black,font-weight:bold ; + class addRegister,DataStoreClient DataStoreClientNode class addRegister_l DataStoreClientLink @@ -319,14 +346,21 @@ flowchart LR %% FailoverTransfer %% ====================== - transferAndRegisterFile{{**transferAndRegisterFile**}} + FailoverTransfer[FailoverTransfer] + + %% Functions + + transferAndRegisterFile{{transferAndRegisterFile}} transferAndRegisterFile transferAndRegisterFile_l@===> FailoverTransfer - setFileReplicationRequest{{**_setFileReplicationRequest**}} + + setFileReplicationRequest{{_setFileReplicationRequest}} setFileReplicationRequest setFileReplicationRequest_l@===> FailoverTransfer - FailoverTransfer[**FailoverTransfer**] - classDef FailoverTransferLink stroke:#3CA300 ; - classDef FailoverTransferNode fill:#7BFF2E,stroke:#3CA300,stroke-width:4px ; + %% Styling + + classDef FailoverTransferLink stroke:#237227 ; + classDef FailoverTransferNode fill:#519A66,stroke:#237227,stroke-width:4px,color:black,font-weight:bold ; + class transferAndRegisterFile,setFileReplicationRequest,FailoverTransfer FailoverTransferNode class transferAndRegisterFile_l,setFileReplicationRequest_l FailoverTransferLink @@ -334,14 +368,21 @@ flowchart LR %% JobReport %% ====================== - setJobParameter{{**setJobParameter**}} + JobReport[JobReport] + + %% Functions + + setJobParameter{{setJobParameter}} setJobParameter setJobParameter_l@===> JobReport - setApplicationStatus{{**setApplicationStatus**}} + + setApplicationStatus{{setApplicationStatus}} setApplicationStatus setApplicationStatus_l@===> JobReport - JobReport[**JobReport**] - classDef JobReportLink stroke:#00A354 ; - classDef JobReportNode fill:#2EFF9A,stroke:#00A354,stroke-width:4px ; + %% Styling + + classDef JobReportLink stroke:#9AB17A ; + classDef JobReportNode fill:#C3CC9B,stroke:#9AB17A,stroke-width:4px,color:black,font-weight:bold ; + class setJobParameter,setApplicationStatus,JobReport JobReportNode class setJobParameter_l,setApplicationStatus_l JobReportLink @@ -349,16 +390,24 @@ flowchart LR %% BookkeepingClient %% ====================== - getFileMetadata{{**getFileMetadata**}} + BookkeepingClient[BookkeepingClient] + + %% Functions + + getFileMetadata{{getFileMetadata}} getFileMetadata getFileMetadata_l@===> BookkeepingClient - sendXMLBookkeepingReport{{**sendXMLBookkeepingReport**}} + + sendXMLBookkeepingReport{{sendXMLBookkeepingReport}} sendXMLBookkeepingReport sendXMLBookkeepingReport_l@===> BookkeepingClient - getFileTypes{{"**getFileTypes (lhcb)**"}} + + getFileTypes{{"getFileTypes (lhcb)"}} getFileTypes getFileTypes_l@===> BookkeepingClient - BookkeepingClient[**BookkeepingClient**] - classDef BookkeepingClientLink stroke:#00A383 ; - classDef BookkeepingClientNode fill:#2EFFD5,stroke:#00A383,stroke-width:4px ; + %% Styling + + classDef BookkeepingClientLink stroke:#81A6C6 ; + classDef BookkeepingClientNode fill:#AACDDC,stroke:#81A6C6,stroke-width:4px,color:black,font-weight:bold ; + class getFileMetadata,sendXMLBookkeepingReport,getFileTypes,BookkeepingClient BookkeepingClientNode class getFileMetadata_l,sendXMLBookkeepingReport_l,getFileTypes_l BookkeepingClientLink @@ -366,18 +415,27 @@ flowchart LR %% FileReport %% ====================== - getFiles{{**getFiles**}} + FileReport[FileReport] + + %% Functions + + getFiles{{getFiles}} getFiles getFiles_l@===> FileReport - setFileStatus{{**setFileStatus**}} + + setFileStatus{{setFileStatus}} setFileStatus setFileStatus_l@===> FileReport - commit{{**commit**}} + + commit{{commit}} commit commit_l@===> FileReport - generateForwardDISET{{**generateForwardDISET**}} + + generateForwardDISET{{generateForwardDISET}} generateForwardDISET generateForwardDISET_l@===> FileReport - FileReport[**FileReport**] - classDef FileReportLink stroke:#006AA3 ; - classDef FileReportNode fill:#2EB6FF,stroke:#006AA3,stroke-width:4px ; + %% Styling + + classDef FileReportLink stroke:#FE81D4 ; + classDef FileReportNode fill:#FAACBF,stroke:#FE81D4,stroke-width:4px,color:black,font-weight:bold ; + class getFiles,setFileStatus,commit,generateForwardDISET,FileReport FileReportNode class getFiles_l,setFileStatus_l,commit_l,generateForwardDISET_l FileReportLink @@ -385,12 +443,18 @@ flowchart LR %% ConfigurationSystem %% ====================== - getValueGconf{{**getValue**}} + ConfigurationSystem[ConfigurationSystem] + + %% Functions + + getValueGconf{{getValue}} getValueGconf getValue_Gconf_l@===> ConfigurationSystem - ConfigurationSystem[**ConfigurationSystem**] - classDef ConfigurationSystemLink stroke:#0034A3 ; - classDef ConfigurationSystemNode fill:#5C8FFF,stroke:#0034A3,stroke-width:4px ; + %% Styling + + classDef ConfigurationSystemLink stroke:#0B2D72 ; + classDef ConfigurationSystemNode fill:#0992C2,stroke:#0B2D72,stroke-width:4px,color:black,font-weight:bold ; + class getValueGconf,ConfigurationSystem ConfigurationSystemNode class getValue_Gconf_l ConfigurationSystemLink @@ -398,12 +462,18 @@ flowchart LR %% FileCatalog %% ====================== - addFile{{**addFile**}} + FileCatalog[FileCatalog] + + %% Functions + + addFile{{addFile}} addFile addFile_l@===> FileCatalog - FileCatalog[**FileCatalog**] + + %% Styling classDef FileCatalogLink stroke:#4400A3 ; - classDef FileCatalogNode fill:#A05CFF,stroke:#4400A3,stroke-width:4px ; + classDef FileCatalogNode fill:#A05CFF,stroke:#4400A3,stroke-width:4px,color:black,font-weight:bold ; + class addFile,FileCatalog FileCatalogNode class addFile_l FileCatalogLink @@ -411,12 +481,18 @@ flowchart LR %% DataUsage %% ====================== - sendDataUsageReport{{**sendDataUsageReport**}} + DataUsageClient[DataUsageClient] + + %% Functions + + sendDataUsageReport{{sendDataUsageReport}} sendDataUsageReport sendDataUsageReport_l@===> DataUsageClient - DataUsageClient[**DataUsageClient**] + + %% Styling classDef DataUsageClientLink stroke:#A98B76 ; - classDef DataUsageClientNode fill:#BFA28C,stroke:#A98B76,stroke-width:4px ; + classDef DataUsageClientNode fill:#BFA28C,stroke:#A98B76,stroke-width:4px,color:black,font-weight:bold ; + class sendDataUsageReport,DataUsageClient DataUsageClientNode class sendDataUsageReport_l DataUsageClientLink @@ -500,6 +576,7 @@ flowchart LR WorkflowAccounting WorkflowAccounting_l1@===> addRegister WorkflowAccounting WorkflowAccounting_l2@===> getValueGconf + class WorkflowAccounting_l1 DataStoreClientLink class WorkflowAccounting_l2 ConfigurationSystemLink @@ -508,6 +585,7 @@ flowchart LR AnalyseFileAccess("AnalyseFileAccess") AnalyseFileAccess AnalyseFileAccess_l1@===> addRegister + class AnalyseFileAccess_l1 DataStoreClientLink %% ====================== @@ -528,6 +606,7 @@ flowchart LR FileUsage FileUsage_l1@===> getValueGconf FileUsage FileUsage_l2@===> sendDataUsageReport + class FileUsage_l1 ConfigurationSystemLink class FileUsage_l2 DataUsageClientLink @@ -562,11 +641,11 @@ Some commands have been removed, such as `UploadMC` or `ErrorLogging`, so they w | UserJobFinalization | UserOutputData | request.json | JobId UserOutputSE SiteName UserOutputPath ReplicateUserOutData UserOutputLFNPrep | | AnalyzeXmlSummary | XMLSummary.xml | N/A | ProdId ApplicationName | -**Legend:** +Legend: -- **Consumes**: Files that will be processed -- **Creates**: Files that generates -- **Requires**: Extra information required from the parameters or DIRAC +- Consumes: Files that will be processed +- Creates: Files that generates +- Requires: Extra information required from the parameters or DIRAC ### CreateDataFile From 47b7bac1fda23ea1945d3eb8af128df333eff070 Mon Sep 17 00:00:00 2001 From: Jorge Lisa <64639359+AcquaDiGiorgio@users.noreply.github.com> Date: Fri, 27 Mar 2026 15:34:13 +0100 Subject: [PATCH 07/11] chore(docs): Change lhcb-workflows JobType flowcharts to state diagrams chore(docs): Rewrite lhcb-workflows explanations --- docs/design/lhcb-workflows.md | 479 +++++++++++++++++++--------------- 1 file changed, 265 insertions(+), 214 deletions(-) diff --git a/docs/design/lhcb-workflows.md b/docs/design/lhcb-workflows.md index e9cf8a8..3062583 100644 --- a/docs/design/lhcb-workflows.md +++ b/docs/design/lhcb-workflows.md @@ -2,245 +2,296 @@ ## Types of workflows -Currently, the Workflow Modules execute in a predefined order. +For the new LHCb Workflows approach with CWL, the modules are called "commands" and the order of execution of the commands has to be defined while creating the `JobType`, which can be the same as the current order. -For the new approach with CWL, the modules are called "commands". The order of execution of the commands has to be defined while creating the `JobType`, which can be the same as the current order. +Every `JobType` has to define certain pre-processing and post-processing steps containing a list of command. That list can be empty and will always execute in the same order. -To select the proper order of the commands, the developer needs to take into account what each one generates and consumes. For LHCb commands, most of them do not need to be on any specific order, except a few such as `UploadOutputData` and `BookkeepingReport`, where the former depends on the latter and `FailoverRequest` and `UserJobFinalization`, which need to execute last, as they commits the operation requests. +Also a few modules have been removed, as they are no longer needed. ### USER Job (setExecutable) ```mermaid -flowchart LR - subgraph Current - direction TB - - CreateDataFile0[CreateDataFile] - LHCbScript0[LHCbScript] - FileUsage0[FileUsage] - UserJobFinalization0[UserJobFinalization] - - CreateDataFile0 --> LHCbScript0 - LHCbScript0 --> FileUsage0 - FileUsage0 --> UserJobFinalization0 - end - - subgraph New - direction TB - subgraph PreProcessing1[PreProcessing] - CreateDataFile1[CreateDataFile] - end - subgraph Processing1[Processing] - CommandLineTool1[CommandLineTool] - end - subgraph PostProcessing1[PostProcessing] - direction TB - FileUsage1[FileUsage] - UserJobFinalization1[UserJobFinalization] - - FileUsage1 --> UserJobFinalization1 - end - PreProcessing1 --> Processing1 - Processing1 --> PostProcessing1 - end - - Current ~~~ New +--- +config: + layout: elk +--- +stateDiagram + direction TB + + state Current { + + CreateDataFile_Old: CreateDataFile + LHCbScript_Old: LHCbScript + FileUsage_Old: FileUsage + UserJobFinalization_Old: UserJobFinalization + + [*] --> CreateDataFile_Old + CreateDataFile_Old --> LHCbScript_Old + LHCbScript_Old --> FileUsage_Old + FileUsage_Old --> UserJobFinalization_Old + UserJobFinalization_Old --> [*] + } + + state New { + PreProcessing_New: PreProcessing + Processing_New: Processing + PostProcessing_New: PostProcessing + + state PreProcessing_New { + CreateDataFile_New: CreateDataFile + + [*] --> CreateDataFile_New + } + + state Processing_New { + CommandLineTool_New: CommandLineTool + + [*] --> CommandLineTool_New + } + + state PostProcessing_New { + FileUsage_New: FileUsage + UserJobFinalization_New: UserJobFinalization + + [*] --> FileUsage_New + FileUsage_New --> UserJobFinalization_New + } + + [*] --> PreProcessing_New + PreProcessing_New --> Processing_New + Processing_New --> PostProcessing_New + PostProcessing_New --> [*] + } ``` ### USER Job (setApplication) ```mermaid -flowchart LR - subgraph Current - direction TB - - CreateDataFile0[CreateDataFile] - GaudiApplication0[GaudiApplication] - FileUsage0[FileUsage] - AnalyseFileAccess0[AnalyseFileAccess] - UserJobFinalization0[UserJobFinalization] - - CreateDataFile0 --> GaudiApplication0 - GaudiApplication0 --> FileUsage0 - FileUsage0 --> AnalyseFileAccess0 - AnalyseFileAccess0 --> UserJobFinalization0 - end - - subgraph New - direction TB - subgraph PreProcessing1[PreProcessing] - CreateDataFile1[CreateDataFile] - end - subgraph Processing1[Processing] - direction TB - LbRunApp1[LbRunApp] - end - subgraph PostProcessing1[PostProcessing] - direction TB - AnalyseXmlSummary1[AnalyseXmlSummary] - FileUsage1[FileUsage] - AnalyseFileAccess1[AnalyseFileAccess] - UserJobFinalization1[UserJobFinalization] - - AnalyseXmlSummary1 ~~~ FileUsage1 - FileUsage1 ~~~ AnalyseFileAccess1 - AnalyseFileAccess1 ~~~ UserJobFinalization1 - end - PreProcessing1 --> Processing1 - Processing1 --> PostProcessing1 - end - - Current ~~~ New +--- +config: + layout: elk +--- +stateDiagram + direction TB + + state Current { + CreateDataFile_Old: CreateDataFile + GaudiApplication_Old: GaudiApplication + FileUsage_Old: FileUsage + AnalyseFileAccess_Old: AnalyseFileAccess + UserJobFinalization_Old: UserJobFinalization + + [*] --> CreateDataFile_Old + CreateDataFile_Old --> GaudiApplication_Old + GaudiApplication_Old --> FileUsage_Old + FileUsage_Old --> AnalyseFileAccess_Old + AnalyseFileAccess_Old --> UserJobFinalization_Old + UserJobFinalization_Old --> [*] + } + + state New { + state PreProcessing { + [*] --> CreateDataFile_New + CreateDataFile_New: CreateDataFile + } + + state Processing { + [*] --> LbRunApp_New + LbRunApp_New: LbRunApp + } + + state PostProcessing { + FileUsage_New: FileUsage + AnalyseFileAccess_New: AnalyseFileAccess + UserJobFinalization_New: UserJobFinalization + + [*] --> FileUsage_New + FileUsage_New --> AnalyseFileAccess_New + AnalyseFileAccess_New --> UserJobFinalization_New + } + + [*] --> PreProcessing + PreProcessing --> Processing + Processing --> PostProcessing + PostProcessing --> [*] + } ``` ### Simulation Job -For this type of job and for the following one (Reconstruction), currently we have some kind of processing and a post-processing step. The main difference with the new approach is that the processing step also contained modules and as this step could be executed multiple times, so did those modules. +For this type of job and for the following one (Reconstruction), currently we have some kind of processing and a post-processing (Finalization) step. The main difference with the new approach is that the old processing step also contained modules. As this step could be executed multiple times, so did those modules. -Now, as we moved those commands out of the processing step, the commands that used to execute multiple times, now they need to deal with multiple outputs at a time, as they only execute once. +Now, the corresponding commands got moved out of the processing step, which forces them to deal with multiple outputs at a time, as they only execute once. ```mermaid -flowchart LR +--- +config: + layout: elk + nodeSpacing: 10 +--- +stateDiagram direction TB - subgraph Current - direction TB - - - subgraph Processing0[''Processing''] - direction TB - - GaudiApplication0[GaudiApplication] - AnalyseXmlSummary0[AnalyseXmlSummary] - ErrorLogging0[ErrorLogging] - BookkeepingReport0[BookkeepingReport] - StepAccounting0[StepAccounting] - - - GaudiApplication0 --> AnalyseXmlSummary0 - AnalyseXmlSummary0 --> ErrorLogging0 - ErrorLogging0 --> BookkeepingReport0 - BookkeepingReport0 --> StepAccounting0 - end - - subgraph PostProcessing0[''PostProcessing''] - direction TB - - UploadOutputData0[UploadOutputData] - UploadLogFile0[UploadLogFile] - UploadMC0[UploadMC] - FailoverRequest0[FailoverRequest] - - UploadOutputData0 --> UploadLogFile0 - UploadLogFile0 --> UploadMC0 - UploadMC0 --> FailoverRequest0 - end - - Processing0 --> PostProcessing0 - - end - - subgraph New - direction TB - subgraph Processing1[Processing] - direction TB - LbRunApp1[LbRunApp] - end - subgraph PostProcessing1[PostProcessing] - direction TB - AnalyseXmlSummary1[AnalyseXmlSummary] - UploadLogFile1[UploadLogFile] - UploadOutputData1[UploadOutputData] - FailoverRequest1[FailoverRequest] - BookkeepingReport1[BookkeepingReport] - WorkflowAccounting1[WorkflowAccounting] - - AnalyseXmlSummary1 ~~~ UploadLogFile1 - UploadLogFile1 ~~~ WorkflowAccounting1 - WorkflowAccounting1 ~~~ BookkeepingReport1 - BookkeepingReport1 --> UploadOutputData1 - UploadOutputData1 --> FailoverRequest1 - end - - Processing1 --> PostProcessing1 - end - - Current ~~~ New + state Current { + Processing_Old: Processing + PostProcessing_Old: Finalization + + state Processing_Old { + GaudiApplication_Old: GaudiApplication + AnalyseXmlSummary_Old: AnalyseXmlSummary + ErrorLogging_Old: ErrorLogging + BookkeepingReport_Old: BookkeepingReport + StepAccounting_Old: StepAccounting + + [*] --> GaudiApplication_Old + GaudiApplication_Old --> AnalyseXmlSummary_Old + AnalyseXmlSummary_Old --> ErrorLogging_Old + ErrorLogging_Old --> BookkeepingReport_Old + BookkeepingReport_Old --> StepAccounting_Old + } + + state PostProcessing_Old { + UploadOutputData_Old: UploadOutputData + UploadLogFile_Old: UploadLogFile + UploadMC_Old: UploadMC + FailoverRequest_Old: FailoverRequest + + [*] --> UploadOutputData_Old + UploadOutputData_Old --> UploadLogFile_Old + UploadLogFile_Old --> UploadMC_Old + UploadMC_Old --> FailoverRequest_Old + } + + [*] --> Processing_Old + Processing_Old --> Processing_Old + Processing_Old --> PostProcessing_Old + PostProcessing_Old --> [*] + } + + state New { + PreProcessing_New: PreProcessing + Processing_New: Processing + PostProcessing_New: PostProcessing + + state PreProcessing_New { + [*] + } + + state Processing_New { + LbRunApp_New: LbRunApp + + [*] --> LbRunApp_New + } + + state PostProcessing_New { + AnalyseXmlSummary_New: AnalyseXmlSummary + UploadLogFile_New: UploadLogFile + UploadOutputData_New: UploadOutputData + FailoverRequest_New: FailoverRequest + BookkeepingReport_New: BookkeepingReport + WorkflowAccounting_New: WorkflowAccounting + + [*] --> AnalyseXmlSummary_New + AnalyseXmlSummary_New --> BookkeepingReport_New + BookkeepingReport_New --> WorkflowAccounting_New + WorkflowAccounting_New --> UploadOutputData_New + UploadOutputData_New --> UploadLogFile_New + UploadLogFile_New --> FailoverRequest_New + } + + [*] --> PreProcessing_New + PreProcessing_New --> Processing_New + Processing_New --> PostProcessing_New + PostProcessing_New --> [*] + } ``` ### Reconstruction Job ```mermaid -flowchart LR - subgraph Current - direction TB - - - subgraph Processing0[''Processing''] - direction TB - - GaudiApplication0[GaudiApplication] - AnalyseXmlSummary0[AnalyseXmlSummary] - ErrorLogging0[ErrorLogging] - BookkeepingReport0[BookkeepingReport] - StepAccounting0[StepAccounting] - - - GaudiApplication0 --> AnalyseXmlSummary0 - AnalyseXmlSummary0 --> ErrorLogging0 - ErrorLogging0 --> BookkeepingReport0 - BookkeepingReport0 --> StepAccounting0 - end - - subgraph PostProcessing0[''PostProcessing''] - direction TB - - UploadOutputData0[UploadOutputData] - RemoveInputData0[RemoveInputData] - UploadLogFile0[UploadLogFile] - UploadMC0[UploadMC] - FailoverRequest0[FailoverRequest] - - UploadOutputData0 --> RemoveInputData0 - RemoveInputData0 --> UploadLogFile0 - UploadLogFile0 --> UploadMC0 - UploadMC0 --> FailoverRequest0 - end - - Processing0 --> PostProcessing0 - - end - - subgraph New - direction TB - subgraph Processing1[Processing] - direction TB - LbRunApp1[LbRunApp] - - end - subgraph PostProcessing1[PostProcessing] - direction TB - AnalyseXmlSummary1[AnalyseXmlSummary] - UploadLogFile1[UploadLogFile] - UploadOutputData1[UploadOutputData] - FailoverRequest1[FailoverRequest] - BookkeepingReport1[BookkeepingReport] - WorkflowAccounting1[WorkflowAccounting] - RemoveInputData1[RemoveInputData] - - - AnalyseXmlSummary1 ~~~ UploadLogFile1 - UploadLogFile1 ~~~ RemoveInputData1 - RemoveInputData1 ~~~ WorkflowAccounting1 - WorkflowAccounting1 ~~~ BookkeepingReport1 - BookkeepingReport1 --> UploadOutputData1 - UploadOutputData1 --> FailoverRequest1 - - end - Processing1 --> PostProcessing1 - end - - Current ~~~ New +--- +config: + layout: elk +--- +stateDiagram + direction TB + + state Current { + Processing_Old: Processing + PostProcessing_Old: Finalization + + state Processing_Old { + GaudiApplication_Old: GaudiApplication + AnalyseXmlSummary_Old: AnalyseXmlSummary + ErrorLogging_Old: ErrorLogging + BookkeepingReport_Old: BookkeepingReport + StepAccounting_Old: StepAccounting + + [*] --> GaudiApplication_Old + GaudiApplication_Old --> AnalyseXmlSummary_Old + AnalyseXmlSummary_Old --> ErrorLogging_Old + ErrorLogging_Old --> BookkeepingReport_Old + BookkeepingReport_Old --> StepAccounting_Old + } + + state PostProcessing_Old { + UploadOutputData_Old: UploadOutputData + RemoveInputData_Old: RemoveInputData + UploadLogFile_Old: UploadLogFile + UploadMC_Old: UploadMC + FailoverRequest_Old: FailoverRequest + + [*] --> UploadOutputData_Old + UploadOutputData_Old --> RemoveInputData_Old + RemoveInputData_Old --> UploadLogFile_Old + UploadLogFile_Old --> UploadMC_Old + UploadMC_Old --> FailoverRequest_Old + } + + [*] --> Processing_Old + Processing_Old --> PostProcessing_Old + Processing_Old --> Processing_Old + PostProcessing_Old --> [*] + } + + state New { + PreProcessing_New: PreProcessing + Processing_New: Processing + PostProcessing_New: PostProcessing + + state PreProcessing_New { + [*] + } + + state Processing_New { + LbRunApp_New: LbRunApp + + [*] --> LbRunApp_New + } + + state PostProcessing_New { + AnalyseXmlSummary_New: AnalyseXmlSummary + UploadLogFile_New: UploadLogFile + UploadOutputData_New: UploadOutputData + FailoverRequest_New: FailoverRequest + BookkeepingReport_New: BookkeepingReport + WorkflowAccounting_New: WorkflowAccounting + RemoveInputData_New: RemoveInputData + + [*] --> AnalyseXmlSummary_New + AnalyseXmlSummary_New --> BookkeepingReport_New + BookkeepingReport_New --> WorkflowAccounting_New + WorkflowAccounting_New --> UploadOutputData_New + UploadOutputData_New --> RemoveInputData_New + RemoveInputData_New --> UploadLogFile_New + UploadLogFile_New --> FailoverRequest_New + } + + [*] --> PreProcessing_New + PreProcessing_New --> Processing_New + Processing_New --> PostProcessing_New + PostProcessing_New --> [*] + } ``` ## Relations between commands and DIRAC Components From 81024ec5bae04e2f97d8847e7631a34afe203a16 Mon Sep 17 00:00:00 2001 From: Jorge Lisa <64639359+AcquaDiGiorgio@users.noreply.github.com> Date: Thu, 9 Apr 2026 11:56:16 +0200 Subject: [PATCH 08/11] fix(docs): lhcb-workflows typo --- docs/design/lhcb-workflows.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/lhcb-workflows.md b/docs/design/lhcb-workflows.md index 3062583..7c69f32 100644 --- a/docs/design/lhcb-workflows.md +++ b/docs/design/lhcb-workflows.md @@ -690,7 +690,7 @@ Some commands have been removed, such as `UploadMC` or `ErrorLogging`, so they w | WorkflowAccounting | N/A | N/A | RunNumber ProdID EventType SiteName ProcessingStep CpuTime NormCpuTime InputsStats OutputStats InputEvents OutputEvents EventTime NProcs JobGroup FinalState | | AnalyseFileAccess | XMLSummary.xml pool_xml_catalog.xml | N/A | N/A | | UserJobFinalization | UserOutputData | request.json | JobId UserOutputSE SiteName UserOutputPath ReplicateUserOutData UserOutputLFNPrep | -| AnalyzeXmlSummary | XMLSummary.xml | N/A | ProdId ApplicationName | +| AnalyseXmlSummary | XMLSummary.xml | N/A | ProdId ApplicationName | Legend: From 80b1c5243c92a2c391c499ca69b6962c0f166f6c Mon Sep 17 00:00:00 2001 From: Jorge Lisa <64639359+AcquaDiGiorgio@users.noreply.github.com> Date: Tue, 14 Apr 2026 16:12:25 +0200 Subject: [PATCH 09/11] chore(docs): Add forks for parallelly executed commands chore: Update processing stage to show dirac-cwl command --- docs/design/lhcb-workflows.md | 139 ++++++++++++++++++++++------------ 1 file changed, 91 insertions(+), 48 deletions(-) diff --git a/docs/design/lhcb-workflows.md b/docs/design/lhcb-workflows.md index 7c69f32..bd4bdd1 100644 --- a/docs/design/lhcb-workflows.md +++ b/docs/design/lhcb-workflows.md @@ -4,17 +4,13 @@ For the new LHCb Workflows approach with CWL, the modules are called "commands" and the order of execution of the commands has to be defined while creating the `JobType`, which can be the same as the current order. -Every `JobType` has to define certain pre-processing and post-processing steps containing a list of command. That list can be empty and will always execute in the same order. +Every `JobType` has to define certain pre-processing and post-processing steps containing a list of command. That list can be empty and will always execute in the same order. However, certain commands could be executed simultaneously. This is shown with a fork in the state diagrams. Also a few modules have been removed, as they are no longer needed. ### USER Job (setExecutable) ```mermaid ---- -config: - layout: elk ---- stateDiagram direction TB @@ -44,9 +40,9 @@ stateDiagram } state Processing_New { - CommandLineTool_New: CommandLineTool + Workflow_New: dirac-cwl workflow.cwl - [*] --> CommandLineTool_New + [*] --> Workflow_New } state PostProcessing_New { @@ -67,10 +63,6 @@ stateDiagram ### USER Job (setApplication) ```mermaid ---- -config: - layout: elk ---- stateDiagram direction TB @@ -90,30 +82,51 @@ stateDiagram } state New { - state PreProcessing { - [*] --> CreateDataFile_New + PreProcessing_New: PreProcessing + Processing_New: Processing + PostProcessing_New: PostProcessing + + state PreProcessing_New { CreateDataFile_New: CreateDataFile + + [*] --> CreateDataFile_New } - state Processing { - [*] --> LbRunApp_New - LbRunApp_New: LbRunApp + state Processing_New { + Execution_New: dirac-cwl workflow.cwl + + state Execution_New { + CLT_New: CommandLineTool + + state CLT_New { + LbRunApp_New: LbRunApp + [*] --> LbRunApp_New + } + } + + [*] --> Execution_New } - state PostProcessing { + state PostProcessing_New { + state fork_state <> + state join_state <> + FileUsage_New: FileUsage AnalyseFileAccess_New: AnalyseFileAccess UserJobFinalization_New: UserJobFinalization - [*] --> FileUsage_New - FileUsage_New --> AnalyseFileAccess_New - AnalyseFileAccess_New --> UserJobFinalization_New + [*] --> fork_state + fork_state --> FileUsage_New + fork_state --> AnalyseFileAccess_New + FileUsage_New --> join_state + AnalyseFileAccess_New --> join_state + join_state --> UserJobFinalization_New } - [*] --> PreProcessing - PreProcessing --> Processing - Processing --> PostProcessing - PostProcessing --> [*] + [*] --> PreProcessing_New + PreProcessing_New --> Processing_New + Processing_New --> PostProcessing_New + PostProcessing_New --> [*] } ``` @@ -124,11 +137,6 @@ For this type of job and for the following one (Reconstruction), currently we ha Now, the corresponding commands got moved out of the processing step, which forces them to deal with multiple outputs at a time, as they only execute once. ```mermaid ---- -config: - layout: elk - nodeSpacing: 10 ---- stateDiagram direction TB @@ -178,12 +186,24 @@ stateDiagram } state Processing_New { - LbRunApp_New: LbRunApp + Execution_New: dirac-cwl workflow.cwl + + state Execution_New { + CLT_New: CommandLineTool + + state CLT_New { + LbRunApp_New: LbRunApp + [*] --> LbRunApp_New + } + } - [*] --> LbRunApp_New + [*] --> Execution_New } state PostProcessing_New { + state fork_state <> + state join_state <> + AnalyseXmlSummary_New: AnalyseXmlSummary UploadLogFile_New: UploadLogFile UploadOutputData_New: UploadOutputData @@ -192,11 +212,15 @@ stateDiagram WorkflowAccounting_New: WorkflowAccounting [*] --> AnalyseXmlSummary_New - AnalyseXmlSummary_New --> BookkeepingReport_New - BookkeepingReport_New --> WorkflowAccounting_New - WorkflowAccounting_New --> UploadOutputData_New - UploadOutputData_New --> UploadLogFile_New - UploadLogFile_New --> FailoverRequest_New + AnalyseXmlSummary_New --> fork_state + fork_state --> BookkeepingReport_New + fork_state --> WorkflowAccounting_New + fork_state --> UploadLogFile_New + join_state --> UploadOutputData_New + BookkeepingReport_New --> join_state + WorkflowAccounting_New --> join_state + UploadLogFile_New --> join_state + UploadOutputData_New --> FailoverRequest_New } [*] --> PreProcessing_New @@ -209,10 +233,6 @@ stateDiagram ### Reconstruction Job ```mermaid ---- -config: - layout: elk ---- stateDiagram direction TB @@ -264,12 +284,30 @@ stateDiagram } state Processing_New { - LbRunApp_New: LbRunApp + Execution_New: dirac-cwl workflow.cwl + + state Execution_New { + Workflow_New: CWL Workflow - [*] --> LbRunApp_New + state Workflow_New { + CLT_New: CommandLineTool + + state CLT_New { + LbRunApp_New: LbRunApp + [*] --> LbRunApp_New + } + + CLT_New --> CLT_New: This can be executed multiple times + } + } + + [*] --> Execution_New } state PostProcessing_New { + state fork_state <> + state join_state <> + AnalyseXmlSummary_New: AnalyseXmlSummary UploadLogFile_New: UploadLogFile UploadOutputData_New: UploadOutputData @@ -279,12 +317,17 @@ stateDiagram RemoveInputData_New: RemoveInputData [*] --> AnalyseXmlSummary_New - AnalyseXmlSummary_New --> BookkeepingReport_New - BookkeepingReport_New --> WorkflowAccounting_New - WorkflowAccounting_New --> UploadOutputData_New - UploadOutputData_New --> RemoveInputData_New - RemoveInputData_New --> UploadLogFile_New - UploadLogFile_New --> FailoverRequest_New + AnalyseXmlSummary_New --> fork_state + fork_state --> BookkeepingReport_New + fork_state --> WorkflowAccounting_New + fork_state --> UploadLogFile_New + fork_state --> RemoveInputData_New + BookkeepingReport_New --> join_state + RemoveInputData_New --> join_state + WorkflowAccounting_New --> join_state + UploadLogFile_New --> join_state + join_state --> UploadOutputData_New + UploadOutputData_New --> FailoverRequest_New } [*] --> PreProcessing_New From 6564135599ba810486a66dbfbeac7e84c19b11d5 Mon Sep 17 00:00:00 2001 From: Jorge Lisa <64639359+AcquaDiGiorgio@users.noreply.github.com> Date: Wed, 15 Apr 2026 16:14:35 +0200 Subject: [PATCH 10/11] chore(docs): add explanation about implementing command parallelization fix: typo in commands section of readme --- README.md | 4 ++-- docs/design/lhcb-workflows.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index d0e4608..073feb8 100644 --- a/README.md +++ b/README.md @@ -135,7 +135,7 @@ workflows/ └── inputs2.yaml ``` -### Add a Pre/Post-processing commamd and a Job type +### Add a Pre/Post-processing command and a Job type #### Add a Pre/Post-Command @@ -147,7 +147,7 @@ To add a new pre/post-processing command to the project, follow these steps: - Create a class that inherits `PreProcessCommand` if it's going to be executed before the workflow or `PostProcessCommand` if it's going to be executed after the workflow. In the rare case that the command can be executed in both stages, it should inherit both classes. These classes are located at `src/dirac_cwl/commands/core.py`. -- Implement the `execute` function with the actions it's expected to do. This function recieves the `job path` as a `string` and the dictionary of keyworded arguments `**kwargs`. This function can raise exceptions if it needs to. +- Implement the `execute` function with the actions it's expected to do. This function receives the `job path` as a `string` and the dictionary of keyworded arguments `**kwargs`. This function can raise exceptions if it needs to. #### Add a Job Type diff --git a/docs/design/lhcb-workflows.md b/docs/design/lhcb-workflows.md index bd4bdd1..17f33f7 100644 --- a/docs/design/lhcb-workflows.md +++ b/docs/design/lhcb-workflows.md @@ -4,7 +4,7 @@ For the new LHCb Workflows approach with CWL, the modules are called "commands" and the order of execution of the commands has to be defined while creating the `JobType`, which can be the same as the current order. -Every `JobType` has to define certain pre-processing and post-processing steps containing a list of command. That list can be empty and will always execute in the same order. However, certain commands could be executed simultaneously. This is shown with a fork in the state diagrams. +Every `JobType` has to define certain pre-processing and post-processing steps containing a list of command. That list can be empty and will always execute in the same order. However, certain commands could be executed simultaneously. This is shown with a fork in the state diagrams, even though we don't have any plans to implement this feature at this time. Also a few modules have been removed, as they are no longer needed. From 90677655e3e1c7cd974bc7db8ad0222584cc3c79 Mon Sep 17 00:00:00 2001 From: aldbr Date: Thu, 16 Apr 2026 08:41:31 +0200 Subject: [PATCH 11/11] chore: few additional explanation for helping communities to migrate --- docs/design/lhcb-workflows.md | 286 ++++++++++++++++++++++++++-------- 1 file changed, 224 insertions(+), 62 deletions(-) diff --git a/docs/design/lhcb-workflows.md b/docs/design/lhcb-workflows.md index 17f33f7..204913d 100644 --- a/docs/design/lhcb-workflows.md +++ b/docs/design/lhcb-workflows.md @@ -1,12 +1,91 @@ # LHCb Workflow Commands +This document describes how LHCb workflows are migrated from the old DIRAC XML workflow system to dirac-cwl with CWL. It serves two purposes: + +1. **For the LHCb team**: a reference for performing the migration, explaining what each module becomes, how commands communicate, and the phased migration plan. +2. **For other communities**: an example of how to design pre/post-processing commands for their own experiment's workflows using dirac-cwl. + +## Architecture Overview + +### Why we are migrating + +Currently, LHCb workflow modules (`LHCbDIRAC/Workflow/Modules`) are the workflow: they call LHCb applications **and** LHCbDIRAC-specific logic (bookkeeping, accounting, file uploads) in the same execution context. This means the workflow cannot run on resources with no external connectivity, which is a growing requirement. + +The new approach separates concerns: + +- **Pre-process commands** — LHCbDIRAC-specific setup that requires external connectivity (runs before the workflow) +- **CWL workflow** — Pure LHCb application logic via `lb-prod-run` (no external connectivity required) +- **Post-process commands** — LHCbDIRAC-specific reporting, upload, and cleanup (runs after the workflow) + +### Job wrapper lifecycle + +The dirac-cwl job wrapper manages the full job lifecycle. Some steps are **generic** (provided by dirac-cwl, always run, not configurable by experiments) and some are **experiment-specific** (configured per job type): + +``` +Job Wrapper (generic dirac-cwl) +│ +├── [setup] — always runs before everything (generic) +│ ├── Download input sandbox +│ ├── Download input data +│ └── Resolve file catalog +│ +├── Pre-process commands (experiment-specific, sequential) +│ +├── CWL workflow execution (experiment-specific) +│ +├── Post-process commands (experiment-specific, sequential) +│ +└── [finally] — always runs, even on crash (generic) + ├── Upload logs (target SE configured by experiment) + ├── Commit input file statuses (Processed/Unused) if managed by a transformation + ├── Commit accounting records (from accounting/*.json) + ├── Merge failover operations + write request.json + └── Report final job status +``` + +The `setup` and `finally` blocks are **not configurable by experiments**. They always run. This guarantees that: +- Logs are always uploaded (even on crash — critical for debugging). The log upload mechanism is generic, but the target SE and log path are configured by the experiment. +- For production jobs managed by the Transformation System, input files are marked as "Processed" or "Unused" so they don't get stuck as "Assigned" (no-op for user jobs) +- Failover operations are always serialized for the RequestDB +- Accounting records are committed. The commit mechanism is generic DIRAC, but the *content* of the records (which fields, which accounting types) is experiment-specific — the `finally` block commits whatever accounting JSON files the experiment's commands have written. + +### How commands communicate + +In the old system, modules share data through **mutable in-memory objects** (`workflow_commons`, `step_commons`). This makes modules tightly coupled and hard to test in isolation. + +In the new system, commands communicate through **files on disk** under `job_path/`. Each command reads from well-known paths and writes to well-known paths: + +| Purpose | File path | Writer | Reader | +|---|---|---|---| +| File statuses | `reports/file_statuses.json` | AnalyseSummary | job wrapper (`finally`) | +| Bookkeeping records | `bookkeeping/bookkeeping_*.xml` | BookkeepingReport | UploadOutputData | +| Accounting records | `accounting/*.json` | WorkflowAccounting | job wrapper (`finally`) | +| Failover operations | `failover/*.json` | Upload commands | job wrapper (`finally`) | +| Output manifest | `outputs/manifest.json` | ConstructLFNs | Upload commands | + +This makes dependencies explicit, commands independently testable, and eliminates shared mutable state. + +### What's generic vs experiment-specific + +| Provided by dirac-cwl (generic) | Implemented by each experiment | +|---|---| +| Job wrapper lifecycle (setup/finally) | Pre-process commands (e.g. ComputeEvents) | +| Log upload mechanism (target SE configured by experiment) | Post-process commands (e.g. BookkeepingReport) | +| File status management | CWL workflow definitions | +| Failover request handling | LFN construction scheme | +| Accounting commit mechanism (record content is experiment-specific) | Metadata/catalog integration | +| Upload to SE with failover | Application-specific validation | +| Phased command execution | | + +Other communities (e.g. CTAO) would implement their own pre/post-process commands but reuse the entire job wrapper infrastructure, upload utilities, and execution model. + ## Types of workflows For the new LHCb Workflows approach with CWL, the modules are called "commands" and the order of execution of the commands has to be defined while creating the `JobType`, which can be the same as the current order. -Every `JobType` has to define certain pre-processing and post-processing steps containing a list of command. That list can be empty and will always execute in the same order. However, certain commands could be executed simultaneously. This is shown with a fork in the state diagrams, even though we don't have any plans to implement this feature at this time. +Every `JobType` has to define certain pre-processing and post-processing steps containing a list of commands. That list can be empty and will always execute in the same order. -Also a few modules have been removed, as they are no longer needed. +A few modules have been removed, as they are no longer needed (see [Removed commands](#removed-commands)). ### USER Job (setExecutable) @@ -108,19 +187,13 @@ stateDiagram } state PostProcessing_New { - state fork_state <> - state join_state <> - FileUsage_New: FileUsage AnalyseFileAccess_New: AnalyseFileAccess UserJobFinalization_New: UserJobFinalization - [*] --> fork_state - fork_state --> FileUsage_New - fork_state --> AnalyseFileAccess_New - FileUsage_New --> join_state - AnalyseFileAccess_New --> join_state - join_state --> UserJobFinalization_New + [*] --> FileUsage_New + FileUsage_New --> AnalyseFileAccess_New + AnalyseFileAccess_New --> UserJobFinalization_New } [*] --> PreProcessing_New @@ -182,6 +255,12 @@ stateDiagram PostProcessing_New: PostProcessing state PreProcessing_New { + note left of PreProcessing_New + May need a pre-process command + to compute LHCbDIRAC-specific details + before passing them to the workflow + (e.g. number of events to process) + end note [*] } @@ -201,25 +280,18 @@ stateDiagram } state PostProcessing_New { - state fork_state <> - state join_state <> - - AnalyseXmlSummary_New: AnalyseXmlSummary + AnalyseSummary_New: AnalyseSummary + BookkeepingReport_New: BookkeepingReport + WorkflowAccounting_New: WorkflowAccounting UploadLogFile_New: UploadLogFile UploadOutputData_New: UploadOutputData FailoverRequest_New: FailoverRequest - BookkeepingReport_New: BookkeepingReport - WorkflowAccounting_New: WorkflowAccounting - [*] --> AnalyseXmlSummary_New - AnalyseXmlSummary_New --> fork_state - fork_state --> BookkeepingReport_New - fork_state --> WorkflowAccounting_New - fork_state --> UploadLogFile_New - join_state --> UploadOutputData_New - BookkeepingReport_New --> join_state - WorkflowAccounting_New --> join_state - UploadLogFile_New --> join_state + [*] --> AnalyseSummary_New + AnalyseSummary_New --> BookkeepingReport_New + BookkeepingReport_New --> WorkflowAccounting_New + WorkflowAccounting_New --> UploadLogFile_New + UploadLogFile_New --> UploadOutputData_New UploadOutputData_New --> FailoverRequest_New } @@ -230,6 +302,8 @@ stateDiagram } ``` +> **Note:** Commands such as `AnalyseSummary`, `BookkeepingReport`, and `WorkflowAccounting` (formerly `StepAccounting`) currently run once per step inside the processing loop. In the new approach, they run **once** after the entire CWL workflow completes, processing all step outputs at once. This is a behavioral change that requires adapting these commands to handle multiple outputs. + ### Reconstruction Job ```mermaid @@ -305,29 +379,21 @@ stateDiagram } state PostProcessing_New { - state fork_state <> - state join_state <> - - AnalyseXmlSummary_New: AnalyseXmlSummary - UploadLogFile_New: UploadLogFile - UploadOutputData_New: UploadOutputData - FailoverRequest_New: FailoverRequest + AnalyseSummary_New: AnalyseSummary BookkeepingReport_New: BookkeepingReport WorkflowAccounting_New: WorkflowAccounting + UploadLogFile_New: UploadLogFile + UploadOutputData_New: UploadOutputData RemoveInputData_New: RemoveInputData + FailoverRequest_New: FailoverRequest - [*] --> AnalyseXmlSummary_New - AnalyseXmlSummary_New --> fork_state - fork_state --> BookkeepingReport_New - fork_state --> WorkflowAccounting_New - fork_state --> UploadLogFile_New - fork_state --> RemoveInputData_New - BookkeepingReport_New --> join_state - RemoveInputData_New --> join_state - WorkflowAccounting_New --> join_state - UploadLogFile_New --> join_state - join_state --> UploadOutputData_New - UploadOutputData_New --> FailoverRequest_New + [*] --> AnalyseSummary_New + AnalyseSummary_New --> BookkeepingReport_New + BookkeepingReport_New --> WorkflowAccounting_New + WorkflowAccounting_New --> UploadLogFile_New + UploadLogFile_New --> UploadOutputData_New + UploadOutputData_New --> RemoveInputData_New + RemoveInputData_New --> FailoverRequest_New } [*] --> PreProcessing_New @@ -623,12 +689,13 @@ flowchart LR UploadOutputData UploadOutputData_l5@===> addFile UploadOutputData UploadOutputData_l6@===> getFileDescendants UploadOutputData UploadOutputData_l7@===> getSiteSEMapping + UploadOutputData UploadOutputData_l8@===> getDestinationSEList class UploadOutputData_l1 FailoverTransferLink class UploadOutputData_l2,UploadOutputData_l3 JobReportLink class UploadOutputData_l4 BookkeepingClientLink class UploadOutputData_l5 FileCatalogLink - class UploadOutputData_l6,UploadOutputData_l7 DataManagerLink + class UploadOutputData_l6,UploadOutputData_l7,UploadOutputData_l8 DataManagerLink %% ====================== @@ -706,15 +773,15 @@ flowchart LR %% ====================== - AnalyseXMLSummary("AnalyseXMLSummary") + AnalyseSummary("AnalyseSummary") - AnalyseXMLSummary AnalyseXMLSummary_l1@===> getFileTypes - AnalyseXMLSummary AnalyseXMLSummary_l2@===> setApplicationStatus - AnalyseXMLSummary AnalyseXMLSummary_l3@===> setFileStatus + AnalyseSummary AnalyseSummary_l1@===> getFileTypes + AnalyseSummary AnalyseSummary_l2@===> setApplicationStatus + AnalyseSummary AnalyseSummary_l3@===> setFileStatus - class AnalyseXMLSummary_l1 BookkeepingClientLink - class AnalyseXMLSummary_l2 JobReportLink - class AnalyseXMLSummary_l3 FileReportLink + class AnalyseSummary_l1 BookkeepingClientLink + class AnalyseSummary_l2 JobReportLink + class AnalyseSummary_l3 FileReportLink ``` @@ -725,15 +792,16 @@ Some commands have been removed, such as `UploadMC` or `ErrorLogging`, so they w | Command | Consumes | Creates | Requires | | --- | --- | --- | --- | | CreateDataFile | Inputs | data.py | poolXMLCatName | -| UploadLogFile | Outputs | N/A | JobID ProductionID Namespace ConfigVersion | -| UploadOutputData | Outputs Inputs XMLSummary.xml bookkeeping.xml | N/A | OutputDataStep OutputList OutputMode ProductionOutputData SiteName | -| RemoveInputData | Inputs | N/A | N/A | -| FailoverRequest | Inputs | request.json | N/A | -| BookkeepingReport | Outputs | bookkeeping.xml | StepID ApplicationName ApplicationVersion StartTime ProductionId StepNumber SiteName JobType | +| AnalyseSummary | XMLSummary.xml | N/A | ProdId ApplicationName | +| BookkeepingReport | Outputs | bookkeeping_*.xml | StepID ApplicationName ApplicationVersion StartTime ProductionId StepNumber SiteName JobType | | WorkflowAccounting | N/A | N/A | RunNumber ProdID EventType SiteName ProcessingStep CpuTime NormCpuTime InputsStats OutputStats InputEvents OutputEvents EventTime NProcs JobGroup FinalState | +| FileUsage | Inputs (directory list) | N/A | SiteName | | AnalyseFileAccess | XMLSummary.xml pool_xml_catalog.xml | N/A | N/A | -| UserJobFinalization | UserOutputData | request.json | JobId UserOutputSE SiteName UserOutputPath ReplicateUserOutData UserOutputLFNPrep | -| AnalyseXmlSummary | XMLSummary.xml | N/A | ProdId ApplicationName | +| UploadOutputData | Outputs Inputs XMLSummary.xml bookkeeping_*.xml | N/A | OutputDataStep OutputList OutputMode ProductionOutputData SiteName | +| UploadLogFile | Log files | N/A | JobID ProductionID Namespace ConfigVersion | +| UserJobFinalization | UserOutputData | N/A | JobId UserOutputSE SiteName UserOutputPath ReplicateUserOutData UserOutputLFNPrep | +| RemoveInputData | Inputs | N/A | N/A | +| FailoverRequest | Inputs | request.json | N/A | Legend: @@ -745,7 +813,7 @@ Legend: Creates a `data.py` data file from the inputs to be used by Ganga. -### AnalyseXMLSummary +### AnalyseSummary Performs a series of checks on the XMLSummary output to make sure the execution was done correctly. @@ -771,7 +839,11 @@ Commits the status of the files in the file report. The status will be "Processe ### UploadLogFile -Uploads a compressed list of outputs to a DIRAC LogSE. +Compresses and uploads log files to a Storage Element configured for log storage (LHCb uses an SE called `LogSE`, configured via `Operations/LogStorage/LogSE`). + +### UserJobFinalization + +Uploads user-specified output files to configured Storage Elements with failover support and optional replication to a secondary site. ### RemoveInputData @@ -780,3 +852,93 @@ Removes the inputs and their replicas (if any) from every SE and File Catalog. ### AnalyseFileAccess Uses the XMLCatalog and XMLSummary to check if the access of each input file was successful or not. + +### Removed commands + +- **ErrorLogging** — Deprecated no-op module (just logs and returns success). Not carried forward. +- **UploadMC** — Uploaded MC statistics (errors, XML summaries, generator logs, prmon metrics) to ElasticSearch. To be handled outside the workflow by a dedicated monitoring service. +- **LHCbScript** — Set `CMTCONFIG` environment variable and ran user scripts. In the new model, the CWL CommandLineTool definition handles environment setup, so this module is absorbed into CWL configuration. +- **StepAccounting** — Renamed to **WorkflowAccounting** because it now processes all step outputs at once rather than running per-step. + +### Key design decisions + +**Why did `AnalyseSummary`, `BookkeepingReport`, and `WorkflowAccounting` move out of the processing loop?** + +In the old system, these ran once per Gaudi application step (inside the processing loop). In the new system, `lb-prod-run` already checks the XML Summary and fails fast via exit code, so the CWL workflow stops on step failure without needing experiment-specific logic. The detailed post-execution analysis (marking files as "Problematic", generating BK records, sending accounting) only needs to happen once after the entire workflow completes. This means these commands must be adapted to handle N step outputs at once instead of 1. + +**Why do we need pre-process commands?** + +Some logic currently embedded in `GaudiApplication` / `RunApplication` requires external connectivity (DIRAC config, CPU time queries) and must run before the CWL workflow, which may execute without external connectivity. For example, computing the number of events to produce in a Simulation job (`getEventsToProduce`) needs the worker node's CPU normalization factor. + +**Why move `FailoverRequest` and `UploadLogFile` to the job wrapper?** + +These must always execute regardless of whether commands succeed or fail. Today they run "by convention" as the last modules in the chain, but if an earlier module crashes, they may never execute — leaving input files stuck as "Assigned" and logs lost. Moving them to the job wrapper's `finally` block guarantees execution. + +## Migration Strategy + +The migration from LHCbDIRAC XML workflows to dirac-cwl with CWL happens in three phases. This allows incremental validation without disrupting production. + +### Phase 1: Refactor modules into commands + +The existing LHCbDIRAC workflow modules (`LHCbDIRAC/Workflow/Modules`) are refactored so that their core logic is extracted into reusable functions that can be called both from the old modules and from new dirac-cwl pre/post-processing commands. + +For each module: +1. Extract the business logic into standalone functions (no dependency on `ModuleBase`, `workflow_commons`, or `step_commons`). +2. The old module becomes a thin wrapper that reads from `workflow_commons`/`step_commons`, calls the extracted function, and writes back to the shared state. +3. The new dirac-cwl command becomes a thin wrapper that reads from files on disk, calls the same extracted function, and writes results to files on disk. + +This means both the old XML workflow path and the new CWL path share the same underlying logic during the transition. Bugs fixed in one are fixed in both. + +``` +Before: After: + +Module (old path) Module (old path) +├── reads workflow_commons ├── reads workflow_commons +├── business logic → ├── calls shared_logic() +└── writes workflow_commons └── writes workflow_commons + + Command (new path) + ├── reads files from disk + ├── calls shared_logic() + └── writes files to disk +``` + +### Phase 2: Submit CWL workflows alongside old workflows + +Once the commands are functional and tested, start submitting CWL workflows through dirac-cwl for selected job types (e.g., start with USER jobs, then Simulation). + +During this phase: +- Old XML workflows and new CWL workflows coexist in production. +- The same transformations can be configured to use either path. +- Validation compares outputs from both paths to ensure correctness. +- The old modules still call into the shared logic, so any production fix applies to both. + +### Phase 3: Drop old modules, adapt commands to DiracX + +Once old XML workflows are no longer in use: +1. Remove the old LHCbDIRAC workflow modules entirely. +2. Remove the `ModuleBase` infrastructure (`workflow_commons`, `step_commons`, shared mutable objects). +3. Adapt the commands to call DiracX services directly instead of going through old LHCbDIRAC clients (e.g., BookkeepingClient, DataManager). +4. The shared logic functions are updated to use DiracX APIs. + +At this point, LHCbDIRAC module code is fully retired, and the commands are self-contained within the dirac-cwl ecosystem using DiracX for all external service calls. + +### For other communities + +If you are migrating a different experiment's workflows to dirac-cwl, the LHCb migration above serves as a template. The key steps are the same: + +1. **Identify your modules.** What logic runs today as part of your workflow that requires external connectivity? That logic becomes pre/post-process commands. +2. **Separate application logic from infrastructure logic.** Your CWL workflow should only contain the experiment's application (no DIRAC calls). Everything else goes into commands. +3. **Design your file-based data flow.** Decide what files each command reads and writes. Use the table in [How commands communicate](#how-commands-communicate) as a starting point. +4. **Implement commands.** Use dirac-cwl's `PreProcessCommand` and `PostProcessCommand` base classes. Each command receives the `job_path` and reads/writes files under it. + +What you get for free from dirac-cwl: +- The job wrapper lifecycle (`setup` / `finally`) +- Log upload, file status management, accounting commit, failover request handling +- Upload to Storage Elements with failover (`FailoverTransfer`) +- CWL workflow execution via `cwltool` + +What you must implement: +- Pre/post-process commands specific to your experiment's metadata, catalogs, and reporting +- Your LFN naming scheme (equivalent to LHCb's `constructProductionLFNs`) +- Your CWL workflow definitions