Resulting crate

Once the application has finished, a new sub-folder under the application’s Working Directory will be created with the name COMPSs_RO-Crate_[uuid]/, which is also known as crate. The contents of the folder include all the elements needed to record a COMPSs application execution (this is, the application together with the datasets used for the run), and are:

  • Application Source Files: as detailed by the user in the YAML configuration file, with the term sources. The main source file and all auxiliary files that the application needs (e.g. .py, .java, .class or .jar source code files, and also any installation, configuration, compilation or submission scripts, readme files, etc…) are included by the user. All application files are added to a sub-folder in the crate named application_sources/, where the sources directory locations are included with their same folder tree structure, while the individual files included are added to the root of the application_sources/ sub-folder in the crate.

  • Application Datasets: when data_persistence is set to True in the YAML configuration file, both the input and output datasets of the workflow are included in the crate. The input dataset are the files that the workflow needs to be run. The output dataset is formed by all the resulting files generated by the execution of the COMPSs application. A sub-folder dataset/ with all related files copied will be created, and the sub-directories structure will be respected. If more than a single root path is detected, a set of folders will be provided inside the dataset/ folder.

  • complete_graph.svg: the diagram of the workflow generated by the COMPSs runtime, as generated with the runcompss -g or --graph options.

  • App_Profile.json (or custom name): a set of task statistics of the application run recorded by the COMPSs runtime, as if the runcompss --output_profile=<path> option was enabled. It includes, for each resource and method executed: number of executions of the specific method, as well as maximum, average and minimum run time for the tasks. The name of the file can be customized using the --output_profile=<path> option. See also Section Schedulers.

  • compss_submission_command_line.txt: stores the exact command line that was used to submit the application (i.e. runcompss or enqueue_compss), including all the flags and parameters passed. This is especially important for reproducing a COMPSs application, since the workflow generated by the COMPSs runtime is created dynamically at run time, thus, input parameters could even potentially change the resulting workflow generated by the COMPSs runtime.

  • ro-crate-info.yaml (or custom name): the YAML workflow provenance configuration file.

  • compss-[job_id].out: only when the execution is on a cluster. The standard output log of the job execution.

  • compss-[job_id].err: only when the execution is on a cluster. The standard error log of the job execution.

  • ro-crate-metadata.json: the RO-Crate JSON main file describing the contents of this directory (crate) in the RO-Crate specification format. You can find examples at Section Metadata examples.

Tip

Since its version 3.3.4, the PyCOMPSs CLI includes the capacity of inspecting RO-Crates with the pycompss inspect [crate_folder/ | crate.zip] command. Check the Inspect Workflow Provenance Section for more details.

Tip

For the basic set of files always included for every application (i.e. complete_graph.svg, App_Profile.json, compss_submission_command_line.txt, ro-crate-info.yaml, compss-[job_id].out, compss-[job_id].err), the runtime generates a file checksum using the sha256 algorithm, as specified inside the metadata file ro-crate-metadata.json. This checksum can be used to verify the file integrity with the sha256sum command.

Warning

All previous file names (complete_graph.svg, App_Profile.json and compss_submission_command_line.txt) are automatically used to generate new files when using the -p or --provenance option. Avoid using these file names among your own files to avoid unwanted overwriting. You can change the resulting App_Profile.json name by using the --output_profile=/path_to/file flag.

Warning

The complete_graph.svg workflow diagram will not be generated automatically if your workflow’s application edges are larger than 6500, to avoid large generation times. If you want to generate the diagram anyway, you can trigger the diagram generation manually with compss_gengraph or pycompss gengraph.