Runner

General Information

The Runner is a standalone environment/application which can be downloaded and installed in servers or in gateways that reside on-premise for a stakeholder. It receives the configuration of the services that are to be executed locally and downloads their latest docker containers from the Cloud Platform. This ensures end-to-end security prior to uploading the results of the local execution in the Cloud Platform, or stores them locally in case they are not allowed to leave a stakeholder's premises.

Runner Execution Flow

Runner Sequence Chart

Runner Registration

A Runner needs to be registered in the Cloud Platform to ensure it is authorized to take over the execution of data check-in jobs and analytics workflows. When you register a runner, you are setting up a communication between the Cloud Platform and the machine where the Runner is installed, using RabbitMQ.

In order to register a Runner, the user needs to enter a one time secure token which is generated from the Cloud Platform. During the registation a public/private key pair is generated on the machine where the Runner is installed to secure the communication between the Runner and the Cloud Platform for jobs requiring sensitive data processing (i.e., encryption)

Job Handling

Job Configuration

type JobConfig struct {
	Image         string
	InputFilename string
	OutputPath    string
	Paths         []string
	Vault         VaultToken
	Config        map[string]interface{}
	Output        map[string]interface{}
}

type VaultToken struct {
	Token      string
	ExpireTime string
	Url        string
}
  • Image The name of the docker image to be pulled to run the incoming job
  • InputFilename The name of the input file, including its path, which is stored locally, specified in the harvester on a data checkin-job [Optional]
  • OutputPath The local path where the result of the data-checkin job will be stored locally, in case the loader is configured for local storage [Optional]
  • Paths An array of all the local paths which should be mounted to the docker container [Optional]
  • Vault The vault token which contains the dockerhub credentials, in order to be able to pull the specified docker image
  • Config The configuration of the incoming job, as expected by the data-checkin services or the analytics execution services.
  • Ouput The output configuration which contains the presigned URL for MinIO in order to be able to store the results if required [Optional]

Job Execution

The flow of an on-premise execution is as follows:

  • User configures a data-checkin job or analytics workflow for on-premise execution in the Cloud Platform.
  • The Cloud Platform sends a RabbitMQ message to the specified runner, in the format specified in the previous section.
  • The runner uses the vault configuration from the message to retrieve the credentials for the private dockerhub repository, where the data-checkin and analytics services reside.
  • The runner pulls the docker image specified in the job configuration from dockerhub, using the retrieved credentials.
  • The runner creates a docker container from the pulled image, and mounts all required local paths specified in the job to it.
  • The runner replaces all local paths specified in the job configuration with the mounted docker paths, so the services know where to find them, and mounts the new configuration to the docker container.
  • The runner starts the container and waits for it to finish. Currently is allows for only one concurrent job execution, but it is configurable to allow any number of concurrent jobs.
  • The docker container informs the Cloud Platform about the execution status and results.