Runner
General Information
The Runner is a standalone environment/application which can be downloaded and installed in servers or in gateways that reside on-premise for a stakeholder. It receives the configuration of the services that are to be executed locally and downloads their latest docker containers from the Cloud Platform. This ensures end-to-end security prior to uploading the results of the local execution in the Cloud Platform, or stores them locally in case they are not allowed to leave a stakeholder's premises.
Runner Execution Flow
Runner Registration
A Runner needs to be registered in the Cloud Platform to ensure it is authorized to take over the execution of data check-in jobs and analytics workflows. When you register a runner, you are setting up a communication between the Cloud Platform and the machine where the Runner is installed, using RabbitMQ.
In order to register a Runner, the user needs to enter a one time secure token which is generated from the Cloud Platform. During the registation a public/private key pair is generated on the machine where the Runner is installed to secure the communication between the Runner and the Cloud Platform for jobs requiring sensitive data processing (i.e., encryption)
Job Handling
Job Configuration
type JobConfig struct {
Image string
InputFilename string
OutputPath string
Paths []string
Vault VaultToken
Config map[string]interface{}
Output map[string]interface{}
}
type VaultToken struct {
Token string
ExpireTime string
Url string
}
Image
The name of the docker image to be pulled to run the incoming jobInputFilename
The name of the input file, including its path, which is stored locally, specified in the harvester on a data checkin-job [Optional]OutputPath
The local path where the result of the data-checkin job will be stored locally, in case the loader is configured for local storage [Optional]Paths
An array of all the local paths which should be mounted to the docker container [Optional]Vault
The vault token which contains the dockerhub credentials, in order to be able to pull the specified docker imageConfig
The configuration of the incoming job, as expected by the data-checkin services or the analytics execution services.Ouput
The output configuration which contains the presigned URL for MinIO in order to be able to store the results if required [Optional]
Job Execution
The flow of an on-premise execution is as follows:
- User configures a data-checkin job or analytics workflow for on-premise execution in the Cloud Platform.
- The Cloud Platform sends a RabbitMQ message to the specified runner, in the format specified in the previous section.
- The runner uses the vault configuration from the message to retrieve the credentials for the private dockerhub repository, where the data-checkin and analytics services reside.
- The runner pulls the docker image specified in the job configuration from dockerhub, using the retrieved credentials.
- The runner creates a docker container from the pulled image, and mounts all required local paths specified in the job to it.
- The runner replaces all local paths specified in the job configuration with the mounted docker paths, so the services know where to find them, and mounts the new configuration to the docker container.
- The runner starts the container and waits for it to finish. Currently is allows for only one concurrent job execution, but it is configurable to allow any number of concurrent jobs.
- The docker container informs the Cloud Platform about the execution status and results.