Block Repository

Block Repository is a stand-alone application that allows users to register and manage blocks across different deployments in an effective way. Users can register (create) a new block or create an updated version of a block (upgrade) by editing the configuration of the block in the UI. Users can control which version of a block is visible in each deployment.

Authentication / Authorisation

  • We use local authentication using email and password (no Keycloak is used)
  • Deployments are authenticated using their Hostname (i.e. product-develop.s5labs.eu) and a token generated for them during their registration
  • Two roles are available in Block Repository: user and admin

Product Integration

  • Each deployment retrieves blocks from block repository on startup (using onModuleInit hook), daily and on demand (through a button in Admin Portal)
  • Blocks upon sync are stored locally in each deployment's backend (to avoid unnecessary back and forth with the block repository)
  • Pipelines (workflow entities) now store the execution image version assinged to the deployment when they are created
  • If a workflow uses an old image version, an "Upgrade" button appears to inform the user that they are using an old version and they are given the option to upgrade to the latest one available to their deployment. If there are conflicts between the blocks in the workflow with the new execution version (i.e, used blocks are upgraded in the new version or no longer supported), a modal is displayed to inform the user and let them choose whether they want to upgrade or not, which is optional - as the workflow should still be able to be executed in the old version if chosen not to upgrade. If chosen to upgrade, the blocks no longer supported are removed from the workflow. For the tasks that use a block that has a new version available, we upgrade the task to use the new block, and for each block parameter in the new version we check if it existed in the old one. If there is one, we initialise the task with the current value of the task, otherwise we initialise it based on the block default values (in a similar way to how it would have been initialised if it was added in the workflow from scratch) and if there are sill misconfigurations, they will be caught by the BE validation right away, so the user can adjust them to make them work. The option to upgrade is not available/visible to finalised workflows, and those can only be upgraded if they are manually unlocked by the user.

Upgrade Workflow

Block Registration Form

General

General information about the block, where and how it can be used, etc.

  • id: Text, Should match regex /^[a-zA-Z]\w+\.[a-zA-Z]\w+$/, Maximum characters 64
  • name: Text, Maximum characters 64
  • description: Text, Minimum characters 8, Maximum characters 4096
  • executionImageType: One of options dataCheckin, analytics
  • category: One of options input, prep, ml, output, control
  • type: Depends on category, one of options

    • input category - null
    • prep category - aggregations, basic, complex, conditional-operations, datatype-transformations, datetime-transformations, math, nested-data-transformation, string-transformations, time-series (or a custom option)
    • ml category - apply, evaluate, train (or a custom option)
    • output category - null
    • control category - loop (or a custom option)
  • frameworks: One or more of options python3, spark3 (spark3 framework can be used only with kubernetes platform)
  • platforms: One or more of options kubernetes, docker, edge
  • public: Boolean
  • batchable: Boolean

Parameters

Input parameters that configure the way the block operates.

  • name: Text, Should match regex /^[a-zA-Z]\w*$/, Maximum characters 64
  • placeholder: Text, Maximum characters 64
  • description: Text, Minimum characters 8, Maximum characters 4096
  • category: One of options value, dataframe, column, logical, complex, model, hidden, asset
  • type: Depends on category, one of options

    • value category - int, double, int || double, string, boolean, datetime, dynamic
    • dataframe category - null
    • column category - int, double, int || double, string, boolean, datetime, dynamic, same (available only if multiple is true), null
    • logical category - null
    • complex category - null
    • model category - pmdarima, statsmodel, keras-model, mllib-model, mllib-transformer, mllib-pipeline, sklearn-model, sklearn-transformer, sklearn-pipeline, null
    • hidden category - int, double, int || double, string, boolean, datetime, dynamic, same, dataset, result, schema, parquet-status, pmdarima, statsmodel, keras-model, mllib-model, mllib-transformer, mllib-pipeline, sklearn-model, sklearn-transformer, sklearn-pipeline, null
    • asset category - dataset, result, schema, parquet-status, null
  • required: Boolean
  • multiple: Boolean
  • parameters: Array of parameters (available only if category is complex or logical)

Validation

  • ref: Select of one parameter with category dataframe (available only if category is column)
  • default: (available only if category is value (and type is not datetime), logical or hidden) can be:

    • Text/Number/Boolean (if multiple is false)
    • Array of Text/Number/Boolean (if multiple is true)
    • One of values (if values are defined and multiple is false)
    • Array of values (if values are defined and multiple is true)
  • requiredIf: Select of one parameter and its values (available only if required is false and category is not value or type is not boolean)
  • displayIf: Select of one parameter and its values (available only if required is false)
  • values: Array of objects (available only if category is value (and type is not boolean or datetime), logical, or asset) with:

    • value: Text/Number
    • text: Text
    • on: Array of Text/Number (available only if type is dynamic)
    • description: Text
  • regex: Text (available only if category is value (and type is not boolean or datetime) or model)
  • dynamic: Select of one parameter (available only if type is dynamic)
  • dynamicOn: One of options value, type (available only if type is dynamic)
  • range: Object (available only if category is value and type is int, double or int || double) with:

    • min: Number
    • max: Number
    • step: Number
  • length: Object (available only if multiple is true or category is value (and type is string or int) or model) with:

    • min: Number
    • max: Number

Validation messages

  • regex: Text
  • range: Text
  • length: Text

Output

Description of the dataframe made available as output from this block (to be used by the next one).

  • type: One of options none, input, dynamic

    • When block category is input then output type is set to dynamic.
    • When block category is output then output type is set to none.
  • input: Select of one parameter with dataframe category (available only if type is input)

Documentation

Documentation about the block, options, configurations etc.

  • documentation: Markdown text

Application Permissions

Everyone can:

  • Create a new block
  • Edit a block
  • Create a new version of a block
  • Delete a draft block
  • Toggle blocks in local and development deployments

Administrators can also:

  • Publish a draft block
  • Archive a published block
  • Create a deployment
  • Change the image versions of a deployment
  • Toggle blocks in staging and production deployments

Block Editing

To address the permissions, we apply the following modifiers on fields:

  • ALWAYS
  • DRAFT_ONLY
  • NEW_VERSION_ONLY
  • NEVER

Note 1: NEW_VERSION_ONLY and DRAFT_ONLY currently are the same. We used two different modifiers in case we want to differentiate cases in the future.

Note 2: When the user creates the block, everything should be editable (even fields with NEVER modifier).

Note 3: As we use the same form for both viewing and editing, the fields should have the modifiers when block is in edit mode. Otherwise everything should be read only.

General

  • id: NEVER
  • name: ALWAYS (this is what the user is seeing on the sidebar)
  • description: ALWAYS
  • executionImageType: NEVER
  • category: NEVER
  • type: ALWAYS (unless category is ml, which requires NEW_VERSION_ONLY)
  • frameworks: ALWAYS
  • platforms: ALWAYS
  • public: NEVER
  • batchable: ALWAYS

Output

  • type: NEVER
  • input: NEVER

Parameters

  • name: NEW_VERSION_ONLY
  • placeholder: ALWAYS
  • description: ALWAYS
  • category: NEW_VERSION_ONLY
  • type: NEW_VERSION_ONLY
  • required: NEW_VERSION_ONLY
  • multiple: NEW_VERSION_ONLY
  • validation:

    • ref: NEW_VERSION_ONLY
    • default: NEW_VERSION_ONLY
    • requiredIf: NEW_VERSION_ONLY
    • displayIf: ALWAYS
    • values:

      • add (action): NEW_VERSION_ONLY
      • remove (action): NEW_VERSION_ONLY
      • text: NEW_VERSION_ONLY
      • value: NEW_VERSION_ONLY
      • on: NEW_VERSION_ONLY
      • description: ALWAYS
    • regex: ALWAYS
    • dynamic: NEW_VERSION_ONLY
    • dynamicOn: NEW_VERSION_ONLY
    • range: ALWAYS
    • length: ALWAYS
  • validationMessages: ALWAYS

Documentation

  • documentation: ALWAYS