On this page

    Tasks

    A task is the basic data processing entity used to define "atomic" functions. A task is part of a pipeline/workflow and defines:

    • specific processing guidelines for data import,preparation, analysis and export, depending on its type
    • the way it is connected with preceding and subsequent tasks in the pipeline/workflow.

    The term "task" serves as an abstraction for the specific data processing tasks of the aforementioned four types. Each task has the following (minimum) properties:

    • id: task id
    • blockId: block id that serves as identification for the function to be executed
    • configuration: task configuration as provided in the execution configuration
    • upstreamTaskIds: list of task ids whose output is used as input in this task
    • downstreamTaskIds: list of task ids that take as input the output of this task
    • output: output dataframe of the task (as "output":{"value":[output df]})
    • execution_id: the id of the execution to which the task belongs
    • messaging_wrapper: used to publish log, status update and result messages to the corresponding queues

    The task configuration should have the following structure:

    "configuration": {
            "[df]": {
                "task": ""
            },
            "[df_column]": {
                "value": "",
                "ref": "df"
            },
            "[some_parameter]": {
                "value": "string | number | boolean | list | dict"
            }
        },

    String values inside brackets inside quotes (i.e. "[x]") denote parameter names. Parameters can be of 3 different types:

    1. Parameters of dataframe type, usually denoting the dataframe on which the function will be applied. These parameters need to specify the task that produces this dataframe (i.e. the task that provides as output the dataframe). Depending on the function, zero, one or multiple such parameters can be defined.
    2. Parameters of type column, which are column names of a specific dataframe. These parameters have a value, which is the column name and also a ref to the dataframe parameter from which the column comes.
    3. Simple parameters, i.e. not of type dataframe or column. The value of these parameters, even in the simplest case must be provided as the value to a "value" key and not directly, as shown in the structure above. For example, for a parameter named p1 with a numeric value 5, the way to define it in the configuration would be "p1":{"value":5} and not "p1":5.

    Special cases of the above configuration functions are documented in the individual task documentation. This is grouped based on task type as follows:

    1. Data import documentation can be found in import docs.
    2. Data manipulation documentation can be found in manipulation docs.
    3. Data analysis (model-related tasks) documentation can be found in in model docs.
    4. Data export documentation can be found in export docs.