Search Engine - Product

General

The search engine is responsible for searching in assets information to retrieve assets (datasets, results, models) and in pipelines information to retrieve pipelines (analytics, data-checkin) that best fit the user's search.

The search engine uses Elasticsearch to index the provided information. There are 5 indices in total, one for each asset type, i.e., datasets, results, models and one for each pipeline type, i.e., analytics, data-checkin.

Six different sets of endpoints are offered:

Assets:

  1. /asset/<int:asset_id>: Asset CRUD endpoints, which are used to create, retrieve, update and delete assets.
  2. /assets/user/<int:user_id>: Assets delete by user id endpoint, which is used to delete all assets that belong to a specific user.
  3. /faceted-search: Asset faceted seach endpoint, which is used to perform faceted search on all assets or per asset type.

Pipelines:

  1. /pipeline/<pipeline_id>: Pipeline CRUD endpoints, which are used to create, retrieve, update and delete pipelines.
  2. /pipelines/user/<int:user_id>: Pipelines delete by user id endpoint, which is used to delete all pipelines that belong to a specific user.
  3. /pipeline-search: Pipeline seach endpoint, which is used to perform search per pipeline type.

Important notes:

  • The file mappings.py provides information per each asset type (dataset, result, model) and per each pipeline type (analytics, data-checkin) regarding: its expected name, its ES index and mapping, its classes and schemas used for CRUD and search etc. The asset and the pipeline type names should be in accordance with the asset and the pipeline type names in BE, as we rely on this 1-1 mapping between the two.
  • The information in request payloads, the information in request responses, the information stored in ES and the information actually indexed by ES are not all the same in this version of the search component. Each of the above is handled by the appropriate schemas (jsonschema, es mappings) and defined field subsets (GET_FIELDS, POST_PUT_FIELDS in asset and pipeline utils files).
  • The 5 indices used by the search engine are automatically created with the correct settings and mapping when the service is deployed, so there is no need to create them manually.

Asset endpoints

General

Notes

  • All asset endpoints (except from the delete by user id endpoint) require the asset id, even the ones used to create the asset. This is because we have a convention that the document id in the elasticsearch indices corresponds to the asset id in the BE database, therefore it is also provided upon creation in ES.
  • All asset endpoints handle all three types of assets (datasets, results, models). The assetTypeId which is passed in all requests allows the search component to differentiate how to handle requests for different assets.

Retrieve asset

GET /asset/[asset id]?assetTypeId=[asset type id]

Request Sample

GET /asset/12?assetTypeId=1

No body

Response Sample

{
  "id": 12,
  "name": "Small Office Giorgos Monitor Plug",
  "description": "Consumption and state of the plug with a timestamp from SmartThings for the status",
  "assetTypeId": 1,
  "standard": null,
  "version": "v1",
  "status": "available",
  "accessLevel": "Public",
  "createdAt": "2021-08-01T06:48:57.450Z",
  "updatedAt": "2021-09-01T11:00:32.896Z",
  "availableAt": "2021-08-31T06:51:15.983Z",
  "modifiedAt": "2021-09-01T11:00:32.892Z",
  "volume": {
    "value": 2433,
    "unit": "records"
  },
  "metadata": {
    "general": {
      "tags": [
        "plug"
      ]
    },
    "distribution": {
      "format": [
        "JSON"
      ],
      "accessibility": [
        "Through an API"
      ],
      "accrualMethod": "Through an API",
      "accrualPeriodicity": "Hourly"
    },
    "extent": {
      "temporalCoverage": {
        "unit": "Not applicable",
        "min": null,
        "field": null,
        "max": null,
        "type": null,
        "value": null
      },
      "temporalResolution": {
        "unit": "Per Hour",
        "value": 4
      },
      "spatialCoverage": {
        "unit": "Not applicable",
        "field": null,
        "values": [],
        "coordinates": null,
        "type": null,
        "value": null
      },
      "spatialResolution": {
        "unit": "Per Building"
      }
    },
    "license": {
      "license": "CDLA-Sharing 1.0",
      "copyrightOwner": "Giorgos",
      "link": "https://cdla.io/sharing-1-0/"
    },
    "provenance": {
      "id": 1,
      "name": "Test",
      "type": "data-checkin"
    }
  },
  "structure": {
    "domain": {
      "uid": "42a33329-34d6-4cd9-9910-c5698ddd8fb2",
      "majorVersion": 1,
      "name": "Energy"
    },
    "primaryConcept": {
      "uid": "32aa2a47-5517-4411-8cb6-dcbb1bede4a2",
      "name": "EnergyDemandMeasurements"
    },
    "otherConcepts": [
      {
        "uid": "be0d3e5f-c3c3-4ee2-bbbf-65f98f866ffc",
        "name": "SmartAppliance"
      },
      {
        "uid": "6db9dd27-07ab-46b9-838d-44cd24fae35d",
        "name": "SmartApplianceControlAction"
      }
    ]
  },
  "schema": {
    "EnergyDemandMeasurements": {
      "totalEnergyConsumption": {
        "min": 41.76,
        "max": 45.2,
        "_uid": "65daf105-12e1-414e-8b5f-dbd97a546323"
      },
      "relatedSmartAppliance": {
        "relatedSmartApplianceControlAction": {
          "powerSwitch": {
            "values": [
              "on"
            ],
            "_uid": "55b46029-a599-41ff-8616-37dee6729e21"
          },
          "updatedDateTime": {
            "min": "2020-11-05T06:12:50.000Z",
            "max": "2022-11-05T06:12:50.000Z",
            "_uid": "5e85937e-c287-4df5-90e0-1f6a9cae6494"
          }
        }
      },
      "smartApplianceLoad": {
        "min": 0.0026,
        "max": 0.0503,
        "_uid": "880212eb-fb65-412c-9c24-1369377a57fe"
      }
    }
  },
  "createdBy": {
    "id": 3,
    "firstName": "A",
    "lastName": "Tester",
    "organisationId": 3,
    "email": "tester@suite5.eu"
  },
  "organisation": {
    "id": 3,
    "legalName": "TSG ",
    "businessName": "Testing Solutions Group",
    "description": "TSG provides...",
    "type": "DUMMY"
  }
}

Create asset

POST /asset/[asset id]

Request Sample

POST /asset/12

Body Sample

{
  "id": 12,
  "name": "Small Office Giorgos Monitor Plug",
  "description": "Consumption and state of the plug with a timestamp from SmartThings for the status",
  "assetTypeId": 1,
  "standard": null,
  "version": "v1",
  "status": "available",
  "accessLevel": "Public",
  "createdAt": "2021-08-01T06:48:57.450Z",
  "updatedAt": "2021-09-01T11:00:32.896Z",
  "availableAt": "2021-08-31T06:51:15.983Z",
  "modifiedAt": "2021-09-01T11:00:32.892Z",
  "volume": {
    "value": 2433,
    "unit": "records"
  },
  "metadata": {
    "distribution": {
      "format": [
        "JSON"
      ],
      "accessibility": [
        "Through an API"
      ],
      "accrualMethod": "Through an API",
      "accrualPeriodicity": "Hourly"
    },
    "extent": {
      "temporalCoverage": {
        "unit": "Not applicable",
        "min": null,
        "field": null,
        "max": null,
        "type": null,
        "value": null
      },
      "temporalResolution": {
        "unit": "Per Hour",
        "value": 4
      },
      "spatialCoverage": {
        "unit": "Not applicable",
        "field": null,
        "values": [

        ],
        "coordinates": null,
        "type": null,
        "value": null
      },
      "spatialResolution": {
        "unit": "Per Building"
      }
    },
    "general": {
      "tags": [
        "plug"
      ]
    },
    "license": {
      "license": "CDLA-Sharing 1.0",
      "copyrightOwner": "Giorgos",
      "link": "https://cdla.io/sharing-1-0/"
    },
   "provenance": {
      "id": 1,
      "name": "Test",
      "type": "data-checkin"
    }
  },
  "structure": {
    "domain": {
      "uid": "42a33329-34d6-4cd9-9910-c5698ddd8fb2",
      "majorVersion": 1,
      "name": "Energy"
    },
    "primaryConcept": {
      "uid": "32aa2a47-5517-4411-8cb6-dcbb1bede4a2",
      "name": "EnergyDemandMeasurements"
    },
    "otherConcepts": [
      {
        "uid": "be0d3e5f-c3c3-4ee2-bbbf-65f98f866ffc",
        "name": "SmartAppliance"
      },
      {
        "uid": "6db9dd27-07ab-46b9-838d-44cd24fae35d",
        "name": "SmartApplianceControlAction"
      }
    ]
  },
  "schema": {
    "EnergyDemandMeasurements": {
      "totalEnergyConsumption": {
        "min": 41.76,
        "max": 45.2,
        "_uid": "65daf105-12e1-414e-8b5f-dbd97a546323"
      },
      "relatedSmartAppliance": {
        "relatedSmartApplianceControlAction": {
          "powerSwitch": {
            "values": [
              "on"
            ],
            "_uid": "55b46029-a599-41ff-8616-37dee6729e21"
          },
          "updatedDateTime": {
            "min": "2020-11-05T06:12:50.000Z",
            "max": "2020-11-05T06:12:50.000Z",
            "_uid": "5e85937e-c287-4df5-90e0-1f6a9cae6494"
          }
        }
      },
      "smartApplianceLoad": {
        "min": 0.0026,
        "max": 0.0503,
        "_uid": "880212eb-fb65-412c-9c24-1369377a57fe"
      }
    }
  },
  "createdBy": {
    "id": 3,
    "firstName": "A",
    "lastName": "Tester",
    "organisationId": 3,
    "email": "test@suite5.eu"
  },
  "organisation": {
    "id": 3,
    "legalName": "TSG ",
    "businessName": "Testing Solutions Group",
    "description": "We make success happen, safely and predictably.",
    "type": "Aggregator"
  }
}

Response Sample

{
  "result": "created"
}

In case the dataset exists, no error will be returned, it will be updated instead. The endpoint will return 200 and:

{
  "result": "updated"
}

and a warning will be logged that the creation endpoint was used to update an asset.

Update asset

PUT /asset/[asset id]

This endpoint is used to update an asset.

Request Sample

PUT /asset/12

Body Sample: same as in POST. Due to the way this is used by the BE (i.e. the whole document is sent and not just the fields that need to be updated), the same restrictions that apply to POST, apply also here. This decision can be revisited if needed.

Response Sample

{
  "result": "updated"
}

or when no update was performed because the exact same information was sent:

{
  "result": "noop"
}

Delete asset

DELETE /asset/[asset id]?assetTypeId=[asset type id]

Request Sample

DELETE /asset/12?assetTypeId=1

No body

Response Sample

{
  "result": "deleted"
}

Delete all assets created by a user

DELETE /assets/user/[user id]

Request Sample

DELETE /assets/user/1

No body

Response Sample

{
  "result": "deleted"
}

or if for some reason an asset could not be deleted:

{
  "result": "failed"
}

Pipeline endpoints

General

Notes

  • All pipeline endpoints (except from the delete by user id endpoint) require the pipeline id, even the ones used to create the pipeline. This is because we have a convention that the document id in the elasticsearch indices corresponds to the pipeline id in the BE database, therefore it is also provided upon creation in ES.
  • All pipeline endpoints handle both types of pipelines (analytics, data-checkin). The pipelineType which is passed in all requests allows the search component to differentiate how to handle requests for different pipelines.

Retrieve pipeline

GET /pipeline/[pipeline id]?pipelineType=[pipeline type]

Request Sample

GET /pipeline/148c9b52-ec82-468b-b5cf-b16654f1d963?pipelineType=analytics

No body

Response Sample

{
  "id": "148c9b52-ec82-468b-b5cf-b16654f1d963",
  "name": "Test Workflow",
  "description": "Test Workflow description",
  "framework": "python3",
  "executionStatus": "pending",
  "accessLevel": "Public",
  "provenanceAssetIds": [],
  "provenanceAssets": [],
  "inputAssetIds": [],
  "inputAssets": [],
  "blocks": [
    {
      "category": "ml",
      "type": "train"
    }
  ],
  "visualisations": [],
  "schedules": [],
  "executionLocation": "cloud",
  "createdAt": "2021-08-01T06:48:57.450Z",
  "updatedAt": "2021-09-01T11:00:32.896Z",
  "createdBy": {
    "id": 3,
    "firstName": "A",
    "lastName": "Tester"
  },
  "organisation": {
    "id": 3
  }
}

Create pipeline

POST /pipeline/[pipeline id]

Request Sample

POST /pipeline/148c9b52-ec82-468b-b5cf-b16654f1d963

Body Sample

{
  "pipelineType": "analytics",
  "id": "148c9b52-ec82-468b-b5cf-b16654f1d963",
  "name": "Test Workflow",
  "description": "Test Workflow description",
  "framework": "python3",
  "executionStatus": "pending",
  "accessLevel": "Public",
  "provenanceAssetIds": [],
  "provenanceAssets": [],
  "inputAssetIds": [],
  "inputAssets": [],
  "blocks": [
    {
      "category": "ml",
      "type": "train"
    }
  ],
  "visualisations": [],
  "schedules": [],
  "executionLocation": "cloud",
  "createdAt": "2021-08-01T06:48:57.450Z",
  "updatedAt": "2021-09-01T11:00:32.896Z",
  "createdBy": {
    "id": 3,
    "firstName": "A",
    "lastName": "Tester"
  },
  "organisation": {
    "id": 3
  }
}

Response Sample

{
  "result": "created"
}

In case the pipeline exists, no error will be returned, it will be updated instead. The endpoint will return 200 and:

{
  "result": "updated"
}

and a warning will be logged that the creation endpoint was used to update a pipeline.

Update pipeline

PUT /pipeline/[pipeline id]

This endpoint is used to update a pipeline.

Request Sample

PUT /pipeline/148c9b52-ec82-468b-b5cf-b16654f1d963

Body Sample: same as in POST. Due to the way this is used by the BE (i.e. the whole document is sent and not just the fields that need to be updated), the same restrictions that apply to POST, apply also here. This decision can be revisited if needed.

Response Sample

{
  "result": "updated"
}

or when no update was performed because the exact same information was sent:

{
  "result": "noop"
}

Delete pipeline

DELETE /pipeline/[pipeline id]?pipelineType=[pipeline type]

Request Sample

DELETE /pipeline/148c9b52-ec82-468b-b5cf-b16654f1d963?pipelineType=analytics

No body

Response Sample

{
  "result": "deleted"
}

Delete all pipelines created by a user

DELETE /pipelines/user/[user id]?pipelineType=[pipeline type]

Request Sample

DELETE /pipelines/user/1?pipelineType=analytics

No body

Response Sample

{
  "result": "deleted"
}

or if for some reason a pipeline could not be deleted:

{
  "result": "failed"
}

Asset Faceted Search endpoints

General

There is a single search endpoint that provides faceted search across (a) all assets, (b) datasets only, (c) results only, (d) models only. The field assetTypeId in the request payload (its existence and value) is what defines which of the above searches is of interest, as follows:

  • assetTypeId:1 --> datasets
  • assetTypeId:2 --> results
  • assetTypeId:3 --> models
  • assetTypeId missing from payload --> all assets

For each of the above searches, we can adjust the following:

  1. Supported facets. All four cases above already have different defined facets which can be found in the respective files in the faceted_search folder. In brief:

    • All: status, assetType
    • Datasets: status, category, accessibility, accrualPeriodicity, accrualMethod, temporalResolution, spatialCoverage
    • Results: status
    • Models: status, source, library, type, purpose, algorithm
  2. Supported sort_by options. Now common across all:

    • relevance (based on free text matching)
    • date created (asc, desc)
    • date updated (asc, desc)
    • title, which corresponds to the name field in ES (asc, desc)
    • status (asc, desc)
  3. Supported filtering options. Now we have some common options across all:

    • access level
    • user
    • related pipeline (the data-checkin/analytics pipeline id for which you want to find the relevant assets)
    • asset origin (the asset id whose origin you want to find, i.e. from which assets it derives)

    and some defined per asset type:

    • Datasets:

      • temporalCoverage (min, max)
    • All, Results, Models:

      • derivative assets (the asset id for which you want to find its derivative assets)
  4. Fields on which the free-text search is applied. Now common across all:

    • title (name)
    • description
    • tags (metadata.general.tags)

Additional settings:

  1. In free text search there are two options: case sensitive matching and partial matching. For partial matching we have used ngrams, but performance (and min-max of n) should be checked once we have many assets indexed.
  2. Pagination: Each search request should define the page number and page size, which are used to apply pagination through ES. Performance should be checked here as well once we have enough assets.
  3. The * character in the free text search field is used to denote that any document should match, i.e. the user is not searching for any particular terms.
  4. The settings.organisationId field is used to ensure that only assets that belong to the user's organisation are returned.

POST /faceted-search

Request sample

{
  "assetTypeId": 1,
  "query": {
    "text": "*",
    "settings": {
      "caseSensitive": false,
      "partialMatch": false
    }
  },
  "facets": {
    "category": [
      "32aa2a47-5517-4411-8cb6-dcbb1bede4a2"
    ],
    "accrualPeriodicity": [],
    "temporalResolution": [],
    "spatialCoverage": []
  },
  "filters": {
    "accessLevel": [
      "Public"
    ],
    "temporalCoverage": {
      "min": null,
      "max": null
    }
  },
  "sortBy": {
    "field": "title",
    "asc": true
  },
  "pagination": {
    "size": 10,
    "page": 1
  },
  "settings": {
    "organisationId": 3
  }
}

Response sample

Every response has 3 high-level fields: results, facets, total.

{
  "results": [
    {
      "id": 1,
      "name": "Small Office Giorgos Monitor Plug",
      "description": "Consumption and state of the plug with a timestamp from SmartThings for the status",
      "assetTypeId": 1,
      "volume": {
        "value": 2433,
        "unit": "records"
      },
      "status": "available",
      "accessLevel": "Public",
      "createdAt": "2021-08-01T06:48:57.450Z",
      "updatedAt": "2021-09-01T11:00:32.896Z",
      "modifiedAt": "2021-09-01T11:00:32.892Z",
      "concepts": [
        {
          "uid": "32aa2a47-5517-4411-8cb6-dcbb1bede4a2",
          "name": "EnergyDemandMeasurementsDDD"
        },
        {
          "uid": "be0d3e5f-c3c3-4ee2-bbbf-65f98f866ffc",
          "name": "SmartAppliance"
        },
        {
          "uid": "6db9dd27-07ab-46b9-838d-44cd24fae35d",
          "name": "SmartApplianceControlAction"
        }
      ],
      "metadata": {
        "general": {
          "tags": [
            "plug"
          ]
        },
        "distribution": {
          "accessibility": [
            "Through an API"
          ]
        },
        "license": {
          "copyrightOwner": "Giorgos",
          "license": "CDLA-Sharing 1.0"
        },
        "extent": {
          "temporalCoverage": {
            "unit": "Not applicable",
            "min": null,
            "field": null,
            "max": null,
            "type": null,
            "value": null
          },
          "spatialCoverage": {
            "unit": "Not applicable",
            "field": null,
            "values": [],
            "coordinates": null,
            "type": null,
            "value": null
          }
        },
        "provenance": {
          "id": 1,
          "name": "Test",
          "type": "data-checkin"
        }
      },
      "schema_fields": [
        "EnergyDemandMeasurements.relatedSmartAppliance.relatedSmartApplianceControlAction.updatedDateTime",
        "EnergyDemandMeasurements.relatedSmartAppliance.relatedSmartApplianceControlAction.powerSwitch",
        "EnergyDemandMeasurements.totalEnergyConsumption",
        "EnergyDemandMeasurements.smartApplianceLoad"
      ],
      "createdBy": {
        "id": 3,
        "firstName": "Georgios",
        "lastName": "Papadopoulos",
        "organisationId": 3,
        "email": "giorgos@suite5.eu"
      },
      "organisation": {
        "id": 3,
        "legalName": "TSG ",
        "businessName": "Testing Solutions Group",
        "description": "TSG provides confidence to organisations that develop innovative systems. We do this by providing assurance and testing services. We make success happen, safely and predictably.",
        "type": "DUMMY"
      }
    },
    {
      "id": 11,
      "name": "Small Office Giorgos Monitor Plug",
      "description": "Consumption and state of the plug with a timestamp from SmartThings for the status",
      "assetTypeId": 1,
      "volume": {
        "value": 2433,
        "unit": "records"
      },
      "status": "available",
      "accessLevel": "Public",
      "createdAt": "2021-08-01T06:48:57.450Z",
      "updatedAt": "2021-09-01T11:00:32.896Z",
      "modifiedAt": "2021-09-01T11:00:32.892Z",
      "concepts": [
        {
          "uid": "32aa2a47-5517-4411-8cb6-dcbb1bede4a2",
          "name": "EnergyDemandMeasurements"
        },
        {
          "uid": "be0d3e5f-c3c3-4ee2-bbbf-65f98f866ffc",
          "name": "SmartAppliance"
        },
        {
          "uid": "6db9dd27-07ab-46b9-838d-44cd24fae35d",
          "name": "SmartApplianceControlAction"
        }
      ],
      "metadata": {
        "general": {
          "tags": [
            "plug"
          ]
        },
        "distribution": {
          "accessibility": [
            "Through an API"
          ]
        },
        "license": {
          "copyrightOwner": "Giorgos",
          "license": "CDLA-Sharing 1.0"
        },
        "extent": {
          "temporalCoverage": {
            "unit": "Not applicable",
            "min": null,
            "field": null,
            "max": null,
            "type": null,
            "value": null
          },
          "spatialCoverage": {
            "unit": "Not applicable",
            "field": null,
            "values": [],
            "coordinates": null,
            "type": null,
            "value": null
          }
        },
        "provenance": {
          "id": 2,
          "name": "Test",
          "type": "data-checkin"
        }
      },
      "schema_fields": [
        "EnergyDemandMeasurements.relatedSmartAppliance.relatedSmartApplianceControlAction.updatedDateTime",
        "EnergyDemandMeasurements.smartApplianceLoad",
        "EnergyDemandMeasurements.totalEnergyConsumption",
        "EnergyDemandMeasurements.relatedSmartAppliance.relatedSmartApplianceControlAction.powerSwitch"
      ],
      "createdBy": {
        "id": 3,
        "firstName": "Georgios",
        "lastName": "Papadopoulos",
        "organisationId": 3,
        "email": "giorgos@suite5.eu"
      },
      "organisation": {
        "id": 3,
        "legalName": "TSG ",
        "businessName": "Testing Solutions Group",
        "description": "TSG provides confidence to organisations that develop innovative systems. We do this by providing assurance and testing services. We make success happen, safely and predictably.",
        "type": "Aggregator"
      }
    },
    {
      "id": 13,
      "name": "Small Office Giorgos Monitor Plug",
      "description": "Consumption and state of the plug with a timestamp from SmartThings for the status",
      "assetTypeId": 1,
      "volume": {
        "value": 2433,
        "unit": "records"
      },
      "status": "available",
      "accessLevel": "Public",
      "createdAt": "2021-08-01T06:48:57.450Z",
      "updatedAt": "2021-09-01T11:00:32.896Z",
      "modifiedAt": "2021-09-01T11:00:32.892Z",
      "concepts": [
        {
          "uid": "32aa2a47-5517-4411-8cb6-dcbb1bede4a2",
          "name": "EnergyDemandMeasurements"
        },
        {
          "uid": "be0d3e5f-c3c3-4ee2-bbbf-65f98f866ffc",
          "name": "SmartAppliance"
        },
        {
          "uid": "6db9dd27-07ab-46b9-838d-44cd24fae35d",
          "name": "SmartApplianceControlAction"
        }
      ],
      "metadata": {
        "general": {
          "tags": [
            "plug"
          ]
        },
        "distribution": {
          "accessibility": [
            "Through an API"
          ]
        },
        "license": {
          "copyrightOwner": "Giorgos",
          "license": "CDLA-Sharing 1.0"
        },
        "extent": {
          "temporalCoverage": {
            "unit": "Not applicable",
            "min": null,
            "field": null,
            "max": null,
            "type": null,
            "value": null
          },
          "spatialCoverage": {
            "unit": "Not applicable",
            "field": null,
            "values": [],
            "coordinates": null,
            "type": null,
            "value": null
          }
        },
        "provenance": {
          "id": 3,
          "name": "Test",
          "type": "data-checkin"
        }
      },
      "schema_fields": [
        "EnergyDemandMeasurements.relatedSmartAppliance.relatedSmartApplianceControlAction.updatedDateTime",
        "EnergyDemandMeasurements.smartApplianceLoad",
        "EnergyDemandMeasurements.totalEnergyConsumption",
        "EnergyDemandMeasurements.relatedSmartAppliance.relatedSmartApplianceControlAction.powerSwitch"
      ],
      "createdBy": {
        "id": 3,
        "firstName": "Georgios",
        "lastName": "Papadopoulos",
        "organisationId": 3,
        "email": "giorgos@suite5.eu"
      },
      "organisation": {
        "id": 3,
        "legalName": "TSG ",
        "businessName": "Testing Solutions Group",
        "description": "TSG provides confidence to organisations that develop innovative systems. We do this by providing assurance and testing services. We make success happen, safely and predictably.",
        "type": "Aggregator"
      }
    }
  ],
  "facets": {
    "status": [
      {
        "value": "available",
        "count": 3,
        "selected": false
      }
    ],
    "category": [
      {
        "value": "32aa2a47-5517-4411-8cb6-dcbb1bede4a2",
        "count": 3,
        "selected": true
      },
      {
        "value": "6db9dd27-07ab-46b9-838d-44cd24fae35d",
        "count": 3,
        "selected": false
      },
      {
        "value": "be0d3e5f-c3c3-4ee2-bbbf-65f98f866ffc",
        "count": 3,
        "selected": false
      }
    ],
    "accessibility": [
      {
        "value": "Through an API",
        "count": 3,
        "selected": false
      }
    ],
    "accrualPeriodicity": [
      {
        "value": "Hourly",
        "count": 3,
        "selected": false
      }
    ],
    "accrualMethod": [
      {
        "value": "Through an API",
        "count": 3,
        "selected": false
      }
    ],
    "temporalResolution": [
      {
        "value": "Per Hour",
        "count": 3,
        "selected": false
      }
    ],
    "spatialCoverage": []
  },
  "total": 3
}

Pipeline Search endpoints

General

There is a single search endpoint that provides pipeline search across (a) analytics pipelines only, (b) data-checkin pipelines only. The field pipelineType in the request payload (its value) is what defines which of the above searches is of interest, as follows:

  • pipelineType:analytics --> analytics pipelines
  • pipelineType:data-checkin --> data-checkin pipelines

For each of the above searches, we can adjust the following:

  1. Supported sort_by options. Now common across all:

    • relevance (based on free text matching)
    • date created (asc, desc)
    • date updated (asc, desc)
    • title, which corresponds to the name field in ES (asc, desc)
    • execution status (asc, desc)
  2. Supported filtering options. Now we have some common options across all:

    • execution status
    • execution location (cloud, on premise)
    • date updated
    • user

    and some defined per pipeline type:

    • Analytics:

      • configuration including (visualisation, schedule)
      • block category
      • framework (spark, python)
    • Data-checkin:

      • date created
      • step (mapping, cleaning, encryption)
      • harvesting option (file, data provider API, platform API)
      • scheduling option (active, expired, future, no schedule)
      • schedule horizon (date range)
  3. Fields on which the free-text search is applied. Now common across all:

    • title (name)
    • description

Additional settings:

  1. In free text search there are two options: case sensitive matching and partial matching.
  2. Pagination: Each search request should define the page number and page size, which are used to apply pagination through ES.
  3. The * character in the free text search field is used to denote that any document should match, i.e. the user is not searching for any particular terms.
  4. The settings.organisationId and the settings.userId fields are used to ensure that only user's pipelines (in case only settings.userId is defined) or pipelines that belong to the user's organisation but are not theirs (in case both settings.organisationId and settings.userId are defined) are returned.

Perform search

POST /pipeline-search

Request sample

{
  "pipelineType": "analytics",
  "query": {
    "text": "*",
    "settings": {
      "caseSensitive": false,
      "partialMatch": false
    }
  },
  "filters": {
    "status": "completed",
    "dateUpdated": {
      "min": null,
      "max": null
    }
  },
  "sortBy": {
    "field": "title",
    "asc": true
  },
  "pagination": {
    "size": 10,
    "page": 1
  },
  "settings": {
    "organisationId": 3,
    "userId": 1
  }
}

Response sample

Every response has 2 high-level fields: results, total.

{
  "results": [
    {
      "id": "148c9b52-ec82-468b-b5cf-b16654f1d963",
      "name": "Test Workflow",
      "description": "Test Workflow description",
      "framework": "python3",
      "executionStatus": "completed",
      "accessLevel": "Public",
      "provenanceAssetIds": [],
      "provenanceAssets": [],
      "inputAssetIds": [],
      "inputAssets": [],
      "visualisations": [],
      "createdAt": "2021-08-01T06:48:57.450Z",
      "updatedAt": "2021-09-01T11:00:32.896Z",
      "createdBy": {
        "id": 3,
        "firstName": "A",
        "lastName": "Tester"
      }
    }
  ],
  "total": 1
}