Search

The search page gives the ability to the user to query the available datasets based on the data model and metadata, and build and store queries that can be used via an API to retrieve data from the selected datasets.

Defining a query

Here the user starts by doing a text-based search and/or adding filters with concepts and metadata and/or defining data queries. Based on the text, filters and data queries the appropriate results are shown on the right.

The text-based search can be used in two ways:

  1. As a free-text search, allowing the user to provide terms which are used to search across the datasets' information (in Elasticsearch). Specifically, the terms provided by the user are searched across the following dataset fields: name, description, tags, domain, concepts and tokenized concepts.
  2. As a spatial search, in which case the user needs to select one of the domain's spatial identification fields and give one or more values. The search will return datasets that both (a) have the specific field defined as spatial identifier and (b) the value(s) that the user provided are indeed found in the specific field.

Filters

Filters are based on the facets of the data filtered from the text-based search. Currently supported facets are: fields, concepts (categories), domains, distribution types, distribution formats, accessibility and language. Selecting a specific filter will filter the results shown but will not affect the available filters. Each filter shows the number of results matching it. This unfortunately does not take into account any results filtered out by access policies. Filters are stored along with the query when saved.

Data Queries

Data queries can be defined on selected concepts and include the following conditions: equals, not equals for all type of concepts, contains, starts with, ends with for string concepts and greater than, greater than or equal to, less than, less than or equal to for numeric and datetime concepts. Conditions can be combined using AND/OR operators.

Results

Results show the matched datasets based on the free-text search, filters and data queries defined by the user. Results are also refined further in case the user doesn't have access to certain datasets based on the dataset's access policies. If any results are missing due to access policies, an informative message is shown at the bottom of the search results.

As part of the search results a user can select multiple in order to setup their query. The limitation here is that the datasets must be of the same kind; for example they need to all from file upload, or all from kafka or all from an API. There is an exception case where some kafka datasets are compatible with api datasets. This is identified from the accessibility metatdata of the search result which defines the methods with which a dataset is available.

Saved queries

The user is forced before moving to the next step to save the selection as a new query. Once at least one query is saved the user can load it by clicking on the saved queries button next to the search box. Loading the saved query and going to the next step will update the loaded query.

Configuring a query

Once the user has selected at least one result of interested and they move to the second step they will receive the configure query page which depends on the type of results they have selected.

Dataset query

If the results selected are from an API (or kafka in some permitted cases) they will receive the dataset configure screen. Here they need to select at least one concept from each dataset before they proceed. They can also select certain concepts to be used as a configurable query parameter during the API. For a parameter to be available to be used as a query parameter they need to have their indexed metadata to true.

The page also shows a preview to the user of what their API URL would look like, giving also an example for any selected metadata.

File query

If the results selected are from a file or many files that were not processed (e.g. other files from data check-in jobs in which the mapping step was not enabled), user will have to choose between direct download and indirect download (not available yet). If direct download is selected then user can click to a file's URL to download it and if indirect download is selected then user has to make an API call in order to get a list of URLs to download the files.

Streaming query

If the results selected are from kafka, the user will be able to see the name of the kafka topic and sample streaming data. The user can subscribe to the specific kafka topic. Subscribing to the topic, the user will receive a set of credentials (username, password, group id, connection url), which can be used to activate a consumer to read streaming data from the topic. Coming back to the same query, there will be an option for resetting the kafka password.

Test Query

As soon as the query parameters (that will be enabled for the query) are defined, user will be redirected to the Test Query page. An editor will be available with an example payload containing the query parameters with placeholder values. The user can edit the payload and enter the desired values for the parameters. By pressing the Run Query button, the query is executed against the production database and a sample of the actual response is returned (max 5 results pre asset). The user can play with the parameters values and see if the query he defined is acceptable. If it is ok he can proceed with the next step or go back, redefine the parameters and repeat the process.

Retrieval

In the final step, the user is presented with instructions for the actual retrieval. There are 2 available endpoints (GET and POST). For the GET endpoint, the parameters need to be sent as query parameters, while for the POST endpoint, the query parameters need to be sent inside the body of the request. Regarding the pagination, the user has the ability to select the pageSize and lastRecordId parameters. If multiple datasets are selected in Step 1, a maximum of 100 results per dataset will be returned. Lastly, both the retrieval endpoints require the user to provide the X-API-TOKEN header.