Data Stream is a REST based web hook which can be configured to POST all changes which occur in Cortex to a web service of your choosing.
The purpose of this is to allow an organization to maintain a copy of all Cortex data in a database or system of your choosing, for ultimate access to organisation specific analytics and system integration. Only one Data Stream can be configured per environment.
Data Stream can be configured in the "Settings" area in Cortex admin.
Here you can enable and disable Data Stream.
URL: The URL that Cortex will POST data to. This must be a secure server.
Shared Secret: This is a long random string used to authorise the request. The value of this is set in the authorisation header sent to the server. The server should verify this, to authenticate the source of the incoming data.
Pin Server Certificate: This allows you to specify the certificate you expect to see from the server. Pinning the server certificate also allows the use of self-signed certificates.
Records Per Request: This limits how many records will appear in each request, this can range from 1 to 1,000.
Rate Limit: This limits how many requests are made per minute to your webservice, this can range from 1 to 300.
Enabled Channels: This determines which 'channels' are included in the Data Stream.
Channels
A channel is a logical grouping of types of Cortex documents. There are currently two channels:
'entities' - this contains clinical data along with other system configuration objects.
'logs' - this contains audit and application log documents.
POST Request Format
The structure of the JSON POST data is as follows:
{ "channel": "<entities | logs>", "organization_id": "<organisation id for environment>", "documents": [ { "document": {…}, "sequence": <sequence integer> }, . . . ] }
‘channel’ is a logical grouping of types of documents, with ‘entities’ being most clinical data, and ‘logs’ containing auditing and application log entries, there may be additional channels added in the future.
‘organization_id’ is a unique identifier that identifies each Cortex environment.
‘documents’ is an array of the actual documents in this batch, ordered by ascending sequence number. All Cortex information is stored as JSON ‘documents’ with a ’type’ property identifying the object type (be it user, patient, note, order, job, admission, workflow etc).
This POST body is additionally compressed using gzip, and the following headers are set:
Content-type | application/json; charset=utf8 |
Encoding | gzip |
Authorization | bearer <shared secret> |
Expected Response Format
A successful response from the server to these POST request should return an HTTP status 200, and contain a JSON payload of the form:
{ "status": "ok", "organization_id": "<organization id as in the submitted json>", "channel": "<channel as in the submitted JSON>", "max_sequence_processed": <the highest sequence from the documents processed in the request> }
'status': a string value of "ok" indicates that the request was understood, and persisted.
'organization_id': this string value should match the organization id passed in the request.
'channel': this string value should match the channel passed in the request.
'max_sequence_processed': this integer value contains the maximum sequence number from the request that was processed successfully. Usually this value with equal the maximum value of all the sequence numbers in the request, however it may also be smaller, in the case that only a portion of documents were successfully processed. Data Stream uses this value to construct the next batch of data to send on the next request.
Processing Requests
First you should authorize the request, verifying that the shared secret in the 'Authorization' header is valid.
Inspect the 'Encoding' header to verify that the encoding is 'gzip' and decompress the payload.
Each document has an associated ‘id’. This uniquely identifies the document, ie its ‘primary key’.
Each document has an associated ’sequence’ number. This is a global monotonically increasing integer associated with each document creation or change (a change to a document will increase its sequence number). It can be thought of as a ‘version number’ for a document, if a document is seen with a higher sequence number, it is considered a more recent ‘version’ of that document.
Documents should be processed in ascending order of their sequence number. The sequence number is incremented for any change to any document in Cortex, so also defines the order of changes to data. It is recommend to store the sequence of processed documents. This will allow you to provide a ‘resume point’ for streaming data, based on the maximum value of the sequence stored. Sequence numbers should be updated against documents when processing them. The sequence number is scoped to the channel.
Individual document structures may change over time, so it is recommended that the entire JSON document is persisted in it's provided form, using an appropriate JSON data type in your target database. From there, specific denormalisation or secondary indexing of key paths can happen.
Resetting Data Stream
It is possible to reset Data Stream, such that all data is resent, or restarted from a specific sequence number.
To reset the sequence number which Data Stream, first disable the stream.
Then edit the Data Stream properties, for each channel you will have the option to reset the sequence manually. Leave this field empty if you do not wish to reset the sequence for that channel. To have Data Stream resend all data, enter 0, or else determine the sequence number you wish to reset to from your local copy of data.