Incapture Technologies

Inside the Cloud

Incapture Technologies Blog

 

Data Provenance

Published:
December 4, 2014
Author:

When developing portfolio management and attribution applications at other firms a requirement that was never explicitly called out but absolutely required was some measure of information provenance. The conversation started usually with the end result of a long process – “why was this aspect of the portfolio down (or up)?” or more usually “All things being equal this is different, why?”. The inputs driving that performance spanned multiple systems, people and processes. It could have been driven by simple market moves, a manual intervention somewhere in a process (particularly in a quantitive based investment process) or a poorly implemented hedge. It could also be caused by friction in the process – a delay between decision and execution. In those systems it was difficult to get a real handle on answering these questions easily – each question was more a project for an analyst and something that was difficult to automate.

With Rapture we wanted to move closer to this idea of capturing data about the process and the relationships between activities and information created and managed by these processes. Once the data is captured the idea is to present that data through a Rapture API so that applications or tools can be built to automate the generation of answers to these “why?” questions.

At its core Rapture defines the idea of a “relationship repository”. The concept follows the same model as other repository types (such as documents) in that applications use a standard API to interact with a repository and Rapture provides the underlying bindings to a lower level implementation, usually in some external data store. A relationship repository manages the storage of directed graphs of information. For those not familiar with the concept of a directed graph the following diagram will help to explain:

SimpleDG

Here we have two “concepts”, A and B. They are linked together by a relationship R and the relationship goes from A to B (hence the direction). Real directed graphs are much more complicated. The following diagram could perhaps document the relationship between trading orders generating trade executions which form a set of positions which with a set of prices and curves produce some measurement of risk. (Some labels have been abbreviated for clarity).

ComplexDG

In Rapture’s case we want to have a method to capture these relationships – the api gives an application developer the ability to register and manage the links between entities which can be arbitrary and application specific. An important distinction is a special system relationship repository that Rapture can maintain automatically. The idea behind this repository is the realization that the URL within Rapture is a unique reference to an entity in the system – the idea of the URL:

doc://idp.order/ABC/ONE/ORD1123345@1

explicitly references the first version of a trading order in a system. The other factor is that Rapture maintains a context as the Rapture API is used – whether the API is used in a complex workflow/decision process or by an application manually invoking the API. So that if Rapture detects that something has just made the following pseudo api calls:

get content doc://idp.order/ABC/ONE/ORD1234455
put content doc://idp.execution/ABC/TRD1223

Then there could be a relationship between these two entities and Rapture can (if so configured) record that relationship automatically. Furthermore there are other Rapture entities that could be added to this relationship map – the user entity that made these calls could be bound to each of these entities, the application (if known) or script or workflow that is in play could also be recorded. In fact the simple act of loading and saving some content could generate a reasonably large amount of relationship data – with this feature enabled (it is optional) Rapture becomes a “big data” generator on the relationships between people, processes and data in the environment. The scalability of such a repository is dictated by the underlying implementation attached to the feature and the usefulness of the data is dictated by the sophistication of an application written to interrogate and navigate the relationship repository.

The API for manual definition and retrieval of these concepts is known as the “relationship” api in Rapture. A selection of calls is reproduced below:

   // Create a repository in which to store relationship information
   Boolean createRelationshipRepo(String relationshipRepoURI, String config);

   // Store a relationship link and return its URI
   String createRelationship(String relationshipAuthorityURI, String fromURI, String toURI, String label, Map(String,String) properties);

   // Retrieve a relationship link
   RaptureRelationship getRelationship(String relationshipURI);

   // Delete a relationship link
   Boolean deleteRelationship(String relationshipURI);
   
   // Get all the relationship links with the specified rapture resource as the "from" half of the link.
   List(RaptureRelationship) getOutboundRelationships(String relationshipRepoURI, String fromURI);
   
   // Get all the relationship links with the specified rapture resource as the "to" half of the link.
   List(RaptureRelationship) getInboundRelationships(String relationshipRepoURI, String toURI);
   
   // Get all the relationship links with the specified label.
   List(RaptureRelationship) getLabledRelationships(String relationshipRepoURI, String relationshipLabel);   
   
   // Get relationships from a given node
   RaptureRelationshipRegion getRelationshipCenteredOn(String relationshipNodeURI, Map(String, String) options);   

In many cases in the calls above the options parameter is used to fine tune queries (depth of search and filters).

To circle back to the first paragraph of this post – the answer to some of these questions could be answered in a technical way by an application that could visualize the idea that we have captured the facts that:

This position was generated from these trades and these trades came from these orders that were created by this application process. Order 15 of this set was manually changed by this user with the comment “manual override due to liquidity constraints”. The landscape of relationships for “today” versus “yesterday” differed only by this manual change and on a 30 day history when a manual change has been made this user has improved performance on 80% of the changes.

An admirable goal – not Big Brother, more effective use of data to improve processes and inform decision making – one of our reasons for building Rapture in the first place.

As before – if you’d like more information about Incapture or Rapture please drop me a line personally or to our general email address info@incapturetechnologies.com and we will get back to you for a more in depth discussion.


Subscribe for updates