Incapture Technologies

Rapture Architecture

Incapture Technologies Blog

 

Rapture Architecture

Entitlements in Practice

Building from an earlier blog post which provides conceptual grounding on entitlements, this post provides some practical examples of how to implement entitlements in Rapture.

Overview

Entitlements in Rapture allow administrators to clearly define who can access what in Rapture.  It is a permissioning system based on users, groups, and entitlements.  API calls made to Rapture are protected by entitlements that are defined at compile-time.  A defined entitlement is associated with a number of groups of users, and this association can be made at run-time.

Concepts/Terminology

User – A user represents a person who is making calls to Rapture or an application that is making calls to Rapture. A user is a single entity with a username/password who needs access to Rapture.
Group – A group represents a collection of users.
Entitlement – An entitlement is a named permission that has associated with it 0 or more groups. If an entitlement has no groups associated with it, it is essentially open and any defined user in Rapture can access it. If an entitlement has at least 1 group associated with it, any user wishing to access the resource protected by this entitlement, must be a member of one of the associated groups.

Example

The use of entitlements is best explained by using a simple example.

User “bob” is a defined user in Rapture.  He writes a Reflex script to update the description associated with his username in Rapture.  Thus, he wants to use the updateMyDescription API call.

#user.updateMyDescription("My name is Bob");

He is successful.  What happened underneath the hood?

Let’s first examine how updateMyDescription is defined in the user.api file in the RaptureNew/ApiGen project.  Every API call in Rapture has a defined entitlement associated with it.

[Update the current description for a user.]
@entitle=/user/write/$u
@public RaptureUser updateMyDescription(String description);

The entitlement string for the updateMyDescription call is defined as “/user/write”.  Entitlements are always defined as hierarchical slashed strings with an optional wildcard.  This means if a user has permissions for /user he will be permissioned for entitlements /user/read and /user/write/ as well.  If a user has permission for /user/xxx, that does not mean has has permissions for /user/yyy, but he does have permissions for /user/xxx/yyy.

On startup, a brand new Rapture instance always creates every single entitlement possible (by scanning every single *.api file) and initializes it as empty.  This means any defined user has permissions to all entitlements on startup of a clean brand-new Rapture instance.  Using the Entitlements API, users can then be added to groups, and groups can then be associated with particular entitlements to control access.  These definitions and associations are persisted to the configuration repository.

Back to our example.  Assuming that “bob” executed that api call against a brand new instance of Rapture, the entitlement of “/user/write” would have had 0 groups associated with it.  The entitlement check would have passed, since remember, an entitlement with 0 associated groups is wide-open to all defined users in Rapture.  How do we make it not pass?  We have to use the Entitlements API to associate a group of users with the entitlement “/user/write”.

#entitlement.addEntitlementGroup("groupThatHasAccessToUserWrite");
#entitlement.addUserToEntitlementGroup("groupThatHasAccessToUserWrite", "alice");
#entitlement.addGroupToEntitlement("/user/write", "groupThatHasAccessToUserWrite");

Notice above that only the user “alice” has been assigned to the group “groupThatHasAccessToUserWrite”.  That group was associated with the “/user/write” entitlement.  User “bob” is not a member of that group.  Therefore, if Bob were to execute his call again after the above changes were made, it would fail.  In order for Bob to be able to make that call, he would have to be added to that group using the Entitlements API.

Dynamic Entitlements

Dynamic entitlements are entitlements with a wildcard in the string, such as /user/put/$d.  The wildcard is substituted at runtime based on the argument(s) of the API call that is made.  This allows Rapture to define entitlements that are based on the actual arguments of the API call being made.  Here is a table showing the currently defined substitutions:

Wildcard
Substituted With
$d documentPath
$a authority
$f full path (i.e. authority/documentPath)
$u current_user

Another Example

User “bob” wants to read a document out of Rapture.  Thus, he writes a Reflex script to use the getContent API call, which has the following definition:

[Retrieve the content for a document.]
@entitle=/data/read/$f(docURI)
@public String getContent(String docURI);

Bob’s Reflex script:

#user.getContent("//myauthority/alicesDocs/doc");

The $f in this entitlement gets substituted such that Bob’s entitlement request looks like the following:

“/data/read/myAuthority/alicesDocs/doc”

The entitlement system will check if Bob is member of the group associated with that entitlement.  For the sake of this example, let’s assume that Alice had previously created an entitlement “/data/read/myAuthority/alicesDocs/doc” with a group with just her username included.  She basically wanted an area that she can keep private.  This means Bob’s call will fail.  He is not a member of the group associated with the entitlement “/data/read/myAuthority/alicesDocs/doc”.

The wildcard substitutions defined above are based on the Rapture URI argument that is passed into the call.  The values of documentPath, partition, and authority are all components of a RaptureURI object.  At this point, it only makes sense to use dynamic entitlements with API calls that have a RaptureURI string as an argument.


Data Provenance

When developing portfolio management and attribution applications at other firms a requirement that was never explicitly called out but absolutely required was some measure of information provenance. The conversation started usually with the end result of a long process – “why was this aspect of the portfolio down (or up)?” or more usually “All things being equal this is different, why?”. The inputs driving that performance spanned multiple systems, people and processes. It could have been driven by simple market moves, a manual intervention somewhere in a process (particularly in a quantitive based investment process) or a poorly implemented hedge. It could also be caused by friction in the process – a delay between decision and execution. In those systems it was difficult to get a real handle on answering these questions easily – each question was more a project for an analyst and something that was difficult to automate.

With Rapture we wanted to move closer to this idea of capturing data about the process and the relationships between activities and information created and managed by these processes. Once the data is captured the idea is to present that data through a Rapture API so that applications or tools can be built to automate the generation of answers to these “why?” questions.

At its core Rapture defines the idea of a “relationship repository”. The concept follows the same model as other repository types (such as documents) in that applications use a standard API to interact with a repository and Rapture provides the underlying bindings to a lower level implementation, usually in some external data store. A relationship repository manages the storage of directed graphs of information. For those not familiar with the concept of a directed graph the following diagram will help to explain:

SimpleDG

Here we have two “concepts”, A and B. They are linked together by a relationship R and the relationship goes from A to B (hence the direction). Real directed graphs are much more complicated. The following diagram could perhaps document the relationship between trading orders generating trade executions which form a set of positions which with a set of prices and curves produce some measurement of risk. (Some labels have been abbreviated for clarity).

ComplexDG

In Rapture’s case we want to have a method to capture these relationships – the api gives an application developer the ability to register and manage the links between entities which can be arbitrary and application specific. An important distinction is a special system relationship repository that Rapture can maintain automatically. The idea behind this repository is the realization that the URL within Rapture is a unique reference to an entity in the system – the idea of the URL:

doc://idp.order/ABC/ONE/ORD1123345@1

explicitly references the first version of a trading order in a system. The other factor is that Rapture maintains a context as the Rapture API is used – whether the API is used in a complex workflow/decision process or by an application manually invoking the API. So that if Rapture detects that something has just made the following pseudo api calls:

get content doc://idp.order/ABC/ONE/ORD1234455
put content doc://idp.execution/ABC/TRD1223

Then there could be a relationship between these two entities and Rapture can (if so configured) record that relationship automatically. Furthermore there are other Rapture entities that could be added to this relationship map – the user entity that made these calls could be bound to each of these entities, the application (if known) or script or workflow that is in play could also be recorded. In fact the simple act of loading and saving some content could generate a reasonably large amount of relationship data – with this feature enabled (it is optional) Rapture becomes a “big data” generator on the relationships between people, processes and data in the environment. The scalability of such a repository is dictated by the underlying implementation attached to the feature and the usefulness of the data is dictated by the sophistication of an application written to interrogate and navigate the relationship repository.

The API for manual definition and retrieval of these concepts is known as the “relationship” api in Rapture. A selection of calls is reproduced below:

   // Create a repository in which to store relationship information
   Boolean createRelationshipRepo(String relationshipRepoURI, String config);

   // Store a relationship link and return its URI
   String createRelationship(String relationshipAuthorityURI, String fromURI, String toURI, String label, Map(String,String) properties);

   // Retrieve a relationship link
   RaptureRelationship getRelationship(String relationshipURI);

   // Delete a relationship link
   Boolean deleteRelationship(String relationshipURI);
   
   // Get all the relationship links with the specified rapture resource as the "from" half of the link.
   List(RaptureRelationship) getOutboundRelationships(String relationshipRepoURI, String fromURI);
   
   // Get all the relationship links with the specified rapture resource as the "to" half of the link.
   List(RaptureRelationship) getInboundRelationships(String relationshipRepoURI, String toURI);
   
   // Get all the relationship links with the specified label.
   List(RaptureRelationship) getLabledRelationships(String relationshipRepoURI, String relationshipLabel);   
   
   // Get relationships from a given node
   RaptureRelationshipRegion getRelationshipCenteredOn(String relationshipNodeURI, Map(String, String) options);   

In many cases in the calls above the options parameter is used to fine tune queries (depth of search and filters).

To circle back to the first paragraph of this post – the answer to some of these questions could be answered in a technical way by an application that could visualize the idea that we have captured the facts that:

This position was generated from these trades and these trades came from these orders that were created by this application process. Order 15 of this set was manually changed by this user with the comment “manual override due to liquidity constraints”. The landscape of relationships for “today” versus “yesterday” differed only by this manual change and on a 30 day history when a manual change has been made this user has improved performance on 80% of the changes.

An admirable goal – not Big Brother, more effective use of data to improve processes and inform decision making – one of our reasons for building Rapture in the first place.

As before – if you’d like more information about Incapture or Rapture please drop me a line personally or to our general email address info@incapturetechnologies.com and we will get back to you for a more in depth discussion.


Document Meta data and versions

In a previous post we talked about Rapture Document repositories and mentioned briefly the idea that documents had attached “metadata”. This post talks about that metadata and also a related feature – document versioning.

All documents in repositories can have additional information stored with them. Some of this information is generated automatically by Rapture but a user can use the Rapture API to add their own custom meta data associated with the document. When retrieving data from Rapture you can ask for just the document contents, just the meta data or both.

The automatic system meta data generated contains the following information:

  • The user name associated with the api context that saved the document
  • The date and time the document was saved (in UTC)
  • The current version of the document – even if the repository does not store previous versions a understanding of how many times the document was saved can give an indication of the “version”.

API calls

There are a number of API calls that directly manipulate or retrieve meta data. They are all in the “doc” api section:

DocumentWithMeta getMetaContent(String docURI);
DocumentMetadata getMetaData(String docURI);

Custom meta data is also known as “attributes”. The following api calls manage attributes:

Boolean addDocumentAttribute(String attributeURI, String value);
List<Object> addDocumentAttributes(String attributeURI, List<String> keys, List<String> values);
XferDocumentAttribute getDocumentAttribute(String attributeURI);
List(XferDocumentAttribute) getDocumentAttributes(String attributeURI);
Boolean removeDocumentAttribute(String attributeURI);

It is worth talking about the “attribute URI” parameter in the above calls. Within Rapture, attributes on documents use the “bookmark” uri convention to define the attribute. So, given a document called:

//idp.data/test/one

Setting an attribute “a1” to the value “42” would be a call similar to the one below (given in Reflex)

#doc.addDocumentAttribute(“//idp.data/test/one#a1”,”42”);

Retrieval of a single attribute follows the same convention.

Versioning

A versioned document repository is one where previous versions of documents can be retained and retrieved at a later date. As you add new documents to a repository any existing documents are preserved in this version history. An api call can be used to trim out old versions from a repository.

Internally a document repository maintains a concept of “the current version”. This is the version that is retrieved if you pass in an unadorned URI to an api call (e.g. //idp.data/test/one above). Using a getMetaData call you can retrieve the version of that latest document and then use a slightly adjusted URI to retrieve previous versions. The script below illustrates this concept:

docMeta = #doc.getMetaContent(“//idp.data/test/one”);
version = docMeta.metaData.version;
previous = version - 1;
previousDocument <— (“//idp.data/test/one@“ + previous);

VersionHistory

You could loop back until you get to version 1 (the first document created) although depending on archiving status earlier versions may already have been expunged from the system.

There are a couple of API calls that rely on version information:

DocumentWithMeta revertDocument(String docURI);
Boolean archiveVersions(String repoURI, int versionLimit, long timeLimit, Boolean ensureVersionLimit);
Boolean putContentWithVersion(String docURI, String content, int currentVersion);

Revert document takes the previous version of a document and makes that the latest version of a document (making the existing “latest version” the previous version)

Archive versions removes old versions of documents given limits on how many versions or time should be preserved (e.g. keep all versions written in the last month, make sure that you keep at least ten versions).

Put content with version attempts to save a new document over an existing one. It assumes that the code saving the document has previously loaded the document and therefore has knowledge of the version of the document it has loaded. If that version does not match the call will fail. This is a form of “optimistic locking” – the assumption is that no-one else should be modifying this document but if someone does the call will fail and will have to be retried (reloaded and re-edited perhaps). Rapture also has a locking system for more formal procedures around modification of data.

This short post gave a quick overview of additional information that can be stored with a document (meta data and attributes) and for some repositories how versioning can be used to maintain a complete history of the changes to a document over time.

As before – if you’d like more information about Incapture or Rapture please drop me a line personally or to our general email address info@incapturetechnologies.com and we will get back to you for a more in depth discussion.


Entitlements

A very important part of any platform environment is the means by which users of the platform are given privileges to perform actions on the platform. Rapture has an entitlements system that provides this function.

Within Rapture everything that takes place is coordinated through an API call – even internally larger “meta” api calls in turn call lower API calls to make their activity happen – nothing in the system bypasses this mechanism. As an example, creating a Reflex script in Rapture actually involves (lower down) saving content to a special internal document repository – and the means by which that takes place is by calling the “putContent” API call for document repositories.

Also every user in Rapture, once logged in, has a concept of a “context” – this is server side state (shared with all Rapture instances) that identify the user associated with a given request in a secure way. This context is always passed (internally) through every API call made in Rapture.

Rapture’s entitlements system works at the API level and by the concept that users can be part of functional entitlement “groups”. The concept of “an entitlement” is a function of the API call being made and the parameters to that API call. It’s best illustrated with an example:

In an earlier post we talked about the API to Rapture and the fact that the API was template based and a process autogenerates both server side and client code. Here is an example of a single call in this template language in all of its glory:

[Store a document in the Rapture system. The new URI name is returned; 
  this allows parts of the new name to be automatically generated]
    @entitle=/data/write/$f(docURI)
    @public String putContent(String docURI, String content);

In this example we are defining the function “putContent” in the document (doc) api. The important line for this discussion is

    @entitle=/data/write/$f(docURI)

This instructs Rapture to check for the “entitlement” that is made up of the prefix /data/write combined with the document URI parameter. So, if we were calling the function like this:

    #doc.putContent("//test/london/one", ...);

The entitlement to check would be:

    /data/write/test/london/one

Rapture now looks for the most specific entitlement that matches that string. By that it checks entitlements in the following order:

/data/write/test/london/one
/data/write/test/london
/data/write/test
/data/write
/data

If it finds a match for that entitlement it looks at which entitlement groups are associated with that entitlement and then sees whether the calling user is a member of any of those groups. If they are the API call can proceed as normal. So in this way the naming of entities and their structure can be important in a Rapture environment if you wish to use this structure to assist in proper partitioning of roles. In the example above the document was named “//test/london/one” with an implication that perhaps an entitlement group called “London” could be part of an entitlement “/data/write/test/london” and only members of that group could write documents below that part of the naming hierarchy.

The final piece of entitlements is how they are created and managed. Rapture provides some User Interfaces to manage these entities but these are simply using the “entitlements” api. Of course the entitlements api is controlled through entitlements as well and there are specific checks in some of these api calls to ensure that a call to an entitlements api doesn’t revoke all access and produce an irreversible outcome! The api is really a “CRUD” api across entitlements and entitlement groups:

[Entitlements are a very important part of the security of Rapture, and the Entitlement api is the way in which information about this entitlements is updated. The api is of course protected by the same entitlements system, so care must be taken to not remove your own entitlement to this api through the \emph{use} of this api.

Entitlements work like this. Users can be members of entitlement groups, and entitlement groups are members of entitlements. Each api call within Rapture is associated with an entitlement path, and when a user wishes to execute that api call they are checked to see if they are a member of that entitlement (by seeing which groups they are members of). Some api calls have dynamic entitlements, where the full name of the entitlement is derived from fundamental concepts such as typename, displayname, queuename etc. If an entitlement with the specific name exists that is used, otherwise the full entitlement path is truncated one part at a time until an entitlement is found.]

api(Entitlement) {
    @entitle=/admin/ent
    @public List(RaptureEntitlement) getEntitlements();

    @entitle=/admin/ent
    @public RaptureEntitlement getEntitlement(String entitlementName);

    @entitle=/admin/ent
    @public RaptureEntitlement getEntitlementByAddress(String entitlementURI);

    @entitle=/admin/ent
    @public RaptureEntitlementGroup getEntitlementGroup(String groupName);

    @entitle=/admin/ent
    @public RaptureEntitlementGroup getEntitlementGroupByAddress(String groupURI);

    @entitle=/admin/ent
    @public List(RaptureEntitlementGroup) getEntitlementGroups();

    @entitle=/admin/ent
    @public RaptureEntitlement addEntitlement(String entitlementName, String initialGroup);

    @entitle=/admin/ent
    @public RaptureEntitlement addGroupToEntitlement(String entitlementName, String groupName);

    @entitle=/admin/ent
    @public RaptureEntitlement removeGroupFromEntitlement(String entitlementName, String groupName);

    @entitle=/admin/ent
    @public Boolean deleteEntitlement(String entitlementName);

    @entitle=/admin/ent
    @public Boolean deleteEntitlementGroup(String groupName);

    @entitle=/admin/ent
    @public RaptureEntitlementGroup addEntitlementGroup(String groupName);

    @entitle=/admin/ent
    @public RaptureEntitlementGroup addUserToEntitlementGroup(String groupName, String userName);

    @entitle=/admin/ent
    @public RaptureEntitlementGroup removeUserFromEntitlementGroup(String groupName, String userName);
}

Rapture also has the concept of dynamic user membership of groups. In this case the membership of groups can be delegated to a user (well, developer!) defined class that can potentially reach out to an external system to determine that. An example of where that is used with our system company Incapture Investments is where we want to be explicitly sure that the user making a request for data that originated from Bloomberg is made by a user that is actually logged into Bloomberg at that point. Other uses could include using an external LDAP directory for group membership.

A simple real world example will help close up on entitlements. The screenshot below shows a number of entitlement groups used in a particular Rapture deployment – with functional groups of users being assigned read rights to parts of the data environment :

Entitlements

As before – if you’d like more information about Incapture or Rapture please drop me a line personally or to our general email address info@incapturetechnologies.com and we will get back to you for a more in depth discussion.


Rapture Runner

In a cloud/distributed environment with a large number of standard processes running we wanted a standard mechanism for ensuring that all processes have a consistent configuration and that there is a central place that operational managers can view the “state of the system”. We create the Rapture Runner process to handle this – it is not an absolute requirement that Rapture processes are coordinated through Rapture Runner but for some deployments it can be a very useful tool to manage what could be a complex environment.

Rapture Runner is an embedded Rapture application – it contains a complete copy of the Rapture kernel and can therefore access all of the APIs of Rapture internally. In an environment using Rapture Runner it is intended that this is the master process for all Rapture processes in an environment – each server contains one instance of Rapture Runner. As the process starts up it connects to the Rapture data and messaging environment and spawn any other Rapture processes that are configured to run on a given server. These processes could be binaries already installed on the server (through something such as Puppet or Chef perhaps) or stored within Rapture as blobs. The configuration for what processes are run on what servers is stored within Rapture and managed through a specific “runner” api.

The general architecture is shown in the diagram below:

RaptureRunner

Runner Configuration

Rapture Runner manages three main concepts:

Applications and Libraries

An application in Rapture Runner is a process that can be run. The definition in Rapture Runner (showing the web front end view below) gives an application a name, description and version. This information is used by Rapture Runner to locate the binaries for the application along with the “APPSOURCE” configuration parameter. A similar technique is used to define optional (java) libraries that should be embedded within the downloaded application.

Rapture Runner applications

Schedule

A schedule defines how and when an application should be run. Using a cron style definition, Rapture Runner can choose to ensure a process is running on certain days and between certain times. The schedule also controls which “server group” the process should be running on:

Rapture Runner Schedule

Server group

Finally Rapture Runner manages the concept of a “server group” – containing an explicit or implicit definition of what physical/virtual servers are members of a given group:

Rapture Server Groups

Server startup

With this configuration in mind we can now describe how Rapture Runner performs its work. On startup, an instance determines which server groups it belongs to. Membership of a server group implies that, using the schedule entries, certain processes should be running. For those processes that should be running, Rapture Runner downloads the application and library binaries and starts the application.

From that point on Rapture Runner is actively checking 3 things. If the schedule indicates that a new process needs to be started or an existing process should be stopped (perhaps a maintenance window) that activity is performed. If an application managed by Rapture Runner terminates unexpectedly it is restarted – up to a maximum restart count. Finally through the API (and through the web interface) an application can be manually restarted or reconfigured.

Status

The Rapture Runner API also manages the state of active servers running in this context – particularly their status and capabilities. This can be queried through the API and the web front end to Rapture Runner uses this to display the current status:

Rapture Runner Status

Summary

Rapture Runner is an ideal tool for those environments where operations wishes to manage the state of Rapture applications through Rapture. Because the runner API is available through the Rapture API the configuration management of the environment can be delegated to other tools or managed through the web interface of Rapture Runner. As always with Rapture there is a great deal of optionality to allow for many different deployment and management approaches.


Execution Server

When designing a system on the Rapture platform there’s a lot that can be done using Reflex scripts, workflows containing in built feed logic or even client side applications that connect to Rapture “on demand” as they see fit. There is also another class of application which (a) needs to usually run as part of some higher level “process” or workflow and (b) cannot be directly embedded into the Rapture platform – i.e. it is not a Reflex script or a piece of Java code. The Rapture Execution Server and associated workflow plugins is a technique that can be used to bring these external “batch style” application fragments into a more controlled execution environment.

The Execution Server is a standalone Rapture application that reacts to specific workflow step initiation messages. These messages tell the Execution Server to kick off a process with a particular (and custom) command line, and if necessary the application can download Rapture data to a local set of folders so that the spawned application need not use the Rapture API at all to connect back to the system. Instead the application can load the data from files and folders and write output data to other files and folders. Once the process finishes the Execution Server can take that generated output and write it back to Rapture. In this way Execution Server can handle both applications that can simply use the Rapture API to interact with the system and applications that have no knowledge of the Rapture API and can simply interact with files.

Furthermore, the Execution Server can have restrictions on the paths that can be executed from it (to control rogue process execution) and can, if needed, download the “executable” from Rapture before the process is kicked off. This is especially useful for script based applications such as Python.

An example workflow

To show how a workflow is constructed that invokes Execution Server we will show a fragment of a workflow that calls a Python script. The workflow is normally constructed using the Rapture API calls or embodied in a feature (the subject of a future blog post), what is reproduced below is the lower level configuration that is the fundamental definition of this workflow:

{
  "workflowURI" : "workflow://idpResearch/workflows/TestModel",
  "semaphoreType" : "WORKFLOW_BASED",
  "semaphoreConfig" : "{\"maxAllowed\":1}",
  "steps" : [ {
    "name" : "setRunId",
    "executable" : "dp_java_invocable://execmgr.RunIdStep",
    "view" : {
      "MODE" : "#DATE"
    },
    "transitions" : [ {
      "name" : "",
      "targetStep" : "Step1_Configuration"
    } ]
  }, {
    "name" : "Step1_Configuration",
    "executable" : "dp_java_invocable://execmgr.ExecProcessStep",
    "view" : {
      "wrapperInfo" : "%{\n  \"appArg\" : \"blob://model.test/python/Test_Configuration_AM.py\",\n  \"appPath\" : \"/usr/bin/python\",\n  \"inputConfig\" : {\n  },\n  \"outputConfig\" : {\n    \"dir_output_pkl\" : \"*blob://model.test/output/${runId}/pkl/$$n$$e\"\n  }\n}"
    },
    "transitions" : [ {
      "name" : "",
      "targetStep" : "$RETURN:ok"
    }, {
      "name" : "error",
      "targetStep" : "$FAIL"
    } ]
  } ],
  "startStep" : "setRunId",
  "category" : "execServer",
  "view" : {
  },
  "defaultAppStatusNamePattern" : "%idpModel/${$__date_string}/test"
}

In this example we have two real workflow steps and both use the “java invocable” technique of defining the implementation of the step. This technique means that java code embedded in the application processing the workflow will be called. Right at the bottom of the definition, the “category” of the workflow is “execServer” and this instructs the workflow system that step messages by default will only be received by the Execution Server – and it is the Execution Server that contains the implementation of these steps.

RaptureExecutionServer

The first step calls the invocable method “setRunId”. This is simply a convenience method used in this workflow to define a common “context” for an instance of a workflow. It means that output information (such as status of this run) is segregated from other runs of the same workflow.

If the first step completes successfully it moves to “Step1_Configuration”. In the real workflow this example is taken from there are 9 other steps – each invoking applications in slightly different ways. The key to interaction with the Execution Server is with the view parameters. In a workflow the view parameters are passed to the implementation of the step. In most cases what is passed and the format of that information is quite specific to the step and in this case it is no different. The format of the “wrapperInfo” parameter defines:

(a) the application to execute – in this case a Python script stored in a Rapture blob repository
(b) the way to invoke this application – in this case use the python interpreter
(c) any input and output that Execution Server needs to either download before the process executes or after it has finished. In this case the contents of the folder “dir_output_pkl” will be uploaded back to Rapture as blobs in the location given.

And so when this step is executed by Execution Server it will perform the following tasks:

1. Setup an area in the local filesystem to store this temporary “run”.
2. Download the python script from Rapture to this area.
3. Download any input documents (there is none in this configuration)
4. Create a configuration file that defines where the input and output files are, including defining where “dir_output_pkl” actually is. The configuration file also contains information about how the process should connect back to Rapture.
5. Spawn the python process to kick of this script, passing the configuration file location as a parameter.
6. The python script runs. It can use the configuration file to determine where to place any output. In this case it will write a bunch of Python “pickle” files to the folder referenced by “dir_output_pkl”.
7. After the process finishes, Execution Server will look at every file written to dir_output_pkl and upload those files as blobs back to Rapture – into the location

blob://model.test/output/${runId}/pkl/$$n$$e

where $$n$$e is replaced by the name of the file (and its extension) and ${runId} is replaced with the unique runId of this workflow.
8. If all has worked correctly the step will return a successful result, which in this case means that the workflow engine will finish the workflow. (In a larger workflow it will usually move to the next step).

Summary

The Rapture Execution Server is a tool to help a workflow reach out to spawn a remote process to do work as part of a workflow. It provides a control framework around this process and runs it in a consistent way. It is one technique that can be used to integrate applications into the Rapture platform environment.


Pipeline and Workflow

This blog post covers two related areas in Rapture – Pipelines and Workflows. A pipeline in Rapture is an abstraction around a message queue to coordinate activity both within a Rapture platform and to external systems connected to Rapture through a queue. A pipeline uses common concepts such as an “exchange” and a “queue” to manage the various topologies needed. A workflow is a set of steps, which can be in the form of a decision tree, that can be executed in a distributed Rapture environment. Workflows use pipelines for coordination. Each step can be either a custom piece of Java code or a Reflex script, and context is collected and passed to each step as a workflow executes.

Pipeline

A pipeline in Rapture isolates the underlying implementation of a messaging system from the API used to interact with it.

Domain

At the top level of this isolation is the concept of an exchange “domain” – a named definition of an implementation of a messaging system. Rapture on startup as (in its configuration) the concept of a “standard” domain – one that Rapture uses internally for its messaging – but developers can use the pipeline API to create others.

The format of a domain is shown in the sample call to setup an exchange domain:

#pipeline.registerExchangeDomain("test", "EXCHANGE {} USING RABBITMQ {}");

The configuration has the keyword “EXCHANGE” followed by an implementation – in this case we are using RabbitMQ as the underlying messaging system.

Exchange

Within a domain there is a concept of an exchange – a common feature of message implementations, being a switching point for routing inbound messages on a queue to one or a set of outbound queues. The call to define an exchange takes a more complex data structure that defines the name of the exchange and its behavior. The behavior could be one of “DIRECT” (a direct connection between an inbound queue and a given outbound queue) or “FANOUT” (a message on an inbound queue is sent to all outbound queues). An example piece of code setting up an exchange is shown below:

       String domainName = "test";
       String exchangeName = "kernel";
       RaptureExchange exchange = new RaptureExchange();
       exchange.setName(exchangeName);
       exchange.setDomain(domain);
       exchange.setExchangeType(RaptureExchangeType.FANOUT);

       pipelineApi.registerPipelineExchange(exchangeName, exchange);

Tasks and categories

For internal message coordination on an exchange, Rapture has the concept of a “category” – a server registers itself as a listener on a category by binding to an exchange with that named category. In this way servers providing a common set of tasks can all share the same category and when tasks are published on that category each server (in the case of FANOUT) or one server (in the case of DIRECT) will receive the task to execute. This category concept is reused in the Workflow implementation described later.

A task that can be published to a category contains both a mime type and a specific set of mime types have very specific meanings within Rapture. The code below shows an example of using such a task publication:

       MimeReflexScript reflexScript = new MimeReflexScript();
        reflexScript.setReflexScript("println('Hello from the Reflex Script');");
        RapturePipelineTask task2 = new RapturePipelineTask();
        task2.addMimeObject(reflexScript);

        task2.setPriority(1);
        List categories = new ArrayList();
        categories.add("blue");
        task2.setCategoryList(categories);
        task2.setContentType("application/vnd.rapture.reflex.script");
        pipelineApi.publishPipelineMessage(exchangeName, task2);

In this example the mime task defines an actual embedded Reflex script that should be executed on a server that is a member of the category provided.

A developer can use their own tasks as long as the receiving server can understand the content. Rapture has a number of pre-built tasks such as running a Reflex script, sending a message on a queue or starting a workflow. These are simply formats for the message content and a part of Rapture that listens for these messages and acts upon them.

Workflows

Workflows in Rapture are constructs that define a set of tasks that need to be performed in some order. The workflow can branch and rejoin – the choice of branch to take at any given point can be determined by the return “state” of a task of the workflow.

Each task in a workflow can be one of two things – a Reflex script that can simply be run and whose return value determines what to do next in the workflow or an explicit piece of java code that is invoked to execute the task.

Switching from one task to another in a workflow is handled by Rapture sending a specific mime message on an internal pipeline. Any Rapture application that can take part in a workflow could potentially receive the message and execute the task, with Rapture handling the activity involved in choosing the next step to execute. Furthermore each task can be associated with a category and only Rapture servers that are members of that category are eligible to receive the message to execute the task. In this way custom Rapture servers containing custom Java code can be placed into a category with the guarantee that they are able to execute those tasks.

Workflows can contain state that is passed (and can be updated) by each task in the workflow. It’s a convenient way for context to be passed along the workflow as it runs. Finally workflows can be initiated using a specific API for that purpose or attached to an event (such as writing a document to a repository).

As a final example, consider the following workflow that performs some checking on a trading order:

Workflow

In this example via some means we have a trading order document saved in Rapture. Rapture automatically fires a data update event and in this case we have attached a workflow to that data update. The original document id and content of the document saved forms part of the context of this running workflow.

In our example the tasks in green (the early tasks) are simple Reflex scripts that are analyzing the content of the order, perhaps looking up other documents (limits by trader by asset class) and making a decision based on that information. Other scripts decide whether the order can be automatically sent to Bloomberg. The last step in the workflow (colored differently) is a specific piece of custom Java code (a service if you like) that formats the order and sends it to Bloomberg.

Workflows in Rapture can be simple and quick (running very often and in milliseconds) or perhaps more complex and slow (an overnight batch style job running for hours). Rapture installations have used them to perform complex data capture from remote sources, for running quantitive research processes and for discrete position and attribution updates in real time.

Finally for a complex system with many workflows running simultaneously there are APIs in Rapture to manage the activity and status of workflows and these are often pulled together into UIs for operational purposes. The screenshot below gives an example of one of the common displays in use within Incapture.

Screen Shot 2014-11-06 at 8.43.09 PM


Rapture API

In earlier posts in this Rapture Series we have shown the APIs for Document, Series, Blob and Sheet Repositories and how those APIs can be called from the Reflex scripting language. This post is intended to give some more “insider” information into how the Rapture APIs are structured and produced, and how Rapture developers can use the same technique to create their own APIs that have the same non-functional features.

API Positioning

The core Rapture kernel is in some ways simply the implementation of a set of APIs. Server side applications that embed the Rapture Kernel (such as the Schedule Server or Exec Server) use these APIs in an active way to provide higher level functionality as part of the Rapture “platform” but the main coordination of work is performed through the implementation of the APIs. The fundamental idea is that the API forms one of the lowest levels of the Rapture stack (implementation of repositories is the only lower part) and that applications layer themselves above this — there is no “undocumented” APIs used by applications and any Rapture developer can reproduce some of the higher level Rapture applications (such as the Dev Ops web front end) entirely by making calls of the Rapture API. This layering is shown in the diagram below:

APILayers

In this way a user of the API uses an API transport to communicate with a Rapture Kernel environment. In some cases – when Rapture is embedded in an application – the transport is completely internal and it is in effect a “null” transport. In other cases the transport is either an internal HTTP based RPC-style transport or something like Thrift.

API code generation

Internally in Rapture each API is defined in a file that conforms to a Rapture interface definition format. A part of the document repository api is reproduced below:


[ The Blob api is used to manipulate large opaque objects that do have names (displaynames) like other data but do not
have any insight to be gained from their contents from within Rapture. The RESTful API can be used to efficiently download
a blob as a stream (or upload it)]

api(Doc) {

    [This api call can be used to determine whether a given type exists in a given authority.]
    @entitle=/repo/list
    @public Boolean doesDocumentExist(String documentURI);

    [Retrieve the meta data associated with a document - the meta data includes version and user information.]
    @entitle=/data/get/$f(docURI)
    @public DocumentWithMeta getMetaContent(String docURI);

    [Retrieve just the meta data for a given document.]
    @entitle=/data/list/$f(docURI)
    @public DocumentMetadata getMetaData(String docURI);

    [Revert this document back to the previous version, by taking the previous version and making a new version]
    @entitle=/data/write/$f(docURI)
    @public DocumentWithMeta revertDocument(String docURI);

    [Retrieve the content for a document.]
    @entitle=/data/read/$f(docURI)
    @public String getContent(String docURI);

    [Store a document in the Rapture system. The new URI name is returned; this allows parts of the new name to be automatically generated]
    @entitle=/data/write/$f(docURI)
    @public String putContent(String docURI, String content);

    [Attempt to put the content into the repository, but fail if the repository supports versioning and the current version of the
    document stored does not match the version passed. A version of zero implies that the document should not exist. The idea of
    this call is that a client can call getMetaContentP to retrieve an existing document, modify it, and save the content back,
    using the version number in the meta data of the document. If another client has modified the data since it was loaded this
    call will return false, indicating that the save was not possible.]
    @entitle=/data/write/$f(docURI)
    @public Boolean putContentWithVersion(String docURI, String content, int currentVersion);

    [Remove a document from the system. Note that the actual implementation is dependent on the repository;
    the document may simply be tagged as deleted rather than permanently removed.]
    @entitle=/data/write/$f(docURI)
    @public Boolean deleteContent(String docURI);

    [Return a list of full display names of the paths below this one. Ideally optimized depending on the repo.]
    @entitle=/data/read/$f(docURI)
    @public List(RaptureFolderInfo) getChildren(String docURI);
}

There are some key parts to the definition of an API that are important to call attention to. The API definition includes some documentation – this will be reflected in comments in any generated code or in any generated documentation. The entitlement section is important in that it defines the “entitlement path” that a user must possess to invoke this API call. In some cases the entitlement definition includes a reference to a parameter – the docURI in these cases. So the actual entitlement path needed by a user can be varied based on what parameter (document) they are interacting with. Entitlements are the subject of a future blog post but at this point it’s important to realize that entitlements are checked at the API call level and can be made very broad or very specific depending on the use required of the platform.

These api input files are then used as part of a general build process by a Rapture tool called “APIGen”. This tool parses the input files and generates a set of implementations of the API (transport and client side bindings) for a number of languages – Java, C#, Python, Ruby, Go and Javascript. The production is template driven and is straightforward to extend.

APIGen

This same process generates the implementations of stubs used by the Rapture Kernel internally for the main forms of API transport.

The same process can be used by a developer to generate and embed “SDK APIs” – custom APIs defined by a developer but embedded within Rapture, with the same client side language support.

Rapture API areas

We’ve already seen some of the areas of the Rapture API in previous blog posts. Here we’ll simply list the areas with a brief description, with the details behind the APIs covered within this blog series.
Admin
The admin api manages Rapture from a fundamental level. Deleting and creating users, retrieving properties of the platform as a whole, managing the message of the day and so on.
Async
The async API is used as a means to “fire and forget” certain long running activities. If you choose to remember again the API gives you a handle to query the status of an asynchronous job.
Audit
The audit API is used to both write and read audit records.
Blob
The blob API is used to manipulate blobs and their repositories.
Bootstrap
The bootstrap API is used to define the initial document based repositories that store the configuration information for a Rapture environment (users, repositories, entitlements, sessions, etc.)
Decision
The decision API is used to define and manage workflows in Rapture.
Doc
The doc API is used to manipulate documents and their repositories.
Entitlements
The entitlements API is used to manage how users are members of entitlement groups and how entitlement groups map to entitlements in the system.
Environment
The environment API gives the caller the ability to find out about the environment in which Rapture is hosted – license information, server lists and status.
Event
The event API is used to drive an Event Driven Architectural approach. Callers can associate scripts, workflows or messages to named events which are then activated (fired) using a different call. Internally Rapture defines some system events (such as updating data) that can be hooked into.
Feature
The feature API is used to install features in Rapture. Features are collections of scripts, repository definitions and content, events and so on that can be considered to be a single well defined installable concept.
Fountain
A fountain is simply a means to autogenerate ids (such as “order ids”). The fountain api gives access to this ability.
Index
The index API is used to manage indices on document repositories.
Lock
The lock API is used to coordinate complex distributed tasks.
Notification
The notification API is used to separate the publishing of an internal notification on a channel with the receiver or subscriber of that information.
Pipeline
The pipeline API is used to manage message queues used both internally by Rapture and as a means for connecting external environments into and out of Rapture through messaging.
Relationship
The relationship API manages the directed graph repositories used to manage relationships and the ability to query and publish the relationship between entities in Rapture.
Runner
The runner API is used to manage the services and systems that form a Rapture environment.
Schedule
The schedule API manages schedules (similar to cron) that invoke services or scripts in a Rapture environment.
Script
The script API manages scripts (Reflex scripts) and their execution.
Series
The series API manages series repositories and their content.
Sheet
The sheet API manages sheet repositories and their content.
Sys
The sys API is used to manipulate raw data within Rapture’s configuration – bypassing the more specific APIs for that purpose.
User
The user API is used by users to manage their account and presence on Rapture.

Summary

The Rapture API is pretty wide with calls protected using entitlements. An automatic and repeatable process is used to convert API definitions into code that connects clients and the Rapture server and this process can be used by developers for their own APIs.


Reflex Scripting

Many tasks run in or against a Rapture environment can involve some relatively simple data manipulation – but manipulation that could require some reasonably sophisticated logic. You can execute these tasks by writing code in a more traditional language and compile and deploy that code to the Rapture platform as a fully fledged server application or as a step in a workflow. (Both of these techniques will be discussed in a future blog post). Rapture also has a scripting language – called Reflex – which can be used as another tool to achieve the same ends. This post gives a brief overview to Reflex and how it is used.

Reflex was intended to be a simple language that was very specific to the task in hand – manipulating data in Rapture. As such it tends to be more procedural than “object oriented” and has a very loose type system, using coercion wherever possible to “do the right thing”. Some more modern constructs have been created when they help with the specific task – for example there are some functional implementations such as map/filter/fold as they are good techniques for manipulating data.

A simple example

In this example we imagine we have a document repository “test” with a document in it called “//test/input/one”. The contents of this document are reproduced below:

{
   "value" : 21,
   "name" : "Test",
   "inScope" : true,
   "description" : "A piece of test data"
}

In our example we want to write a script that will take this document, multiply the value field by 2 and then write out the changed document to another location. We will only do this for documents that have a true value in the field “inScope”.

In our first use of Reflex we will use native Rapture API calls:


def workWith(inputUri)
   content = #doc.getContent(inputUri);
   contentAsMap = fromjson(content);
   if contentAsMap['inScope'] do
      newUri = replace(inputUri, 'input','output');
      contentAsMap['value'] = contentAsMap['value'] * 2;
      println("Would write ${contentAsMap} to ${newUri}");
      #doc.putContent(newUri, json(contentAsMap));
   end
end

workWith('//test/input/one');

Going through this example line by line we see that we first define a function “workWith” that we call in line 12 with our input URI. This gives us the ability to call workWith with many different URIs if we see fit. Within the function we first (line 2) retrieve the content from Rapture given the document’s URI. The content returned by the getContent call is a simple text string, which we convert to a Reflex “map” with the built-in “fromjson” call. Once we have a map we can use simple Reflex indexing to retrieve values. In line 4 we test for whether “inScope” is set. If it is we compute a new place to save our new document (by simply replacing the text “input” with “output” and multiply the value field by 2.

In line 7 we print out what we are about to do – using Reflex’s substitution ability for strings (so that the value of contentAsMap and newUri are printed out) and then finally in line 8 we put the content back.

The output from running this Reflex script would be:

Would write {value=42, inScope=true, name=hello} to //test/output/one

In fact Reflex has a simpler technique to retrieving and saving content when it is formatted as json. The operators <-- and --> can be used as replacements to the getContent/json and fromjson/putContent calls. So the more idiomatic Reflex script for performing this task would be:

def workWith(inputUri)
   content <-- inputUri;
   if content['inScope'] do
      newUri = replace(inputUri, 'input','output');
      content['value'] = content['value'] * 2;
      println("Would write ${content} to ${newUri}");
      content --> newUri;
   end
end

workWith('//test/input/one');

Reflex Containers

The above Reflex script was actually run using a command line tool “ReflexRunner” which can be invoked from a desktop. We could have also run the script on a Rapture web server or via an API call (either with the script stored in Rapture or passed as a parameter). Reflex runs inside the concept of a container and this section will describe what we mean by that.

When a Reflex environment is used for the execution of a script it runs in the concept of a container that has a series of hooks to underlying implementation of that Reflex functionality. For example, in the script above we were loading and saving data from Rapture – one of the hooks we have to wire up in a container is the mapping between making an API call in Reflex and how that API call is implemented. In a desktop based container the “wiring” uses the Rapture client side API to connect to a remote Rapture environment. When running on a Rapture instance directly the API is wired direct.

The diagram below shows the hooks that form part of this container architecture for Reflex:

ReflexContainer

In this diagram we have a script running in a container with the links to the wired interfaces. The interfaces in sky blue are usually always implemented in a container.

Each of the hooks is described briefly below with some examples of how they are wired up depending on the container location:

API Handler

The API handler is the link to the connected Rapture API (the use of #doc.putContent in the above example). In a desktop this will use the client Rapture API, within Rapture itself this will be bound to the Rapture kernel.

Data Handler

This handler is used to implement the push and pull operators (–> and <--). The implementation is usually simply using the API handler, though this hook allows a more efficient implementation to be used if available.

Debug Handler

This handler (when implemented) gives the hook the ability to pause and inspect the state of a Reflex script. The handler is only usually implemented when user interaction is available but because the debug hook is called for each statement (and partial statement) executed a “debug handler” is also available that records performance statistics for the execution of a Reflex script.

IO Handler

Certain Reflex built-in functions involve manipulating “files” and “folders”. The binding of these concepts to an environment is the responsibility of this handler. On the server this is usually a null implementation (you cannot access files on a Rapture server). For desktop execution this is bound to the underlying operation system file structure.

Script Handler

Reflex scripts can “include” or “import” other Reflex scripts. This handler is used to work out how to retrieve those referenced scripts and often delegates to the API handler.

Output Handler

Built-in Reflex functions such as “print” have the default behavior of printing out the arguments onto the stdout stream. On a server this may not be the best approach – particularly if you want a calling application to see the output. The output handler implementation can determine the best way of handling printed information.

Input Handler

The input handler is the opposite of the output handler – it determines how the built in functions of Reflex that are responsible for getting data from a “user” are implemented.

Cache Handler

The cache functions of Reflex can help performance in that data can be stored in a cache for later retrieval – with the assumption that the cache entry may not necessarily exist if a follow on script is run on a completely different (and disconnected) Rapture server or desktop. Usually the cache handler is implemented as a simple time and space limited mapping store.

Specific Reflex Containers

Within the Rapture platform there are a number of pre-built Reflex containers – which can be extended by a developer by implementing or extending the handlers described above.

Desktop execution

There are two desktop Reflex containers – the “ReflexRunner” application which takes as a parameter a file containing a Reflex script and the connection parameters to a Rapture environment, and a widget in the Rapture “Vienna” environment (a thick client graphical UI environment which will be the subject of a future post).

Server execution

On a server there are a number of different containers, depending on need.

The first is the container used when the API call #script.runScript is executed. The API is bound to the kernel, there is no debugging or input or file io and any output is captured and stored in the system log of the server.

Next is a variant of that container when #script.runScriptExtended is executed. RunScriptExtended returns the output from the script after execution and so the output handler is bound to a different implementation.

Finally there is the container that is used when executing a script as part of a workflow (workflows are the subject of a future post). A workflow has the concept of a workflow “context” and the script has access natively to this context as variables pre-injected into the Reflex container. Also the return value of the script is captured and use to drive the future execution or direction of the workflow.

Language Overview

We’ve seen how a simple script is created and we’ve also seen the types of container a Reflex script can run in. We’ll now have a quick look at some of the features of the Reflex language. The post cannot describe every feature but there is a Reflex manual that can be requested from Incapture that contains that information.

General structure

Reflex has variables – capable of storing a number of different types. The type of a variable is inferred by its content. Reflex understands booleans, strings, numbers, lists, maps, sparse matrices, “files” (or streams) and native Java objects. So all of the following is valid syntax in Reflex:

x = true; // boolean
y = 1;  // number
z = 'a string'; // string
a = "another string"; //string
m = {};  // map
m['value'] = 4;
l = [ 1, 2, 3, 4 ];  // a list
l2 = [ x, y, z, a, m, l]; // a list containing different types

Reflex has simple operators on these values. The “index” operator is shown above for maps, the same structure with a number can be used to access members of a list. Other operators are the usual standard set with the usual precedence rules:

notx = !x;
twoY = y * 2;
letterA = z[0];
theFirstElement = l[0];
y /= 2;
y *= 2;
y = y + 10;
newY = (y * y) + (twoY * twoY);

Reflex has simple logic and loop constructs for controlling flow. The usual “if” “while” “for”. Reflex can enclose blocks with either “do” / “end” pairs or “{” “}”.

for x = 0 to 10 do
    if x % 2 == 0 do
        println("{x} is even");
    end
end

Procedures

Procedures can be defined using the syntax shown in the example at the start of this post. The name of a procedure can be used as a first class object to some of the more functional methods.

def twoTimes(x)
   return x * 2;
end

println(twoTimes(2));

Pull and Push

Reflex can retrieve and place data into a Rapture environment using a simple set of operators:

data = {};
data['one'] = 1;
data --> '//myrepo/data/1';
dataBack <-- '//myrepo/data/1';
assert(data['one'] == 1);

Files and IO

When supported by the container Reflex can access the underlying IO/File system of the environment:

x = file('myfile.csv');
for y in x do
   println("Line ${y}");
end

The Rapture API

The Rapture API is exposed to Reflex by using the # symbol followed by the API area and then the function within that API. Parameters to the API are cooerced to the types needed by the implementation of the API. If the return value is a complex object it can either be inspected directly (using a dotted method notation) or converted to a map using the “json” built-in.

x = #doc.getContent("//myrepo/data/1");

Built in functions

Reflex has many built-in functions that are fully defined in the reference manual. In this post I’ll highlight a couple to give a sense for the type and range of functions:

date
The date built-in returns a Reflex “date” object. If passed zero parameters the date will be “now”, otherwise the date will be the result of parsing the argument passed. Dates can be manipulated using simple arithmetic and can also respect calendars such as a business week.
keys
When working with maps it’s often useful to enumerate over the keys of the map, or at least check to see if a given value is in the keys. This built-in returns the keys of a map.
size
The size of a variable depends on its type. For a list it’s the number of elements in the list. For a string it’s the length. For a map it’s the size of its keys.
unique
Unique returns the elements of a list that are unique – removing duplicates.
cast
The cast built-in can be used to coerce the type of a variable manually. E.g.

x = 1;
y = cast(x, 'string');
z = cast(y, 'number');
// y will be '1', z will be 1

split
The split command takes a string and splits on a certain character (e.g. a comma). An additional parameter helps with splitting content that could contain quoted strings.

A working example

We’ve seen a simple Reflex script, some ideas about the logic and some examples of built-in functions. Finally we’ll look at a real world Reflex script that is used in a working system to parse a file passed by a remote system into more structured documents in Rapture. Some of the code has been modified to protect confidentiality.

// Create positions and asset information and pricing from a blob
//

require '//common/util' as common;

const DATE = common.getToday();
//const BLOB = "blob://idp.external.report/client/${DATE}/eod";
const BLOB = '/Users/amkimian/Mock_SPOS_INCAPTURE_D_20141028.csv';

posDate = DATE;
fund = 'ALPHACAPTURE';

def normalizeSymbol(val) 
  if val == null do
     return val;
  end
  return urlencode(val);
end

def maybeNumber(val)
  try do 
     return cast(val, 'number');
  end
  catch e do
  end
  return val;
end

f = file(BLOB);
first = true;
for line in f do
  vals = split(line, ',', true);
  if first do
    ks = vals;
    first = false;
  else do
    pos = 0;
    doc = {};
    for x in vals do
      if size(x) > 0 do
        doc[ks[pos]] = maybeNumber(x);
      end
      pos = pos+1;
    end
    strategy = doc['Standard Strategy'];
    substrat = doc['Strategy'];
    asset = normalizeSymbol(doc['Symbol']);
    path = "//idp.eod.client.pos/${posDate}/${fund}/${strategy}/${substrat}/${asset}";
    added = {};
    added['fund'] = fund;
    added['date'] = posDate;
    added['strategy'] = strategy;
    added['substrat'] = substrat;
    added['asset'] = asset;
    doc['enrich'] = added;
    println(path);
    doc --> path;
  end
end

In this real world example we’re taking a file (actually stored as a blob in Rapture) which is structured as a CSV file. Each row of the file is converted into a document that is stored in Rapture – with the first line used to define the keys of each document.

Most of this example should be pretty readable with a number of notable additions we haven’t covered in this post:

The “require” directive is loading another script from Rapture and putting it in the namespace “common”. This is then used to call the “getToday” function in that common code (it returns the current business date).

The “normalizeSymbol” procedure calls the built-in “urlencode”. This helper function ensures that the string used for the symbol contains only characters that are valid for a URI in Rapture.

The “maybeNumber” procedure uses exceptions as a lazy method of seeing whether a field in the CSV file is a number or not. It tries to “cast” the value to a number. If it works that is returned. Reflex has an exception construct (try/catch/finally).

Summary

The Reflex scripting language in Rapture is simple but has reasonable depth for advanced users. This blog post has scratched the surface with some examples of use and some context around how Reflex is used and deployed. Within Rapture, Reflex scripts can be stored and executed using the Rapture API (running on a container in the server) or executed locally with a long-wired connection to a Rapture environment. Reflex scripts tend to be used for small tasks in workflows, for housekeeping tasks or as the back-end of a web server based application. For more detailed information about Reflex feel free to contact us.


Rapture Sheet Repositories

This is part of the Rapture Series of posts – providing a general overview of the features of Rapture. In this article we describe the sheet repository type of Rapture which is used to store spreadsheet styled data indexed by a unique text key (a uri).

What is a repository?
In Rapture a repository is a place to store information. A repository has a name (unique in a given Rapture instance) and an underlying implementation. The idea is that application developers interact with a repository using a consistent API and Rapture takes care of the details of how to manage the information in the underlying implementation. The implementation in this case refers to a database system and the systems supported currently cover a wide range of technologies from traditional relational databases to the newer “distributed key-value stores”. A list of the technologies currently supported is provided later in this post.

A quick example
Before diving into the details it is worth giving a preview to help set the stage. Although Rapture is a platform it does have an operations web interface that can be used (amongst other things) to browse the data stored in the environment.

In the screen capture below you see a typical view of some of the sheet repositories in a Rapture environment.

Screen Shot 2014-10-27 at 12.11.42 PM

A Rapture environment can have many sheet repositories and the data is usually divided by purpose.

A sheet repository holds “sheets” and in the web ui if we select one of the sheets we can view its contents:

Screen Shot 2014-10-27 at 12.13.54 PM

In this case the sheet is used as a structured configuration document to help with the loading of data from Bloomberg.

Using the API
The Rapture API is the consistent way of interacting with the platform. The API is split into different sections and the section for document repositories is called “sheet”. Using the API in a general sense is the subject of a later post but it is worth giving some general observations about how we designed the API for use.

The Rapture API is available in a number of different places. For client applications (programs that are not hosted within Rapture) the client API is used. This API is connected to a Rapture environment through the login section of the API and once connected the application can use the other API sections to interact with the system. The client API is available for Java, Javascript (for browsers), .NET, Python, Ruby, Go and VBA. Although the syntax varies slightly the meaning and use of each of the API calls is consistent.

As an example, the code in Java to retrieve the data in the sheet in the screen above would be something like:

            HttpLoginApi loginApi = new HttpLoginApi("http://rapture", new SimpleCredentialsProvider(user, password));
            loginApi.login();
            ScriptClient sc = new ScriptClient(loginApi);
            RaptureSheetStatus content = sc.getSheet().getAllCells("//datacapture/input/strategies/COMLS.sheet",0,0);

Repository Implementation
Before putting data into a repository it needs to be created or registered within the environment. There is of course an API call to do this – it takes the name of the repository and its configuration:

#sheet.createSheetRepo("//test.sheet", [a config string]);

The configuration string defines two things – the underlying implementation technology and the general feature set supported (whether the repository is versioned for instance).

The format of the configuration string is as follows:

SHEET { [configuration] } USING [implementation] { [configuration] }

Describing all of the options available is beyond the scope of this post but some examples will help clarify the syntax:

SHEET {} USING MONGODB {}     // A sheet repository using MONGODB
SHEET {} USING MONGODB { prefix="testdoc" }  // A sheet repository using MONGODB on a specific collection
SHEET {} USING CASSANDRA { prefix="test" } // A sheet repository using Cassandra as a backing store

Sheet structure
A sheet in Rapture is mainly used to store row/column style data. The API is specifically tuned for “poking” data into a cell given a row and column coordinate. The content stored is actually an aspect of the third “dimension” of the data in the sheet and this dimension is used throughout the API (zero being the “content”). A value of one in the dimension field is about storing the format that can be used to display the data. Typically this is used when a sheet is to be displayed in a much richer display environment or when exporting a sheet to a report such as a PDF.

For efficient UI representation a sheet also maintains the concept of a “current sheet version” or “epoch”. Each change to a sheet (adding, modifying and removing cells usually) causes an epoch number to increment and each modification in a sheet is tied to its epoch number. Whenever a caller retrieves data from a sheet they also receive the latest epoch number – and they can pass that number in to each call. In this way a “long polling” technique can be used to retrieve data from a sheet. In the first call the UI retrieves all of the data and the latest version number. In subsequent calls the request is for “all changes since the version I’ve already received” – in most cases this will be a set of cells that have changed, and an efficient UI can simply update those elements that have changed, with optional transient highlighting or flashing of the change.

Finally the sheet API also allows the developer to attach Reflex scripts to a sheet – when executing a script attached to a sheet the “current sheet” is automatically injected into the script environment as the variable “s” so that manipulation of data on the sheet is easier syntactically. This is currently a little used feature – it is often used in demonstrations rather in production code.

A tour of the API
The Sheet API section of Rapture is used to interact with sheet repositories. We’ve seen a getAllCells call and a create repository call. The API is rounded out with calls to (a) manage sheet repositories (create, destroy, modify), (b) manage sheets in a repository (create, destroy, update formatting and scripts) and (c) manage the sheet data (put cells and retrieve cells). Each API call is controlled by entitlements so an administrator can configure who can perform each of these tasks.

Although a blog post isn’t the place to describe these calls in general it is worth listing them out so the breadth of coverage implied in the sheet api set can be appreciated. The goal in Rapture is to have a very open API that can be used to manipulate all aspects of the system and this is reflected in number of calls available. Entitlements are used to ensure unintended consequences or unapproved calls being made. A typical user level application would use only a small subset of these calls.

   Boolean createSheetRepo(String sheetURI, String config);
   // Get sheet repository config metadata
   SheetRepoConfig getSheetRepoConfig(String sheetURI);
   // Get repository config metadata for all sheets]
   List(SheetRepoConfig) getAllSheetRepoConfigs();
   // Get the hierarchy of name of sheets and sheet repositories
   List(RaptureFolderInfo) getChildren(String sheetURI);
   // This method removes a Sheet Repository and its data from the Rapture system. There is no undo.
   Boolean deleteSheetRepo(String repoURI);
   // This api call can be used to determine whether a given type exists in a given authority.
   Boolean doesSheetRepoExist(String repoURI);

For managing sheets within a repository:

   // Create a sheet, initially empty. If the sheet exists it is unaffected.
   RaptureSheet createSheet(String sheetURI);
   // Remove a sheet
   RaptureSheet deleteSheet(String sheetURI);
   // Does a sheet already exist
   Boolean doesSheetExist(String sheetURI);
   // Render this sheet to this blob (as a PDF)
   Boolean renderSheet(String sheetURI, String blobURI);
   // Get all of the formatting styles in this sheet
   List(RaptureSheetStyle) getAllStyles(String sheetURI);
   // Remove a named style
   Boolean removeStyle(String sheetURI, String styleName);
   // Create a named style
   RaptureSheetStyle createStyle(String sheetURI, String styleName, RaptureSheetStyle style);
   // Get a list of all of the scripts on the sheet
   List(RaptureSheetScript) getAllScripts(String sheetURI);
   // Remove a script
   Boolean removeScript(String sheetURI, String scriptName);
   // Create a script
   RaptureSheetScript createScript(String sheetURI, String scriptName, RaptureSheetScript script);
   // Run a script on a sheet
   Boolean runScriptOnSheet(String sheetURI, String scriptName);
   // Get the script associated with a sheet
   RaptureSheetScript getSheetScript(String sheetURI, String scriptName);
   // Get all of the ranges in this sheet
   List(RaptureSheetRange) getAllRanges(String sheetURI);
   // Remove a range from a sheet
   Boolean removeRange(String sheetURI, String rangeName);
   // Create a range in this sheet
   RaptureSheetRange createRange(String sheetURI, String rangeName, RaptureSheetRange range);
   // Get all of the notes (comments)in this sheet
   List(RaptureSheetNote) getAllNotes(String sheetURI);
   // Remove a note from a sheet
   Boolean removeNote(String sheetURI, String noteId);
   // Create a note on this sheet
   RaptureSheetNote createNote(String sheetURI, RaptureSheetNote note);
   // Make a copy of a sheet (usually the source is a "template")
   Boolean cloneSheet(String sheetURI, String newSheetURI);

For managing data in a sheet:

   // Set the data for a single cell
   String setSheetCell(String sheetURI, int row, int column, String value, int dimension);
   // Set a group of cells all in one go
   Boolean setBulkSheetCell(String sheetURI, int startRow, int startColumn, List(List(String)) values, int dimension);
   // Set a rectangular block of data
   Boolean setBlock(String sheetURI, int startRow, int startColumn, List(String) values, int height, int width, int dimension);
   // Get the value of data at a given cell
   String getSheetCell(String sheetURI, int row, int column, int dimension);
   // Set data given the name of a cell instead of its coordinates
   String setNamedSheetCell(String sheetURI, String rangeName, String value, int dimension);
   // Retrieve a named data cell
   String getNamedSheetCell(String sheetURI, String rangeName, int dimension);
   // Get all cells that have changed since a modification version (set epoch to 0 to get all data)
   RaptureSheetStatus getAllCells(String sheetURI, int dimension, Long epoch);
   // Get a sheet in a non-sparse format
   RaptureSheetDisplayForm getSheetAsDisplay(String sheetURI);
   // Get a subset of the sheet
   List(RaptureSheetRow) getSheetRangeByName(String sheetURI, String rangeName);
   // Get a subset of the sheet by coordinates
   List(RaptureSheetRow) getSheetRangeByCoords(String sheetURI, int startRow, int startColumn, int endRow, int endColumn);
   // Delete a whole column
   Boolean deleteColumn(String sheetURI, int column);
   // Delete a whole row
   Boolean deleteRow(String sheetURI, int row);
   // Delete a cell
   Boolean deleteCell(String sheetURI, int row, int column, int dimension);

Uses of Sheet Repositories
Sheet Repositories in Rapture are mainly used for configuration documents or as a staging point before rendering a report.

Behind the scenes Rapture for Financial Services uses sheet repositories to configure features such as data capture.

Summary
In summary Rapture sheet repositories give a separation of responsibility between the application developer (putting and getting content) and the underlying operational concerns (which database vendor to use, how to configure and manage that). This separation allows changes to be made beneath an application without changing that application in any way. A sheet is also a good metaphor for some data sets and that makes it easier to build configuration and management screens for some applications.

In the next post we will talk about Reflex Scripting in much more detail.



Subscribe for updates