Incapture Technologies

Rapture Architecture

Incapture Technologies Blog

 

Rapture Architecture

Rapture Platform Overview

Introduction

The server-based Rapture platform is designed for creating and handling distributed applications in a cloud environment.

Unlike other cloud systems, Rapture provides the power for handling very large sets of data and using them to provide actionable intelligence quickly. Rapture handles “back of the house” activity, so that clients can develop their own applications for enabling their own business decisions.

Rapture Environment

The diagram below illustrates the subsystems associated with an instance of Rapture.

Kernel1

The Data block in this figure refers to repositories that are shielded from applications that use Rapture. These applications access the data only through the Rapture API.

Subsystems

The most fundamental part of Rapture consists of events and the scripts that work with them. An event typically invokes a standard or user-defined script when it is triggered. As shown in the diagram below, events can be pipelined and stored for asynchronous handling, or they can be triggered immediately. At the other end of the event chain, a trigger can be tied to a schedule, as shown in Figure 1, or it can be an “event hook” associated with a specific activity within Rapture, such as when a particular block of data is saved.

On a higher level, Rapture has functionality to maintain an audit trail for all of its activities, along with a data set for user-based privileges, called entitlements. A specific entitlement is associated with a group of one or more users (the “entitlement group”), and each group is entitled to make calls to a predefined list of functions within the Rapture API.

Rapture also maintains notification channels to enable instantaneous messaging among users, among programs, or between users and programs; for example, to alert the user to a required update.

Finally, Rapture supports a Decision Process feature, which allows developers to create complex workflows for multi-step processes that might require authorization from other users to proceed.

Each of these subsystems is described in more detail later in this article.

Pre-built server processes

The Rapture platform uses a variety of pre-built server processes to put the Rapture kernel into an execution mode.

Of these server processes, the most commonly used is the Rapture API Server, which provides an HTTP transport layer to client applications outside Rapture so that they are able to access the Rapture API. In addition, the Rapture API Server provides a configuration context for binding Rapture to underlying data repositories and messaging systems.

Other important pre-built server processes include the Schedule Server, which coordinates the event schedule entries and the invoking of referenced tasks; the Compute Server, which is simply an execution container for Reflex scripts; and Rapture Runner, which can start or stop Rapture processes with identical configurations on a given server. In particular, Rapture Runner is used for invoking the Rapture API Server, the Schedule Server, and duplicate instances of Rapture Runner, in addition to other processes.

Data

One of the key features of the Rapture platform is its ability to manage data securely. All data in Rapture is classified as either managed or presented.

Managed data is accessible only through Rapture and can never be modified by client applications. This feature gives Rapture full control over how the data is stored, whether it is versioned, and what details to include in a change log.

Presented data can be accessed and modified outside Rapture, but it is usually read-only to Rapture applications. Using presented data is extremely helpful for data migration and allows for very easy setup within Rapture. Presented data can also be useful for secondary purposes, such as supplemental reports. However, the platform features associated with managed data cannot be used.

Data classes

The Rapture platform defines four classes of data, and each of its data repositories can be associated with only one of these classes. This model helps to ensure a systematic way to reference all data through URIs, as discussed in the next section. The four data classes are: document, series, blob, and sheet.

A document is a piece of text that is usually formatted as a JSON structure. Because of JSON’s portability and ease of use, the document class is widely used in Rapture applications. The Rapture platform also uses documents to describe the internal configuration of its own core system.

A series is a keyed list of data points. In most applications, the key is a timestamp, and therefore the series represents how the data changes with time. The data points within a series can be simple types, such as numbers, or they can be data structures.

When a series is stored in a Rapture repository, its keys are aligned so that data can be extracted and analyzed more efficiently, as shown in Figure 2. In this example, each horizontal row represents a different series of prices, but they are all aligned vertically with a keyed timestamp. Other sets of series could be aligned with, for instance, a column name as the key.

Prices

The sheet data class is modeled after spreadsheets. All data within a sheet is stored in a cell structure, with each cell having a row and column coordinate. Cells can also contain data for formatting and running calculations on the contents, so that the sheet can be presented in a user interface without requiring any format translation.

A blob in Rapture is simply binary data without any other classifications. It is useful for storing items such as images or PDF data.

Data repositories

Each data repository in Rapture has a unique name, a single class, and a single implementation. The implementation defines the underlying technology that manages the data, such as Oracle or Cassandra. Client applications do not need the implementation metadata; instead, they rely on URI references, as explained below.

Rapture uses a strict system for naming URIs, in which the end of the URI string always concludes with authority/id. The authority identifies the repository, and the id is a pointer to a specific entity within the repository. Depending on the class of data, the ID can be a simple string, or it can contain sub-parts. For example, the ID of an item in a series repository might be prices/pork/04092014, where prices is the name of the authority and pork/04092014 is the ID for a specific item within one series.

Pipelining

When an activity takes place within Rapture, the platform uses a pipelining technique to determine which server is most appropriate for running the activity. Pipelining allows Rapture to dynamically scale the numbers and categories of servers as the load profile varies over time.

Pipeline task structure

Field Name Description
Priority The priority of the task (lower is more important)
Category A list of categories for this task (for routing)
Content Type Determines how the message will be executed
Content Depends on Content Type above
Context The user context under which the task will be executed
Submission Time When the task was added to the pipeline

All Rapture platforms have at least one kernel pipeline for handling transport activity. However, there can easily be many different pipelines within the same platform, each of which can be configured to a different, specific messaging transport. The configuration of a pipeline is stored in a system document repository for access by the Rapture kernel. Physically, pipelines are hosted on a system designed to support message queues, such as RabbitMQ or MQSeries

Events

The primary function of an event, once it’s triggered, is to begin the execution of one or more scripts, which are generally Reflex scripts.

Event hooks

Events can be explicitly scheduled, or they can be connected to actions that happen at various “hook points” in the system. In most cases, the script references are connected to the hook asynchronously. These scripts are simply queued into the kernel pipeline to be run as early as possible. The context of the event is also passed as a parameter to the script.

In addition to – or instead of – asynchronous scripts, an event can be associated with a single inline script reference. Inline scripts are run immediately, as soon as the event occurs. They function as the equivalent of an interrupt processing routine, so the system user must take care to keep the execution as brief and efficient as possible. It is also important to avoid further calls that would trigger other events while the script is running.

As a shortcut, an event can also directly invoke a workflow, instead of requiring a developer to write a one-line script that starts the workflow.

Events

Event flows

The diagram below shows the steps that take place when an event is invoked.

EventFlows

Note that event information and event hooks are assigned to an arbitrary set of named events, located in the event registry. The purpose of the registry is to retain two distinct sets of information about an event: what happens when it’s triggered, and what triggers it in the first place. This feature allows users to define custom events in addition to Rapture’s built-in system events. The Rapture API contains calls to trigger user-defined events only.

Scripting

The Rapture platform includes a unique procedural scripting language called Reflex, which hooks into the Java Virtual Machine that Rapture uses.

NOTE: Reflex was designed to streamline many of the common tasks in Rapture, but it is not required for event-based scripts.

Reflex overview

For comprehensive information about the Reflex language, refer to Incapture’s Reflex Language Reference.

Reflex is similar to Python in structure, and is a procedural language in that it supports author-defined functions. (In addition, Reflex is not object-oriented.) It is packaged with an emulator called ReflexRunner, so that scripts can be tested before being run on Rapture.

Data types in Reflex are mostly intuitive, including string, number, list (equivalent to arrays), boolean, and map (which is used for key-value pairs).

Operators unique to Reflex

Reflex supports all the standard boolean, arithmetic, index ([]) and ternary (?) operators. It also uses the standard keywords for flow control, including if, while, and for.

In addition, there are four special operators: push, pull, metapush, and metapull; whose symbols, respectively, are –>, <–, –>>, and <

Simple Reflex Examples

In this scenario, suppose you have a repository named test.config, and you are using it to store map data. You would define the structure and push it as follows:

config = {};
// Write the map data
config[‘Option1’] = true;
config[‘level’] = 42;

// Create a document
displayName = ‘test.config/main’;

// Write the map to the document
config --> displayName;

Next, in a different script, you would pull in the map data and use it to control the script’s behavior:

appConfig <-- displayName;
if (appConfig[‘Option1’]) do
  println(“level is “ + appConfig[‘level’]);
else do
  println(“Option1 is not set.”);
end

The second example uses println, which is a built-in function in Reflex, to handle simple formatted printing. If all is well, the output from the second snippet of code would be:

level is 42

Decision Processes

A decision process works with a Decision Packet, which contains any data that needs to be associated with a given process. Such data includes status and progress information within the decision process and also can include references to documents, blobs, and audit logs.

The decision process itself is a directed graph of states, with one entry point and at least one terminal point. Each state has an associated script that determines whether the decision packet is ready to transition to the next state and, if there are multiple possibilities, which state it should transition to. There is also a script attached to the entry point, which executes before the process begins.

As a simple example of a decision process, a trade might require an authorization above a certain amount from the head of the trading desk. In this case, the decision packet would contain the trade information, and there would need to be three scripts associated with the decision process. One of these would check the amount, one would be run after the trade is approved, and one would be run after the trade is denied. If the trade amount exceeds the threshold, the decision process is halted, and the application using the decision process would need to run some additional code to notify both parties that an approval decision is needed.

For a more real-world example, consider a data capture scenario from an external source. The scripting steps in this process would be: (1) preparing the request, (2) connecting to the underlying system (for example, Bloomberg) and depositing the request, (3) waiting for request completion with a success or failure result, (4) downloading and processing the result if the request was successful, or (5) notifying the caller if the request failed.

The Rapture API contains calls for creating processes and packets, sending packets to processes, fetching and updating packet data, and registering approvals. This functionality makes scripting decision processes very straightforward.

Other subsystems

Schedule

The Schedule subsystem, as its name implies, instructs Rapture to run one or more scripts in a schedule entry at a specific time, sending them to the kernel pipeline.

Schedule entries are stored in a tree format to manage the execution order of the scripts over time. Figure 5 shows an example with references to five scripts. Script 1 and Script 2 are executed serially, and once Script 2 has completed, Script 3 and Script 4 are sent to the pipeline to be run in parallel. Script 5 is not executed until both Script 3 and Script 4 have completed successfully.

Schedule

Entitlement

Every API call has at least one entitlement path associated with it. The call will not be run unless the user belongs to a group that has the API’s entitlement path in its allow list. API calls can also have entitlement rules, which are wildcards that can generate multiple paths. Users are not assigned entitlement paths or rules directly; they can only be assigned to groups that have entitlements. When a user does not belong to a group that has access to an entitlement path, any attempt to make the corresponding API call will fail.

In practice, groups are often used for granting read-only access to some users and read-write access to others.

Audit

Audit functionality is available to Rapture system administrators. By default, there is a kernel audit trail for which the Rapture administrator can configure its storage and behavior. This audit trail is normally wired into the API invocation flow.

In addition, administrators can create their own audit trails, using the same API calls used by the kernel audit trail for generating its own records. In this case, either Reflex scripts or client applications would be used to generate the custom audit records.

Notification

In addition to the user-based notification channels discussed in the overview, the Rapture system maintains its own notification channels to report system-wide changes to the configuration or the structure of the Rapture platform. This feature allows for obsolete cached data structures to be purged before they can be erroneously used.

Extensions

Clients have access to several extensions to the Rapture platform.

Reflex plug-ins

Various plugins can allow Reflex scripts to be extended to local Java code or to remote services. These plugins need to be encapsulated in .jar files in the class path of the application that will run them.

The entry point to a plugin is a class that implements the reflex.importer.Module interface and is placed into the reflex.Module package. There are two possible ways to embed a function in Reflex: it can either be done with a keyhole call or by using reflection. The keyhole approach has better execution times, but reflection is more frequently used because the coding is less complex. Both overrides must be included, with one set to false and the other to true, as shown below:

   @Override
    public boolean handlesKeyhole() {
        return false;
    }

    @Override
    public boolean canUseReflection() {
        return true;
    }

In addition, there are three functions that pass configuration information to Rapture and need their own overrides. These functions are configure, setReflexHandler, and setReflexDebugger.

Finally, if keyhole calls are used, the class must override keyholeCall. Otherwise, reflection lets Reflex invoke any function that returns a ReflexValue type and that takes a parameter list of ReflexValues.

As an example, the following plugin code implements a class called ReflexMath, which for brevity implements only a cosine function.

package reflex.module;

import java.util.List;

import reflex.IReflexHandler;
import reflex.ReflexException;
import reflex.debug.IReflexDebugger;
import reflex.importer.Module;
import reflex.value.ReflexValue;
import reflex.value.internal.ReflexVoidValue;

public class ReflexMath implements Module {

    private double getDP(List<ReflexValue> params) {
        if (params.size() == 1) {
            if (params.get(0).isNumber()) {
                return params.get(0).asDouble();
            }
        }
        throw new ReflexException(-1, "Cannot retrieve numeric first argument in math call");
    }

    private double getDP(List<ReflexValue> params, int position) {

    @Override
    public ReflexValue keyholeCall(String name, List<ReflexValue> parameters) {
        return new ReflexVoidValue();
    }

    @Override
    public boolean handlesKeyhole() {
        return false;
    }

    @Override
    public boolean canUseReflection() {
        return true;
    }

    @Override
    public void configure(List<ReflexValue> parameters) {
    }

    @Override
    public void setReflexHandler(IReflexHandler handler) {
    }

    @Override
    public void setReflexDebugger(IReflexDebugger debugger) {

    }

    public ReflexValue cos(List<ReflexValue> params) {
        return new ReflexValue(Math.cos(getDP(params)));
    }
}

To use ReflexMath in another Reflex script, you could do something like this:

import ReflexMath as math;

println(“cos(0.1) = " + $math.cos(0.1));

Storage and messaging extensions

Copies of the pre-built storage and messaging implementations can be customized. Contact Incapture for additional details.

Custom APIs

Clients have access to the same tools that were used for creating the Rapture API. As a result, it is possible for clients to create their own high-level APIs with automatic client-side code generation for invoking calls.

For client applications in financial services, some common applications that could use the API include a Value at Risk (VaR) system for a broker-dealer; a system for order construction, programmed trading, and position management used in a hedge fund; or an equity research management system.

Beyond financial service, the Rapture platform offers many other possibilities for useful applications and big data management. For instance, a system for collecting multiple sensor readings from city locations, vehicles, or even microsurgical devices can be used with the Rapture platform to handle complex decisions in real time.


Rapture Blob Repositories

This is part of the Rapture Series of posts – providing a general overview of the features of Rapture. In this article we describe the blob repository type of Rapture which used to store large opaque data objects such as reports or images.

If you haven’t already read the post on Document Repositories I would encourage you to read that first : Document Repositories.

What is a repository?
In Rapture a repository is a place to store information. A repository has a name (unique in a given Rapture instance) and an underlying implementation. The idea is that application developers interact with a repository using a consistent API and Rapture takes care of the details of how to manage the information in the underlying implementation. The implementation in this case refers to a database system and the systems supported currently cover a wide range of technologies from traditional relational databases to the newer “distributed key-value stores”. A list of the technologies currently supported is provided later in this post.

Sample Blob views
Using the same web interface that we used when browsing Document and Series Repositories we can do the same for Blob Repositories.

In the screen capture below you see a typical view of some of the blob repositories in a Rapture environment.

Blob View

As you can see a Rapture environment typically has many blob repositories and the data is usually divided by purpose. If we expand out one of these repositories you see we have an implied hierarchy of information – the analog is to a file system with folders and files. In this case we have expanded out some data around countries and currencies.

When writing a blob to Rapture through the API you can specify a mime type for that data. A mime type simply describes the format of the content in a standard way. If the mime type is not given it is guessed from the “extension” of the name of the blob in Rapture.

The web interface can display some mime types – CSV files, PDF reports and the like. If we pick on one of these blobs we can see its contents – in the picture below we are showing the contents of the countries blob.

Screen Shot 2014-10-20 at 9.40.22 AM

 

Using the API

The Rapture API is the consistent way of interacting with the platform. The API is split into different sections and the section for document repositories is called “doc”. Using the API in a general sense is the subject of a later post but it is worth giving some general observations about how we designed the API for use.

The Rapture API is available in a number of different places. For client applications (programs that are not hosted within Rapture) the client API is used. This API is connected to a Rapture environment through the login section of the API and once connected the application can use the other API sections to interact with the system. The client API is available for Java, Javascript (for browsers), .NET, Python, Ruby, Go and VBA. Although the syntax varies slightly the meaning and use of each of the API calls is consistent. For Blobs however there is often a more efficient way to retrieve and post content. The Rapture API also has two endpoints for posting (uploading) blob data to Rapture and for downloading blob content through a http connection.

For uploading a blob via a web page the following html snippet shows the main technique – basically you use a file type upload to the /rapture/blobupload endpoint:

<div class="modal fade" id="uploadBlob">
  <div class="modal-dialog">
    <div class="modal-content">
      <div class="modal-header">
        <button type="button" class="close" data-dismiss="modal">x</button>
        <h3>Upload Blob File</h3>
      </div>
      <div class="modal-body">
        <form id="uploadForm" action="/rapture/blobupload" method="post" enctype="multipart/form-data" role="form">
        <div class="form-group">
          <label for="BlobURI">Blob URI</label> <input type="text" name="description" id="BlobURI" placeholder="Enter Blob URI" />
        </div>
        <div class="form-group">
          <label for="BlobFile">File Input</label>
          <input type="file" name="file" id="BlobFile" />
          <p class="help-block">Choose the file to upload as a Rapture blob</p>
        </div>
     </form>
  </div>
  <div class="modal-footer">
    <a href="#" class="btn" data-dismiss="modal">Close</a> <a id="submitUploadBlob" href="#" class="btn btn-primary">Upload</a>
  </div>
</div>
</div>
</div>

The link to upload is simply bound (in this example) using jQuery to submit the form:

$('#submitUploadBlob').click(function() {
	$('#uploadForm').submit();
});

Repository Implementation
Before putting data into a repository it needs to be created or registered within the environment. There is of course an API call to do this – it takes the name of the repository and its configuration, along with the configuration of a document repository to handle the meta data and folder usage of the repository.

#blob.createBlobRepo("//test.blob", [a config string for the blob], [ a config string for the folders]);

Rather like the document repository configuration, the configuration string defines two things – the underlying implementation technology and the general feature set supported (whether the repository is versioned for instance). The configuration string for the folder repository follows the same convention as that described for document repositories.

The format of the blob configuration string is as follows:

BLOB {}  USING [implementation] { [configuration] }

Describing all of the options available is beyond the scope of this post but some examples will help clarify the syntax:

BLOB {} USING MONGODB {}     // A blob repository using MONGODB
BLOB {} USING MONGODB { prefix="testdoc" }  // A series repository using MONGODB on a specific collection

The currently supported implementations include: Cassandra, MongoDB and the FileSystem (which must be shared between all instances in a multi-server environment). A pure memory implementation also exists but is solely used for testing.

To summarize, an example call to create a Blob repository is reproduced below:

#blob.createBlobRepo('//test.blob', 'BLOB {} USING MONGODB { prefix="idp.research.data" }', 'REP {} USING MONGODB { prefix="idp.research.data.folders" }');

A tour of the API
The Blob API section of Rapture is used to interact with blob repositories. The API is rounded out with calls to (a) manage blob repositories (create, destroy, modify), (b) managing blobs in a repository (get, put, delete and append) and (c) manage the implied hierarchy of the documents in a repository (get children at a particular point in the tree). Each API call is controlled by entitlements so an administrator can configure who can perform each of these tasks.

Although a blog post isn’t the place to describe these calls in general it is worth listing them out so the breadth of coverage implied in the blob api set can be appreciated. The goal in Rapture is to have a very open API that can be used to manipulate all aspects of the system and this is reflected in number of calls available. Entitlements are used to ensure unintended consequences or unapproved calls being made. A typical user level application would use only a small subset of these calls.

For managing repositories:

   // Creates a repository used to store blobs
   Boolean createBlobRepo(String blobRepoURI, String config, String metaConfig);
   //Retrieves blob repository information
   BlobRepoConfig getBlobRepoConfig(String blobRepoURI);
   // Retrieves blob repositories
   List(BlobRepoConfig) getAllBlobRepoConfigs();
   // Remove a blob repository
   Boolean deleteBlobRepo(String repoURI);
   // This api call can be used to determine whether a given type exists in a given authority.
   Boolean doesBlobRepoExist(String repoURI);

For managing blobs within a repository (noting that the endpoints /rapture/blobupload and /rapture/blob are also useful for putting and retrieving content via http).

   // Does a blob actually exist
   Boolean doesBlobExist(String blobURI);
   // Create a blank blob, ready for appending to. If the blob exists it is unaffected.
   Boolean createBlob(String blobURI, String contentType);
   // Append to a blob created with createBlob. 
   Boolean appendToBlob(String blobURI, ByteArray content);
   // Stores a blob in one hit, assuming a String representation. If append, adds to any content already existing]
   Boolean storeBlob(String blobURI, ByteArray content, String contentType);
   // Stores a blob in one hit, assuming a String representation. If append, adds to any content already existing]
   Boolean putBlob(String blobURI, ByteArray content, String contentType);
   // Retrieves a blob in one hit, assuming a String representation]
   BlobContainer getBlob(String blobURI);
   // Removes a blob from the store
   Boolean deleteBlob(String blobURI);
   // Retrieves the size of a blob]
   Long getBlobSize(String blobURI);
   // Retrieves the metadata for a blob
   Map(String,String) getMetaData(String blobURI);

For listing things:

   List(RaptureFolderInfo) getChildren(String seriesURI);

Uses of Blob Repositories
Within typical Rapture systems blob repositories are used primarily to store reports, images and occasionally code and executables.

Summary
In summary Rapture blob repositories are a key feature of Rapture. They give a separation of responsibility between the application developer (putting and getting binary content) and the underlying operational concerns (which database vendor to use, how to configure and manage that). This separation allows changes to be made beneath an application without changing that application in any way. The underlying implementation is optimized for storing and retrieving data structured in large opaque content blobs.

In the next post we will talk about Sheet Repositories.


Rapture Series Repositories

This is part of the Rapture Series of posts – providing a general overview of the features of Rapture. In this article we describe the series repository type of Rapture which is usually used to store a time series of either structured data or simply numeric points indexed by a unique text key (a uri).

If you haven’t already read the post on Document Repositories I would encourage you to read that first : Document Repositories.

What is a repository?
In Rapture a repository is a place to store information. A repository has a name (unique in a given Rapture instance) and an underlying implementation. The idea is that application developers interact with a repository using a consistent API and Rapture takes care of the details of how to manage the information in the underlying implementation. The implementation in this case refers to a database system and the systems supported currently cover a wide range of technologies from traditional relational databases to the newer “distributed key-value stores”. A list of the technologies currently supported is provided later in this post.

A preview of data
Using the same web interface that we used when browsing Document Repositories we can do the same for Series Repositories.

In the screen capture below you see a typical view of some of the series repositories in a Rapture environment.

Series Repository view

As you can see a Rapture environment typically has many series repositories and the data is usually divided by purpose. If we expand out one of these repositories you see we have an implied hierarchy of information – the analog is to a file system with folders and files. In this case we have expanded out a set of rates by currency.

The contents of a series can be queried using the API by the column dimension – usually a point in time. The web operations front end grabs all of the points and displays it as a chart:

Series view

Comparison between documents and series
It is true that you could create your own “series” by simply storing many documents – one for each point in time. Furthermore you could put the date of the point as part of the name of the document and use that to emulate exactly what is in a series repository. The main difference between that technique and using the series repository is that the latter is optimized for access across that time dimension – it is tuned so that requests for multiple points is as efficient as possible.

Using the API

The Rapture API is the consistent way of interacting with the platform. The API is split into different sections and the section for document repositories is called “doc”. Using the API in a general sense is the subject of a later post but it is worth giving some general observations about how we designed the API for use.

The Rapture API is available in a number of different places. For client applications (programs that are not hosted within Rapture) the client API is used. This API is connected to a Rapture environment through the login section of the API and once connected the application can use the other API sections to interact with the system. The client API is available for Java, Javascript (for browsers), .NET, Python, Ruby, Go and VBA. Although the syntax varies slightly the meaning and use of each of the API calls is consistent.

As an example, the code in Java to retrieve the time series in the screen above would be something like:

            HttpLoginApi loginApi = new HttpLoginApi("http://rapture", new SimpleCredentialsProvider(user, password));
            loginApi.login();
            ScriptClient sc = new ScriptClient(loginApi);
            List values = sc.getSeries().getAllPoints("//analytics.rates/AUD/forward/10Y1YF");

In the Reflex scripting language the Reflex environment is already logged in and the code is much simpler:

series = #series.getAllPoints("//analytics.rates/AUD/forward/10Y1YF");
// Or simply, as this is a common task in Reflex
series = rpull("series://analytics.rates/AUD/forward/10Y1YF");

Repository Implementation
Before putting data into a repository it needs to be created or registered within the environment. There is of course an API call to do this – it takes the name of the repository and its configuration:

#series.createSeriesRepo("//test.series", [a config string]);

Rather like the document repository configuration, the configuration string defines two things – the underlying implementation technology and the general feature set supported (whether the repository is versioned for instance).

The format of the configuration string is as follows:

SREP {  } USING [implementation] { [configuration] }

Describing all of the options available is beyond the scope of this post but some examples will help clarify the syntax:

SREP {} USING MONGODB {}     // A series repository using MONGODB
SREP {} USING MONGODB { prefix="testdoc" }  // A series repository using MONGODB on a specific collection
SEP {} USING CASSANDRA { prefix="test" } // A series repository using CASSANDRA as a backing store

The currently supported implementations include: Cassandra, MongoDB and the FileSystem (typically a simple CSV file and used more for demonstration than for production use). A pure memory implementation also exists but is solely used for testing.

A tour of the API
The Series API section of Rapture is used to interact with series repositories. We’ve seen a getAllPoints call and a create repository call. The API is rounded out with calls to (a) manage series repositories (create, destroy, modify), (b) managing series in a repository (get, put, bulk get and put, delete) and (c) manage the implied hierarchy of the documents in a repository (get children at a particular point in the tree). Each API call is controlled by entitlements so an administrator can configure who can perform each of these tasks.

Although a blog post isn’t the place to describe these calls in general it is worth listing them out so the breadth of coverage implied in the series api set can be appreciated. The goal in Rapture is to have a very open API that can be used to manipulate all aspects of the system and this is reflected in number of calls available. Entitlements are used to ensure unintended consequences or unapproved calls being made. A typical user level application would use only a small subset of these calls.

For managing repositories:

   // Create a repository for series data
   Boolean createSeriesRepo(String seriesURI, String config);
   // Force the repo instantiation (if not already done)
   Boolean validateSeriesRepo(String seriesURI);
   // Check for the existence of a series repository
   Boolean doesSeriesRepoExist(String seriesURI);
   // Fetch the series repository config
   SeriesRepoConfig getSeriesRepoConfig(String seriesURI);
   // Fetch the series repository configs
   List(SeriesRepoConfig) getAllSeriesRepoConfigs();
   // Removes a Series Repository and its data from the Rapture system
   Boolean deleteSeriesRepo(String repoURI);

For managing series within a repository:

   // Create a series, initially empty. If the Series exists it is unaffected
   Boolean createSeries(String seriesURI);
   // Check for the existence of a given series
   Boolean doesSeriesExist(String seriesURI);
   // Removes a Series and its data
   Boolean deleteSeries(String repoURI);

Adding data to a series:

   // Add a decimal-value point to a given series
   Boolean addDoubleToSeries(String seriesURI, String columnKey, Double columnValue);
   // Add a integer-value point to a given series
   Boolean addLongToSeries(String seriesURI, String columnKey, Long columnValue);
   // Add a string-value point to a given series
   Boolean addStringToSeries(String seriesURI, String columnKey, String columnValue);
   // Add a point with a JSON-encoded structure value to a given series
   Boolean addStructureToSeries(String seriesURI, String columnKey, String jsonColumnValue);
   // Add a List of points with decimal-values to a series
   Boolean addDoublesToSeries(String seriesURI, List(String) columns, List(Double) values);
   // Add a List of points with integer-values to a series
   Boolean addLongsToSeries(String seriesURI, List(String) columns, List(Long) values);
   // Add a list of points with string-values to a series
   Boolean addStringsToSeries(String seriesURI, List(String) columns, List(String) values);
   // Add a list of points with JSON-encoded structure values to a series
   Boolean addStructuresToSeries(String seriesURI, List(String) columns, List(String) jsonValues);

For removing things:

   // Remove a folder (many series)
   Boolean removeFolder(String seriesURI);
   // Delete a list of points from a series
   Boolean dropPointsFromSeries(String seriesURI, List(String) columns);
   // Delete all points from a series
   Boolean dropAllPointsFromSeries(String seriesURI);

And finally retrieving things:

   // Get last point from a series
   XferSeriesValue getLastPoint(String seriesURI);
   // Get the entire contents of a series
   List(XferSeriesValue) getAllPoints(String seriesURI);
   // Get one page of data from a series
   List(XferSeriesValue) getPoints(String seriesURI, String startColumn, int maxNumber);
   // Get one page of data from a series in reverse
   List(XferSeriesValue) getPointsReverse(String seriesURI, String startColumn, int maxNumber);
   // Get one page of data from a series range
   List(XferSeriesValue) getRange(String seriesURI, String startColumn, String endColumn, int maxNumber);
   // Get all points from a range from a series
   List(XferSeriesValue) getAllFromRange(String seriesURI, String startColumn, String endColumn);
   // Get the entire contents of a series and cast to Doubles
   SeriesDoubles getAllPointsAsDoubles(String seriesURI);
   // Get one page of data from a series and cast to Doubles
   SeriesDoubles getPointsAsDoubles(String seriesURI, String startColumn, int maxNumber);
   // Get one page of data from a series range and cast to Doubles
   SeriesDoubles getRangeAsDoubles(String seriesURI, String startColumn, String endColumn, int maxNumber);
   // Get all points from a range from a series and cast to Doubles
   SeriesDoubles getAllFromRangeAsDoubles(String seriesURI, String startColumn, String endColumn);
   // Get the entire contents of a series and cast to Strings
   SeriesStrings getAllPointsAsStrings(String seriesURI);
   // Get one page of data from a series and cast to Strings
   SeriesStrings getPointsAsStrings(String seriesURI, String startColumn, int maxNumber);
   // Get one page of data from a series range and cast to Strings
   SeriesStrings getRangeAsStrings(String seriesURI, String startColumn, String endColumn, int maxNumber);
   // Get all points from a range from a series and cast to Strings
   SeriesStrings getAllFromRangeAsStrings(String seriesURI, String startColumn, String endColumn);
   // Get all the immediate children of a particular series path, including both series and folders
   List(RaptureFolderInfo) getChildren(String seriesURI);

In the api return values above the concepts of “XferSeriesValue” and “SeriesDouble” are simply constructs to combine both the value and the column (time point) of the data.

Uses of Series Repositories
Within typical Rapture systems, series repositories are used primarily to store time series data corresponding to fx rates, prices and the like.

Summary
In summary Rapture series repositories are a key feature of Rapture. They give a separation of responsibility between the application developer (putting and getting content that is structured as a series of points) and the underlying operational concerns (which database vendor to use, how to configure and manage that). This separation allows changes to be made beneath an application without changing that application in any way. The underlying implementation is optimized for storing and retrieving column based data which makes it preferable to use series repositories for this type of data over document repositories.

In the next post we will talk about Blob Repositories.


Rapture Document Repositories

This is part of the Rapture Series of posts – providing a general overview of the features of Rapture. In this article we describe the document repository type of Rapture which is used to store structured data indexed by a unique text key (a uri).

What is a repository?
In Rapture a repository is a place to store information. A repository has a name (unique in a given Rapture instance) and an underlying implementation. The idea is that application developers interact with a repository using a consistent API and Rapture takes care of the details of how to manage the information in the underlying implementation. The implementation in this case refers to a database system and the systems supported currently cover a wide range of technologies from traditional relational databases to the newer “distributed key-value stores”. A list of the technologies currently supported is provided later in this post.

A quick example
Before diving into the details it is worth giving a preview to help set the stage. Although Rapture is a platform it does have an operations web interface that can be used (amongst other things) to browse the data stored in the environment.

In the screen capture below you see a typical view of some of the document repositories in a Rapture environment.

Screen Shot 2014-10-16 at 8.52.15 AM

As you can see a Rapture environment typically has many document repositories and the data is usually divided by purpose. If we expand out one of these repositories you see we have an implied hierarchy of information – the analog is to a file system with folders and files.

Expand document repository

And the contents of a given document in this repository:

Screen Shot 2014-10-16 at 9.00.45 AM

In this specific example we used a document repository to store configuration information used in connecting to Bloomberg and interpreting the results returned by that system.

In a very simple sense a document repository in Rapture is an abstract way of managing “files” (content) into a series of “folders” (based on their name). Within Rapture this universal naming convention (a uri) is used everywhere when interacting with documents and their repositories.

Using the API
The Rapture API is the consistent way of interacting with the platform. The API is split into different sections and the section for document repositories is called “doc”. Using the API in a general sense is the subject of a later post but it is worth giving some general observations about how we designed the API for use.

The Rapture API is available in a number of different places. For client applications (programs that are not hosted within Rapture) the client API is used. This API is connected to a Rapture environment through the login section of the API and once connected the application can use the other API sections to interact with the system. The client API is available for Java, Javascript (for browsers), .NET, Python, Ruby, Go and VBA. Although the syntax varies slightly the meaning and use of each of the API calls is consistent.

As an example, the code in Java to retrieve the configuration document in the screen above would be something like:

            HttpLoginApi loginApi = new HttpLoginApi("http://rapture", new SimpleCredentialsProvider(user, password));
            loginApi.login();
            ScriptClient sc = new ScriptClient(loginApi);
            String content = sc.getDoc().getContent("//bloomberg.configuration/equity/index/AEX_Index");

In the Reflex scripting language the Reflex environment is already logged in and the code is much simpler:

content = #doc.getContent("//bloomberg.configuration/equity/index/AEX_Index");
// Or simply, as this is a common task in Reflex
contentMap <-- "//bloomberg.configuration/equity/index/AEX_Index";

Often the content of a document (as is the case here) is a JSON formatted document and there are many utilities in the various language implementations for mapping such a document into a data object or a map/dictionary.

Repository Implementation
Before putting data into a repository it needs to be created or registered within the environment. There is of course an API call to do this – it takes the name of the repository and its configuration:

#doc.createDocumentRepo("//test.doc", [a config string]);

The configuration string defines two things – the underlying implementation technology and the general feature set supported (whether the repository is versioned for instance).

The format of the configuration string is as follows:

[?]REP { [configuration] } USING [implementation] { [configuration] }

Describing all of the options available is beyond the scope of this post but some examples will help clarify the syntax:

NREP {} USING MONGODB {}     // A versioned repository using MONGODB
NREP {} USING MONGODB { prefix="testdoc" }  // A versioned repository using MONGODB on a specific collection
REP {} USING REDIS { prefix="test" } // An unversioned repository using REDIS as a backing store

The currently supported implementations include: Cassandra, MongoDB, Postgres, Amazon SimpleDB, Generic JDBC, Redis, MemCached, EhCache and the FileSystem. A pure memory implementation also exists but is solely used for testing.

Repository Features
A document repository can have some or all of the following features – in some cases features are turned off because they are not needed for the given use case which can help with performance and management of a given repository : versioning, metadata, type checking, indexing (create alternate indices to find documents given other criteria). In nearly all cases the API is consistent across all repository types, though of course attempting to retrieve a previous version of a document from an unversioned repository would result in an error!

A tour of the API
The Doc API section of Rapture is used to interact with document repositories. We’ve seen a getContent call and a create repository call. The API is rounded out with calls to (a) manage document repositories (create, destroy, modify), (b) manage documents in a repository (get, put, bulk get and put, delete, get versions, get metadata) and (c) manage the implied hierarchy of the documents in a repository (get children at a particular point in the tree). Each API call is controlled by entitlements so an administrator can configure who can perform each of these tasks.

Although a blog post isn’t the place to describe these calls in general it is worth listing them out so the breadth of coverage implied in the document api set can be appreciated. The goal in Rapture is to have a very open API that can be used to manipulate all aspects of the system and this is reflected in number of calls available. Entitlements are used to ensure unintended consequences or unapproved calls being made. A typical user level application would use only a small subset of these calls.

     // Is the underlying implementation reachable?
     Boolean validate(String raptureURI);
     // Create a document repository
     Boolean createDocumentRepo(String raptureURI, String config);
     // Does a repo configuration with the given name exist in this system?
     Boolean doesDocumentRepoExist(String raptureURI);
     // Retrieve the configuration of a document repository
     DocumentRepoConfig getDocumentRepoConfig(String docRepoURI);
     // Retrieve the status of a repository (implementation specific)
     Map<String, String> getDocumentRepoStatus(String docRepoURI);
     // Retrieve the configurations of all document repositories
     List<DocumentRepoConfig> getAllDocumentRepoConfigs();
     // Remove a repository and its data
     Boolean deleteDocumentRepo(String repoURI);

For managing documents:

     // Retrieve the content given a key (which includes the name of the repository)
     String getContent(String docURI);
     // Store content, potentially overwriting an existing content
     String putContent(String docURI, String content);
     // Retrieve a set of data
     List<String> batchGet(List<String> docURIs);
     // Does a set of data exist in the system?
     List<Boolean> batchExists(List<String> docURIs);
     // Does an individual document exist?
     Boolean doesDocumentExist(String documentURI);
     // Put a set of data
     List<Object> batchPutContent(List<String> docURIs, List<String> contents);
     // Remove some content
     Boolean deleteContent(String docURI);
     // Move some content
     String renameContent(String fromDocURI, String toDocURI);
     // Batch move some content
     List<Object> batchRenameContent(String authority, String comment, List<String> fromDocURIs, List<String> toDocURIs);

For repositories that are versioned and have meta data we also have:

     // Remove old versions of documents
     Boolean archiveVersions(String repoURI, int versionLimit, long timeLimit, Boolean ensureVersionLimit);
     // Retrieve a document and its metadata (version, who and when it was written)
     DocumentWithMeta getMetaContent(String docURI);
     // Retrieve just the metadata
     DocumentMetadata getMetaData(String docURI);
     // Replace a document's contents with its previous version
     DocumentWithMeta revertDocument(String docURI);
     // Put a content with an assumption of the version being overwritten
     // Optimistic locking
     Boolean putContentWithVersion(String docURI, String content, int currentVersion);
     // Retrieve a set of documents with metadata
     List<DocumentWithMeta> batchGetMetaContent(List<String> docURIs);
     // Add an attribute to a document (a key/value pair not part of the content)
     Boolean addDocumentAttribute(String attributeURI, String value);
     // Retrieve an attribute from a document
     XferDocumentAttribute getDocumentAttribute(String attributeURI);
     // Retrieve all of the attributes of a document
     List<XferDocumentAttribute> getDocumentAttributes(String attributeURI);
     // Remove an attribute from a document
     Boolean removeDocumentAttribute(String attributeURI);

And finally for managing and understanding the implicit hierarchies in an environment:

     // Get a list of the files and folders immediately below the point given
     List<RaptureFolderInfo> getChildren(String docURI);
     // Remove the files and folders below the point given
     // If force is not set it will not remove the folder if it is not empty
     List<String> removeFolder(String docURI, Boolean force);

Uses of Document Repositories
Document Repositories in Rapture are used for many different things.

Behind the scenes Rapture uses document repositories to store information about the entities in a system, the users, their entitlements and the like. The fact that repositories can be versioned and audited ensures that changes can be managed correctly.

Within typical Rapture systems, document repositories have been used for (a) configuration information, (b) entity management (trading orders, trades, positions, financial asset information, curves) and (c) general status information about things progressing in the system. They are very useful when there is a clear an unique “key” that can be used to describe the entity.

Summary
In summary Rapture document repositories are a key feature of Rapture. They give a separation of responsibility between the application developer (putting and getting content) and the underlying operational concerns (which database vendor to use, how to configure and manage that). This separation allows changes to be made beneath an application without changing that application in any way. The history of changes can be preserved if needed to provide audit trails and provenance. Finally depending on the implementation document repositories can be tuned from being a very fast in memory shared data environment (e.g. Redis) to a fully distributed massive store (e.g. Cassandra) to a more traditional relational form that fits in with existing infrastructure (e.g. Oracle). Over time as needs change these implementations can be migrated with little or no change to the applications sitting on top of Rapture.

In the next post we will talk about Series Repositories.


General overview of Rapture

In this series of posts I’ll be describing our platform product “Rapture” in great detail. This post is intended to set the scene and define the roadmap of the future posts you can expect to see.

At its core Rapture is a common platform for building applications in a distributed environment. The goal of Rapture is to allow the users who build applications to remain independent of the underlying infrastructure used to host the application environment. If (for non-functional reasons such as performance) an application needs to use something like Cassandra to store data instead of a relational database then the goal of Rapture is to allow that to happen with little to no changes in Application code.

A further goal of Rapture is to take care of a lot of the common application aspects that are commodity or boilerplate – entitlements, audit trails, devops, release management and deployment should be consistent and common between all applications that run on a Rapture platform.

Rapture in Financial Services

Rapture at its core has no “bindings” to financial services – although it was designed to be used to explicitly build Financial Services applications. Nevertheless Rapture has a very open extensible architecture and through these entry points Incapture has built a suite of extensions that connect a Rapture platform to common financial end points and data sources such as Bloomberg, Reuters and State Street along with a data model and an analytics framework for pricing and analyzing complex financial assets.

Incapture Technologies’ sister company, Incapture Investments, uses this Financial Services based Rapture platform to help manage their AlphaCapture fund.

Rapture in Detail

Over the next few months a series of blog posts (in this “Rapture Series”) will take you on a journey that will cover nearly all aspects of Rapture in some detail. The diagram below shows the general architecture of Rapture:

Rapture Diagram

Our journey will begin with a series of posts on repositories – places in Rapture that you can store or reference information that your application needs. We split repositories in Rapture into four different flavors (or types) and we’ll cover each type in a different post. The next post will be based on Document Repositories and then we’ll cover Series, Blob and Sheets.

An important part of the “glue” in a Rapture system can involve scripting of small programs that perform relatively simple activities on the platform. A description of the Rapture scripting language (called Reflex) will follow the repository posts.

Although you can create quite sophisticated applications using the simple scripting language most users will want to build (or adapt) applications that need to connect to Rapture through an API. Rapture’s API can be called from nearly any client side language and we’ll discuss how this works and how you can create your own custom APIs.

As we move beyond applications calling into Rapture we’ll start describing how Rapture applications can talk to each other in a controlled way. Rapture has facilities such as pipelines (messaging) and workflows that help coordinate this activity. There is also a part of the platform that gives application builders the ability to invoke their client side application from a workflow (ExecServer) and we’ll cover that in a later post. Our sister company Incapture Investments uses this technique to invoke their Python based research models in a controlled way in production.

The cloud based deployment paradigm of Rapture also encourages more use of web based applications. A post will be dedicated to describing how a combination of the Rapture API and the scripting language can be used to create a clear division of responsibility between that of the web browser application and the server side application services. These techniques are used within Rapture itself to provide the developer operations consoles for managing a Rapture environment.

Finally, towards the end of this series, a set of blog posts will cover the more operational side of running a Rapture environment. These posts will cover how to get data into Rapture, how to release and deploy applications in the environment, how to manage audit trails and the “provenance” of data and how to manage entitlements. We’ll also describe how to extend Rapture across many different dimensions.

I hope you’ll enjoy finding out more about Rapture over the next couple of months of blog posts. If you want to find out more early you can either email me directly (Alan Moore) or make a general enquiry through the Contact page of this web site.

 


Rapture: Taking advantage of the present while looking forward to the future

Incapture’s Rapture platform was born out of a desire to not repeat the past, to take advantage of the present and to look forward to the future.

The Past

While working for a number of capital market banks and asset managers I had the opportunity to design, build and deploy a large number of different applications. Although the end functionality could often be vastly different some of the core components of the application stack were the same — in fact it was often deja vu in that I ended up rewriting (or recreating) the same functionality again and again. The enlightened among you would argue that this is a classic case of a need for a common platform or framework – something on which these applications could be built but in these organizations at this time the pressure was to deliver at any cost (time was of the essence) not to deliver a perfect and forward looking solution. This behavior, repeated over and over again, is often the source of the large legacy software infrastructure at these type of organizations and the challenge they now face to look forward.

The Present

The last two to three years have seen a fundamental shift in architectures for software applications — the advent of cloud computing, distributed systems, big data environments and the like have been instrumental in this change. These technologies were available before this time, it’s just that the market has matured to the point where such things are now commodity instead of esoteric.

The Future

In designing a platform to solve (or help transition) the problems of the past and to take advantage of the present we needed to ensure that the platform we design today does not become obsolete as new technologies and approaches emerge. We ensured that the platform has a pluggable model by default – it should be easy to replace one implementation with another. We also ensured that the API to our platform could be called by any language (the transport is open) with little effort. We cannot predict the new languages that will be created but we certainly should ensure that our platform can leverage those new languages with ease.

Where we are

In release Version 1 of Rapture we successfully captured these desires in a platform that can be used with new architectures around distributed processing and common access to data while still providing a legacy migration path for existing applications. The pluggable architecture and options around customizing and enhancing the core platform makes it easy for 3rd party developers to integrate their services and technology with our common platform. Finally and most importantly we have taken the core platform and extended it with key functionality in the Financial Services space – recreating nearly all of that common functionality that so frustrated me in my early career. If I had Rapture then I could have spent all of my time building key application functionality (Risk Analysis, Pricing, Trade Capture) instead of most of my time on managing data, processing, scheduling, entitlements, audit trails. I could have also delivered my applications via many different means – spreadsheet front ends, web front ends, thick clients – all calling the same common layer that delivers the business logic.



Subscribe for updates