Incapture Technologies

Inside the Cloud

Incapture Technologies Blog


Rapture Document Repositories

October 16, 2014

This is part of the Rapture Series of posts – providing a general overview of the features of Rapture. In this article we describe the document repository type of Rapture which is used to store structured data indexed by a unique text key (a uri).

What is a repository?
In Rapture a repository is a place to store information. A repository has a name (unique in a given Rapture instance) and an underlying implementation. The idea is that application developers interact with a repository using a consistent API and Rapture takes care of the details of how to manage the information in the underlying implementation. The implementation in this case refers to a database system and the systems supported currently cover a wide range of technologies from traditional relational databases to the newer “distributed key-value stores”. A list of the technologies currently supported is provided later in this post.

A quick example
Before diving into the details it is worth giving a preview to help set the stage. Although Rapture is a platform it does have an operations web interface that can be used (amongst other things) to browse the data stored in the environment.

In the screen capture below you see a typical view of some of the document repositories in a Rapture environment.

Screen Shot 2014-10-16 at 8.52.15 AM

As you can see a Rapture environment typically has many document repositories and the data is usually divided by purpose. If we expand out one of these repositories you see we have an implied hierarchy of information – the analog is to a file system with folders and files.

Expand document repository

And the contents of a given document in this repository:

Screen Shot 2014-10-16 at 9.00.45 AM

In this specific example we used a document repository to store configuration information used in connecting to Bloomberg and interpreting the results returned by that system.

In a very simple sense a document repository in Rapture is an abstract way of managing “files” (content) into a series of “folders” (based on their name). Within Rapture this universal naming convention (a uri) is used everywhere when interacting with documents and their repositories.

Using the API
The Rapture API is the consistent way of interacting with the platform. The API is split into different sections and the section for document repositories is called “doc”. Using the API in a general sense is the subject of a later post but it is worth giving some general observations about how we designed the API for use.

The Rapture API is available in a number of different places. For client applications (programs that are not hosted within Rapture) the client API is used. This API is connected to a Rapture environment through the login section of the API and once connected the application can use the other API sections to interact with the system. The client API is available for Java, Javascript (for browsers), .NET, Python, Ruby, Go and VBA. Although the syntax varies slightly the meaning and use of each of the API calls is consistent.

As an example, the code in Java to retrieve the configuration document in the screen above would be something like:

            HttpLoginApi loginApi = new HttpLoginApi("http://rapture", new SimpleCredentialsProvider(user, password));
            ScriptClient sc = new ScriptClient(loginApi);
            String content = sc.getDoc().getContent("//bloomberg.configuration/equity/index/AEX_Index");

In the Reflex scripting language the Reflex environment is already logged in and the code is much simpler:

content = #doc.getContent("//bloomberg.configuration/equity/index/AEX_Index");
// Or simply, as this is a common task in Reflex
contentMap <-- "//bloomberg.configuration/equity/index/AEX_Index";

Often the content of a document (as is the case here) is a JSON formatted document and there are many utilities in the various language implementations for mapping such a document into a data object or a map/dictionary.

Repository Implementation
Before putting data into a repository it needs to be created or registered within the environment. There is of course an API call to do this – it takes the name of the repository and its configuration:

#doc.createDocumentRepo("//test.doc", [a config string]);

The configuration string defines two things – the underlying implementation technology and the general feature set supported (whether the repository is versioned for instance).

The format of the configuration string is as follows:

[?]REP { [configuration] } USING [implementation] { [configuration] }

Describing all of the options available is beyond the scope of this post but some examples will help clarify the syntax:

NREP {} USING MONGODB {}     // A versioned repository using MONGODB
NREP {} USING MONGODB { prefix="testdoc" }  // A versioned repository using MONGODB on a specific collection
REP {} USING REDIS { prefix="test" } // An unversioned repository using REDIS as a backing store

The currently supported implementations include: Cassandra, MongoDB, Postgres, Amazon SimpleDB, Generic JDBC, Redis, MemCached, EhCache and the FileSystem. A pure memory implementation also exists but is solely used for testing.

Repository Features
A document repository can have some or all of the following features – in some cases features are turned off because they are not needed for the given use case which can help with performance and management of a given repository : versioning, metadata, type checking, indexing (create alternate indices to find documents given other criteria). In nearly all cases the API is consistent across all repository types, though of course attempting to retrieve a previous version of a document from an unversioned repository would result in an error!

A tour of the API
The Doc API section of Rapture is used to interact with document repositories. We’ve seen a getContent call and a create repository call. The API is rounded out with calls to (a) manage document repositories (create, destroy, modify), (b) manage documents in a repository (get, put, bulk get and put, delete, get versions, get metadata) and (c) manage the implied hierarchy of the documents in a repository (get children at a particular point in the tree). Each API call is controlled by entitlements so an administrator can configure who can perform each of these tasks.

Although a blog post isn’t the place to describe these calls in general it is worth listing them out so the breadth of coverage implied in the document api set can be appreciated. The goal in Rapture is to have a very open API that can be used to manipulate all aspects of the system and this is reflected in number of calls available. Entitlements are used to ensure unintended consequences or unapproved calls being made. A typical user level application would use only a small subset of these calls.

     // Is the underlying implementation reachable?
     Boolean validate(String raptureURI);
     // Create a document repository
     Boolean createDocumentRepo(String raptureURI, String config);
     // Does a repo configuration with the given name exist in this system?
     Boolean doesDocumentRepoExist(String raptureURI);
     // Retrieve the configuration of a document repository
     DocumentRepoConfig getDocumentRepoConfig(String docRepoURI);
     // Retrieve the status of a repository (implementation specific)
     Map<String, String> getDocumentRepoStatus(String docRepoURI);
     // Retrieve the configurations of all document repositories
     List<DocumentRepoConfig> getAllDocumentRepoConfigs();
     // Remove a repository and its data
     Boolean deleteDocumentRepo(String repoURI);

For managing documents:

     // Retrieve the content given a key (which includes the name of the repository)
     String getContent(String docURI);
     // Store content, potentially overwriting an existing content
     String putContent(String docURI, String content);
     // Retrieve a set of data
     List<String> batchGet(List<String> docURIs);
     // Does a set of data exist in the system?
     List<Boolean> batchExists(List<String> docURIs);
     // Does an individual document exist?
     Boolean doesDocumentExist(String documentURI);
     // Put a set of data
     List<Object> batchPutContent(List<String> docURIs, List<String> contents);
     // Remove some content
     Boolean deleteContent(String docURI);
     // Move some content
     String renameContent(String fromDocURI, String toDocURI);
     // Batch move some content
     List<Object> batchRenameContent(String authority, String comment, List<String> fromDocURIs, List<String> toDocURIs);

For repositories that are versioned and have meta data we also have:

     // Remove old versions of documents
     Boolean archiveVersions(String repoURI, int versionLimit, long timeLimit, Boolean ensureVersionLimit);
     // Retrieve a document and its metadata (version, who and when it was written)
     DocumentWithMeta getMetaContent(String docURI);
     // Retrieve just the metadata
     DocumentMetadata getMetaData(String docURI);
     // Replace a document's contents with its previous version
     DocumentWithMeta revertDocument(String docURI);
     // Put a content with an assumption of the version being overwritten
     // Optimistic locking
     Boolean putContentWithVersion(String docURI, String content, int currentVersion);
     // Retrieve a set of documents with metadata
     List<DocumentWithMeta> batchGetMetaContent(List<String> docURIs);
     // Add an attribute to a document (a key/value pair not part of the content)
     Boolean addDocumentAttribute(String attributeURI, String value);
     // Retrieve an attribute from a document
     XferDocumentAttribute getDocumentAttribute(String attributeURI);
     // Retrieve all of the attributes of a document
     List<XferDocumentAttribute> getDocumentAttributes(String attributeURI);
     // Remove an attribute from a document
     Boolean removeDocumentAttribute(String attributeURI);

And finally for managing and understanding the implicit hierarchies in an environment:

     // Get a list of the files and folders immediately below the point given
     List<RaptureFolderInfo> getChildren(String docURI);
     // Remove the files and folders below the point given
     // If force is not set it will not remove the folder if it is not empty
     List<String> removeFolder(String docURI, Boolean force);

Uses of Document Repositories
Document Repositories in Rapture are used for many different things.

Behind the scenes Rapture uses document repositories to store information about the entities in a system, the users, their entitlements and the like. The fact that repositories can be versioned and audited ensures that changes can be managed correctly.

Within typical Rapture systems, document repositories have been used for (a) configuration information, (b) entity management (trading orders, trades, positions, financial asset information, curves) and (c) general status information about things progressing in the system. They are very useful when there is a clear an unique “key” that can be used to describe the entity.

In summary Rapture document repositories are a key feature of Rapture. They give a separation of responsibility between the application developer (putting and getting content) and the underlying operational concerns (which database vendor to use, how to configure and manage that). This separation allows changes to be made beneath an application without changing that application in any way. The history of changes can be preserved if needed to provide audit trails and provenance. Finally depending on the implementation document repositories can be tuned from being a very fast in memory shared data environment (e.g. Redis) to a fully distributed massive store (e.g. Cassandra) to a more traditional relational form that fits in with existing infrastructure (e.g. Oracle). Over time as needs change these implementations can be migrated with little or no change to the applications sitting on top of Rapture.

In the next post we will talk about Series Repositories.

Subscribe for updates