Incapture Technologies

Engineering

Incapture Technologies Blog

 

Engineering

Utilizing Java Watch Service

At Incapture we often implement data ingestion workflows for clients, typically as part of a larger re-engineering effort. Frequently, this involves waiting for and loading file-based data to arrive from another system or vendor. This is where Java’s Watch Service comes into play. Recently I was reading about Java’s Watch Service, which is included with the java.nio.file package, and thought this would could help us with client engagements.

Watch Service allows you to monitor directories and what types of events you want notifications for.  Events are create, modify and delete; more details here.

We have released  ‘WatchServer‘ as part of our open source platform. The server provides a file system monitoring capability that maps file system events to Rapture actions in a repeatable and configurable fashion.

Typically the action would be a Workflow.  As a reminder ‘Workflows’ in Rapture:

  • Are constructs that define a set of tasks (or steps) that need to be performed in some order:
  • Contain steps that can be implemented in various languages (Reflex, Java, Python etc)
  • Contain state that can be updated by each step
  • Manage step switching and execution via an internal pipeline
  • Can be initiated using Workflow API or attached to an event

There are many use cases we could support with this architecture plus Rapture platform capabilities, some of which are:

  • Loading csv file(s) to create time series accessible via Rapture’s Series API
  • Loading pdf file(s), indexing them and making them searchable via Rapture’s Search API
  • Loading xml file(s) and transforming to (json) documents accessible via Rapture’s Document API

To illustrate i’ve developed a workflow to load a SamplePriceData.xlsx file, extract data from each row and create a (json) document for that row in a Rapture document repository.

The WatchServer detects ENTRY_CREATE events and runs the workflow, which does:

  1. Loads a file from /opt/test and stores it in a blob Rapture repository blob://archive/yyyyMMdd_HHmmss/SamplePriceData.xlsx
  2. Create a Rapture document repository containing one document for each row in the spreadsheet document://data/yyyyMMdd_HHmmss/ROW000001..N. This uses Apache poi, a Java API for Microsoft documents, to extract data from the spreadsheet.

It is straightforward to setup and run locally using the process set out in the README.md using images from Incapture’s public Docker Hub account.  Make sure to install Docker on your local system first! I use Docker for mac.

Once the workflow has been run once you can view the results in default Rapture system UI on http://localhost:8000.

The archived xlsx file saved as a blob:

archive repository

and the subsequent documents created in document://data repository:

screen-shot-2016-11-25-at-9-48-41-pm

Using WatchServer in conjunction with Workflows gives you a flexible but defined approach to implement your domain specific data loading processes. Plus the benefits from the built-in operational support Rapture provides.

If you’d like more information about Incapture or Rapture please email me jonathan.major@incapturetechnologies.com, or to our general email address info@incapturetechnologies.com and we will get back to you for a more in depth discussion.


Rapture and REST

At Incapture we implemented a REST server to demonstrate exposing Rapture (Kernel) calls through a REST style interface. Specifically, to perform CRUD operations on the various Rapture data types: document, blob and series. This approach can be used when modeling and implementing your own Rapture client’s domain resources and interactions.

We wanted to use a simple and straightforward REST framework so we choose http://sparkjava.com/. This allows you to get started quickly and provides everything needed to build an API.

Lets focus on Rapture ‘Documents’. One of the prime uses of Rapture is to manage access to data. Rapture has the concept of a repository for managing access to data. Various repositories, configurations and implementations are provided ‘out of the box’. For the purposes of this post we will be considering a versioned document repository hosted on MongoDB.

Document data repositories manage data as key/value pairs and are addressable through URIs. In fact, all data in a Rapture system is uniquely addressable via a URI and is a key concept in using the platform.

For example, consider the following document with URI document://orders/ORD000023312 and data:

{
    "id" : "ORD000023312",
    "orderDate" : "20150616",
    "ordType" : "market",
    "side" : "buy",
    "quantity" : 4000000.0,
    "strategy" : "XYZ",
    "fund" : "FUNDNAME",
    "status" : "FILLED",
}


Lets look at the process to create a document repository and load a document.

The first step is spin up a local Rapture system; this can be done easily using Docker. The steps are set out at this README.md. All the docker images are available on Incapture’s public Dockerhub registry.

So lets begin the process of:

  1. Creating a Document repository using a POST action
  2. Adding a document using a POST action
  3. Using GET action to retrieve the data
  4. Deleting the document

A postman collection is available with working API calls. Please note this uses https://localhost as we’re using Docker’s native (mac) tools.  Postman collection includes a /login call and provides all the necessary body (Raw JSON/Application) inputs.


The first task is to create a versioned Document repository configured to use MongoDB.  The REST call is as follows:

    POST /doc/:authority
    Example: /doc/orders
    Body: {"config":"NREP USING MONGODB {prefix=\"orders\"}"}

The server will route this call and create this repository: document://orders

Here is the (spark) method implementing the route note the Rapture Kernel calls:


post("/doc/:authority", (req, res) -> {
    log.info(req.body());
    String data = JacksonUtil.getMapFromJson(req.body());
    String authority = req.params(":authority");
    String config = (String) data.get("config");
    CallingContext ctx = getContext(req);
    if (Kernel.getDoc().docRepoExists(ctx, authority)) {
        halt(409, String.format("Repo [%s] already exists", authority));
    }
    Kernel.getDoc().createDocRepo(ctx, authority, config);
    return new RaptureURI(authority, Scheme.DOCUMENT).toString();
});

Next we will create a new ‘order’ document at URI document://orders/ORD000023312. The body for the call is provided in the postman collection.

   PUT /doc/:uri
   Example: /doc/orders/ORD000023312
   Body: {..order json here..}

Note the Rapture Kernel call to write a document putDoc(String uri, String body)

    put("/doc/*", (req, res) -> {
     return Kernel.getDoc().putDoc(getContext(req), getDocUriParam(req), req.body());
    });

We won’t go through the subsequent GET and DELETE calls as the postman collection and github code are available to review.

Links:

  1. RESTServer Github repository
  2. Setting up local Docker environment
  3. Postman collection

If you’d like more information about Incapture or Rapture please email me or to our general email address info@incapturetechnologies.com and we will get back to you for a more in depth discussion.



Subscribe for updates