Incapture Technologies

Inside the Cloud

Incapture Technologies Blog

 

Execution Server

Published:
November 7, 2014
Author:

When designing a system on the Rapture platform there’s a lot that can be done using Reflex scripts, workflows containing in built feed logic or even client side applications that connect to Rapture “on demand” as they see fit. There is also another class of application which (a) needs to usually run as part of some higher level “process” or workflow and (b) cannot be directly embedded into the Rapture platform – i.e. it is not a Reflex script or a piece of Java code. The Rapture Execution Server and associated workflow plugins is a technique that can be used to bring these external “batch style” application fragments into a more controlled execution environment.

The Execution Server is a standalone Rapture application that reacts to specific workflow step initiation messages. These messages tell the Execution Server to kick off a process with a particular (and custom) command line, and if necessary the application can download Rapture data to a local set of folders so that the spawned application need not use the Rapture API at all to connect back to the system. Instead the application can load the data from files and folders and write output data to other files and folders. Once the process finishes the Execution Server can take that generated output and write it back to Rapture. In this way Execution Server can handle both applications that can simply use the Rapture API to interact with the system and applications that have no knowledge of the Rapture API and can simply interact with files.

Furthermore, the Execution Server can have restrictions on the paths that can be executed from it (to control rogue process execution) and can, if needed, download the “executable” from Rapture before the process is kicked off. This is especially useful for script based applications such as Python.

An example workflow

To show how a workflow is constructed that invokes Execution Server we will show a fragment of a workflow that calls a Python script. The workflow is normally constructed using the Rapture API calls or embodied in a feature (the subject of a future blog post), what is reproduced below is the lower level configuration that is the fundamental definition of this workflow:

{
  "workflowURI" : "workflow://idpResearch/workflows/TestModel",
  "semaphoreType" : "WORKFLOW_BASED",
  "semaphoreConfig" : "{\"maxAllowed\":1}",
  "steps" : [ {
    "name" : "setRunId",
    "executable" : "dp_java_invocable://execmgr.RunIdStep",
    "view" : {
      "MODE" : "#DATE"
    },
    "transitions" : [ {
      "name" : "",
      "targetStep" : "Step1_Configuration"
    } ]
  }, {
    "name" : "Step1_Configuration",
    "executable" : "dp_java_invocable://execmgr.ExecProcessStep",
    "view" : {
      "wrapperInfo" : "%{\n  \"appArg\" : \"blob://model.test/python/Test_Configuration_AM.py\",\n  \"appPath\" : \"/usr/bin/python\",\n  \"inputConfig\" : {\n  },\n  \"outputConfig\" : {\n    \"dir_output_pkl\" : \"*blob://model.test/output/${runId}/pkl/$$n$$e\"\n  }\n}"
    },
    "transitions" : [ {
      "name" : "",
      "targetStep" : "$RETURN:ok"
    }, {
      "name" : "error",
      "targetStep" : "$FAIL"
    } ]
  } ],
  "startStep" : "setRunId",
  "category" : "execServer",
  "view" : {
  },
  "defaultAppStatusNamePattern" : "%idpModel/${$__date_string}/test"
}

In this example we have two real workflow steps and both use the “java invocable” technique of defining the implementation of the step. This technique means that java code embedded in the application processing the workflow will be called. Right at the bottom of the definition, the “category” of the workflow is “execServer” and this instructs the workflow system that step messages by default will only be received by the Execution Server – and it is the Execution Server that contains the implementation of these steps.

RaptureExecutionServer

The first step calls the invocable method “setRunId”. This is simply a convenience method used in this workflow to define a common “context” for an instance of a workflow. It means that output information (such as status of this run) is segregated from other runs of the same workflow.

If the first step completes successfully it moves to “Step1_Configuration”. In the real workflow this example is taken from there are 9 other steps – each invoking applications in slightly different ways. The key to interaction with the Execution Server is with the view parameters. In a workflow the view parameters are passed to the implementation of the step. In most cases what is passed and the format of that information is quite specific to the step and in this case it is no different. The format of the “wrapperInfo” parameter defines:

(a) the application to execute – in this case a Python script stored in a Rapture blob repository
(b) the way to invoke this application – in this case use the python interpreter
(c) any input and output that Execution Server needs to either download before the process executes or after it has finished. In this case the contents of the folder “dir_output_pkl” will be uploaded back to Rapture as blobs in the location given.

And so when this step is executed by Execution Server it will perform the following tasks:

1. Setup an area in the local filesystem to store this temporary “run”.
2. Download the python script from Rapture to this area.
3. Download any input documents (there is none in this configuration)
4. Create a configuration file that defines where the input and output files are, including defining where “dir_output_pkl” actually is. The configuration file also contains information about how the process should connect back to Rapture.
5. Spawn the python process to kick of this script, passing the configuration file location as a parameter.
6. The python script runs. It can use the configuration file to determine where to place any output. In this case it will write a bunch of Python “pickle” files to the folder referenced by “dir_output_pkl”.
7. After the process finishes, Execution Server will look at every file written to dir_output_pkl and upload those files as blobs back to Rapture – into the location

blob://model.test/output/${runId}/pkl/$$n$$e

where $$n$$e is replaced by the name of the file (and its extension) and ${runId} is replaced with the unique runId of this workflow.
8. If all has worked correctly the step will return a successful result, which in this case means that the workflow engine will finish the workflow. (In a larger workflow it will usually move to the next step).

Summary

The Rapture Execution Server is a tool to help a workflow reach out to spawn a remote process to do work as part of a workflow. It provides a control framework around this process and runs it in a consistent way. It is one technique that can be used to integrate applications into the Rapture platform environment.


Subscribe for updates