NCIBI Web Services - Long Running Services

Long Running Services Web Service Guide

The NCIBI Long Running Services are available through a client side Java API. The services can also be accessed as a set of REST based services. The following documentation describes the Java API and the REST URL services if you wish to access these services using a scripting language such as Perl.

All of the long running services are based on a polling model. A user submits a request and recieves back a unique token. This token is then used to query for the results. If the request has not been started the status of the request will be QUEUED. A request that is currently being processed will have a status of RUNNING. Requests that aborted have a status of ERRORED. Completed requests have a status of DONE, along with their results. In some cases a request might be aborted (for example due to excessive resource usage), those requests are marked as CANCELED.

The following example Java code submits a request to LRPath and then loops waiting for a result. The example elides over the details of setting up the data. The full code for this example is included in the example code section below.

          /*
          ** Create client and submit request.
          */
          NcibiLRPathService client = new NcibiLRPathService(HttpRequestType.POST);
          Response<String> response = client.submitLRPathRequest(data);
          String uuid = response.getResponseValue();

          /*
          ** Loop waiting for the task to complete.
          */
          Response<RequestStatus<List<LRPathResult>>> results = client.lrpathStatus(uuid);
          while (results != null && results.getResponseValue().getTask().getStatus() != TaskStatus.DONE)
          {
          sleep(10*60);
          results = client.lrpathStatus(uuid);
          }

          /*
          ** Retrieve results.
          */
          List<LRPathResult> lrpathResults = r.getResponseValue().getData();

The general structure above will be followed for all requests. This is true whether you use the provided Java API, or write scripts accessing the REST URLs directly. The general flow is as follows:

Setup your data.
Submit your data to one of the long running services.
Upon a successful submital you will receive back a unique token.
Use the unique token to periodically poll, testing if your request has finished.

Some jobs can take a long time to run. This can occur because the processing is complicated, or because there are a lot of other requests ahead of yours in the queue. We recommend that you limit your polling. In the example above a polling request is done every 10 minutes.

Services

LRpath
Thinkback

Appearance Frequency Modulated Gene Set Enrichment Testing
Density Analysis for Gene Set Enrichment Testing

splitter/mxterminator
parser/stanford
status

LRpath Service

LRpath performs gene set enrichment testing, an approach used to test for predefined biologically-relevant gene sets that contain more significant genes from an experimental dataset than expected by chance. Given a high-throughput dataset with continuous significance values (i.e. p-values), LRpath tests for gene sets (termed concepts) that have significantly higher significance values (e.g. for differential expression) than expected at random. LRpath can identify both concepts that have a few genes with very significant differential expression and concepts containing many genes with only moderate differential expression. This user interface provides a user-friendly implementation of LRpath, and greatly expands the set of concepts available to test from the original publication¹. Genes are mapped to concepts using their Entrez Gene IDs. The pre-defined gene sets (concept databases) available to test depend on the species, but for human, mouse, and rat include all those used in ConceptGen. The use of logistic regression allows the use of continuous values for differential expression (a non-threshold based method) while maintaining the interpretation of results in terms of an odds ratio , as is used with the standard Fisher's Exact test. Detailed methods are provided here.

For more details please see the LRpath website.

Thinkback Service (Appearance Frequency Modulated Gene Set Enrichment Testing)

Gene set enrichment analysis has helped bridge the gap from an individual gene to systems biology interpretation of microarray data. Although gene sets are defined a priori based on biological knowledge, all genes are treated as equal in current methods. However, it is well-known that some genes, such as those responsible for housekeeping functions, appear in many pathways, whereas other genes are more specialized and play a unique role in a single pathway.

Drawing inspiration from the field of information retrieval, we develop an approach to incorporate gene appearance frequency (in KEGG) into the Gene Set Enrichment Analysis (GSEA) and logistic regression-based LRpath framework to generate more reproducible and biologically meaningful results.

Thinkback Service (Density Analysis for Gene Set Enrichment Testing)

Use of biological knowledge has been shown to be of value in analyzing high throughput gene expression and other types of genomics data. For instance, Gene Set Enrichment Analysis (GSEA) is widely used for analysis of data from gene expression microarrays. However, biological knowledge is typically introduced in a very simple way ---for example all genes in a biological pathway are defined to be in a gene set and considered equivalent for statistical purposes.

We propose a more sophisticated analysis, called Density Scoring (DS), that takes into account the topology of the pathway graph by considering the relative positions of differentially expressed genes over the pathway network. This score is then used to adjust any prior gene set enrichment testing scores. Our experiments over lung and breast cancer microarray data show that the DS-adjusted methods assigned a higher mean rank to cancer-related signaling pathways, compared to the original gene set enrichment testing methods. We also show that DS-adjusted methods can more robustly replicate analysis results across studies.

For more details please see the Thinkback website.

Splitter/mxterminator Service

The Splitter service performs complete sentence boundary disambiguation. The current splitter we expose uses mxterminator.

Parser/stanford Service

The Parser service currently exposes the Stanford Parser - A Natural Language Processing based statistical parser. We currently limit its input to single sentences. More details on the Stanford Parser can be found at it's website.

Status Service

All long running service requests return a unique token that is used to query request status. The status service will return the status of a request, and if that request has finished processing, will also return it's results.

Sample Code For Retrieving and Parsing NCIBI MiMI Data

The following example code shows how to access long running services in Java.

Sample 1: POM.xml for example Java code - POM.xml
Sample 2: Calling LRPath - Java
Sample 3: Calling Thinkback - Java, dataset file, class file, template file, chipset file