Supporting long operations in CodeProject.AI Server modules in Python

Chris Maunder

4.67/5 (2 votes)

Apr 4, 2024

CPOL

6 min read

5909

This article will show you how to create a module for CodeProject.AI Server that wraps some code that takes a long time to complete

Introduction

Wrapping some great AI code in a CodeProject.AI module is straightforward for the cases where your code performs a quick inference then returns the results to the server. For cases where the AI operation is longer - for example generative AI - this flow won't work due to timeouts and a generally poor user experience.

This article will show you how to create a module for CodeProject.AI Server that wraps some code that takes a long time to complete. We will focus solely on the code required to write the adapter for our AI code, and not on the AI code itself. For that, and a fun example of an LLM on your desktop, please read the follow up article by Matthew Dennis Creating a LLM Chat Module for CodeProject.AI Server.

Getting Started

We're going to assume you have read CodeProject.AI Module creation: A full walkthrough in Python. We'll be creating a module in exactly the same manner, with the small addition that we'll show how to handle long running processes.

First, as always, clone the CodeProject.AI Server repo and in the /src/modules folder create a new folder for your module. We'll call it PythonLongProcess. A simple name for us simple folk.

We will also assume we have some code that we want to expose via CodeProject.AI Server. The amazing code we'll be wrapping is below:

import time

cancelled = False

def a_long_process(callback):

    result    = ""
    step      = 0
    cancelled = False

    for i in range(1, 11):
       if cancelled: break
       
       time.sleep(1)
       step   = 1 if not step else step + 1
       result = str(step) if not result else f"{result} {step}"
       callback(result, step)


def cancel_process():
    global cancelled
    cancelled = True

All the code does is progressively build a string containing the numbers 1 - 10. At each step it checks if the process has been cancelled, and also calls a callback to allow the caller to check on progress. Nothing exciting, but it'll serve as a good demo.

Creating the adapter

We want to wrap this long process code in a CodeProject.AI Server module, so we'll create an adapter, a modulesettings.json file, install scripts and a test page. We'll start with the adapter.

Our adapter will be very bare-bones. We don't need to get values from the caller, there's not a lot of error checking, and we're not going to log any info.

We need to create a ModuleRunner derived class and override the initialize and process methods. To provide support for long processes we also need to override command_status, cancel_command_task, and provide a method that will actually call the long process we're wrapping. It's this last piece that provides long process support in modules.

Long Process support

To allow a CodeProject.AI Server module to handle long processes we do three things:

Signal to the caller, and to the Server itself, that a call to a method is going to result in a long process.
Run the long process in the background
Provide the means to check on it's status and cancel if necessary

To do this we return a Callable from the usual process method, rather than a JSON object that would normally contains the results of call. Returning a Callable signals to the server that we need to run a method in the background. The caller will then need to poll the module status API to check on progress, and if needed, call the cancel task API to cancel the long running process.

To check a module's status you make an API call to /v1/<moduleid>/get_command_status
To cancel a long process you make an API call to /v1/<moduleid>/cancel_command.

These routes are automatically added to each module and do not need to be defined in the module settings files. The calls will map to the module's command_status and cancel_command_task methods respectively.

The Code

Here is the (mostly) complete listing for our adapter. Note the usual initialize and process methods, as well as the long_process method which is returned from process to signal a long process is starting.

Within long_process we don't do much other than call the code we're wrapping (a_long_process) and report back the results.

The command_status and cancel_command_task methods are equally simple: return what we have so far, and cancel the long operation if requested.

The final piece is our long_process_callback which we pass to long_process. This will receive updates from long_process and gives us the chance to collect interim results.

... other imports go here

# Import the method of the module we're wrapping
from long_process import a_long_process, cancel_process

class PythonLongProcess_adapter(ModuleRunner):

    def initialise(self) -> None:
        # Results from the long process
        self.result      = None
        self.step        = 0
        # Process state
        self.cancelled   = False
        self.stop_reason = None

    def process(self, data: RequestData) -> JSON:
        # This is a long process module, so all we need to do here is return the
        # long process method that will be run
        return self.long_process

    def long_process(self, data: RequestData) -> JSON:
        """ This calls the actual long process code and returns the results """
        self.cancelled   = False
        self.stop_reason = None
        self.result      = None
        self.step        = 0

        start_time = time.perf_counter()
        a_long_process(self.long_process_callback)
        inferenceMs : int = int((time.perf_counter() - start_time) * 1000)

        if self.stop_reason is None:
            self.stop_reason = "completed"

        response = {
            "success":     True, 
            "result":      self.result,
            "stop_reason": self.stop_reason,
            "processMs":   inferenceMs,
            "inferenceMs": inferenceMs
        }

        return response

    def command_status(self) -> JSON:
        """ This method will be called regularly during the long process to provide updates """
        return {
            "success": True, 
            "result":  self.result or ""
        }

    def cancel_command_task(self):
        """ This process is called when the client requests the process to stop """
        cancel_process()
        self.stop_reason = "cancelled"
        self.force_shutdown = False  # Tell ModuleRunner we'll shut ourselves down



    def long_process_callback(self, result, step):
        """ We'll provide this method as the callback for the a_long_process() 
            method in long_process.py """
        self.result = result
        self.step   = step


if __name__ == "__main__":
    PythonLongProcess_adapter().start_loop()

Create the modulesettings.json files

Again, make sure you've reviewed A full walkthrough in Python and The ModuleSettings files. Our modulesettings file is very basic, with the interesting bits being:

The path to our adapter, which will be used to launch the module, is long_process_demo_adapter.py
We'll run under python3.9
We'll define a route "pythonlongprocess/long-process" that takes a command "command" that doesn't accept any input values and returns a string "reply"
It can run on all platforms

{
  "Modules": {
 
    "PythonLongProcess": {
      "Name": "Python Long Process Demo",
      "Version": "1.0.0",
 
      "PublishingInfo" : {
         ... 
      },
 
      "LaunchSettings": {
        "FilePath":    "llama_chat_adapter.py",
        "Runtime":     "python3.8",
      },
 
      "EnvironmentVariables": {
         ...
      },
 
      "GpuOptions" : {
         ...
      },
      
      "InstallOptions" : {
        "Platforms": [ "all" ],
        ...
      },
  
      "RouteMaps": [
        {
          "Name": "Long Process",
          "Route": "pythonlongprocess/long-process",
          "Method": "POST",
          "Command": "command",
          "MeshEnabled": false,
          "Description": "Demos a long process.",
          
          "Inputs": [
          ],
          "Outputs": [
            {
              "Name": "success",
              "Type": "Boolean",
              "Description": "True if successful."
            },
            {
              "Name": "reply",
              "Type": "Text",
              "Description": "The reply from the model."
            },
            ...
          ]
        }
      ]
    }
  }
}

There's a fair bit of boilerplate that has been removed from this snippet, so please refer to the source code to see the full Monty.

The installation scripts.

We don't actually have any installing to do for our example. When this module is downloaded, the server will unpack it, move the files to the correct folder, and then run the install script so we can perform any actions needed to setup the module. We need do nothing, so we'll include empty scripts. Not including a script will signal to the server that this module should not be installed.

@if "%1" NEQ "install" (
    echo This script is only called from ..\..\setup.bat
    @goto:eof
)
call "!sdkScriptsDirPath!\utils.bat" WriteLine "No custom setup steps for this module." "!color_info!"

if [ "$1" != "install" ]; then
    read -t 3 -p "This script is only called from: bash ../../setup.sh"
    echo
    exit 1 
fi
writeLine "No custom setup steps for this module" "$color_info"

Create the CodeProject.AI Test page (and the Explorer UI)

We have the code we wish to wrap and expose to the world, an adapter to do this, a modulesettings.json file to define how to setup and start our adapter, and our install scripts. The final piece is the demo page that allows us to test our new module.

Our demo page (explore.html) is as basic as it gets: a button to start the long process, a button to cancel, and an output pane to view the results.

<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta charset="utf-8" />
    <title>Python Long Process demo module</title>

    <link id="bootstrapCss" rel="stylesheet" type="text/css" href="http://localhost:32168/assets/bootstrap-dark.min.css">
    <link rel="stylesheet" type="text/css" href="http://localhost:32168/assets/server.css?v=2.6.1.0">
    <script type="text/javascript" src="http://localhost:32168/assets/server.js"></script>
    <script type="text/javascript" src="http://localhost:32168/assets/explorer.js"></script>

    <style>
/* START EXPLORER STYLE */
/* END EXPLORER STYLE */
    </style>

</head>
<body class="dark-mode">
<div class="mx-auto" style="max-width: 800px;">
    <h2 class="mb-3">Python Long Process demo module</h2>
    <form method="post" action="" enctype="multipart/form-data" id="myform">

<!-- START EXPLORER MARKUP -->
        <div class="form-group row g-0">
            <input id="_MID_things" class="form-control btn-success" type="button" value="Start long process"
                   style="width:9rem" onclick="_MID_onLongProcess()"/>
            <input id="_MID_cancel" class="form-control btn-warn" type="button" value="Cancel"
                   style="width:5rem" onclick="_MID_onCancel()"/>
        </div>
<!-- END EXPLORER MARKUP -->
        <div>
            <h2>Results</h2>
            <div id="results" name="results" class="bg-light p-3" style="min-height: 100px;"></div>
        </div>

    </form>

    <script type="text/javascript">
// START EXPLORER SCRIPT

        let _MID_params = null;

        async function _MID_onLongProcess() {

            if (_MID_params) {
                setResultsHtml("Process already running. Cancel first to start a new process");
                return;
            }

            setResultsHtml("Starting long process...");
            let data = await submitRequest('pythonlongprocess/long-process', 'command', null, null);
            if (data) {

                _MID_params = [['commandId', data.commandId], ['moduleId', data.moduleId]];

                let done = false;

                while (!done) {
                    
                    await delay(1000);

                    if (!_MID_params)    // may have been cancelled
                        break;

                    let results = await submitRequest('pythonlongprocess', 'get_command_status',
                                                        null, _MID_params);
                    if (results && results.success) {

                        if (results.commandStatus == "failed") {
                            done = true;
                            setResultsHtml(results?.error || "Unknown error");
                        } 
                        else {
                            let message = results.result;
                            if (results.commandStatus == "completed")
                                done = true;

                            setResultsHtml(message);
                        }
                    }
                    else {
                        done = true;
                        setResultsHtml(results?.error || "No response from server");
                    }
                }

                _MID_params = null;
            };
        }

        async function _MID_onCancel() {
            if (!_MID_params)
                return;
				
			let moduleId = _MID_params[1][1];
            let result = await submitRequest(moduleId, 'cancel_command', null, _MID_params);
            if (result.success) {
                _MID_params = null;
                setResultsHtml("Command stopped");
            }
        }
// END EXPLORER SCRIPT
    </script>
</div>
</body>
</html>

Conclusion

Wrapping code that takes a long time to execute in a CodeProject.AI module is straightforward thanks to help from the server. It helps enormously if the code you are wrapping provides a means of regularly querying its progress, but even that isn't necessary (though the user experience will suffer a little).

We've used the long process support to wrap a text-to-image module using stable diffusion, and the Llama large language model to provide ChatGPT functionality on your desktop. The only additional beyond writing a standard CodeProject.AI Server module was adding the methods to the adapter to check status and cancel if necessary, plus the code to actually call these methods in our test HTML page.

Long process support is perfect for generative AI solutions, but is also useful where you wish to support AI operations on low-spec hardware. While OCR, for instance, might take a fraction of a second on a decent machine, running the same text detection and recognition models on large amounts of data on a Raspberry Pi could take a while. Offering the functionality via a long process module can provide a better user experience and avoid issues of HTTP timeouts.