Guide to Workflow

Workflow and Table

There'll two actions in workflow builder that can be used to interact with table:

Load tables as virtual database, and output schema: This action will load the schema of the virtual database that contains the table, and output Database markup language (DML) schema.

Query virtual database: This action will query the virtual database with the generated SQL query, and output the result.

Between those two actions, you can use any LLM model to generate the SQL query, and use the SQL query to query the virtual database, and get the result. Thanks to the schema output by the first action, LLM can generate SQL query with high accuracy, and the result can be used to generate the next action's input.

Collaborative Workflow Builder

edit workflow as if you are editing a google doc

Workflow builder is essentially a rich text document editor enhanced with runnable actions. Just like google docs editor, Workflow builder supports real-time collaboration, you can invite your team members to edit the workflow together, and see the changes in real-time. This is extremely useful for writing prompt within teams, and debugging the workflow, because different team members may have different expertise, and can contribute to the prompt in different ways.

Runnable Action

Action supported by ReByte's workflow builder
- LLM Actions
  - Language Model Interface, support OpenAI/Google/Claude/Deepseek
- Data Actions
  - Dataset Loader, load pre defined datasets for later processing
  - File Loader, extract/transform/load user's provided files
  - Virtual Database, translate user's natural language query into SQL and execute the query on user's database
  - Semantic Search, search for similar content over user's knowledge base
- Tools Actions
  - Search Engine, search for information on Google/Bing
  - Web Crawler, crawl web pages and extract information
  - Http Request Maker, make any http request to any public/private API
- Control flow Actions
  - Loop Until, run actions until a condition is met
  - Parallel, execute multiple actions in parallel
  - Vanilla Javascript, execute any vanilla javascript code, useful for doing pure data transformation

Test Driven Development For Workflow

Dataset is a collection of JSON data that actions can use, the most important dataset is the workflow input test dataset, which is the data that used to run the workflow everything you run workflow from workflow builder UI, think about input dataset as the test case for your workflow, it should cover all possible scenarios that your workflow will face in production.

Lifecycle of a LLM Workflow

LLM is naturally unpredictable, it's hard to predict what LLM will do in a specific scenario. The typical lifecycle of a workflow is:

Define test dataset, this is the data that you will use to test your workflow, it should cover all possible scenarios that your workflow will face in production.
Design your workflow. This is the process of creating a sequence of actions that LLM will execute.
Run to test your workflow, this is the process of running your workflow with the test dataset, and see if the result is as expected.
Loop previous steps until you are satisfied with the result.
Deploy your workflow to production. This is the process of making your workflow available to end users.

Workflow Version

Every Deployment of a workflow triggers a new version, starting from 1. There are two special version strings: 'latest' always points to the newest version of the workflow. 'Live': You can manually promote a version to live, this is the version that end users will use.

The best practice is to always use 'latest' in your development environment, and use 'live' in your production environment.

Workflow Observability

ReByte records everything that happens during the execution of a workflow, including the input data, the output data, the reasoning steps, and the execution log. We call this information an Workflow Run. Workflow Run is crucial for debugging and improving the workflow. You can access this information in the workflow builder UI.

More details

Input and Output

There are two cases here:

Build a workflow to seamlessly integrate with ReByte's autopilot, your workflow needs to conform to a specific input/output format. Assistant will show specific UI elements based on the input/output format of the workflow, for example, if your workflow has a table output, autopilot will show the table in a tabular format.
Build a workflow and access via API, you can define your own input/output format

Here is the input/output format for ReByte's autopilot:


export const AssistantIOSchema: JSONSchemaType<ChatProtocolType> = {
  type: "object",
  properties: {
    role: { type: "string" },
    content: { type: "string" },
    parts: {
      type: "array",
      items: AttachmentItemSchema,
      nullable: true,
    },
  },
  required: ["role", "content"],
}

const AttachmentItemSchema: JSONSchemaType<AttachmentItem> = {
  type: "object",
  properties: {
    type: {
      type: "string",
      enum: ["file", "text", "image_url", "link", "table"],
    },
    text: { type: "string", nullable: true },
    file: {
      type: "object",
      nullable: true,
      properties: {
        id: { type: "string" },
        name: { type: "string", nullable: true },
      },
      required: ["id"],
    },
    image_url: {
      type: "object",
      nullable: true,
      properties: {
        url: { type: "string" },
        detail: { type: "string", enum: ["low", "high"], nullable: true },
      },
      required: ["url"],
    },
    link: {
      type: "object",
      nullable: true,
      properties: {
        title: { type: "string", nullable: true },
        url: { type: "string" },
        id: { type: "string", nullable: true },
      },
      required: ["url"],
    },
    table: {
      type: "object",
      nullable: true,
      properties: {
        name: { type: "string" },
        columns: { type: "array", items: { type: "string" } },
        data: {
          type: "array",
          items: {
            type: "object",
            properties: {
              value: { type: "array", items: { type: "string" } },
            },
            required: ["value"],
          },
        },
      },
      required: ["name", "columns", "data"],
    },
  },
  required: ["type"],
}

Refer Previous Action Output

Action runs in a sequence, the output of the previous action is just a normal JSON object, can be used as input for the next action. There are two ways to reference the previous action output:

In JavaScript code, use

env.state['action_name']

to reference the output of the previous action, named 'action_name'.

In prompt, use

to reference the output of the previous action named 'action_name'. please refer to https://keats.github.io/tera/docs/ for syntax details.

Action can output:

String
Array
Object