Agent

What is a Agent?

Agent is a way to capture prior knowledge and experience, and make it reusable by assistant.

let's consider an example, the agent is asked to read a manual and follow the instructions to process the data. We exactly know how to do that as follows:

Read the manual
Parse the manual
Follow the instructions
Process the data

We don't need any LLM planning capability to do that, we just need a sequence of actions to be executed. This is where the agent comes in, a agent is a sequence of actions that can be executed sequentially or in parallel, actions can be anything from reading a file, making an HTTP request, to executing an SQL query, make LLM call, etc.

In ReByte, we define Agent as a serverless API that can be executed on cloud, usually agents will leverage LLM models to perform some tasks to achieve its intelligence, but this is not required. A Agent without any AI model is just like a normal serverless API, but we will focus on AI tools in this document. Here are some typical examples of AI tools:

Based on user's query, find the most relevant information from the user's knowledge base, and summarize the result and return the summary to the user.
User describes a database query in natural language, agent will translate the query into SQL and execute the query on user's database to get results, then use LLM to generate a summary of the results and return to user.
Help user to do professional translation between two languages, user can describe the translation task in natural language, agent will not only do the translation but also evaluate the translation quality, if the quality is not good enough, agent will iterate the translation process until the quality is good enough.

How Agent and Table work together

There're two actions in agent builder that can be used to interact with table:

Load Virtual database schema: This action will load the schema of the virtual database that contains the table, and output Database markup language (DML) schema.

Query Virtual database: This action will query the virtual database with the generated SQL query, and output the result.

Between those two actions, you can use any LLM model to generate the SQL query, and use the SQL query to query the virtual database, and get the result. Thanks to the schema output by the first action, LLM can generate SQL query with high accuracy, and the result can be used to generate the next action's input.

ReByte Agent Builder

The rise of LLM has make it possible for end users to build their own AI agent just like editing our documents. Our agent builder is essentially a rich text document editor enhanced with runnable actions. Just like google docs editor, ReByte Agent builder supports real-time collaboration, you can invite your team members to edit the agent together, and see the changes in real-time.

Runnable Action

Agent is a piece of sequential actions that can be executed on the LLM serverless runtime. It is the core building block of ReByte, and the main way for end users to create their own tools. ReByte provides a GUI builder for end users to create/edit their own LLM tools. ReByte provides a list of pre-built actions for common use cases, also private SDK for software engineer to build their own actions, and seamlessly integrate with the agent builder. Pre-built actions include:
- LLM Actions
  - Language Model Interface
- Data Actions
  - Dataset Loader, load pre defined datasets for later processing
  - File Loader, extract/transform/load user's provided files
  - Table Query, translate user's natural language query into SQL and execute the query on user's database
  - Semantic Search, search for similar content over user's knowledge base
- Tools Actions
  - Search Engine, search for information on Google/Bing
  - Web Crawler, crawl web pages and extract information
  - Http Request Maker, make any http request to any public/private API
- Control flow Actions
  - Loop Until, run actions until a condition is met
  - Parallel, execute multiple actions in parallel
  - Vanilla Javascript, execute any vanilla javascript code, useful for doing pure data transformation

Dataset

Dataset is a collection of JSON data that actions can use, the most important dataset is the agent input test dataset, which is the data that used to run the agent everything you run agent from agent builder UI, think about input dataset as the test case for your agent, it should cover all possible scenarios that your agent will face in production.

Lifecycle of a Agent

LLM is naturally unpredictable, it's hard to predict what LLM will do in a specific scenario. The typical lifecycle of a agent is:

Define test dataset, this is the data that you will use to test your agent, it should cover all possible scenarios that your agent will face in production.
Design your agent. This is the process of creating a sequence of actions that LLM will execute.
Run to test your agent, this is the process of running your agent with the test dataset, and see if the result is as expected.
Loop previous steps until you are satisfied with the result.
Deploy your agent to production. This is the process of making your agent available to end users.

Action Chain

Input and Output

There are two cases here:

Build a agent to seamlessly integrate with ReByte's assistant, your agent needs to conform to a specific input/output format. Assistant will show specific UI elements based on the input/output format of the agent, for example, if your agent has a table output, assistant will show the table in a tabular format.
Build a agent and access via API, you can define your own input/output format

Here is the input/output format for ReByte's assistant:


export const AssistantIOSchema: JSONSchemaType<ChatProtocolType> = {
  type: "object",
  properties: {
    role: { type: "string" },
    content: { type: "string" },
    parts: {
      type: "array",
      items: AttachmentItemSchema,
      nullable: true,
    },
  },
  required: ["role", "content"],
}

const AttachmentItemSchema: JSONSchemaType<AttachmentItem> = {
  type: "object",
  properties: {
    type: {
      type: "string",
      enum: ["file", "text", "image_url", "link", "table"],
    },
    text: { type: "string", nullable: true },
    file: {
      type: "object",
      nullable: true,
      properties: {
        id: { type: "string" },
        name: { type: "string", nullable: true },
      },
      required: ["id"],
    },
    image_url: {
      type: "object",
      nullable: true,
      properties: {
        url: { type: "string" },
        detail: { type: "string", enum: ["low", "high"], nullable: true },
      },
      required: ["url"],
    },
    link: {
      type: "object",
      nullable: true,
      properties: {
        title: { type: "string", nullable: true },
        url: { type: "string" },
        id: { type: "string", nullable: true },
      },
      required: ["url"],
    },
    table: {
      type: "object",
      nullable: true,
      properties: {
        name: { type: "string" },
        columns: { type: "array", items: { type: "string" } },
        data: {
          type: "array",
          items: {
            type: "object",
            properties: {
              value: { type: "array", items: { type: "string" } },
            },
            required: ["value"],
          },
        },
      },
      required: ["name", "columns", "data"],
    },
  },
  required: ["type"],
}

Reference Previous Action Output

Action runs in a sequence, the output of the previous action is just a normal JSON object, can be used as input for the next action. There are two ways to reference the previous action output:

In JavaScript code, use

env.state['action_name']

to reference the output of the previous action, named 'action_name'.

In Jinja template, use

to reference the output of the previous action named 'action_name'.

Action can output:

String
Array
Object

Agent Version

Every Deployment of a agent triggers a new version, starting from 1. There are two special version strings: 'latest' always points to the newest version of the agent. 'Live': You can manually promote a version to live, this is the version that end users will use.

The best practice is to always use 'latest' in your development environment, and use 'live' in your production environment.

Agent Observability

ReByte records everything that happens during the execution of a agent, including the input data, the output data, the reasoning steps, and the execution log. We call this information a Agent Run. Agent Run is crucial for debugging and improving the agent. You can access this information in the agent builder UI.