What is ReByte?

ReByte is an antomous agent on your own data, it can:

Understand semantics of your data, achieve super high accuracy from natural language to SQL
Split complex tasks into smaller tasks and dispatch them to most suitable expert agents to get the task done.
Write and execute code to automate data related tasks.

ReByte's goal is to democratize data related work within your team, especially for tasks that previously required hundreds to a few thousands of lines of code. We believe that everyone in your team should be able to build user-centric data apps to automate their daily tasks.

Some examples of tasks that ReByte can help with include:

OLAP tasks that require complex SQL queries and data visualization
Tasks that typically require hundreds to a few thousands of lines of code, could be Python, JavaScript, SQL, Bash, etc
Any combination of the above two categories

Data sources could be public data, or internal data owned by your company, such as CSV, Excel, Postgres, MySQL, Parquet, Iceberg, Snowflake, Databricks, S3, BigQuery, etc.

The need for Enterprise Data Agents

At Rebyte, we believe AI agents will soon become an essential component of the enterprise workforce, enhancing productivity across diverse areas such as customer support, field services, analytics, engineering, and more. These agents will allow employees to redirect their focus from routine tasks to higher-value business challenges. Data agents, a key subset of AI agents, integrate data with advanced tools to deliver precise, data-driven insights by selecting the most appropriate data sources and tools for retrieval.

For AI agents to be effective at scale, they need a secure connection to enterprise data and a unified governance framework to regulate their access—much like the controls already in place for your teams. These agents must comply with data policies, access a variety of data sources efficiently, and extract accurate information to produce dependable, high-value results.

However, we are aware that the potential of this agent-driven future comes with a set of significant challenges. As the quality of models improves and inference costs decrease, companies looking to deploy reliable agent systems at scale still face a common set of obstacles:

Accuracy: There's a high standard for the output quality of AI agents in enterprise applications, and even small errors can have significant consequences, particularly in business-critical functions like finance or engineering.

Trust and security: As businesses develop more data-centric AI applications, maintaining security and complying with governance policies becomes increasingly complex.

Governed data access: AI agents need access to a wide array of data sources to work effectively within business contexts. This includes both unstructured data (e.g., text, audio) and structured data (e.g., tables, views), often stored across multiple systems.

The foundation of scaling agent-driven workflows that utilize data lies in ensuring smooth interaction between models and data, while preserving accuracy, trust, and compliance. For instance, a financial analyst might need to combine structured revenue data with unstructured financial reports and market insights. These use cases require secure, governed access to data and the ability to surface relevant information to AI, with robust end-to-end governance.

To address these challenges, we are excited to unveil ReByte Platform, a fully managed service designed to simplify the integration, retrieval, and processing of both structured and unstructured data—empowering Rebyte customers to build high-quality agents at scale.

ReByte accomplishes this goal by providing three main components:

Autonomous Data Agent: Agentic User Experience for all team members

ReByte Agent is an autonomous entity that end users can interact with.

Provides rich agent user experiences, allowing users to interact with it
Orchestrates across structured and unstructured data sources, breaking down complex queries, retrieving relevant data, and generating precise answers
Planning and reasoning and recover from failures.
Data Visualization: Generate visualizations automatically, without being limited to the types provided by traditional BI tools.
Interact with secure sandbox environment, write and execute code on the fly.
Fully asynchronously, can run in the background

Semantic Table: Let LLM understand your structured data

A translation layer from data source to LLM friendly data unit. ReByte table is an abstraction of a two-dimension table that can be pulled from various sources, such as CSV, Excel, Postgres, MySQL, Parquet, Snowflake, S3, BigQuery, etc. These data units are enriched with metadata, such as column type, column name, column description, possible values, max/min values, whether it's a primary key, whether it's a NULL value, etc. With this metadata, LLM can better understand the data, thus achieving super high accuracy from natural language to SQL. Table can be:

Federated: all queries are directly routed to the source, we don't store any data
Fusion table: user can define how data comes from different sources, and how to merge them. ReByte will store the fusion table, and future queries doesn't need to go to the source again.

Expert Workflow: "Mixure of expert" moment for Agent

Each enterprise has its own unique business processes, and ReByte workflow allows you to capture those knowledge and build a workflow that can be seamlessly integrated into the agent. Agent is designed to seamlessly integrate with a variety of expert workflows, each created by professionals who specialize in specific areas within an organization. These workflows are tailored to the unique and specialized processes that are not part of general knowledge. To facilitate this, we provide a builder environment where users can create these custom workflows. Once built, the agent is capable of automatically invoking these workflows whenever needed, streamlining complex tasks and ensuring efficient execution without requiring manual intervention. This system empowers organizations to leverage expert-level automation for their specialized processes.

Basically workflow is an orchestration of tools and LLM in a sequence to achieve a specific goal. Workflow has several building blocks:

LLM: builder can use any LLM supported by ReByte
Retriever: retrieve data from database or vector engine
Memory: each workflow run comes with a persistent memory, which can be used to store context information. Primarily used to store conversation history
Ability to run few lines of code: usually used to do data transformation
Common Tools: http requester, search engine, etc.

Besides, ReByte offers the following features:

State-of-the-Art Analytics Performance: ReByte can ingest millions of rows of data in just a few seconds. It handles complex queries, including joins, group by operations, window functions, and more, with ease.

Model Agnostic: In the Agent Builder, you can use any large language model, such as OpenAI, Gemini, Anthropic, Deepseek, or any other open-source models provided by ReByte. You’re not limited to a specific model.

Team Collaboration: Building high-quality AI agents is a team effort. ReByte’s Agent Builder is a web-based tool that enables team members to collaborate in real-time, just like editing a document in Google Docs.

Observability: ReByte's goal is to make AI agents transparent, not black-box systems. We’ve built a monitoring system that allows you to track how your agents are performing and how your data is being used.

Secure Sandbox: Each agent operates in its own secure sandbox, so you never have to worry about data leaks between agents.

Access Control: ReByte includes a role-based access control system, allowing you to manage who can access specific data, who can build agents, and who can use them.

API: Agents built in ReByte can be accessed via an API, making it easy to integrate ReByte with your existing systems.

Typical Use Cases

We expect ReByte can be useful in the following use cases:

New Kinds of BI Tools LLMs have revolutionized data visualization. With ReByte, you can generate visualizations automatically, without being limited to the types provided by traditional BI tools. You can create any kind of visualization you need, tailored to your data.
Internal Data Exploration and Analysis Data owners can create a virtual database with proper access control and share it with other team members. Everyone can then run queries independently on the shared database. For example, if you have sales data across Google, Excel, local CSV files, and databases, you can use ReByte Table to combine and explore your data in flexible ways, helping you identify the best sales channel for your company.
General Repetitive Tasks Repetitive tasks such as data cleaning, data transformation, file processing, and statistical analysis, which were previously handled by programmers, can now be done by anyone in your team with ease.

Typically, tasks that would require hundreds or thousands of lines of code can be fully automated by an agent. We're actively working to extend ReByte's capabilities to cover more use cases.