Query Engine

We use Apache DataFusion as the query engine to execute the generated SQL queries. DataFusion is a high-performance, extensible query engine written in Rust that provides ANSI SQL support with a syntax mostly aligned with PostgresSQL. However, there are critical deviations from PostgreSQL that developers and LLMs must account for to avoid errors.

SQL Compatibility

While DataFusion aims for PostgresSQL compatibility, certain syntax elements and functions are unsupported. Below is a categorized list of notable gaps.

Unsupported Functions

String Functions

  • regexp_matches (use ~ operator instead)
  • split_part (use split and array indexing)
  • encode/decode (Base64, Hex)
  • pg_catalog functions (e.g., pg_typeof)

Date/Time Functions

  • date_trunc with units millennium, century, or microseconds.
  • timezone (use AT TIME ZONE syntax instead).
  • age() for calculating intervals between timestamps.

Aggregate Functions

  • Statistical functions: covar_pop, corr, regr_slope.
  • string_agg (use array_agg with array_to_string).

Window Functions

  • RANGE frames with offsets (only ROWS is supported).
  • Custom window frame exclusions (e.g., EXCLUDE CURRENT ROW).

JSON Functions

  • JSONB operators (?, ?|, @>).
  • json_array_elements_text (use json_extract + array functions).

Mathematical Functions

  • Trigonometric functions using degrees (e.g., sind, cosd).
  • Factorial operator (!).

Data Types

  • Network address types (inet, cidr).
  • Geometric types (point, polygon).

Miscellaneous

  • DISTINCT ON (PostgresSQL extension).
  • ILIKE (use LIKE with case-insensitive collation).
  • WITH ORDINALITY in LATERAL joins.

Rebyte team maintains a knowledge base of unsupported functions so that workflows can inject unsupported functions into prompts. This ensures that the LLM generates SQL queries that DataFusion can execute.

Unsupported Functions Knowledge

If you create workflow from 'analyze_data' template, it already has everything set up for you.