Query Engine
We use Apache DataFusion as the query engine to execute the generated SQL queries. DataFusion is a high-performance, extensible query engine written in Rust that provides ANSI SQL support with a syntax mostly aligned with PostgresSQL. However, there are critical deviations from PostgreSQL that developers and LLMs must account for to avoid errors.
SQL Compatibility
While DataFusion aims for PostgresSQL compatibility, certain syntax elements and functions are unsupported. Below is a categorized list of notable gaps.
Unsupported Functions
String Functions
regexp_matches
(use~
operator instead)split_part
(usesplit
and array indexing)encode
/decode
(Base64, Hex)pg_catalog
functions (e.g.,pg_typeof
)
Date/Time Functions
date_trunc
with unitsmillennium
,century
, ormicroseconds
.timezone
(useAT TIME ZONE
syntax instead).age()
for calculating intervals between timestamps.
Aggregate Functions
- Statistical functions:
covar_pop
,corr
,regr_slope
. string_agg
(usearray_agg
witharray_to_string
).
Window Functions
RANGE
frames with offsets (onlyROWS
is supported).- Custom window frame exclusions (e.g.,
EXCLUDE CURRENT ROW
).
JSON Functions
- JSONB operators (
?
,?|
,@>
). json_array_elements_text
(usejson_extract
+ array functions).
Mathematical Functions
- Trigonometric functions using degrees (e.g.,
sind
,cosd
). - Factorial operator (
!
).
Data Types
- Network address types (
inet
,cidr
). - Geometric types (
point
,polygon
).
Miscellaneous
DISTINCT ON
(PostgresSQL extension).ILIKE
(useLIKE
with case-insensitive collation).WITH ORDINALITY
inLATERAL
joins.
Rebyte team maintains a knowledge base of unsupported functions so that workflows can inject unsupported functions into prompts. This ensures that the LLM generates SQL queries that DataFusion can execute.
Unsupported Functions Knowledge
If you create workflow from 'analyze_data' template, it already has everything set up for you.