Moltres vs SQLFrame Comparison

Executive Summary

This document compares Moltres and SQLFrame: two Python libraries that provide a DataFrame API with SQL pushdown execution. Both enable running transformation pipelines on database engines without Spark clusters. They differ in API style (drop-in PySpark vs native PySpark-style), supported backends, and feature focus (read/transform vs full CRUD and async).

Key Findings

API style: SQLFrame implements the PySpark DataFrame API as a drop-in: use activate(engine="...") then standard pyspark.sql imports; Moltres provides a native PySpark-style API (e.g. connect(), db.table().select()) and optional Pandas/Polars interfaces, with both camelCase and snake_case method names.
Backends: SQLFrame supports BigQuery, Databricks, DuckDB, Postgres, Snowflake, Spark, GizmoSQL, Redshift (community), and a Standalone session for SQL-only generation; Moltres is SQLAlchemy-based (SQLite, PostgreSQL, MySQL, DuckDB, and any SQLAlchemy-supported database).
CRUD: Moltres provides INSERT (via Records), UPDATE, and DELETE with DataFrame-style syntax; SQLFrame focuses on read and transform (no UPDATE/DELETE in its public API).
Async: Moltres has full async/await support (async_connect(), AsyncDatabase); SQLFrame does not advertise async support in its README or overview docs.
SQL visibility: SQLFrame offers df.sql(optimize=True) for human-readable SQL and optional OpenAI-based SQL enhancement; Moltres offers show_sql(), .sql, plan_summary(), visualize_plan(), validate(), and performance_hints().

Use Case Recommendations

Choose Moltres when:
- You use traditional SQL databases (PostgreSQL, MySQL, SQLite) or any SQLAlchemy-backed engine.
- You need UPDATE/DELETE or bulk INSERT with a DataFrame-like API.
- You want full async/await for all operations.
- You need framework integrations (FastAPI, Django, Streamlit, SQLModel) or transaction/batch support.
- You prefer a single pip install moltres with optional extras.
Choose SQLFrame when:
- You need first-class support for BigQuery, Snowflake, or Databricks.
- You want to run existing PySpark code with minimal changes (drop-in activate() + pyspark.sql imports).
- You want a Standalone session to generate SQL without a database connection.
- You prefer engine-specific installs (pip install sqlframe[bigquery], etc.) and optional Spark dialect or engine-native dialect configuration.

1. Initialization and Connection

SQLFrame

from sqlframe import activate

# Option A: Replace PySpark – activate then use pyspark imports
activate(engine="duckdb")
from pyspark.sql import SparkSession
session = SparkSession.builder.getOrCreate()

# Option B: Engine-native session (no pyspark imports)
from sqlframe.duckdb import DuckDBSession
session = DuckDBSession.builder.getOrCreate()

Moltres

from moltres import connect, async_connect

# Synchronous connection
db = connect("sqlite:///example.db")
db = connect("postgresql://user:pass@localhost/db")

# Async connection
db = async_connect("postgresql+asyncpg://user:pass@localhost/db")
db = async_connect("sqlite+aiosqlite:///example.db")

Comparison

Aspect	SQLFrame	Moltres
Entry point	`SparkSession` or engine-native session (e.g. `DuckDBSession`)	`Database` from `connect()` or `async_connect()`
Activation	`activate(engine="...")` for drop-in PySpark	No activation; direct `connect(dsn)`
Configuration	Per-engine install and optional dialect config	DSN / env vars; SQLAlchemy connection options
Async support	Not documented in overview	Full async via `async_connect()` and `AsyncDatabase`

2. DataFrame API and PySpark Compatibility

SQLFrame

SQLFrame implements the PySpark DataFrame API so that existing PySpark code can run with minimal changes. After activate(engine="..."), you use pyspark.sql (SparkSession, functions as F, Window). Method names follow PySpark’s camelCase (e.g. groupBy, orderBy, withColumn).

from sqlframe import activate
activate(engine="bigquery")
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql import Window

session = SparkSession.builder.getOrCreate()
df = (
    session.table('"bigquery-public-data".samples.natality')
    .where(F.col("ever_born") == 1)
    .groupBy("year")
    .agg(F.count("*").alias("num_single_child_families"))
    .withColumn("last_year_num", F.lag(F.col("num_single_child_families"), 1).over(Window.orderBy("year")))
    .limit(5)
)

Moltres

Moltres provides a native PySpark-style API (no pyspark dependency). It supports both camelCase and snake_case (e.g. group_by / groupBy, order_by / orderBy). Optional Pandas and Polars interfaces are available for result handling or style preference.

from moltres import col, connect
from moltres.expressions import functions as F

db = connect("sqlite:///example.db")
df = (
    db.table("orders")
    .select()
    .join(db.table("customers").select(), on=[col("orders.customer_id") == col("customers.id")])
    .where(col("active") == True)
    .group_by("country")
    .agg(F.sum(col("amount")).alias("total_amount"))
)
results = df.collect()

Comparison

Aspect	SQLFrame	Moltres
PySpark compatibility	Drop-in: use `pyspark.sql` after `activate()`	Native API with high PySpark method compatibility; no pyspark dependency
Naming	PySpark camelCase	Both camelCase and snake_case
Other APIs	—	Optional Pandas and Polars interfaces

3. Supported Backends and Execution

SQLFrame

Engines: BigQuery, Databricks, DuckDB, Postgres, Snowflake, Spark, GizmoSQL (community), Redshift (community).
Standalone: A session that generates SQL only, with no database connection.
Install: Per-engine extras, e.g. pip install sqlframe[bigquery], sqlframe[postgres], sqlframe[duckdb].

Moltres

Backends: SQLAlchemy-based; officially documented for SQLite, PostgreSQL, MySQL, DuckDB. Any SQLAlchemy-supported database works (e.g. BigQuery/Snowflake via community drivers, but not first-class in docs).
No standalone mode: Execution is always tied to a Database connection (sync or async).
Install: Single pip install moltres; optional extras for async drivers, Pandas, Polars, integrations.

Comparison

Aspect	SQLFrame	Moltres
First-class engines	BigQuery, Databricks, DuckDB, Postgres, Snowflake, Spark, etc.	SQLite, PostgreSQL, MySQL, DuckDB (SQLAlchemy)
Other backends	GizmoSQL, Redshift (community)	Any SQLAlchemy-supported DB (e.g. BigQuery/Snowflake via drivers)
SQL-only (no DB)	Yes (Standalone session)	No
Execution model	Per-engine; SQL generated for target engine	Single stack: logical plan → SQL → SQLAlchemy execution

4. CRUD and Mutations

SQLFrame

SQLFrame is oriented around reading and transforming data (e.g. session.table(), read.csv, transformations, df.write.saveAsTable()). Its README and overview docs do not describe UPDATE or DELETE operations on existing rows.

Moltres

Moltres provides full SQL CRUD with a DataFrame-like and database API:

from moltres import col, connect
from moltres.io.records import Records

db = connect("sqlite:///example.db")

# Insert rows
Records.from_list([
    {"id": 1, "name": "Alice", "email": "alice@example.com"},
    {"id": 2, "name": "Bob", "email": "bob@example.com"},
], database=db).insert_into("users")

# Update rows
db.update("users", where=col("active") == 0, set={"active": 1})

# Delete rows
db.delete("users", where=col("email").is_null())

Comparison

Aspect	SQLFrame	Moltres
INSERT	Via write path (e.g. `saveAsTable`)	`Records.from_list(...).insert_into(table)`
UPDATE	Not in overview/docs	`db.update(table, where=..., set=...)`
DELETE	Not in overview/docs	`db.delete(table, where=...)`

5. Async Support

SQLFrame

SQLFrame’s README and stable docs do not describe async/await or an async session API.

Moltres

Moltres supports async end-to-end: async_connect(), AsyncDatabase, async table and DataFrame operations (e.g. await df.collect(), await db.table(...)), and async file readers/writers where applicable.

import asyncio
from moltres import async_connect

async def main():
    async with async_connect("sqlite+aiosqlite:///example.db") as db:
        table_handle = await db.table("users")
        df = table_handle.select()
        results = await df.collect()

asyncio.run(main())

Comparison

Aspect	SQLFrame	Moltres
Async API	Not documented in overview	Full async: `async_connect()`, `AsyncDatabase`, async collect/table ops
Install	—	e.g. `pip install moltres[async-postgresql]` or `[async-sqlite]`

6. SQL Visibility and Tooling

SQLFrame

Generated SQL: df.sql() returns the compiled SQL; df.sql(optimize=True) produces more human-readable SQL.
Configuration: Docs describe generated SQL configuration and optional use of OpenAI to enhance or explain SQL.
Dialect: Default is Spark dialect; input/output dialect can be set to match the engine (see Input and Output Dialect Configuration).

Moltres

SQL display: df.show_sql(), df.sql property.
Plan and validation: plan_summary(), visualize_plan(), validate(), performance_hints().
Dialect: ANSI plus dialect-specific handling via SQLAlchemy (see src/moltres/engine/dialects.py).

Comparison

Aspect	SQLFrame	Moltres
View SQL	`df.sql()`, `df.sql(optimize=True)`	`df.show_sql()`, `df.sql`
Readable SQL	`optimize=True`; optional OpenAI	Plan summary and visualization
Validation / hints	—	`validate()`, `performance_hints()`
Dialect	Configurable Spark vs engine-native	SQLAlchemy-based dialect layer

7. Installation and Optional Dependencies

SQLFrame

# Per-engine (required for that backend)
pip install "sqlframe[bigquery]"
pip install "sqlframe[postgres]"
pip install "sqlframe[duckdb]"
pip install "sqlframe[snowflake]"
pip install "sqlframe[spark]"
# Standalone (no DB)
pip install sqlframe
# Conda
conda install -c conda-forge sqlframe

Moltres

# Core (SQLite, Postgres, MySQL, DuckDB via SQLAlchemy)
pip install moltres

# Optional
pip install moltres[async-postgresql]   # Async PostgreSQL
pip install moltres[pandas,polars]      # Pandas/Polars result formats
pip install moltres[sqlmodel]            # SQLModel/Pydantic
pip install moltres[streamlit]          # Streamlit

Comparison

Aspect	SQLFrame	Moltres
Default install	Engine-specific or standalone	Single package; core supports multiple DBs via SQLAlchemy
Extras	One extra per engine (bigquery, postgres, duckdb, etc.)	Optional async, pandas, polars, sqlmodel, streamlit, etc.
Conda	conda-forge	Not mentioned in README

8. Summary Table

Feature	SQLFrame	Moltres
API style	Drop-in PySpark (`activate()` + pyspark imports)	Native PySpark-style + optional Pandas/Polars
Entry point	SparkSession or engine-native session	`connect()` / `async_connect()` → Database
First-class backends	BigQuery, Databricks, DuckDB, Postgres, Snowflake, Spark, Redshift, GizmoSQL	SQLite, PostgreSQL, MySQL, DuckDB (SQLAlchemy)
Standalone SQL	Yes	No
INSERT	Via write path	`Records.insert_into()`
UPDATE / DELETE	Not in public API	`db.update()`, `db.delete()`
Async	Not documented	Full async support
SQL display	`df.sql(optimize=True)`	`show_sql()`, `.sql`, `plan_summary()`, `visualize_plan()`
Validation / hints	—	`validate()`, `performance_hints()`
Install	Per-engine or standalone	Single package + optional extras
Dialect	Spark or engine-native (configurable)	ANSI + SQLAlchemy dialects

References

SQLFrame: GitHub, Documentation
Moltres: GitHub, Documentation