Moltres vs SQLFrame Comparison
Executive Summary
This document compares Moltres and SQLFrame: two Python libraries that provide a DataFrame API with SQL pushdown execution. Both enable running transformation pipelines on database engines without Spark clusters. They differ in API style (drop-in PySpark vs native PySpark-style), supported backends, and feature focus (read/transform vs full CRUD and async).
Key Findings
API style: SQLFrame implements the PySpark DataFrame API as a drop-in: use
activate(engine="...")then standardpyspark.sqlimports; Moltres provides a native PySpark-style API (e.g.connect(),db.table().select()) and optional Pandas/Polars interfaces, with both camelCase and snake_case method names.Backends: SQLFrame supports BigQuery, Databricks, DuckDB, Postgres, Snowflake, Spark, GizmoSQL, Redshift (community), and a Standalone session for SQL-only generation; Moltres is SQLAlchemy-based (SQLite, PostgreSQL, MySQL, DuckDB, and any SQLAlchemy-supported database).
CRUD: Moltres provides INSERT (via
Records), UPDATE, and DELETE with DataFrame-style syntax; SQLFrame focuses on read and transform (no UPDATE/DELETE in its public API).Async: Moltres has full async/await support (
async_connect(),AsyncDatabase); SQLFrame does not advertise async support in its README or overview docs.SQL visibility: SQLFrame offers
df.sql(optimize=True)for human-readable SQL and optional OpenAI-based SQL enhancement; Moltres offersshow_sql(),.sql,plan_summary(),visualize_plan(),validate(), andperformance_hints().
Use Case Recommendations
Choose Moltres when:
You use traditional SQL databases (PostgreSQL, MySQL, SQLite) or any SQLAlchemy-backed engine.
You need UPDATE/DELETE or bulk INSERT with a DataFrame-like API.
You want full async/await for all operations.
You need framework integrations (FastAPI, Django, Streamlit, SQLModel) or transaction/batch support.
You prefer a single
pip install moltreswith optional extras.
Choose SQLFrame when:
You need first-class support for BigQuery, Snowflake, or Databricks.
You want to run existing PySpark code with minimal changes (drop-in
activate()+pyspark.sqlimports).You want a Standalone session to generate SQL without a database connection.
You prefer engine-specific installs (
pip install sqlframe[bigquery], etc.) and optional Spark dialect or engine-native dialect configuration.
1. Initialization and Connection
SQLFrame
from sqlframe import activate
# Option A: Replace PySpark – activate then use pyspark imports
activate(engine="duckdb")
from pyspark.sql import SparkSession
session = SparkSession.builder.getOrCreate()
# Option B: Engine-native session (no pyspark imports)
from sqlframe.duckdb import DuckDBSession
session = DuckDBSession.builder.getOrCreate()
Moltres
from moltres import connect, async_connect
# Synchronous connection
db = connect("sqlite:///example.db")
db = connect("postgresql://user:pass@localhost/db")
# Async connection
db = async_connect("postgresql+asyncpg://user:pass@localhost/db")
db = async_connect("sqlite+aiosqlite:///example.db")
Comparison
Aspect |
SQLFrame |
Moltres |
|---|---|---|
Entry point |
|
|
Activation |
|
No activation; direct |
Configuration |
Per-engine install and optional dialect config |
DSN / env vars; SQLAlchemy connection options |
Async support |
Not documented in overview |
Full async via |
2. DataFrame API and PySpark Compatibility
SQLFrame
SQLFrame implements the PySpark DataFrame API so that existing PySpark code can run with minimal changes. After activate(engine="..."), you use pyspark.sql (SparkSession, functions as F, Window). Method names follow PySpark’s camelCase (e.g. groupBy, orderBy, withColumn).
from sqlframe import activate
activate(engine="bigquery")
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql import Window
session = SparkSession.builder.getOrCreate()
df = (
session.table('"bigquery-public-data".samples.natality')
.where(F.col("ever_born") == 1)
.groupBy("year")
.agg(F.count("*").alias("num_single_child_families"))
.withColumn("last_year_num", F.lag(F.col("num_single_child_families"), 1).over(Window.orderBy("year")))
.limit(5)
)
Moltres
Moltres provides a native PySpark-style API (no pyspark dependency). It supports both camelCase and snake_case (e.g. group_by / groupBy, order_by / orderBy). Optional Pandas and Polars interfaces are available for result handling or style preference.
from moltres import col, connect
from moltres.expressions import functions as F
db = connect("sqlite:///example.db")
df = (
db.table("orders")
.select()
.join(db.table("customers").select(), on=[col("orders.customer_id") == col("customers.id")])
.where(col("active") == True)
.group_by("country")
.agg(F.sum(col("amount")).alias("total_amount"))
)
results = df.collect()
Comparison
Aspect |
SQLFrame |
Moltres |
|---|---|---|
PySpark compatibility |
Drop-in: use |
Native API with high PySpark method compatibility; no pyspark dependency |
Naming |
PySpark camelCase |
Both camelCase and snake_case |
Other APIs |
— |
Optional Pandas and Polars interfaces |
3. Supported Backends and Execution
SQLFrame
Engines: BigQuery, Databricks, DuckDB, Postgres, Snowflake, Spark, GizmoSQL (community), Redshift (community).
Standalone: A session that generates SQL only, with no database connection.
Install: Per-engine extras, e.g.
pip install sqlframe[bigquery],sqlframe[postgres],sqlframe[duckdb].
Moltres
Backends: SQLAlchemy-based; officially documented for SQLite, PostgreSQL, MySQL, DuckDB. Any SQLAlchemy-supported database works (e.g. BigQuery/Snowflake via community drivers, but not first-class in docs).
No standalone mode: Execution is always tied to a
Databaseconnection (sync or async).Install: Single
pip install moltres; optional extras for async drivers, Pandas, Polars, integrations.
Comparison
Aspect |
SQLFrame |
Moltres |
|---|---|---|
First-class engines |
BigQuery, Databricks, DuckDB, Postgres, Snowflake, Spark, etc. |
SQLite, PostgreSQL, MySQL, DuckDB (SQLAlchemy) |
Other backends |
GizmoSQL, Redshift (community) |
Any SQLAlchemy-supported DB (e.g. BigQuery/Snowflake via drivers) |
SQL-only (no DB) |
Yes (Standalone session) |
No |
Execution model |
Per-engine; SQL generated for target engine |
Single stack: logical plan → SQL → SQLAlchemy execution |
4. CRUD and Mutations
SQLFrame
SQLFrame is oriented around reading and transforming data (e.g. session.table(), read.csv, transformations, df.write.saveAsTable()). Its README and overview docs do not describe UPDATE or DELETE operations on existing rows.
Moltres
Moltres provides full SQL CRUD with a DataFrame-like and database API:
from moltres import col, connect
from moltres.io.records import Records
db = connect("sqlite:///example.db")
# Insert rows
Records.from_list([
{"id": 1, "name": "Alice", "email": "alice@example.com"},
{"id": 2, "name": "Bob", "email": "bob@example.com"},
], database=db).insert_into("users")
# Update rows
db.update("users", where=col("active") == 0, set={"active": 1})
# Delete rows
db.delete("users", where=col("email").is_null())
Comparison
Aspect |
SQLFrame |
Moltres |
|---|---|---|
INSERT |
Via write path (e.g. |
|
UPDATE |
Not in overview/docs |
|
DELETE |
Not in overview/docs |
|
5. Async Support
SQLFrame
SQLFrame’s README and stable docs do not describe async/await or an async session API.
Moltres
Moltres supports async end-to-end: async_connect(), AsyncDatabase, async table and DataFrame operations (e.g. await df.collect(), await db.table(...)), and async file readers/writers where applicable.
import asyncio
from moltres import async_connect
async def main():
async with async_connect("sqlite+aiosqlite:///example.db") as db:
table_handle = await db.table("users")
df = table_handle.select()
results = await df.collect()
asyncio.run(main())
Comparison
Aspect |
SQLFrame |
Moltres |
|---|---|---|
Async API |
Not documented in overview |
Full async: |
Install |
— |
e.g. |
6. SQL Visibility and Tooling
SQLFrame
Generated SQL:
df.sql()returns the compiled SQL;df.sql(optimize=True)produces more human-readable SQL.Configuration: Docs describe generated SQL configuration and optional use of OpenAI to enhance or explain SQL.
Dialect: Default is Spark dialect; input/output dialect can be set to match the engine (see Input and Output Dialect Configuration).
Moltres
SQL display:
df.show_sql(),df.sqlproperty.Plan and validation:
plan_summary(),visualize_plan(),validate(),performance_hints().Dialect: ANSI plus dialect-specific handling via SQLAlchemy (see
src/moltres/engine/dialects.py).
Comparison
Aspect |
SQLFrame |
Moltres |
|---|---|---|
View SQL |
|
|
Readable SQL |
|
Plan summary and visualization |
Validation / hints |
— |
|
Dialect |
Configurable Spark vs engine-native |
SQLAlchemy-based dialect layer |
7. Installation and Optional Dependencies
SQLFrame
# Per-engine (required for that backend)
pip install "sqlframe[bigquery]"
pip install "sqlframe[postgres]"
pip install "sqlframe[duckdb]"
pip install "sqlframe[snowflake]"
pip install "sqlframe[spark]"
# Standalone (no DB)
pip install sqlframe
# Conda
conda install -c conda-forge sqlframe
Moltres
# Core (SQLite, Postgres, MySQL, DuckDB via SQLAlchemy)
pip install moltres
# Optional
pip install moltres[async-postgresql] # Async PostgreSQL
pip install moltres[pandas,polars] # Pandas/Polars result formats
pip install moltres[sqlmodel] # SQLModel/Pydantic
pip install moltres[streamlit] # Streamlit
Comparison
Aspect |
SQLFrame |
Moltres |
|---|---|---|
Default install |
Engine-specific or standalone |
Single package; core supports multiple DBs via SQLAlchemy |
Extras |
One extra per engine (bigquery, postgres, duckdb, etc.) |
Optional async, pandas, polars, sqlmodel, streamlit, etc. |
Conda |
conda-forge |
Not mentioned in README |
8. Summary Table
Feature |
SQLFrame |
Moltres |
|---|---|---|
API style |
Drop-in PySpark ( |
Native PySpark-style + optional Pandas/Polars |
Entry point |
SparkSession or engine-native session |
|
First-class backends |
BigQuery, Databricks, DuckDB, Postgres, Snowflake, Spark, Redshift, GizmoSQL |
SQLite, PostgreSQL, MySQL, DuckDB (SQLAlchemy) |
Standalone SQL |
Yes |
No |
INSERT |
Via write path |
|
UPDATE / DELETE |
Not in public API |
|
Async |
Not documented |
Full async support |
SQL display |
|
|
Validation / hints |
— |
|
Install |
Per-engine or standalone |
Single package + optional extras |
Dialect |
Spark or engine-native (configurable) |
ANSI + SQLAlchemy dialects |
References
SQLFrame: GitHub, Documentation
Moltres: GitHub, Documentation