# Moltres vs SQLFrame Comparison ## Executive Summary This document compares [Moltres](https://github.com/eddiethedean/moltres) and [SQLFrame](https://github.com/eakmanrq/sqlframe): two Python libraries that provide a DataFrame API with SQL pushdown execution. Both enable running transformation pipelines on database engines without Spark clusters. They differ in API style (drop-in PySpark vs native PySpark-style), supported backends, and feature focus (read/transform vs full CRUD and async). ### Key Findings - **API style**: SQLFrame implements the PySpark DataFrame API as a drop-in: use `activate(engine="...")` then standard `pyspark.sql` imports; Moltres provides a native PySpark-style API (e.g. `connect()`, `db.table().select()`) and optional Pandas/Polars interfaces, with both camelCase and snake_case method names. - **Backends**: SQLFrame supports BigQuery, Databricks, DuckDB, Postgres, Snowflake, Spark, GizmoSQL, Redshift (community), and a Standalone session for SQL-only generation; Moltres is SQLAlchemy-based (SQLite, PostgreSQL, MySQL, DuckDB, and any SQLAlchemy-supported database). - **CRUD**: Moltres provides INSERT (via `Records`), UPDATE, and DELETE with DataFrame-style syntax; SQLFrame focuses on read and transform (no UPDATE/DELETE in its public API). - **Async**: Moltres has full async/await support (`async_connect()`, `AsyncDatabase`); SQLFrame does not advertise async support in its README or overview docs. - **SQL visibility**: SQLFrame offers `df.sql(optimize=True)` for human-readable SQL and optional OpenAI-based SQL enhancement; Moltres offers `show_sql()`, `.sql`, `plan_summary()`, `visualize_plan()`, `validate()`, and `performance_hints()`. ### Use Case Recommendations - **Choose Moltres** when: - You use traditional SQL databases (PostgreSQL, MySQL, SQLite) or any SQLAlchemy-backed engine. - You need UPDATE/DELETE or bulk INSERT with a DataFrame-like API. - You want full async/await for all operations. - You need framework integrations (FastAPI, Django, Streamlit, SQLModel) or transaction/batch support. - You prefer a single `pip install moltres` with optional extras. - **Choose SQLFrame** when: - You need first-class support for BigQuery, Snowflake, or Databricks. - You want to run existing PySpark code with minimal changes (drop-in `activate()` + `pyspark.sql` imports). - You want a Standalone session to generate SQL without a database connection. - You prefer engine-specific installs (`pip install sqlframe[bigquery]`, etc.) and optional Spark dialect or engine-native dialect configuration. --- ## 1. Initialization and Connection ### SQLFrame ```python from sqlframe import activate # Option A: Replace PySpark – activate then use pyspark imports activate(engine="duckdb") from pyspark.sql import SparkSession session = SparkSession.builder.getOrCreate() # Option B: Engine-native session (no pyspark imports) from sqlframe.duckdb import DuckDBSession session = DuckDBSession.builder.getOrCreate() ``` ### Moltres ```python from moltres import connect, async_connect # Synchronous connection db = connect("sqlite:///example.db") db = connect("postgresql://user:pass@localhost/db") # Async connection db = async_connect("postgresql+asyncpg://user:pass@localhost/db") db = async_connect("sqlite+aiosqlite:///example.db") ``` ### Comparison | Aspect | SQLFrame | Moltres | |--------|----------|---------| | **Entry point** | `SparkSession` or engine-native session (e.g. `DuckDBSession`) | `Database` from `connect()` or `async_connect()` | | **Activation** | `activate(engine="...")` for drop-in PySpark | No activation; direct `connect(dsn)` | | **Configuration** | Per-engine install and optional dialect config | DSN / env vars; SQLAlchemy connection options | | **Async support** | Not documented in overview | Full async via `async_connect()` and `AsyncDatabase` | --- ## 2. DataFrame API and PySpark Compatibility ### SQLFrame SQLFrame implements the PySpark DataFrame API so that existing PySpark code can run with minimal changes. After `activate(engine="...")`, you use `pyspark.sql` (SparkSession, functions as F, Window). Method names follow PySpark’s camelCase (e.g. `groupBy`, `orderBy`, `withColumn`). ```python from sqlframe import activate activate(engine="bigquery") from pyspark.sql import SparkSession from pyspark.sql import functions as F from pyspark.sql import Window session = SparkSession.builder.getOrCreate() df = ( session.table('"bigquery-public-data".samples.natality') .where(F.col("ever_born") == 1) .groupBy("year") .agg(F.count("*").alias("num_single_child_families")) .withColumn("last_year_num", F.lag(F.col("num_single_child_families"), 1).over(Window.orderBy("year"))) .limit(5) ) ``` ### Moltres Moltres provides a native PySpark-style API (no pyspark dependency). It supports both camelCase and snake_case (e.g. `group_by` / `groupBy`, `order_by` / `orderBy`). Optional Pandas and Polars interfaces are available for result handling or style preference. ```python from moltres import col, connect from moltres.expressions import functions as F db = connect("sqlite:///example.db") df = ( db.table("orders") .select() .join(db.table("customers").select(), on=[col("orders.customer_id") == col("customers.id")]) .where(col("active") == True) .group_by("country") .agg(F.sum(col("amount")).alias("total_amount")) ) results = df.collect() ``` ### Comparison | Aspect | SQLFrame | Moltres | |--------|----------|---------| | **PySpark compatibility** | Drop-in: use `pyspark.sql` after `activate()` | Native API with high PySpark method compatibility; no pyspark dependency | | **Naming** | PySpark camelCase | Both camelCase and snake_case | | **Other APIs** | — | Optional Pandas and Polars interfaces | --- ## 3. Supported Backends and Execution ### SQLFrame - **Engines**: BigQuery, Databricks, DuckDB, Postgres, Snowflake, Spark, GizmoSQL (community), Redshift (community). - **Standalone**: A session that generates SQL only, with no database connection. - **Install**: Per-engine extras, e.g. `pip install sqlframe[bigquery]`, `sqlframe[postgres]`, `sqlframe[duckdb]`. ### Moltres - **Backends**: SQLAlchemy-based; officially documented for SQLite, PostgreSQL, MySQL, DuckDB. Any SQLAlchemy-supported database works (e.g. BigQuery/Snowflake via community drivers, but not first-class in docs). - **No standalone mode**: Execution is always tied to a `Database` connection (sync or async). - **Install**: Single `pip install moltres`; optional extras for async drivers, Pandas, Polars, integrations. ### Comparison | Aspect | SQLFrame | Moltres | |--------|----------|---------| | **First-class engines** | BigQuery, Databricks, DuckDB, Postgres, Snowflake, Spark, etc. | SQLite, PostgreSQL, MySQL, DuckDB (SQLAlchemy) | | **Other backends** | GizmoSQL, Redshift (community) | Any SQLAlchemy-supported DB (e.g. BigQuery/Snowflake via drivers) | | **SQL-only (no DB)** | Yes (Standalone session) | No | | **Execution model** | Per-engine; SQL generated for target engine | Single stack: logical plan → SQL → SQLAlchemy execution | --- ## 4. CRUD and Mutations ### SQLFrame SQLFrame is oriented around reading and transforming data (e.g. `session.table()`, `read.csv`, transformations, `df.write.saveAsTable()`). Its README and overview docs do not describe UPDATE or DELETE operations on existing rows. ### Moltres Moltres provides full SQL CRUD with a DataFrame-like and database API: ```python from moltres import col, connect from moltres.io.records import Records db = connect("sqlite:///example.db") # Insert rows Records.from_list([ {"id": 1, "name": "Alice", "email": "alice@example.com"}, {"id": 2, "name": "Bob", "email": "bob@example.com"}, ], database=db).insert_into("users") # Update rows db.update("users", where=col("active") == 0, set={"active": 1}) # Delete rows db.delete("users", where=col("email").is_null()) ``` ### Comparison | Aspect | SQLFrame | Moltres | |--------|----------|---------| | **INSERT** | Via write path (e.g. `saveAsTable`) | `Records.from_list(...).insert_into(table)` | | **UPDATE** | Not in overview/docs | `db.update(table, where=..., set=...)` | | **DELETE** | Not in overview/docs | `db.delete(table, where=...)` | --- ## 5. Async Support ### SQLFrame SQLFrame’s README and stable docs do not describe async/await or an async session API. ### Moltres Moltres supports async end-to-end: `async_connect()`, `AsyncDatabase`, async table and DataFrame operations (e.g. `await df.collect()`, `await db.table(...)`), and async file readers/writers where applicable. ```python import asyncio from moltres import async_connect async def main(): async with async_connect("sqlite+aiosqlite:///example.db") as db: table_handle = await db.table("users") df = table_handle.select() results = await df.collect() asyncio.run(main()) ``` ### Comparison | Aspect | SQLFrame | Moltres | |--------|----------|---------| | **Async API** | Not documented in overview | Full async: `async_connect()`, `AsyncDatabase`, async collect/table ops | | **Install** | — | e.g. `pip install moltres[async-postgresql]` or `[async-sqlite]` | --- ## 6. SQL Visibility and Tooling ### SQLFrame - **Generated SQL**: `df.sql()` returns the compiled SQL; `df.sql(optimize=True)` produces more human-readable SQL. - **Configuration**: Docs describe generated SQL configuration and optional use of OpenAI to enhance or explain SQL. - **Dialect**: Default is Spark dialect; input/output dialect can be set to match the engine (see [Input and Output Dialect Configuration](https://sqlframe.readthedocs.io/en/stable/configuration/#input-and-output-dialect)). ### Moltres - **SQL display**: `df.show_sql()`, `df.sql` property. - **Plan and validation**: `plan_summary()`, `visualize_plan()`, `validate()`, `performance_hints()`. - **Dialect**: ANSI plus dialect-specific handling via SQLAlchemy (see `src/moltres/engine/dialects.py`). ### Comparison | Aspect | SQLFrame | Moltres | |--------|----------|---------| | **View SQL** | `df.sql()`, `df.sql(optimize=True)` | `df.show_sql()`, `df.sql` | | **Readable SQL** | `optimize=True`; optional OpenAI | Plan summary and visualization | | **Validation / hints** | — | `validate()`, `performance_hints()` | | **Dialect** | Configurable Spark vs engine-native | SQLAlchemy-based dialect layer | --- ## 7. Installation and Optional Dependencies ### SQLFrame ```bash # Per-engine (required for that backend) pip install "sqlframe[bigquery]" pip install "sqlframe[postgres]" pip install "sqlframe[duckdb]" pip install "sqlframe[snowflake]" pip install "sqlframe[spark]" # Standalone (no DB) pip install sqlframe # Conda conda install -c conda-forge sqlframe ``` ### Moltres ```bash # Core (SQLite, Postgres, MySQL, DuckDB via SQLAlchemy) pip install moltres # Optional pip install moltres[async-postgresql] # Async PostgreSQL pip install moltres[pandas,polars] # Pandas/Polars result formats pip install moltres[sqlmodel] # SQLModel/Pydantic pip install moltres[streamlit] # Streamlit ``` ### Comparison | Aspect | SQLFrame | Moltres | |--------|----------|---------| | **Default install** | Engine-specific or standalone | Single package; core supports multiple DBs via SQLAlchemy | | **Extras** | One extra per engine (bigquery, postgres, duckdb, etc.) | Optional async, pandas, polars, sqlmodel, streamlit, etc. | | **Conda** | conda-forge | Not mentioned in README | --- ## 8. Summary Table | Feature | SQLFrame | Moltres | |---------|----------|---------| | **API style** | Drop-in PySpark (`activate()` + pyspark imports) | Native PySpark-style + optional Pandas/Polars | | **Entry point** | SparkSession or engine-native session | `connect()` / `async_connect()` → Database | | **First-class backends** | BigQuery, Databricks, DuckDB, Postgres, Snowflake, Spark, Redshift, GizmoSQL | SQLite, PostgreSQL, MySQL, DuckDB (SQLAlchemy) | | **Standalone SQL** | Yes | No | | **INSERT** | Via write path | `Records.insert_into()` | | **UPDATE / DELETE** | Not in public API | `db.update()`, `db.delete()` | | **Async** | Not documented | Full async support | | **SQL display** | `df.sql(optimize=True)` | `show_sql()`, `.sql`, `plan_summary()`, `visualize_plan()` | | **Validation / hints** | — | `validate()`, `performance_hints()` | | **Install** | Per-engine or standalone | Single package + optional extras | | **Dialect** | Spark or engine-native (configurable) | ANSI + SQLAlchemy dialects | --- ## References - **SQLFrame**: [GitHub](https://github.com/eakmanrq/sqlframe), [Documentation](https://sqlframe.readthedocs.io/en/stable/) - **Moltres**: [GitHub](https://github.com/eddiethedean/moltres), [Documentation](https://moltres.readthedocs.io/en/latest/)