# Why Moltres? ## The Missing Piece in Python's Data Ecosystem Moltres fills a major, long-standing gap in the Python data ecosystem. It provides a **DataFrame API** whose operations are **pushed down into SQL**, **without loading data into memory**, **while also supporting real SQL CRUD** (INSERT, UPDATE, DELETE). **This combination does not exist anywhere else in Python today.** ## The Gap in Python's Ecosystem Python has powerful DataFrame tools and powerful SQL tools—but **no library connects them in a unified, ergonomic way**. ### What Currently Exists | Category | Examples | Limitation | |----------|-----------|------------| | **DataFrame libraries** | Pandas, Polars, Modin | In-memory only. No SQL CRUD. | | **SQL libraries** | SQLAlchemy, SQLModel, Databases | Row-level CRUD but *not* DataFrame-style. | | **SQL query builders** | Ibis, SQLGlot, PyPika | Excellent SELECT support, but **no updates/deletes/inserts**. | | **Distributed DataFrames** | PySpark | Heavy, clustered environment required. | Across all of these, developers repeatedly ask for: > "A Pandas/DataFrame-like interface backed by SQL instead of memory." But until Moltres, **nobody built it**. ## What Makes Moltres Unique Moltres is the **only** Python library that provides: | Feature | Pandas/Polars | Ibis | SQLAlchemy | SQLModel | **Moltres** | |--------|----------------|------|-------------|-----------|-------------| | DataFrame API | ✔ | ✔ | ❌ | ❌ | **✔** | | SQL Pushdown Execution | ❌ | ✔ | ✔ | ✔ | **✔** | | **Row-Level INSERT/UPDATE/DELETE** | ❌ | ❌ | ✔ | ✔ | **✔** | | Lazy query building | ✔ (Polars) | ✔ | ⚠️ | ⚠️ | **✔** | | Operates directly on SQL tables | ⚠️ limited | ✔ | ✔ | ✔ | **✔** | | Column-oriented transformations | ✔ | ✔ | ❌ | ❌ | **✔** | ## Who Needs Moltres? Moltres solves real problems for: ### Data Engineers **Problem:** Need to update millions of rows, but loading data into memory is impractical. **Solution:** Use Moltres DataFrame operations that compile to SQL UPDATE statements. No data loading required. ```python # Update millions of rows without loading into memory orders = db.table("orders") orders.update( where=col("status") == "pending", set={"status": "processing", "updated_at": "2024-01-15"} ) ``` ### Backend Developers **Problem:** ORM operations are verbose and don't support column-aware bulk operations well. **Solution:** Replace many ORM operations with cleaner, column-aware DataFrame syntax. ```python # Instead of row-by-row ORM updates users.update( where=col("status") == "pending", set={"status": "active", "updated_at": "2024-01-15"} ) ``` ### Analytics Engineers / dbt Users **Problem:** Want to express SQL models in Python code with DataFrame chaining. **Solution:** Build analytics pipelines using composable DataFrame operations that compile to SQL. ```python # Build models like dbt, but in Python customer_metrics = ( db.table("orders") .group_by("customer_id") .agg(sum(col("amount")).alias("lifetime_value")) ) ``` ### Product Engineers **Problem:** Need validated, type-safe CRUD without hand-writing SQL. **Solution:** Moltres provides type-safe CRUD operations with DataFrame-style syntax. ```python # Type-safe, validated CRUD users.insert([{"name": "Alice", "email": "alice@example.com"}]) users.update(where=col("id") == 1, set={"status": "active"}) ``` ### Teams Migrating Off Spark **Problem:** Want Spark-like DataFrame API but for traditional SQL databases—no cluster required. **Solution:** Moltres provides a Spark-like DataFrame API that works with existing SQL infrastructure. ```python # Familiar Spark-style operations df = ( db.table("orders") .select() .where(col("status") == "completed") .group_by("country") .agg(sum(col("amount")).alias("total")) ) ``` ## Why This Matters ### Pain Today Developers must juggle: - Pandas or Polars for DataFrame transformations - SQLAlchemy/ORMs for persistence - Raw SQL for updates/deletes - Custom glue to keep everything in sync ### Moltres Fixes This With Moltres: - Transformations are DataFrame-style - Execution happens in SQL - No massive DataFrame materialization - CRUD is first-class - Types and schemas stay consistent - Code becomes composable and readable ## Why Moltres Is Important for the Future of Python Data The industry is moving toward: - **pushdown execution** - **lazy query planning** - **typed models** - **server-side compute** - **Python as a declarative DSL for data** Moltres is aligned perfectly with this direction. It acts as the **SQL-powered backbone** of typed, validated, Pythonic data pipelines. ## Summary **Moltres is not "another DataFrame library."** It provides a core capability missing from Python: > **A DataFrame layer directly backed by SQL with full CRUD support.** This makes it uniquely powerful for modern data engineering, backend services, analytics, and hybrid workflows where SQL is the source of truth. If you work with SQL and Python—**Moltres solves problems you've had for years.** ## See Also - [Examples](EXAMPLES.md) - Practical examples for each use case - [Advocacy Document](moltres_advocacy.md) - Detailed positioning and comparison - [Design Notes](moltres_plan.md) - Architecture and design decisions