Moltres Package Plan
Archived
This document outlines an early package and architecture plan for Moltres. It is kept for maintainers and is not part of the primary user docs.
1. Name
Moltres – inspired by the legendary fire Pokémon, evoking speed, power, and a spark-like DataFrame API.
2. Project Goal
Provide a PySpark DataFrame API that executes lazily on real SQL databases, supporting select, joins, aggregations, inserts, updates, deletes, without needing Spark.
3. High-Level Architecture
Core Layers (Bottom → Top)
Expression System: Columns, literals, functions → produce a symbolic expression tree (like PySpark Column or Polars Expr)
Logical Plan Builder: select, join, filter, groupby → produce a logical plan tree (Project, Filter, Join, Aggregate, etc.)
SQL Compiler: Converts the logical plan into SQL (ANSI + dialect adaptations)
Execution Engine: Uses SQLAlchemy for DB connections, executes SQL, returns DataFrame (pandas/polars)
Mutation Engine: Supports
insert,update,deleteon underlying SQL tables
4. Package Directory Structure
moltres/
__init__.py
config.py
engine/
connection.py
execution.py
dialects.py
expressions/
column.py
expr.py
functions.py
logical/
plan.py
operators.py
dataframe/
dataframe.py
groupby.py
sql/
compiler.py
builders.py
io/
read.py
write.py
table/
table.py
mutations.py
utils/
exceptions.py
typing.py
inspector.py
tests/
...
5. API Design
Connect to DB
from moltres import connect
db = connect("postgresql://user:pass@host/db")
Select & Filter
t = db.table("customers")
df = (
t.select("id", "name", (col("spend") * 1.1).alias("adj_spend"))
.where(col("active") == True)
.order_by(col("created_at").desc())
)
df.show()
Joins
df = orders.join(customers, on="customer_id").select(
customers["name"],
orders["total"],
)
GroupBy & Aggregations
df = t.groupBy("country").agg(
sum(col("spend")).alias("total_spend"),
count("*").alias("n"),
)
Insert
t.insert(new_df)
Update
t.update(
where=col("status") == "new",
set={"status": lit("processed")}
)
Delete
t.delete(col("created_at") < "2024-01-01")
6. Internal Components
Expression System
Column expressions (
col("spend") + 1,col("country").like("%US%"))Functions (
sum,avg,upper,concat, etc.)
Logical Plan Nodes
Project, Filter, Aggregate, Join, Limit, Sort, TableScan
SQL Compiler
Translates logical plan → SQL
Handles aliases, expressions, joins, groupings
Supports multiple SQL dialects
Execution Engine
Uses SQLAlchemy for connections
Executes SQL
Returns pandas/polars DataFrame
Mutation Engine
INSERT, UPDATE, DELETE
Returns row count or status
7. Development Roadmap
Week 1: Foundation
Project structure
Column, Literal, basic expressions
TableScan
Minimal DataFrame wrapper
Week 2: Logical Plan + Compiler
select / where / limit
Expression → SQL conversion
Execute queries → pandas
Week 3: Full Query Support
joins
groupBy / aggregates
orderBy
Dialect support (Postgres + SQLite)
Week 4: Mutation
insert DataFrame
update
delete
Week 5: Polars Integration
Convert query results to polars
Accept polars DataFrames for insert
Week 6: Stabilize API + Docs
Documentation (sphinx/mkdocs)
Examples
Benchmark suite
PyPI packaging
8. Testing Plan
Unit tests: expressions, logical plan, SQL compiler, mutation queries
Integration tests: SQLite + PostgreSQL
Performance tests: compare Sparklet vs raw SQL
9. Value Proposition
PySpark familiarity, SQL execution
No cluster required
Safer than writing dynamic SQL
Composable & testable
Wraps existing SQL databases
Lets analysts/engineers write PySpark-style code anywhere