Moltres Design Notes
Layers
Supports Python 3.10+ and assumes SQLAlchemy-compatible drivers.
Expression System –
moltres.expressionsmodels columns, literals, math/string ops, and aggregations as symbolic trees.Logical Planner –
moltres.logicalturns DataFrame actions into plan nodes (Project, Filter, Join, Aggregate, Sort, Limit, TableScan).SQL Compiler –
moltres.sql.compilerconverts the plan into ANSI SQL with basic dialect awareness (SQLite and PostgreSQL quoting, case sensitivity, etc.).Execution Engine –
moltres.enginemanages SQLAlchemy connections and materializes results as lists of dicts, pandas DataFrames, or polars DataFrames depending on configuration. Supports streaming execution viafetch_stream()for cursor-based pagination of large result sets.DDL Layer –
moltres.sql.ddlandmoltres.table.schemaprovide table creation and schema definition utilities that compile to CREATE TABLE and DROP TABLE statements.Data Loading Layer –
moltres.dataframe.reader(nowDataLoader) provides data source loaders with:File formats:
load.csv(),load.json(),load.jsonl(),load.parquet(),load.text()- all returnRecordsGeneric format:
load.format(format_name).load(path)- returnsRecordsSchema inference: automatic type detection from data
Explicit schemas:
.schema([ColumnDef(...), ...])for type controlFormat options:
.option(key, value)for format-specific settingsStreaming:
.stream()for chunked reading of large files (configurable chunk_size)Note: File readers return
Records(notDataFrame). For SQL operations, usedb.table(name).select()
Write Layer –
moltres.dataframe.writerprovides DataFrame persistence with:Table writes:
save_as_table()with schema inference and automatic table creationExisting table inserts:
insertInto()for appending to pre-existing tablesFile formats:
csv(),json(),jsonl(),parquet()with format-specific optionsPartitioning:
partitionBy()for directory-based data partitioningMultiple write modes: append, overwrite, error_if_exists
Streaming:
.stream()for chunked writing without materializing entire DataFrame
Mutation Layer –
moltres.table.mutationsprovides eagerinsert,update, anddeletehelpers that share the same connection stack as queries.
Workflows
Create tables programmatically using
db.create_table(name, columns, ...)with schema definitions built fromcolumn()helpers. Drop tables withdb.drop_table(name).Load data from files using
db.load.csv(path),db.load.json(path),db.load.parquet(path), etc. These returnRecordswhich can be inserted into tables or iterated. Use.schema([...])for explicit schemas and.option(key, value)for format-specific settings.For SQL operations on database tables, use
db.table("name").select(...)to get a DataFrame.Use
db.table("name").select(...)to construct lazy DataFrames.Compose joins via
df.join(other_df, on=[col("left_col") == col("right_col")])and aggregations viadf.group_by("country").agg(sum(col("amount")).alias("total")).Call
collect()to execute a plan; Moltres compiles SQL at that point. Usecollect(stream=True)to get an iterator of row chunks for large datasets, or enable streaming mode with.stream()on readers/writers.Write DataFrames to tables using
df.write.save_as_table(name)with automatic schema inference and table creation, ordf.write.insertInto(name)for existing tables. Control behavior with.mode("append|overwrite|error_if_exists")and.schema([ColumnDef(...), ...])for explicit schemas.Write DataFrames to files using
df.write.csv(path),df.write.json(path),df.write.parquet(path), or the genericdf.write.save(path, format="..."). Use.partitionBy("col1", "col2")for directory-based partitioning and.option(key, value)for format-specific settings.Enable streaming for large file datasets:
db.load.stream().option("chunk_size", 10000).csv("large.csv")returns streamingRecordsthat iterate row-by-row. For SQL queries, usecollect(stream=True)to process chunks incrementally. Usedf.write.stream().save_as_table("large_table")to write without materializing.Perform table mutations through the
TableHandle(db.table("orders").insert([...])).
Testing & Tooling
Tests:
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytestLinting: Ruff + mypy configuration is defined in
pyproject.toml.Optional deps: install with
pip install '.[polars]'to enable polars fetches.