Migration Guide

This guide helps you migrate to Moltres from other data processing libraries.

Migrating from Pandas

Basic Operations

Pandas:

import pandas as pd

df = pd.read_csv("data.csv")
filtered = df[df["age"] > 18]
result = filtered.groupby("category").sum()

Moltres:

from moltres import connect, col

db = connect("sqlite:///data.db")
df = db.read.csv("data.csv")
filtered = df.where(col("age") > 18)
result = filtered.group_by("category").agg(sum(col("amount")))

Key Differences

  1. Lazy Evaluation: Moltres operations are lazy until .collect()

  2. SQL Pushdown: Operations execute in the database

  3. No In-Memory Data: Data stays in the database

Migrating from SQLAlchemy ORM

Query Building

SQLAlchemy ORM:

from sqlalchemy.orm import Session
from models import User

session = Session()
users = session.query(User).filter(User.age > 18).all()

Moltres:

from moltres import connect, col

db = connect("postgresql://...")
users = db.table("users").select().where(col("age") > 18).collect()

CRUD Operations

SQLAlchemy ORM:

# Create
user = User(name="Alice", age=30)
session.add(user)
session.commit()

# Update
user.age = 31
session.commit()

# Delete
session.delete(user)
session.commit()

Moltres:

# Create
db.createDataFrame([{"name": "Alice", "age": 30}]).write.insertInto("users")

# Update
df = db.table("users").select()
df.write.update("users", where=col("name") == "Alice", set={"age": 31})

# Delete
df.write.delete("users", where=col("name") == "Alice")

Migrating from PySpark

DataFrame Operations

PySpark:

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
df = spark.read.csv("data.csv")
result = df.filter(df.age > 18).groupBy("category").sum("amount")

Moltres:

from moltres import connect, col

db = connect("postgresql://...")
df = db.read.csv("data.csv")
result = df.where(col("age") > 18).group_by("category").agg(sum(col("amount")))

Key Differences

  1. No Cluster: Moltres works with existing databases, no cluster needed

  2. Same API: 98% API compatibility with PySpark

  3. SQL Pushdown: All operations compile to SQL

Migrating from Ibis

Query Building

Ibis:

import ibis

con = ibis.postgres.connect(...)
table = con.table("users")
result = table.filter(table.age > 18).group_by("category").aggregate(...)

Moltres:

from moltres import connect, col

db = connect("postgresql://...")
df = db.table("users").select().where(col("age") > 18)
result = df.group_by("category").agg(...)

Key Differences

  1. DataFrame API: Moltres uses DataFrame API (like Pandas/PySpark)

  2. CRUD Operations: Moltres supports INSERT/UPDATE/DELETE

  3. Type Safety: Full type hints throughout

Migration Checklist

Pre-Migration

  • [ ] Identify all data sources

  • [ ] Map current operations to Moltres equivalents

  • [ ] Identify breaking changes

  • [ ] Plan migration strategy (big bang vs. gradual)

Migration Steps

  1. Setup

    • Install Moltres

    • Configure database connections

    • Test connectivity

  2. Data Migration

    • Migrate data to target database

    • Verify data integrity

    • Set up indexes

  3. Code Migration

    • Replace library imports

    • Update API calls

    • Update data access patterns

  4. Testing

    • Test all operations

    • Verify results match

    • Performance testing

  5. Deployment

    • Deploy to staging

    • Monitor for issues

    • Deploy to production

Post-Migration

  • [ ] Monitor performance

  • [ ] Verify data correctness

  • [ ] Update documentation

  • [ ] Train team members

Common Migration Patterns

Pattern 1: Gradual Migration

  1. Keep existing system running

  2. Migrate one module at a time

  3. Use Moltres for new features

  4. Gradually replace old code

Pattern 2: Big Bang Migration

  1. Migrate entire system at once

  2. Requires thorough testing

  3. Higher risk but faster completion

Pattern 3: Hybrid Approach

  1. Use Moltres for new features

  2. Keep existing code as-is

  3. Migrate when touching old code

Troubleshooting

Common Issues

  1. Performance Differences

    • Add indexes

    • Optimize queries

    • Use connection pooling

  2. API Differences

    • Check documentation

    • Use type hints for IDE help

    • Review examples

  3. Data Type Mismatches

    • Verify schema

    • Check type mappings

    • Use explicit casting

Getting Help

  • Check documentation

  • Search GitHub issues

  • Ask questions in discussions

  • Review examples in docs/