# Migration Guide This guide helps you migrate to Moltres from other data processing libraries. ## Migrating from Pandas ### Basic Operations **Pandas:** ```python import pandas as pd df = pd.read_csv("data.csv") filtered = df[df["age"] > 18] result = filtered.groupby("category").sum() ``` **Moltres:** ```python from moltres import connect, col db = connect("sqlite:///data.db") df = db.read.csv("data.csv") filtered = df.where(col("age") > 18) result = filtered.group_by("category").agg(sum(col("amount"))) ``` ### Key Differences 1. **Lazy Evaluation**: Moltres operations are lazy until `.collect()` 2. **SQL Pushdown**: Operations execute in the database 3. **No In-Memory Data**: Data stays in the database ## Migrating from SQLAlchemy ORM ### Query Building **SQLAlchemy ORM:** ```python from sqlalchemy.orm import Session from models import User session = Session() users = session.query(User).filter(User.age > 18).all() ``` **Moltres:** ```python from moltres import connect, col db = connect("postgresql://...") users = db.table("users").select().where(col("age") > 18).collect() ``` ### CRUD Operations **SQLAlchemy ORM:** ```python # Create user = User(name="Alice", age=30) session.add(user) session.commit() # Update user.age = 31 session.commit() # Delete session.delete(user) session.commit() ``` **Moltres:** ```python # Create db.createDataFrame([{"name": "Alice", "age": 30}]).write.insertInto("users") # Update df = db.table("users").select() df.write.update("users", where=col("name") == "Alice", set={"age": 31}) # Delete df.write.delete("users", where=col("name") == "Alice") ``` ## Migrating from PySpark ### DataFrame Operations **PySpark:** ```python from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() df = spark.read.csv("data.csv") result = df.filter(df.age > 18).groupBy("category").sum("amount") ``` **Moltres:** ```python from moltres import connect, col db = connect("postgresql://...") df = db.read.csv("data.csv") result = df.where(col("age") > 18).group_by("category").agg(sum(col("amount"))) ``` ### Key Differences 1. **No Cluster**: Moltres works with existing databases, no cluster needed 2. **Same API**: 98% API compatibility with PySpark 3. **SQL Pushdown**: All operations compile to SQL ## Migrating from Ibis ### Query Building **Ibis:** ```python import ibis con = ibis.postgres.connect(...) table = con.table("users") result = table.filter(table.age > 18).group_by("category").aggregate(...) ``` **Moltres:** ```python from moltres import connect, col db = connect("postgresql://...") df = db.table("users").select().where(col("age") > 18) result = df.group_by("category").agg(...) ``` ### Key Differences 1. **DataFrame API**: Moltres uses DataFrame API (like Pandas/PySpark) 2. **CRUD Operations**: Moltres supports INSERT/UPDATE/DELETE 3. **Type Safety**: Full type hints throughout ## Migration Checklist ### Pre-Migration - [ ] Identify all data sources - [ ] Map current operations to Moltres equivalents - [ ] Identify breaking changes - [ ] Plan migration strategy (big bang vs. gradual) ### Migration Steps 1. **Setup** - Install Moltres - Configure database connections - Test connectivity 2. **Data Migration** - Migrate data to target database - Verify data integrity - Set up indexes 3. **Code Migration** - Replace library imports - Update API calls - Update data access patterns 4. **Testing** - Test all operations - Verify results match - Performance testing 5. **Deployment** - Deploy to staging - Monitor for issues - Deploy to production ### Post-Migration - [ ] Monitor performance - [ ] Verify data correctness - [ ] Update documentation - [ ] Train team members ## Common Migration Patterns ### Pattern 1: Gradual Migration 1. Keep existing system running 2. Migrate one module at a time 3. Use Moltres for new features 4. Gradually replace old code ### Pattern 2: Big Bang Migration 1. Migrate entire system at once 2. Requires thorough testing 3. Higher risk but faster completion ### Pattern 3: Hybrid Approach 1. Use Moltres for new features 2. Keep existing code as-is 3. Migrate when touching old code ## Troubleshooting ### Common Issues 1. **Performance Differences** - Add indexes - Optimize queries - Use connection pooling 2. **API Differences** - Check documentation - Use type hints for IDE help - Review examples 3. **Data Type Mismatches** - Verify schema - Check type mappings - Use explicit casting ## Getting Help - Check documentation - Search GitHub issues - Ask questions in discussions - Review examples in docs/