Deployment Guide

This guide covers best practices for deploying Moltres in production environments.

Pre-Deployment Checklist

  • [ ] Database connection strings configured securely

  • [ ] Connection pooling configured appropriately

  • [ ] Query timeouts set

  • [ ] Error handling implemented

  • [ ] Logging configured

  • [ ] Performance monitoring enabled

  • [ ] Indexes created on frequently queried columns

  • [ ] Database backups configured

  • [ ] Environment variables set

  • [ ] Dependencies pinned

Configuration

Environment Variables

Use environment variables for configuration (12-factor app principle):

# Database connection
export MOLTRES_DSN="postgresql://user:pass@host/dbname"

# Connection pooling
export MOLTRES_POOL_SIZE=10
export MOLTRES_MAX_OVERFLOW=5
export MOLTRES_POOL_TIMEOUT=30
export MOLTRES_POOL_RECYCLE=3600
export MOLTRES_POOL_PRE_PING=true

# Query settings
export MOLTRES_QUERY_TIMEOUT=30.0

# Logging
export MOLTRES_ECHO=false  # Set to true for debugging only

Application Configuration

from moltres import connect
import os

# Load from environment
db = connect(
    dsn=os.environ.get("MOLTRES_DSN"),
    pool_size=int(os.environ.get("MOLTRES_POOL_SIZE", "10")),
    max_overflow=int(os.environ.get("MOLTRES_MAX_OVERFLOW", "5")),
    query_timeout=float(os.environ.get("MOLTRES_QUERY_TIMEOUT", "30.0")),
    pool_pre_ping=True,
    echo=False,  # Disable in production
)

Connection Management

Connection Pooling

Configure connection pooling based on your workload:

# For web applications (many concurrent requests)
db = connect(
    "postgresql://user:pass@host/dbname",
    pool_size=20,        # Larger pool for concurrency
    max_overflow=10,     # Allow overflow
    pool_timeout=30,     # Wait for connection
    pool_recycle=3600,   # Recycle after 1 hour
    pool_pre_ping=True,  # Verify connections
)

# For batch processing (fewer concurrent requests)
db = connect(
    "postgresql://user:pass@host/dbname",
    pool_size=5,         # Smaller pool
    max_overflow=2,      # Minimal overflow
    pool_timeout=60,     # Longer timeout
    pool_recycle=7200,   # Recycle after 2 hours
    pool_pre_ping=True,
)

Connection Lifecycle

# Application startup
db = connect("postgresql://user:pass@host/dbname")

# Application shutdown
db.close()  # Close all connections

Error Handling

Production Error Handling

from moltres.utils.exceptions import (
    ExecutionError,
    CompilationError,
    DatabaseConnectionError,
    QueryTimeoutError,
)

def safe_query(df):
    try:
        return df.collect()
    except QueryTimeoutError as e:
        logger.error(f"Query timeout: {e}")
        # Return empty result or cached result
        return []
    except ExecutionError as e:
        logger.error(f"Query failed: {e}")
        # Log and handle gracefully
        raise
    except DatabaseConnectionError as e:
        logger.error(f"Connection failed: {e}")
        # Retry or use fallback
        raise
    except Exception as e:
        logger.error(f"Unexpected error: {e}", exc_info=True)
        raise

Retry Logic

import time
from functools import wraps

def retry_on_connection_error(max_retries=3, delay=1):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except DatabaseConnectionError:
                    if attempt < max_retries - 1:
                        time.sleep(delay * (attempt + 1))
                        continue
                    raise
        return wrapper
    return decorator

@retry_on_connection_error(max_retries=3)
def get_users():
    return db.table("users").select().collect()

Logging

Configure Logging

import logging

# Configure Moltres logging
logging.getLogger("moltres").setLevel(logging.WARNING)

# Log SQL queries (for debugging)
logging.getLogger("moltres.engine").setLevel(logging.DEBUG)

Custom Logging

from moltres.engine import register_performance_hook
import logging

logger = logging.getLogger(__name__)

def log_slow_queries(sql: str, elapsed: float, metadata: dict):
    if elapsed > 1.0:
        logger.warning(
            f"Slow query ({elapsed:.2f}s): {sql[:200]}",
            extra={
                "query_time": elapsed,
                "rowcount": metadata.get("rowcount"),
            }
        )

register_performance_hook("query_end", log_slow_queries)

Monitoring

Health Checks

def health_check():
    """Check database connectivity."""
    try:
        db.execute("SELECT 1")
        return {"status": "healthy", "database": "connected"}
    except Exception as e:
        return {"status": "unhealthy", "error": str(e)}

Metrics Collection

from moltres.engine import register_performance_hook

query_metrics = {
    "total_queries": 0,
    "total_time": 0.0,
    "slow_queries": 0,
}

def collect_metrics(sql: str, elapsed: float, metadata: dict):
    query_metrics["total_queries"] += 1
    query_metrics["total_time"] += elapsed
    if elapsed > 1.0:
        query_metrics["slow_queries"] += 1

register_performance_hook("query_end", collect_metrics)

# Expose metrics (e.g., for Prometheus)
def get_metrics():
    return {
        "queries_total": query_metrics["total_queries"],
        "queries_duration_seconds": query_metrics["total_time"],
        "slow_queries_total": query_metrics["slow_queries"],
    }

Security

Connection String Security

Never commit credentials:

# Bad: Hardcoded credentials
db = connect("postgresql://user:password@host/dbname")  # DON'T DO THIS

# Good: Environment variables
db = connect(os.environ["MOLTRES_DSN"])

# Good: Configuration file (not in version control)
import json
with open("config.json") as f:
    config = json.load(f)
db = connect(config["database"]["dsn"])

SQL Injection Prevention

Moltres prevents SQL injection by using parameterized queries:

# Good: Safe - uses parameterized queries
user_id = 123
df = db.table("users").select().where(col("id") == user_id)

# Bad: Vulnerable (but Moltres doesn't support this anyway)
# Don't use raw SQL with string formatting

Database Setup

Indexes

Create indexes before deployment:

# Create indexes for production
indexes = [
    "CREATE INDEX IF NOT EXISTS idx_users_email ON users(email)",
    "CREATE INDEX IF NOT EXISTS idx_orders_customer_id ON orders(customer_id)",
    "CREATE INDEX IF NOT EXISTS idx_orders_status ON orders(status)",
    "CREATE INDEX IF NOT EXISTS idx_orders_date ON orders(date)",
]

for index_sql in indexes:
    try:
        db.execute(index_sql)
    except Exception as e:
        logger.warning(f"Failed to create index: {e}")

Database Migrations

Use a migration tool (Alembic, Django migrations, etc.) for schema changes:

# Example: Run migrations before application starts
def run_migrations():
    import subprocess
    subprocess.run(["alembic", "upgrade", "head"])

Deployment Strategies

Blue-Green Deployment

  1. Deploy new version to green environment

  2. Test with production database (read-only)

  3. Switch traffic to green

  4. Keep blue as rollback option

Canary Deployment

  1. Deploy to small percentage of instances

  2. Monitor for errors

  3. Gradually increase percentage

  4. Full rollout if successful

Rolling Deployment

  1. Deploy to one instance at a time

  2. Verify health after each deployment

  3. Continue until all instances updated

Scaling

Horizontal Scaling

  • Run multiple application instances

  • Each instance has its own connection pool

  • Database handles connection limits

  • Use load balancer for distribution

Vertical Scaling

  • Increase database resources (CPU, memory)

  • Optimize queries and indexes

  • Increase connection pool size if needed

Database Read Replicas

# Primary for writes
write_db = connect("postgresql://user:pass@primary/dbname")

# Replica for reads
read_db = connect("postgresql://user:pass@replica/dbname")

# Use appropriate database
def get_users():
    return read_db.table("users").select().collect()

def create_user(data):
    write_db.table("users").insert(data)

Backup and Recovery

Database Backups

Ensure database backups are configured:

  • Automated daily backups

  • Point-in-time recovery

  • Test restore procedures

Application State

  • Log important operations

  • Use idempotent operations where possible

  • Implement retry logic for transient failures

Performance Tuning

Query Optimization

  1. Create indexes on frequently queried columns

  2. Filter early in query chains

  3. Use LIMIT for exploratory queries

  4. Monitor slow queries with performance hooks

See Performance Guide for details.

Connection Pool Tuning

Monitor connection pool metrics:

  • Active connections

  • Idle connections

  • Connection wait time

  • Connection errors

Adjust pool size based on metrics.

Troubleshooting

Common Issues

  1. Connection pool exhausted

    • Increase pool_size and max_overflow

    • Check for connection leaks

    • Reduce connection timeout

  2. Query timeouts

    • Optimize queries (add indexes)

    • Increase query_timeout if appropriate

    • Break queries into smaller chunks

  3. Database connection failures

    • Check network connectivity

    • Verify credentials

    • Check database server status

See Debugging Guide for more help.

Summary

  1. Use environment variables for configuration

  2. Configure connection pooling appropriately

  3. Implement error handling and retry logic

  4. Set up logging and monitoring

  5. Secure credentials (never commit)

  6. Create indexes before deployment

  7. Monitor performance and adjust

  8. Plan for scaling (horizontal/vertical)

  9. Test backups and recovery

  10. Document deployment procedures

For more information: