Deployment Guide

This guide covers best practices for deploying Moltres in production environments.

Pre-Deployment Checklist

[ ] Database connection strings configured securely
[ ] Connection pooling configured appropriately
[ ] Query timeouts set
[ ] Error handling implemented
[ ] Logging configured
[ ] Performance monitoring enabled
[ ] Indexes created on frequently queried columns
[ ] Database backups configured
[ ] Environment variables set
[ ] Dependencies pinned

Configuration

Environment Variables

Use environment variables for configuration (12-factor app principle):

# Database connection
export MOLTRES_DSN="postgresql://user:pass@host/dbname"

# Connection pooling
export MOLTRES_POOL_SIZE=10
export MOLTRES_MAX_OVERFLOW=5
export MOLTRES_POOL_TIMEOUT=30
export MOLTRES_POOL_RECYCLE=3600
export MOLTRES_POOL_PRE_PING=true

# Query settings
export MOLTRES_QUERY_TIMEOUT=30.0

# Logging
export MOLTRES_ECHO=false  # Set to true for debugging only

Application Configuration

from moltres import connect
import os

# Load from environment
db = connect(
    dsn=os.environ.get("MOLTRES_DSN"),
    pool_size=int(os.environ.get("MOLTRES_POOL_SIZE", "10")),
    max_overflow=int(os.environ.get("MOLTRES_MAX_OVERFLOW", "5")),
    query_timeout=float(os.environ.get("MOLTRES_QUERY_TIMEOUT", "30.0")),
    pool_pre_ping=True,
    echo=False,  # Disable in production
)

Connection Management

Connection Pooling

Configure connection pooling based on your workload:

# For web applications (many concurrent requests)
db = connect(
    "postgresql://user:pass@host/dbname",
    pool_size=20,        # Larger pool for concurrency
    max_overflow=10,     # Allow overflow
    pool_timeout=30,     # Wait for connection
    pool_recycle=3600,   # Recycle after 1 hour
    pool_pre_ping=True,  # Verify connections
)

# For batch processing (fewer concurrent requests)
db = connect(
    "postgresql://user:pass@host/dbname",
    pool_size=5,         # Smaller pool
    max_overflow=2,      # Minimal overflow
    pool_timeout=60,     # Longer timeout
    pool_recycle=7200,   # Recycle after 2 hours
    pool_pre_ping=True,
)

Connection Lifecycle

# Application startup
db = connect("postgresql://user:pass@host/dbname")

# Application shutdown
db.close()  # Close all connections

Error Handling

Production Error Handling

from moltres.utils.exceptions import (
    ExecutionError,
    CompilationError,
    DatabaseConnectionError,
    QueryTimeoutError,
)

def safe_query(df):
    try:
        return df.collect()
    except QueryTimeoutError as e:
        logger.error(f"Query timeout: {e}")
        # Return empty result or cached result
        return []
    except ExecutionError as e:
        logger.error(f"Query failed: {e}")
        # Log and handle gracefully
        raise
    except DatabaseConnectionError as e:
        logger.error(f"Connection failed: {e}")
        # Retry or use fallback
        raise
    except Exception as e:
        logger.error(f"Unexpected error: {e}", exc_info=True)
        raise

Retry Logic

import time
from functools import wraps

def retry_on_connection_error(max_retries=3, delay=1):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except DatabaseConnectionError:
                    if attempt < max_retries - 1:
                        time.sleep(delay * (attempt + 1))
                        continue
                    raise
        return wrapper
    return decorator

@retry_on_connection_error(max_retries=3)
def get_users():
    return db.table("users").select().collect()

Logging

Configure Logging

import logging

# Configure Moltres logging
logging.getLogger("moltres").setLevel(logging.WARNING)

# Log SQL queries (for debugging)
logging.getLogger("moltres.engine").setLevel(logging.DEBUG)

Custom Logging

from moltres.engine import register_performance_hook
import logging

logger = logging.getLogger(__name__)

def log_slow_queries(sql: str, elapsed: float, metadata: dict):
    if elapsed > 1.0:
        logger.warning(
            f"Slow query ({elapsed:.2f}s): {sql[:200]}",
            extra={
                "query_time": elapsed,
                "rowcount": metadata.get("rowcount"),
            }
        )

register_performance_hook("query_end", log_slow_queries)

Monitoring

Health Checks

def health_check():
    """Check database connectivity."""
    try:
        db.execute("SELECT 1")
        return {"status": "healthy", "database": "connected"}
    except Exception as e:
        return {"status": "unhealthy", "error": str(e)}

Metrics Collection

from moltres.engine import register_performance_hook

query_metrics = {
    "total_queries": 0,
    "total_time": 0.0,
    "slow_queries": 0,
}

def collect_metrics(sql: str, elapsed: float, metadata: dict):
    query_metrics["total_queries"] += 1
    query_metrics["total_time"] += elapsed
    if elapsed > 1.0:
        query_metrics["slow_queries"] += 1

register_performance_hook("query_end", collect_metrics)

# Expose metrics (e.g., for Prometheus)
def get_metrics():
    return {
        "queries_total": query_metrics["total_queries"],
        "queries_duration_seconds": query_metrics["total_time"],
        "slow_queries_total": query_metrics["slow_queries"],
    }

Security

Connection String Security

Never commit credentials:

# Bad: Hardcoded credentials
db = connect("postgresql://user:password@host/dbname")  # DON'T DO THIS

# Good: Environment variables
db = connect(os.environ["MOLTRES_DSN"])

# Good: Configuration file (not in version control)
import json
with open("config.json") as f:
    config = json.load(f)
db = connect(config["database"]["dsn"])

SQL Injection Prevention

Moltres prevents SQL injection by using parameterized queries:

# Good: Safe - uses parameterized queries
user_id = 123
df = db.table("users").select().where(col("id") == user_id)

# Bad: Vulnerable (but Moltres doesn't support this anyway)
# Don't use raw SQL with string formatting

Database Setup

Indexes

Create indexes before deployment:

# Create indexes for production
indexes = [
    "CREATE INDEX IF NOT EXISTS idx_users_email ON users(email)",
    "CREATE INDEX IF NOT EXISTS idx_orders_customer_id ON orders(customer_id)",
    "CREATE INDEX IF NOT EXISTS idx_orders_status ON orders(status)",
    "CREATE INDEX IF NOT EXISTS idx_orders_date ON orders(date)",
]

for index_sql in indexes:
    try:
        db.execute(index_sql)
    except Exception as e:
        logger.warning(f"Failed to create index: {e}")

Database Migrations

Use a migration tool (Alembic, Django migrations, etc.) for schema changes:

# Example: Run migrations before application starts
def run_migrations():
    import subprocess
    subprocess.run(["alembic", "upgrade", "head"])

Deployment Strategies

Blue-Green Deployment

Deploy new version to green environment
Test with production database (read-only)
Switch traffic to green
Keep blue as rollback option

Canary Deployment

Deploy to small percentage of instances
Monitor for errors
Gradually increase percentage
Full rollout if successful

Rolling Deployment

Deploy to one instance at a time
Verify health after each deployment
Continue until all instances updated

Scaling

Horizontal Scaling

Run multiple application instances
Each instance has its own connection pool
Database handles connection limits
Use load balancer for distribution

Vertical Scaling

Increase database resources (CPU, memory)
Optimize queries and indexes
Increase connection pool size if needed

Database Read Replicas

# Primary for writes
write_db = connect("postgresql://user:pass@primary/dbname")

# Replica for reads
read_db = connect("postgresql://user:pass@replica/dbname")

# Use appropriate database
def get_users():
    return read_db.table("users").select().collect()

def create_user(data):
    write_db.table("users").insert(data)

Backup and Recovery

Database Backups

Ensure database backups are configured:

Automated daily backups
Point-in-time recovery
Test restore procedures

Application State

Log important operations
Use idempotent operations where possible
Implement retry logic for transient failures

Performance Tuning

Query Optimization

Create indexes on frequently queried columns
Filter early in query chains
Use LIMIT for exploratory queries
Monitor slow queries with performance hooks

See Performance Guide for details.

Connection Pool Tuning

Monitor connection pool metrics:

Active connections
Idle connections
Connection wait time
Connection errors

Adjust pool size based on metrics.

Troubleshooting

Common Issues

Connection pool exhausted
- Increase pool_size and max_overflow
- Check for connection leaks
- Reduce connection timeout
Query timeouts
- Optimize queries (add indexes)
- Increase query_timeout if appropriate
- Break queries into smaller chunks
Database connection failures
- Check network connectivity
- Verify credentials
- Check database server status

See Debugging Guide for more help.

Summary

Use environment variables for configuration
Configure connection pooling appropriately
Implement error handling and retry logic
Set up logging and monitoring
Secure credentials (never commit)
Create indexes before deployment
Monitor performance and adjust
Plan for scaling (horizontal/vertical)
Test backups and recovery
Document deployment procedures

For more information: