Operational Runbooks

This document provides operational procedures for running Moltres in production environments.

Deployment

Pre-Deployment Checklist

[ ] All tests pass (pytest)
[ ] Type checking passes (mypy src)
[ ] Linting passes (ruff check .)
[ ] Documentation is up to date
[ ] Changelog is updated
[ ] Version number is bumped
[ ] Dependencies are reviewed for security issues
[ ] Performance benchmarks are within acceptable ranges

Deployment Steps

Build Package
```
python -m build
```

Test Installation

pip install dist/moltres-*.whl
python -c "import moltres; print(moltres.__version__)"

Upload to PyPI
```
twine upload dist/*
```

Verify Release

pip install --upgrade moltres
python -c "import moltres; print(moltres.__version__)"

Create GitHub Release
- Tag the release: git tag vX.Y.Z
- Push tag: git push origin vX.Y.Z
- Create release on GitHub with changelog

Schema Migrations

Adding New Tables

Create Migration Script

from moltres import connect from moltres.table.schema import column

db = connect(os.getenv(“DATABASE_URL”))

Create new table

db.create_table( “new_table”, [ column(“id”, “INTEGER”, primary_key=True), column(“name”, “TEXT”), ], if_not_exists=True, ).collect()

2. **Test Migration**
- Test on staging environment first
- Verify table structure
- Test data insertion/retrieval

3. **Apply to Production**
- Run during maintenance window
- Monitor for errors
- Verify table creation

### Modifying Existing Tables

1. **Add Columns**
```python
# Use ALTER TABLE (database-specific)
db.sql("ALTER TABLE existing_table ADD COLUMN new_column TEXT").collect()

Drop Columns

Use ALTER TABLE (database-specific)

db.sql(“ALTER TABLE existing_table DROP COLUMN old_column”).collect()

3. **Rename Columns**
```python
# Use ALTER TABLE (database-specific)
db.sql("ALTER TABLE existing_table RENAME COLUMN old_name TO new_name").collect()

Migration Best Practices

Backup First: Always backup database before migrations
Test on Staging: Test all migrations on staging environment
Rollback Plan: Have rollback scripts ready
Monitor: Watch for errors during migration
Document: Document all schema changes

Incident Response

Database Connection Issues

Symptoms:

Connection timeouts
“Connection refused” errors
Pool exhaustion errors

Diagnosis:

from moltres.utils.health import check_connection_health, check_pool_health

# Check connection health
result = check_connection_health(db)
print(result)

# Check pool health
pool_result = check_pool_health(db)
print(pool_result)

Resolution:

Check database server status
Verify network connectivity
Check firewall rules
Review connection pool settings

Increase pool size if needed:

db = connect(
    dsn,
    pool_size=20,  # Increase from default
    max_overflow=40,
)

Query Performance Issues

Symptoms:

Slow query execution
Timeout errors
High CPU usage

Diagnosis:

# Enable query logging
from moltres import register_performance_hook

def log_slow_queries(sql: str, elapsed: float, metadata: dict):
    if elapsed > 1.0:
        print(f"Slow query ({elapsed:.2f}s): {sql[:200]}")

register_performance_hook("query_end", log_slow_queries)

Resolution:

Analyze slow queries with EXPLAIN
Add indexes on frequently queried columns
Optimize query patterns
Consider query caching
Scale database resources if needed

Memory Issues

Symptoms:

Out of memory errors
High memory usage
Process crashes

Diagnosis:

import psutil
import os

process = psutil.Process(os.getpid())
print(f"Memory usage: {process.memory_info().rss / 1024 / 1024} MB")

Resolution:

Use streaming for large datasets:

Use streaming mode

df = db.read.csv(“large_file.csv”, stream=True)

2. Process data in chunks
3. Close database connections when done
4. Monitor memory usage
5. Scale application resources

### Data Corruption

**Symptoms:**
- Unexpected query results
- Constraint violations
- Data inconsistencies

**Diagnosis:**
```python
# Verify data integrity
result = db.sql("SELECT COUNT(*) FROM table_name").collect()
print(f"Row count: {result[0]['count']}")

# Check for duplicates
result = db.sql("""
 SELECT column_name, COUNT(*) as count
 FROM table_name
 GROUP BY column_name
 HAVING COUNT(*) > 1
""").collect()

Resolution:

Identify corrupted data
Restore from backup if needed
Fix data inconsistencies
Add data validation
Implement data integrity checks

Monitoring

Key Metrics to Monitor

Query Performance
- Average query duration
- P95/P99 query latency
- Slow query count
Connection Pool
- Active connections
- Pool utilization
- Connection wait time
Error Rates
- Query error rate
- Connection error rate
- Timeout rate
Resource Usage
- Memory usage
- CPU usage
- Network I/O

Setting Up Monitoring

from moltres.utils.telemetry import get_metrics_collector, get_structured_logger

# Get metrics
metrics = get_metrics_collector()
print(metrics.get_metrics())

# Structured logging
logger = get_structured_logger()
# Logs are automatically structured JSON

Alerting Thresholds

Query Duration: Alert if P95 > 5 seconds
Error Rate: Alert if error rate > 1%
Connection Pool: Alert if utilization > 80%
Memory Usage: Alert if usage > 80% of available

Backup and Recovery

Backup Procedures

Database Backups

# PostgreSQL
pg_dump -h host -U user -d dbname > backup.sql

# MySQL
mysqldump -h host -u user -p dbname > backup.sql

# SQLite
sqlite3 database.db ".backup backup.db"

Application Data Backups
- Export critical data to files
- Store backups in secure location
- Encrypt sensitive backups

Recovery Procedures

Restore from Backup

# PostgreSQL
psql -h host -U user -d dbname < backup.sql

# MySQL
mysql -h host -u user -p dbname < backup.sql

# SQLite
cp backup.db database.db

Point-in-Time Recovery
- Use database transaction logs
- Restore to specific timestamp
- Verify data integrity

Scaling

Horizontal Scaling

Read Replicas
- Set up read replicas for read-heavy workloads
- Route read queries to replicas
- Keep writes on primary
Connection Pooling
- Increase pool size for more connections
- Use connection pooler (e.g., PgBouncer)

Vertical Scaling

Database Resources
- Increase CPU/memory for database server
- Add faster storage (SSD)
- Optimize database configuration
Application Resources
- Increase application server resources
- Use multiple application instances
- Load balance requests

Maintenance Windows

Scheduled Maintenance

Database Maintenance
- Run VACUUM/OPTIMIZE (database-specific)
- Update statistics
- Check for corruption
Application Maintenance
- Deploy updates
- Restart services
- Clear caches

Maintenance Checklist

[ ] Notify users of maintenance window
[ ] Backup database
[ ] Put application in maintenance mode
[ ] Perform maintenance tasks
[ ] Verify system health
[ ] Remove maintenance mode
[ ] Monitor for issues

Troubleshooting

Common Issues

“Connection pool exhausted”
- Increase pool size
- Check for connection leaks
- Use connection pooler
“Query timeout”
- Optimize slow queries
- Increase query timeout
- Add indexes
“Table does not exist”
- Verify table name
- Check schema/database
- Verify permissions
“Permission denied”
- Check database user permissions
- Verify table access
- Check firewall rules

Debug Mode

Enable debug logging:

import logging

logging.basicConfig(level=logging.DEBUG)
logging.getLogger("moltres").setLevel(logging.DEBUG)

Getting Help

Check documentation
Search GitHub issues
Review logs
Contact support (if available)