# Performance Service Level Agreements (SLAs)

This document defines performance expectations and SLAs for Moltres operations. These benchmarks are based on representative datasets and typical production workloads.

## Benchmark Methodology

- **Test Environment**: Standard development machine (8GB RAM, 4 CPU cores)
- **Database**: SQLite (in-memory) for baseline, PostgreSQL for production benchmarks
- **Dataset Sizes**: 
  - Small: 1K-10K rows
  - Medium: 10K-100K rows
  - Large: 100K-1M rows
  - Very Large: 1M+ rows (streaming mode)

## Query Performance SLAs

### Simple SELECT Operations

**Target**: < 100ms for 10K rows, < 1s for 100K rows

- **Small datasets (1K-10K rows)**: < 100ms
- **Medium datasets (10K-100K rows)**: < 500ms
- **Large datasets (100K-1M rows)**: < 2s
- **Very large datasets (1M+ rows)**: Streaming mode recommended, < 10s per chunk

**Notes**:
- Performance depends heavily on database engine and indexing
- SQLite (in-memory) provides baseline performance
- PostgreSQL/MySQL with proper indexing can achieve better results
- Network latency not included in these benchmarks

### Filter Operations

**Target**: < 150ms for 10K rows with indexed filter column

- **Indexed column filter**: < 150ms for 10K rows
- **Non-indexed column filter**: < 500ms for 10K rows
- **Complex predicates (AND/OR)**: < 300ms for 10K rows

**Recommendations**:
- Create indexes on frequently filtered columns
- Use parameterized queries for repeated filters
- Consider partitioning for very large tables

### Aggregation Operations

**Target**: < 200ms for 100K rows grouped by indexed column

- **Simple aggregations (SUM, AVG, COUNT)**: < 200ms for 100K rows
- **Group by indexed column**: < 200ms for 100K rows
- **Group by non-indexed column**: < 1s for 100K rows
- **Multiple aggregations**: < 500ms for 100K rows

**Recommendations**:
- Index GROUP BY columns for better performance
- Use approximate aggregations (e.g., `percentile_approx`) for very large datasets
- Consider materialized views for frequently aggregated data

### Join Operations

**Target**: < 500ms for 10K × 10K row joins with indexed join keys

- **Inner join (indexed keys)**: < 500ms for 10K × 10K rows
- **Left/Right join (indexed keys)**: < 600ms for 10K × 10K rows
- **Join without indexes**: < 2s for 10K × 10K rows
- **Multiple joins**: < 1s per additional join

**Recommendations**:
- Always index foreign key columns
- Use appropriate join types (INNER vs LEFT) based on data requirements
- Consider denormalization for frequently joined tables

### Window Functions

**Target**: < 1s for 100K rows with simple window

- **ROW_NUMBER, RANK**: < 1s for 100K rows
- **LAG, LEAD**: < 1s for 100K rows
- **Complex windows (PARTITION BY + ORDER BY)**: < 2s for 100K rows

**Notes**:
- Window function performance varies significantly by database engine
- PostgreSQL generally provides better window function performance than SQLite
- Consider materialized views for expensive window calculations

## Write Performance SLAs

### INSERT Operations

**Target**: > 1K rows/second for batch inserts

- **Single row inserts**: Not recommended for production (use batch)
- **Batch inserts (100 rows)**: < 100ms
- **Batch inserts (1K rows)**: < 500ms
- **Batch inserts (10K rows)**: < 2s

**Recommendations**:
- Always use batch inserts for multiple rows
- Use `createDataFrame()` for bulk data loading
- Consider `COPY` or bulk insert for very large datasets (database-specific)

### UPDATE Operations

**Target**: < 500ms for 10K rows with indexed WHERE clause

- **Single row update (indexed WHERE)**: < 10ms
- **Bulk update (10K rows, indexed WHERE)**: < 500ms
- **Bulk update (100K rows, indexed WHERE)**: < 5s

**Recommendations**:
- Index columns used in WHERE clauses
- Use transactions for multiple updates
- Consider batch updates for very large datasets

### DELETE Operations

**Target**: < 500ms for 10K rows with indexed WHERE clause

- **Single row delete (indexed WHERE)**: < 10ms
- **Bulk delete (10K rows, indexed WHERE)**: < 500ms
- **Bulk delete (100K rows, indexed WHERE)**: < 5s

**Recommendations**:
- Index columns used in WHERE clauses
- Use transactions for multiple deletes
- Consider soft deletes (UPDATE flag) instead of hard deletes for audit trails

## File I/O Performance SLAs

### CSV Reading

**Target**: > 10K rows/second for streaming reads

- **Small files (< 10MB)**: < 1s
- **Medium files (10-100MB)**: < 10s (streaming)
- **Large files (100MB-1GB)**: < 2 minutes (streaming)
- **Very large files (1GB+)**: Streaming mode required, < 5 minutes per GB

**Recommendations**:
- Use streaming mode (`stream=True`) for files > 10MB
- Compress files (gzip) for better I/O performance
- Consider Parquet format for better compression and performance

### JSON Reading

**Target**: > 5K rows/second for streaming reads

- **Small files (< 10MB)**: < 2s
- **Medium files (10-100MB)**: < 20s (streaming)
- **Large files (100MB-1GB)**: < 4 minutes (streaming)

**Recommendations**:
- Use JSONL format (one object per line) for better streaming performance
- Use streaming mode for files > 10MB
- Consider schema inference optimization for large files

### Parquet Reading

**Target**: > 50K rows/second

- **Small files (< 10MB)**: < 500ms
- **Medium files (10-100MB)**: < 5s
- **Large files (100MB-1GB)**: < 30s

**Recommendations**:
- Parquet provides best performance for analytical workloads
- Use column pruning to read only needed columns
- Consider partitioning for very large datasets

### CSV Writing

**Target**: > 10K rows/second

- **Small datasets (< 10K rows)**: < 1s
- **Medium datasets (10K-100K rows)**: < 10s
- **Large datasets (100K-1M rows)**: < 2 minutes

### JSON Writing

**Target**: > 5K rows/second

- **Small datasets (< 10K rows)**: < 2s
- **Medium datasets (10K-100K rows)**: < 20s
- **Large datasets (100K-1M rows)**: < 4 minutes

### Parquet Writing

**Target**: > 20K rows/second

- **Small datasets (< 10K rows)**: < 500ms
- **Medium datasets (10K-100K rows)**: < 5s
- **Large datasets (100K-1M rows)**: < 30s

## Connection Pool Performance

### Connection Acquisition

**Target**: < 10ms from pool

- **Connection from pool**: < 10ms
- **New connection (if pool exhausted)**: < 100ms
- **Connection timeout**: 30s default

### Pool Sizing Recommendations

- **Small applications**: `pool_size=5, max_overflow=10`
- **Medium applications**: `pool_size=10, max_overflow=20`
- **Large applications**: `pool_size=20, max_overflow=40`

**Formula**: `pool_size = (expected_concurrent_requests / avg_query_time) * 1.2`

## Memory Usage SLAs

### Query Execution

**Target**: Constant memory usage regardless of dataset size (with streaming)

- **Streaming queries**: O(1) memory (constant)
- **Non-streaming queries**: O(n) memory (proportional to result size)
- **Recommended**: Always use streaming for datasets > 100K rows

### File Reading

**Target**: Constant memory usage with streaming

- **Streaming mode**: O(chunk_size) memory (default: 10K rows)
- **Non-streaming mode**: O(file_size) memory (not recommended for large files)

## Performance Monitoring

### Metrics to Track

1. **Query Latency**: P50, P95, P99 percentiles
2. **Throughput**: Queries per second
3. **Error Rate**: Failed queries per total queries
4. **Connection Pool Usage**: Active connections, pool size, wait time
5. **Memory Usage**: Peak memory per operation
6. **File I/O**: Read/write throughput

### Performance Hooks

Moltres provides performance monitoring hooks:

```python
from moltres import register_performance_hook

def my_hook(event: str, sql: str, duration: float, metadata: dict):
    if duration > 1.0:  # Log slow queries
        print(f"Slow query ({duration:.2f}s): {sql}")

register_performance_hook(my_hook)
```

## Performance Tuning Tips

1. **Indexing**: Create indexes on frequently filtered, joined, or grouped columns
2. **Connection Pooling**: Configure appropriate pool sizes for your workload
3. **Streaming**: Always use streaming for large datasets
4. **Batch Operations**: Use batch inserts/updates instead of single-row operations
5. **Query Optimization**: Use EXPLAIN to analyze query plans
6. **Database-Specific Optimizations**: Leverage database-specific features (e.g., PostgreSQL JSONB, MySQL JSON functions)

## Regression Testing

Performance benchmarks are run as part of CI to detect regressions:

- **Baseline**: SQLite in-memory database
- **Frequency**: On every commit to main branch
- **Threshold**: 20% performance degradation triggers alert

## Notes

- All benchmarks assume optimal conditions (indexed columns, sufficient memory, no network latency)
- Real-world performance may vary based on:
  - Database engine and version
  - Hardware specifications
  - Network latency (for remote databases)
  - Concurrent load
  - Data distribution and cardinality
- These SLAs are targets, not guarantees. Actual performance depends on many factors outside Moltres's control.