# Debugging Guide

This guide helps you debug and troubleshoot issues with Moltres queries.

## Viewing Generated SQL

The most important debugging tool is viewing the SQL that Moltres generates. Moltres provides several convenient methods for this:

### Method 1: show_sql() - Pretty-print SQL

```python
df = db.table("users").select().where(col("age") > 18)
df.show_sql()  # Pretty-prints formatted SQL with indentation
```

### Method 2: sql property - Get formatted SQL

```python
df = db.table("users").select().where(col("age") > 18)
print(df.sql)  # Returns formatted SQL string
```

### Method 3: to_sql(pretty=True) - Get formatted SQL

```python
df = db.table("users").select().where(col("age") > 18)
sql = df.to_sql(pretty=True)  # Returns formatted SQL
print(sql)
```

### Method 4: sql_preview() - Get SQL preview

```python
df = db.table("users").select().where(col("age") > 18)
preview = df.sql_preview(max_length=200)  # First 200 characters
print(preview)
```

### Why This Helps

- **Verify correctness:** Check if the SQL matches your intent
- **Performance:** See what the database will execute
- **Debug errors:** Understand why a query might fail
- **Optimization:** Identify opportunities for improvement

## Common Error Patterns

### 1. CompilationError

**What it means:** Moltres couldn't compile your operation to SQL.

**Common causes:**
- Unsupported operation for your SQL dialect
- Invalid column references
- Missing required parameters

**How to debug:**
```python
try:
    df = db.table("users").select().where(col("invalid_col") > 18)
    df.collect()
except CompilationError as e:
    print(f"Error: {e.message}")
    print(f"Suggestion: {e.suggestion}")
    print(f"Context: {e.context}")
    
    # Check the SQL that was attempted
    try:
        sql = df.to_sql()
        print(f"Generated SQL: {sql}")
    except:
        print("Could not generate SQL")
```

**Solutions:**
- Check column names exist in the table
- Verify the operation is supported for your dialect
- Review the error suggestion

### 2. ExecutionError

**What it means:** The SQL query failed to execute in the database.

**Common causes:**
- Table/column doesn't exist
- SQL syntax error
- Data type mismatch
- Constraint violation

**How to debug:**
```python
try:
    results = df.collect()
except ExecutionError as e:
    print(f"Error: {e.message}")
    print(f"Suggestion: {e.suggestion}")
    
    # Get the SQL that failed
    sql = df.to_sql()
    print(f"Failed SQL: {sql}")
    
    # Try executing directly in database to see detailed error
    try:
        db.execute(sql)
    except Exception as db_error:
        print(f"Database error: {db_error}")
```

**Solutions:**
- Verify table and column names
- Check data types match
- Test SQL directly in database client
- Review database error logs

### 3. QueryTimeoutError

**What it means:** The query exceeded the configured timeout.

**How to debug:**
```python
try:
    results = df.collect()
except QueryTimeoutError as e:
    print(f"Query timed out after {e.context.get('timeout_seconds')} seconds")
    
    # Check the slow query
    sql = df.to_sql()
    print(f"Slow SQL: {sql}")
    
    # Use EXPLAIN to understand the query plan
    explain_sql = f"EXPLAIN ANALYZE {sql}"
    plan = db.execute(explain_sql)
    print(f"Query plan: {plan}")
```

**Solutions:**
- Add indexes on filtered/joined columns
- Filter early to reduce data scanned
- Increase timeout if appropriate
- Break query into smaller chunks

### 4. DatabaseConnectionError

**What it means:** Failed to connect to the database.

**How to debug:**
```python
try:
    db = connect("postgresql://user:pass@host/dbname")
    db.table("users").select().collect()
except DatabaseConnectionError as e:
    print(f"Connection failed: {e.message}")
    print(f"Suggestion: {e.suggestion}")
    
    # Test connection string
    import sqlalchemy
    try:
        engine = sqlalchemy.create_engine("postgresql://user:pass@host/dbname")
        with engine.connect() as conn:
            print("Direct connection works")
    except Exception as conn_error:
        print(f"Direct connection failed: {conn_error}")
```

**Solutions:**
- Verify connection string is correct
- Check database server is running
- Verify network connectivity
- Check credentials and permissions

## Debugging Query Logic

### 1. Step-by-Step Execution

Build queries incrementally and test at each step:

```python
# Step 1: Basic select
df = db.table("users").select()
print("Step 1 SQL:", df.to_sql())
results = df.limit(5).collect()
print("Step 1 results:", results)

# Step 2: Add filter
df = df.where(col("active") == True)
print("Step 2 SQL:", df.to_sql())
results = df.limit(5).collect()
print("Step 2 results:", results)

# Step 3: Add join
df = df.join(orders, on=[col("df.id") == col("orders.user_id")])
print("Step 3 SQL:", df.to_sql())
results = df.limit(5).collect()
print("Step 3 results:", results)
```

### 2. Verify Intermediate Results

Use `limit()` and `show()` to inspect intermediate results:

```python
# Check what data exists
df = db.table("users").select()
print("Total rows:", len(df.collect()))
df.show(10)  # Show first 10 rows

# Check filtered results
df_filtered = df.where(col("age") > 18)
print("Filtered rows:", len(df_filtered.collect()))
df_filtered.show(10)
```

### 3. Compare Expected vs Actual

```python
# Expected: Users with age > 18
expected_count = 100

# Actual
df = db.table("users").select().where(col("age") > 18)
actual_count = len(df.collect())

print(f"Expected: {expected_count}, Actual: {actual_count}")

if actual_count != expected_count:
    # Debug the filter
    sql = df.to_sql()
    print(f"SQL: {sql}")
    
    # Check data
    all_users = db.table("users").select().collect()
    print(f"Total users: {len(all_users)}")
    print(f"Users with age > 18: {sum(1 for u in all_users if u.get('age', 0) > 18)}")
```

## Debugging Performance Issues

### 1. Use Query Plan Visualization

Moltres provides several methods to understand your query plan:

#### plan_summary() - Get structured plan statistics

```python
df = db.table("orders").select().where(col("status") == "active")
summary = df.plan_summary()
print(f"Operations: {summary['operations']}")
print(f"Table scans: {summary['table_scans']}")
print(f"Joins: {summary['joins']}")
print(f"Filters: {summary['filters']}")
print(f"Depth: {summary['depth']}")
```

#### visualize_plan() - ASCII tree visualization

```python
df = db.table("orders").select().where(col("status") == "active")
print(df.visualize_plan())
# Output: ASCII tree showing plan structure
```

#### explain() - Database execution plan

```python
df = db.table("orders").select().where(col("status") == "active")
plan = df.explain(analyze=True)  # Use analyze=True for actual execution stats
print(plan)
```

#### validate() - Check for common issues

```python
df = db.table("orders").select().where(col("status") == "active")
issues = df.validate()
for issue in issues:
    print(f"[{issue['type']}] {issue['message']}")
    if issue.get('suggestion'):
        print(f"  Suggestion: {issue['suggestion']}")
```

#### performance_hints() - Get optimization suggestions

```python
df = db.table("orders").select().where(col("status") == "active")
hints = df.performance_hints()
for hint in hints:
    print(f"- {hint}")
```

### 2. Monitor Query Execution

Use performance hooks to monitor queries:

```python
from moltres.engine import register_performance_hook

def log_query(sql: str, elapsed: float, metadata: dict):
    if elapsed > 0.1:  # Log queries taking more than 100ms
        print(f"Query ({elapsed:.3f}s): {sql[:200]}")
        print(f"  Rows: {metadata.get('rowcount', 'N/A')}")

register_performance_hook("query_end", log_query)
```

### 3. Profile Query Steps

Time individual operations:

```python
import time

start = time.perf_counter()
df = db.table("users").select()
print(f"Build query: {(time.perf_counter() - start) * 1000:.2f}ms")

start = time.perf_counter()
sql = df.to_sql()
print(f"Compile SQL: {(time.perf_counter() - start) * 1000:.2f}ms")

start = time.perf_counter()
results = df.collect()
print(f"Execute query: {(time.perf_counter() - start) * 1000:.2f}ms")
print(f"Results: {len(results)} rows")
```

## Debugging Data Issues

### 1. Check for NULL Values

```python
from moltres.expressions.functions import isnull, isnotnull

# Count NULL values
null_count = (
    db.table("users")
    .select()
    .where(isnull(col("email")))
    .count()
)
print(f"Users with NULL email: {null_count}")

# Check specific values
df = db.table("users").select().where(col("email").is_null())
df.show(10)
```

### 2. Verify Data Types

```python
# Check column types in database
if db.dialect.name == "postgresql":
    type_query = """
    SELECT column_name, data_type 
    FROM information_schema.columns 
    WHERE table_name = 'users'
    """
    types = db.execute(type_query)
    print("Column types:", types)
```

### 3. Inspect Data Ranges

```python
from moltres.expressions.functions import min, max, avg

# Check data ranges
stats = (
    db.table("orders")
    .select()
    .agg(
        min(col("amount")).alias("min_amount"),
        max(col("amount")).alias("max_amount"),
        avg(col("amount")).alias("avg_amount")
    )
    .collect()
)
print("Order amount stats:", stats)
```

## Debugging Join Issues

### 1. Verify Join Keys

```python
# Check join key values exist
left_ids = set(
    row["id"] 
    for row in db.table("customers").select("id").collect()
)
right_ids = set(
    row["customer_id"] 
    for row in db.table("orders").select("customer_id").collect()
)

print(f"Customers: {len(left_ids)}")
print(f"Orders: {len(right_ids)}")
print(f"Matching: {len(left_ids & right_ids)}")
print(f"Orphaned orders: {len(right_ids - left_ids)}")
```

### 2. Test Join Conditions

```python
# Test join with explicit condition
df = (
    db.table("customers")
    .select()
    .join(
        db.table("orders").select(),
        on=[col("customers.id") == col("orders.customer_id")],
        how="inner"
    )
)

# Check SQL
sql = df.to_sql()
print("Join SQL:", sql)

# Check results
results = df.limit(10).collect()
print("Join results:", results)
```

## Debugging Aggregation Issues

### 1. Verify Grouping Columns

```python
# Check distinct values in grouping column
categories = (
    db.table("orders")
    .select("category")
    .distinct()
    .collect()
)
print("Categories:", [r["category"] for r in categories])

# Compare with aggregation
agg_results = (
    db.table("orders")
    .select()
    .group_by("category")
    .agg(count("*").alias("count"))
    .collect()
)
print("Aggregation results:", agg_results)
```

### 2. Check for NULL in Grouping

```python
# Count NULL values in grouping column
null_count = (
    db.table("orders")
    .select()
    .where(isnull(col("category")))
    .count()
)
print(f"Orders with NULL category: {null_count}")
```

## Tools and Techniques

### 1. Enable SQL Logging

```python
# Enable SQL logging
db = connect(
    "postgresql://user:pass@host/dbname",
    echo=True  # Logs all SQL to console
)
```

### 2. Use Database Tools

- **PostgreSQL:** `psql`, pgAdmin, DBeaver
- **SQLite:** `sqlite3` command-line tool
- **MySQL:** `mysql` command-line tool, MySQL Workbench

Test queries directly in these tools to verify behavior.

### 3. Create Test Queries

```python
# Create a simple test query
def test_query():
    df = db.table("users").select("id", "name").limit(1)
    sql = df.to_sql()
    print(f"Test SQL: {sql}")
    result = df.collect()
    print(f"Test result: {result}")
    return result

# Run test
test_query()
```

## Schema Discovery

Moltres provides convenient methods to discover database schemas:

### Get table schema

```python
# Get schema for a specific table
schema = db.schema("users")
for col_def in schema:
    print(f"{col_def.name}: {col_def.type_name}")

# Get all tables with their schemas
tables = db.tables()
for table_name, columns in tables.items():
    print(f"{table_name}: {len(columns)} columns")

# Get columns from a table handle
handle = db.table("users")
columns = handle.columns()
print(columns)  # ['id', 'name', 'email', ...]
```

## Interactive Help

Moltres provides interactive help to discover available operations:

### help() - Display available operations

```python
df = db.table("users").select()
df.help()  # Displays all available operations and examples
```

### suggest_next() - Get suggestions for next operations

```python
df = db.table("users").select()
suggestions = df.suggest_next()
for suggestion in suggestions:
    print(suggestion)
```

## Summary

1. **Use `df.show_sql()` or `df.sql`** to see formatted SQL
2. **Use `df.plan_summary()` and `df.visualize_plan()`** to understand query structure
3. **Use `df.validate()`** to check for common issues
4. **Use `df.performance_hints()`** to get optimization suggestions
5. **Use `db.schema()` and `db.tables()`** to discover database structure
6. **Read error messages** and suggestions carefully (they now include query context)
7. **Test incrementally** - build queries step by step
8. **Use `limit()` and `show()`** to inspect results
9. **Use `df.explain()`** to understand database execution plans
10. **Monitor performance** with hooks
11. **Verify data** - check for NULLs, types, ranges
12. **Test in database tools** directly
13. **Enable SQL logging** for detailed debugging
14. **Create test queries** to isolate issues
15. **Use `df.help()` and `df.suggest_next()`** to discover available operations

For more help:
- [Performance Guide](./PERFORMANCE.md)
- [Best Practices](./BEST_PRACTICES.md)
- [FAQ](./FAQ.md)
- [Troubleshooting Guide](./TROUBLESHOOTING.md)