# Troubleshooting Guide Common issues and solutions when using Moltres. ## Connection Issues ### "Failed to execute query" or Connection Errors **Problem**: Cannot connect to database or queries fail immediately. **Solutions**: 1. **Check connection string format**: ```python # ✅ Correct formats db = connect("sqlite:///path/to/db.db") db = connect("postgresql://user:pass@host:5432/dbname") db = connect("mysql://user:pass@host:3306/dbname") ``` 2. **Verify database is accessible**: ```python import sqlalchemy engine = sqlalchemy.create_engine("your_connection_string") with engine.connect() as conn: print("Connection successful!") ``` 3. **Check network/firewall settings** for remote databases 4. **Verify credentials** are correct 5. **Enable connection pooling** for better reliability: ```python db = connect( "postgresql://...", pool_size=5, max_overflow=10, pool_pre_ping=True # Verify connections before use ) ``` ### "Cannot collect a plan without an attached Database" **Problem**: Trying to execute a DataFrame that isn't bound to a database. **Solution**: Ensure the DataFrame is created from a database table or has a database attached: ```python # ✅ Correct db = connect("sqlite:///example.db") df = db.table("users").select() results = df.collect() # ❌ Incorrect from moltres.dataframe.dataframe import DataFrame df = DataFrame(...) # No database attached results = df.collect() # Will fail ``` ## Query Issues ### "Unsupported logical plan node" or Compilation Errors **Problem**: Query cannot be compiled to SQL. **Solutions**: 1. **Check that all operations are supported**: - Basic operations: select, where, join, group_by, order_by, limit - Aggregations: sum, avg, count, min, max - Window functions: Limited support 2. **Verify column expressions**: ```python # ✅ Correct df.where(col("age") > 18) # ❌ Incorrect df.where("age > 18") # Must use Column expressions ``` 3. **Check join syntax**: ```python # ✅ PySpark-style (recommended) df1.join(df2, on=[col("left_col") == col("right_col")]) # ✅ Tuple syntax (backward compatible) df1.join(df2, on=[("left_col", "right_col")]) # ✅ Same column name (simplest) df1.join(df2, on="column") ``` ### "Join requires either equality keys or an explicit condition" **Problem**: Join operation is missing required parameters. **Solution**: Provide either `on` parameter or `condition`: ```python # ✅ Option 1: PySpark-style equality join df1.join(df2, on=[col("id") == col("id")]) # ✅ Option 2: Tuple syntax (backward compatible) df1.join(df2, on=[("id", "id")]) # ✅ Option 3: Custom condition (for complex joins) from moltres import col df1.join(df2, condition=col("df1.id") == col("df2.user_id")) ``` ### Empty Results or Unexpected Data **Problem**: Query returns no results or wrong data. **Solutions**: 1. **Check filter conditions**: ```python # Verify the condition print(df.to_sql()) # See the generated SQL ``` 2. **Verify table has data**: ```python count = len(db.table("users").select().collect()) print(f"Table has {count} rows") ``` 3. **Check data types**: ```python # String comparison df.where(col("status") == "active") # Not col("status") == active (missing quotes) ``` ## File Reading Issues ### "File not found" Errors **Problem**: Cannot read CSV/JSON/Parquet files. **Solutions**: 1. **Use absolute paths**: ```python from pathlib import Path file_path = Path("data.csv").resolve() records = db.load.csv(str(file_path)) ``` 2. **Check file permissions** 3. **Verify file exists**: ```python import os if not os.path.exists("data.csv"): print("File not found!") ``` ### Schema Inference Issues **Problem**: Data types are inferred incorrectly. **Solutions**: 1. **Provide explicit schema**: ```python from moltres.table.schema import ColumnDef schema = [ ColumnDef(name="id", type_name="INTEGER"), ColumnDef(name="name", type_name="TEXT"), ColumnDef(name="price", type_name="REAL"), ] records = db.load.schema(schema).csv("data.csv") ``` 2. **Disable schema inference**: ```python records = db.load.option("inferSchema", False).csv("data.csv") ``` ## Performance Issues ### Slow Queries **Problem**: Queries take too long to execute. **Solutions**: 1. **Use streaming for large datasets**: ```python records = db.load.stream().csv("large_file.csv") for row in records: process(row) ``` 2. **Add indexes** to database tables (at database level) 3. **Use batch inserts** (already implemented automatically): ```python # Automatically uses batch inserts table.insert([row1, row2, ..., row1000]) ``` 4. **Limit results** when possible: ```python df.limit(100).collect() ``` 5. **Check connection pooling**: ```python db = connect( "postgresql://...", pool_size=10, max_overflow=20 ) ``` ### Memory Issues **Problem**: Out of memory errors with large datasets. **Solutions**: 1. **Use streaming mode**: ```python records = db.load.stream().option("chunk_size", 10000).csv("large.csv") for row in records: process(row) ``` 2. **Process in batches**: ```python # Process 1000 rows at a time for i in range(0, total_rows, 1000): batch = df.limit(1000).offset(i).collect() process_batch(batch) ``` ## Type and Format Issues ### "Unknown fetch format" Error **Problem**: Requested format (pandas/polars) not available. **Solutions**: 1. **Install required dependencies**: ```bash pip install moltres[pandas] # For pandas pip install moltres[polars] # For polars ``` 2. **Use records format** (default, no dependencies needed): ```python db = connect("sqlite:///example.db") # Default: fetch_format="records" ``` ### Type Errors with Mypy **Problem**: Type checker complains about types. **Solutions**: 1. **Use type hints properly**: ```python from typing import List, Dict, Any results: List[Dict[str, Any]] = df.collect() ``` 2. **Cast when necessary**: ```python from typing import cast pandas_df = cast(pd.DataFrame, df.collect()) ``` ## Validation Errors ### "SQL identifier cannot be empty" or Invalid Identifier Errors **Problem**: Table or column name validation fails. **Solutions**: 1. **Check identifier names**: ```python # ✅ Valid db.table("users") db.table("user_profiles") # ❌ Invalid db.table("") # Empty db.table("users; DROP") # Contains invalid characters ``` 2. **Validate user input** before using as identifiers: ```python import re table_name = get_user_input() if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', table_name): raise ValueError("Invalid table name") db.table(table_name) ``` ### "Row does not match expected columns" **Problem**: Inserted rows have inconsistent schemas. **Solutions**: 1. **Ensure all rows have same columns**: ```python # ✅ Correct table.insert([ {"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}, ]) # ❌ Incorrect table.insert([ {"id": 1, "name": "Alice"}, {"id": 2}, # Missing "name" ]) ``` 2. **Check column names match table schema** ## Getting Help If you're still experiencing issues: 1. **Check the generated SQL**: ```python print(df.to_sql()) ``` 2. **Enable logging**: ```python import logging logging.basicConfig(level=logging.DEBUG) db = connect("sqlite:///example.db", echo=True) ``` 3. **Check GitHub Issues**: https://github.com/eddiethedean/moltres/issues 4. **Create a minimal reproduction**: - Small code sample - Sample data - Expected vs actual behavior - Error messages 5. **Check documentation**: See README.md and docs/ directory