Troubleshooting Guide

Common issues and solutions when using Moltres.

Connection Issues

“Failed to execute query” or Connection Errors

Problem: Cannot connect to database or queries fail immediately.

Solutions:

Check connection string format:

# ✅ Correct formats
db = connect("sqlite:///path/to/db.db")
db = connect("postgresql://user:pass@host:5432/dbname")
db = connect("mysql://user:pass@host:3306/dbname")

Verify database is accessible:

import sqlalchemy
engine = sqlalchemy.create_engine("your_connection_string")
with engine.connect() as conn:
    print("Connection successful!")

Check network/firewall settings for remote databases
Verify credentials are correct
Enable connection pooling for better reliability:

db = connect(
    "postgresql://...",
    pool_size=5,
    max_overflow=10,
    pool_pre_ping=True  # Verify connections before use
)

“Cannot collect a plan without an attached Database”

Problem: Trying to execute a DataFrame that isn’t bound to a database.

Solution: Ensure the DataFrame is created from a database table or has a database attached:

# ✅ Correct
db = connect("sqlite:///example.db")
df = db.table("users").select()
results = df.collect()

# ❌ Incorrect
from moltres.dataframe.dataframe import DataFrame
df = DataFrame(...)  # No database attached
results = df.collect()  # Will fail

Query Issues

“Unsupported logical plan node” or Compilation Errors

Problem: Query cannot be compiled to SQL.

Solutions:

Check that all operations are supported:
- Basic operations: select, where, join, group_by, order_by, limit
- Aggregations: sum, avg, count, min, max
- Window functions: Limited support
Verify column expressions:

# ✅ Correct
df.where(col("age") > 18)

# ❌ Incorrect
df.where("age > 18")  # Must use Column expressions

Check join syntax:

# ✅ PySpark-style (recommended)
df1.join(df2, on=[col("left_col") == col("right_col")])

# ✅ Tuple syntax (backward compatible)
df1.join(df2, on=[("left_col", "right_col")])

# ✅ Same column name (simplest)
df1.join(df2, on="column")

“Join requires either equality keys or an explicit condition”

Problem: Join operation is missing required parameters.

Solution: Provide either on parameter or condition:

# ✅ Option 1: PySpark-style equality join
df1.join(df2, on=[col("id") == col("id")])

# ✅ Option 2: Tuple syntax (backward compatible)
df1.join(df2, on=[("id", "id")])

# ✅ Option 3: Custom condition (for complex joins)
from moltres import col
df1.join(df2, condition=col("df1.id") == col("df2.user_id"))

Empty Results or Unexpected Data

Problem: Query returns no results or wrong data.

Solutions:

Check filter conditions:

# Verify the condition
print(df.to_sql())  # See the generated SQL

Verify table has data:

count = len(db.table("users").select().collect())
print(f"Table has {count} rows")

Check data types:

# String comparison
df.where(col("status") == "active")  # Not col("status") == active (missing quotes)

File Reading Issues

“File not found” Errors

Problem: Cannot read CSV/JSON/Parquet files.

Solutions:

Use absolute paths:

from pathlib import Path
file_path = Path("data.csv").resolve()
records = db.load.csv(str(file_path))

Check file permissions
Verify file exists:

import os
if not os.path.exists("data.csv"):
    print("File not found!")

Schema Inference Issues

Problem: Data types are inferred incorrectly.

Solutions:

Provide explicit schema:

from moltres.table.schema import ColumnDef

schema = [
    ColumnDef(name="id", type_name="INTEGER"),
    ColumnDef(name="name", type_name="TEXT"),
    ColumnDef(name="price", type_name="REAL"),
]
records = db.load.schema(schema).csv("data.csv")

Disable schema inference:

records = db.load.option("inferSchema", False).csv("data.csv")

Performance Issues

Slow Queries

Problem: Queries take too long to execute.

Solutions:

Use streaming for large datasets:

records = db.load.stream().csv("large_file.csv")
for row in records:
    process(row)

Add indexes to database tables (at database level)
Use batch inserts (already implemented automatically):

# Automatically uses batch inserts
table.insert([row1, row2, ..., row1000])

Limit results when possible:

df.limit(100).collect()

Check connection pooling:

db = connect(
    "postgresql://...",
    pool_size=10,
    max_overflow=20
)

Memory Issues

Problem: Out of memory errors with large datasets.

Solutions:

Use streaming mode:

records = db.load.stream().option("chunk_size", 10000).csv("large.csv")
for row in records:
    process(row)

Process in batches:

# Process 1000 rows at a time
for i in range(0, total_rows, 1000):
    batch = df.limit(1000).offset(i).collect()
    process_batch(batch)

Type and Format Issues

“Unknown fetch format” Error

Problem: Requested format (pandas/polars) not available.

Solutions:

Install required dependencies:

pip install moltres[pandas]  # For pandas
pip install moltres[polars]  # For polars

Use records format (default, no dependencies needed):

db = connect("sqlite:///example.db")  # Default: fetch_format="records"

Type Errors with Mypy

Problem: Type checker complains about types.

Solutions:

Use type hints properly:

from typing import List, Dict, Any

results: List[Dict[str, Any]] = df.collect()

Cast when necessary:

from typing import cast
pandas_df = cast(pd.DataFrame, df.collect())

Validation Errors

“SQL identifier cannot be empty” or Invalid Identifier Errors

Problem: Table or column name validation fails.

Solutions:

Check identifier names:

# ✅ Valid
db.table("users")
db.table("user_profiles")

# ❌ Invalid
db.table("")  # Empty
db.table("users; DROP")  # Contains invalid characters

Validate user input before using as identifiers:

import re

table_name = get_user_input()
if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', table_name):
    raise ValueError("Invalid table name")
db.table(table_name)

“Row does not match expected columns”

Problem: Inserted rows have inconsistent schemas.

Solutions:

Ensure all rows have same columns:

# ✅ Correct
table.insert([
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"},
])

# ❌ Incorrect
table.insert([
    {"id": 1, "name": "Alice"},
    {"id": 2},  # Missing "name"
])

Check column names match table schema

Getting Help

If you’re still experiencing issues:

Check the generated SQL:

print(df.to_sql())

Enable logging:

import logging
logging.basicConfig(level=logging.DEBUG)
db = connect("sqlite:///example.db", echo=True)

Check GitHub Issues: https://github.com/eddiethedean/moltres/issues
Create a minimal reproduction:
- Small code sample
- Sample data
- Expected vs actual behavior
- Error messages
Check documentation: See README.md and docs/ directory