Troubleshooting Guide
Common issues and solutions when using Moltres.
Connection Issues
“Failed to execute query” or Connection Errors
Problem: Cannot connect to database or queries fail immediately.
Solutions:
Check connection string format:
# ✅ Correct formats
db = connect("sqlite:///path/to/db.db")
db = connect("postgresql://user:pass@host:5432/dbname")
db = connect("mysql://user:pass@host:3306/dbname")
Verify database is accessible:
import sqlalchemy
engine = sqlalchemy.create_engine("your_connection_string")
with engine.connect() as conn:
print("Connection successful!")
Check network/firewall settings for remote databases
Verify credentials are correct
Enable connection pooling for better reliability:
db = connect(
"postgresql://...",
pool_size=5,
max_overflow=10,
pool_pre_ping=True # Verify connections before use
)
“Cannot collect a plan without an attached Database”
Problem: Trying to execute a DataFrame that isn’t bound to a database.
Solution: Ensure the DataFrame is created from a database table or has a database attached:
# ✅ Correct
db = connect("sqlite:///example.db")
df = db.table("users").select()
results = df.collect()
# ❌ Incorrect
from moltres.dataframe.dataframe import DataFrame
df = DataFrame(...) # No database attached
results = df.collect() # Will fail
Query Issues
“Unsupported logical plan node” or Compilation Errors
Problem: Query cannot be compiled to SQL.
Solutions:
Check that all operations are supported:
Basic operations: select, where, join, group_by, order_by, limit
Aggregations: sum, avg, count, min, max
Window functions: Limited support
Verify column expressions:
# ✅ Correct
df.where(col("age") > 18)
# ❌ Incorrect
df.where("age > 18") # Must use Column expressions
Check join syntax:
# ✅ PySpark-style (recommended)
df1.join(df2, on=[col("left_col") == col("right_col")])
# ✅ Tuple syntax (backward compatible)
df1.join(df2, on=[("left_col", "right_col")])
# ✅ Same column name (simplest)
df1.join(df2, on="column")
“Join requires either equality keys or an explicit condition”
Problem: Join operation is missing required parameters.
Solution: Provide either on parameter or condition:
# ✅ Option 1: PySpark-style equality join
df1.join(df2, on=[col("id") == col("id")])
# ✅ Option 2: Tuple syntax (backward compatible)
df1.join(df2, on=[("id", "id")])
# ✅ Option 3: Custom condition (for complex joins)
from moltres import col
df1.join(df2, condition=col("df1.id") == col("df2.user_id"))
Empty Results or Unexpected Data
Problem: Query returns no results or wrong data.
Solutions:
Check filter conditions:
# Verify the condition
print(df.to_sql()) # See the generated SQL
Verify table has data:
count = len(db.table("users").select().collect())
print(f"Table has {count} rows")
Check data types:
# String comparison
df.where(col("status") == "active") # Not col("status") == active (missing quotes)
File Reading Issues
“File not found” Errors
Problem: Cannot read CSV/JSON/Parquet files.
Solutions:
Use absolute paths:
from pathlib import Path
file_path = Path("data.csv").resolve()
records = db.load.csv(str(file_path))
Check file permissions
Verify file exists:
import os
if not os.path.exists("data.csv"):
print("File not found!")
Schema Inference Issues
Problem: Data types are inferred incorrectly.
Solutions:
Provide explicit schema:
from moltres.table.schema import ColumnDef
schema = [
ColumnDef(name="id", type_name="INTEGER"),
ColumnDef(name="name", type_name="TEXT"),
ColumnDef(name="price", type_name="REAL"),
]
records = db.load.schema(schema).csv("data.csv")
Disable schema inference:
records = db.load.option("inferSchema", False).csv("data.csv")
Performance Issues
Slow Queries
Problem: Queries take too long to execute.
Solutions:
Use streaming for large datasets:
records = db.load.stream().csv("large_file.csv")
for row in records:
process(row)
Add indexes to database tables (at database level)
Use batch inserts (already implemented automatically):
# Automatically uses batch inserts
table.insert([row1, row2, ..., row1000])
Limit results when possible:
df.limit(100).collect()
Check connection pooling:
db = connect(
"postgresql://...",
pool_size=10,
max_overflow=20
)
Memory Issues
Problem: Out of memory errors with large datasets.
Solutions:
Use streaming mode:
records = db.load.stream().option("chunk_size", 10000).csv("large.csv")
for row in records:
process(row)
Process in batches:
# Process 1000 rows at a time
for i in range(0, total_rows, 1000):
batch = df.limit(1000).offset(i).collect()
process_batch(batch)
Type and Format Issues
“Unknown fetch format” Error
Problem: Requested format (pandas/polars) not available.
Solutions:
Install required dependencies:
pip install moltres[pandas] # For pandas pip install moltres[polars] # For polars
Use records format (default, no dependencies needed):
db = connect("sqlite:///example.db") # Default: fetch_format="records"
Type Errors with Mypy
Problem: Type checker complains about types.
Solutions:
Use type hints properly:
from typing import List, Dict, Any
results: List[Dict[str, Any]] = df.collect()
Cast when necessary:
from typing import cast
pandas_df = cast(pd.DataFrame, df.collect())
Validation Errors
“SQL identifier cannot be empty” or Invalid Identifier Errors
Problem: Table or column name validation fails.
Solutions:
Check identifier names:
# ✅ Valid
db.table("users")
db.table("user_profiles")
# ❌ Invalid
db.table("") # Empty
db.table("users; DROP") # Contains invalid characters
Validate user input before using as identifiers:
import re
table_name = get_user_input()
if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', table_name):
raise ValueError("Invalid table name")
db.table(table_name)
“Row does not match expected columns”
Problem: Inserted rows have inconsistent schemas.
Solutions:
Ensure all rows have same columns:
# ✅ Correct
table.insert([
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"},
])
# ❌ Incorrect
table.insert([
{"id": 1, "name": "Alice"},
{"id": 2}, # Missing "name"
])
Check column names match table schema
Getting Help
If you’re still experiencing issues:
Check the generated SQL:
print(df.to_sql())
Enable logging:
import logging
logging.basicConfig(level=logging.DEBUG)
db = connect("sqlite:///example.db", echo=True)
Check GitHub Issues: https://github.com/eddiethedean/moltres/issues
Create a minimal reproduction:
Small code sample
Sample data
Expected vs actual behavior
Error messages
Check documentation: See README.md and docs/ directory