Frequently Asked Questions (FAQ)
Common questions about Moltres and their answers.
General Questions
What is Moltres?
Moltres is a Python library that provides a PySpark-like DataFrame API for SQL databases. It compiles DataFrame operations to SQL and executes them in your database, providing SQL pushdown without requiring a Spark cluster.
How is Moltres different from PySpark?
No cluster required: Moltres works with any SQL database, no Spark cluster needed
SQL pushdown: Operations are compiled to SQL and executed in the database
Simpler deployment: Just connect to your existing database
Lower overhead: No cluster management or resource allocation
How is Moltres different from Pandas?
SQL execution: Moltres executes queries in the database, not in Python memory
Lazy evaluation: Queries are optimized before execution
Scalability: Can handle larger datasets (limited by database, not Python memory)
SQL pushdown: Operations compile to SQL for better performance
Which databases are supported?
Moltres supports any database with a SQLAlchemy driver, including:
SQLite
PostgreSQL
MySQL
SQL Server
Oracle
And many others
Usage Questions
How do I connect to a database?
from moltres import connect
# SQLite
db = connect("sqlite:///example.db")
# PostgreSQL
db = connect("postgresql://user:pass@host/dbname")
# MySQL
db = connect("mysql://user:pass@host/dbname")
How do I read data from a table?
# Select all columns
df = db.table("users").select()
# Select specific columns
df = db.table("users").select("id", "name", "email")
# With filtering
from moltres import col
df = db.table("users").select().where(col("active") == True)
How do I read data from files?
# CSV
df = db.load.csv("data.csv")
# JSON
df = db.load.json("data.json")
# Parquet
df = db.load.parquet("data.parquet")
# With options
df = db.load.csv("data.csv", header=True, delimiter=",")
How do I filter rows?
from moltres import col
# Single condition
df = db.table("users").select().where(col("age") > 18)
# Multiple conditions
df = db.table("users").select().where(
(col("age") > 18) & (col("active") == True)
)
How do I join tables?
customers = db.table("customers").select()
orders = db.table("orders").select()
# Inner join
from moltres import col
df = customers.join(orders, on=[col("customers.id") == col("orders.customer_id")], how="inner")
# Left join
df = customers.join(orders, on=[col("customers.id") == col("orders.customer_id")], how="left")
How do I aggregate data?
from moltres import col
from moltres.expressions.functions import sum, avg, count
df = (
db.table("orders")
.select()
.group_by("category")
.agg(
sum(col("amount")).alias("total"),
avg(col("amount")).alias("average"),
count("*").alias("count")
)
)
How do I execute a query and get results?
# Collect all results
results = df.collect() # Returns list of dicts
# Stream results (for large datasets)
for chunk in df.collect(stream=True):
process_chunk(chunk)
# Show first few rows
df.show(10)
How do I see the generated SQL?
sql = df.to_sql()
print(sql)
Performance Questions
Why is my query slow?
Common causes:
Missing indexes - Create indexes on filtered/joined columns
Full table scans - Add WHERE clauses to filter early
Large result sets - Use LIMIT or streaming
Inefficient joins - Ensure join columns are indexed
See Performance Guide for details.
How do I optimize queries?
Create indexes on frequently queried columns
Filter early in the query chain
Select only needed columns
Use LIMIT for exploration
Configure connection pooling
See Performance Guide for more tips.
Does Moltres cache results?
No, Moltres doesn’t cache results. Each query executes fresh in the database. For caching, use:
Database query cache (if supported)
Application-level caching (Redis, Memcached)
Materialized views in the database
Error Questions
What does “CompilationError” mean?
A CompilationError means Moltres couldn’t compile your DataFrame operation to SQL. Common causes:
Unsupported operation for your SQL dialect
Invalid column references
Missing join conditions
Check the error message and suggestion for details.
What does “ExecutionError” mean?
An ExecutionError means the SQL query failed to execute. Common causes:
Table/column doesn’t exist
SQL syntax error
Data type mismatch
Constraint violation
Check the error message and suggestion for details.
How do I debug query errors?
Use
df.to_sql()to see the generated SQLCheck the error message and suggestion
Verify table/column names exist
Test the SQL directly in your database
See Debugging Guide for more help
Configuration Questions
How do I configure connection pooling?
db = connect(
"postgresql://user:pass@host/dbname",
pool_size=10, # Number of connections
max_overflow=5, # Additional connections
pool_timeout=30, # Timeout in seconds
pool_pre_ping=True, # Verify connections
)
How do I set query timeout?
# Via configuration
db = connect(
"postgresql://user:pass@host/dbname",
query_timeout=30.0 # 30 seconds
)
# Via environment variable
# MOLTRES_QUERY_TIMEOUT=30.0
How do I enable SQL logging?
db = connect(
"postgresql://user:pass@host/dbname",
echo=True # Enable SQL logging
)
Feature Questions
Does Moltres support UDFs (User-Defined Functions)?
No, Moltres doesn’t support Python UDFs since operations are pushed down to SQL. Instead:
Use SQL functions available in your database
Use Moltres expression functions (see
moltres.expressions.functions)For complex logic, use raw SQL expressions
Does Moltres support window functions?
Yes! Moltres supports window functions:
from moltres.expressions.window import Window
from moltres.expressions.functions import row_number, rank
window = Window.partition_by(col("category")).order_by(col("amount").desc())
df = df.with_column("rank", rank().over(window))
Does Moltres support recursive queries?
Yes! Use recursive CTEs:
initial = db.table("seed").select(...)
recursive = initial.select(...) # References CTE
df = initial.recursive_cte("name", recursive)
Does Moltres support LATERAL joins?
Yes, for PostgreSQL and MySQL 8.0+:
df = customers.join(
orders.select().where(col("customer_id") == col("customers.id")),
how="left",
lateral=True
)
Can I use raw SQL?
Yes, you can execute raw SQL:
# Execute raw SQL
results = db.execute("SELECT * FROM users WHERE age > 18")
# Use in subqueries
subquery = db.execute("SELECT id FROM active_users")
df = db.table("orders").select().where(col("user_id").isin(subquery))
Migration Questions
How do I migrate from PySpark?
See the Migration Guide for detailed instructions.
Key differences:
Replace
SparkSessionwithconnect()Update imports (
pyspark.sql.functions→moltres.expressions.functions)Use
col()consistentlyReplace UDFs with SQL functions
How do I migrate from Pandas?
Replace
pd.read_csv()withdb.load.csv()Replace pandas operations with Moltres DataFrame operations
Use
df.collect()to get results as list of dictsConvert to pandas if needed:
pd.DataFrame(df.collect())
Troubleshooting
My query returns no results
Check if the table has data:
db.table("users").select().count()Verify filter conditions:
df.to_sql()to see SQLCheck for NULL values: Use
isnull()/isnotnull()
My query is too slow
Check indexes:
EXPLAINthe queryFilter early: Apply WHERE clauses before joins
Use LIMIT: Test with small result sets first
I get “table does not exist” error
Verify table name spelling
Check database connection
Ensure table is in the correct schema
Use
db.execute("SHOW TABLES")to list tables
Column names are case-sensitive
SQL column names may be case-sensitive depending on your database:
SQLite: Case-insensitive
PostgreSQL: Case-sensitive (use quotes)
MySQL: Depends on configuration
Use exact column names as they appear in your database.
Getting Help
Documentation: See README and other guides
Examples: See Examples Guide
Issues: Report on GitHub
Questions: Check this FAQ first
For more help: