DataFrame API

The DataFrame API provides a PySpark-like interface for querying SQL databases. All operations are lazy and compile to SQL that runs directly on your database. Use this reference alongside the getting-started and patterns guides.

Lazy DataFrame representation.

This module provides the core DataFrame class, which represents a lazy query plan that is executed only when results are requested (via collect(), show(), etc.).

The DataFrame class supports: - PySpark-style operations (select, where, join, groupBy, etc.) - SQL pushdown execution (all operations compile to SQL) - Lazy evaluation (queries are not executed until collect/show is called) - Model integration (SQLModel, Pydantic, SQLAlchemy)

class moltres.dataframe.core.dataframe.DataFrame(plan: LogicalPlan, database: Any | None = None, model: Any | None = None)[source]

Bases: DataFrameHelpersMixin

Lazy DataFrame representing a query plan.

A DataFrame is an immutable, lazy representation of a SQL query. Operations on a DataFrame build up a logical plan that is only executed when you call collect(), show(), or similar execution methods.

All operations compile to SQL and execute directly on the database - no data is loaded into memory until you explicitly request results.

Example

>>> from moltres import connect, col
>>> db = connect("sqlite:///example.db")
>>> df = db.table("users").select().where(col("age") > 25)
>>> results = df.collect()  # Query executes here

anti_join(other: DataFrame, *, on: str | Sequence[str] | Sequence[Tuple[str, str]] | None = None) → DataFrame[source]

Perform an anti-join: return rows from this DataFrame where no matching row exists in other.

This is equivalent to filtering with NOT EXISTS subquery.

Parameters:

other – Another DataFrame to anti-join with (used as NOT EXISTS subquery)
on – Join condition - can be: - A single column name (assumes same name in both DataFrames) - A sequence of column names (assumes same names in both) - A sequence of (left_column, right_column) tuples

Returns:

New DataFrame containing rows from this DataFrame that have no matches in other

Raises:

RuntimeError – If DataFrames are not bound to the same Database

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("customers", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> db.create_table("orders", [column("id", "INTEGER"), column("customer_id", "INTEGER")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}], _database=db).insert_into("customers")
>>> _ = :class:`Records`(_data=[{"id": 1, "customer_id": 1}], _database=db).insert_into("orders")
>>> # Find customers who have not placed any orders
>>> customers = db.table("customers").select()
>>> orders = db.table("orders").select()
>>> customers_without_orders = customers.anti_join(orders, on=[("id", "customer_id")])
>>> results = customers_without_orders.collect()
>>> len(results)
1
>>> results[0]["name"]
'Bob'
>>> db.close()

collect(stream: Literal[False] = False) → List[Dict[str, object]][source]

collect(stream: Literal[True]) → Iterator[List[Dict[str, object]]]

Collect DataFrame results.

Parameters:

stream – If True, return an iterator of row chunks. If False (default), materialize all rows into a list.

Returns:

List of dictionaries representing rows. If stream=False and model attached: List of SQLModel or Pydantic instances. If stream=True and no model attached: Iterator of row chunks (each chunk is a list of dicts). If stream=True and model attached: Iterator of row chunks (each chunk is a list of model instances).

Return type:

If stream=False and no model attached

Raises:

RuntimeError – If DataFrame is not bound to a Database
ImportError – If model is attached but Pydantic or SQLModel is not installed

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}], _database=db).insert_into("users")
>>> # Collect all results
>>> df = db.table("users").select()
>>> results = df.collect()
>>> len(results)
2
>>> results[0]["name"]
'Alice'
>>> # Collect with streaming (returns iterator)
>>> stream_results = df.collect(stream=True)
>>> chunk = next(stream_results)
>>> len(chunk)
2
>>> db.close()

property columns: List[str]

Return a list of column names in this DataFrame.

Delegates to SchemaInspector.

Similar to PySpark’s DataFrame.columns property, this extracts column names from the logical plan without requiring query execution.

Returns:: List of column name strings
Raises:: RuntimeError – If column names cannot be determined (e.g., RawSQL without execution)

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT"), column("email", "TEXT")]).collect()
>>> df = db.table("users").select()
>>> cols = df.columns
>>> "id" in cols and "name" in cols and "email" in cols
True
>>> df2 = df.select("id", "name")
>>> cols2 = df2.columns
>>> len(cols2)
2
>>> "id" in cols2 and "name" in cols2
True
>>> db.close()

count() → int[source]

Return the number of rows in the DataFrame.

Delegates to StatisticsCalculator.

Returns:: Number of rows

Note

This executes a COUNT(*) query against the database.

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> Records(_data=[{"id": i, "name": f"User{i}"} for i in range(1, 6)], _database=db).insert_into("users")
>>> df = db.table("users").select()
>>> df.count()
5
>>> # Count with filter
>>> df2 = db.table("users").select().where(col("id") > 2)
>>> df2.count()
3
>>> db.close()

crossJoin(other: DataFrame) → DataFrame[source]

Perform a cross join (Cartesian product) with another DataFrame.

Parameters:: other – Another DataFrame to cross join with
Returns:: New DataFrame containing the Cartesian product of rows
Raises:: RuntimeError – If DataFrames are not bound to the same Database

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("table1", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> db.create_table("table2", [column("id", "INTEGER"), column("value", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "A"}, {"id": 2, "name": "B"}], _database=db).insert_into("table1")
>>> _ = :class:`Records`(_data=[{"id": 1, "value": "X"}, {"id": 2, "value": "Y"}], _database=db).insert_into("table2")
>>> df1 = db.table("table1").select()
>>> df2 = db.table("table2").select()
>>> # Cross join (Cartesian product)
>>> df_cross = df1.crossJoin(df2)
>>> results = df_cross.collect()
>>> len(results)
4
>>> db.close()

cross_join(other: DataFrame) → DataFrame[source]

Perform a cross join (Cartesian product) with another DataFrame (snake_case alias for crossJoin).

This is an alias for crossJoin(). See crossJoin() for full documentation.

Parameters:: other – Another DataFrame to cross join with
Returns:: New DataFrame containing the Cartesian product of rows
Raises:: RuntimeError – If DataFrames are not bound to the same Database

cte(name: str) → DataFrame[source]

Create a Common Table Expression (CTE) from this DataFrame.

Parameters:: name – Name for the CTE
Returns:: New DataFrame representing the CTE

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("orders", [column("id", "INTEGER"), column("amount", "REAL")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "amount": 150.0}, {"id": 2, "amount": 50.0}], _database=db).insert_into("orders")
>>> # Create CTE
>>> cte_df = db.table("orders").select().where(col("amount") > 100).cte("high_value_orders")
>>> # Query the CTE
>>> result = cte_df.select().collect()
>>> len(result)
1
>>> result[0]["amount"]
150.0
>>> db.close()

database: Any | None = None

describe(*cols: str) → DataFrame[source]

Compute basic statistics for numeric columns.

Delegates to StatisticsCalculator.

Parameters:: *cols – Optional column names to describe. If not provided, describes all numeric columns.
Returns:: count, mean, stddev, min, max
Return type:: DataFrame with statistics

Note

This is a simplified implementation. A full implementation would automatically detect numeric columns if cols is not provided.

distinct() → DataFrame[source]

Return a new DataFrame with distinct rows.

Returns:: New DataFrame with distinct rows

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Alice"}, {"id": 3, "name": "Bob"}], _database=db).insert_into("users")
>>> df = db.table("users").select("name").distinct()
>>> results = df.collect()
>>> len(results)
2
>>> names = {r["name"] for r in results}
>>> "Alice" in names
True
>>> "Bob" in names
True
>>> db.close()

drop(*cols: str | Column) → DataFrame[source]

Drop one or more columns from the DataFrame.

Parameters:: *cols – Column names or Column objects to drop
Returns:: New DataFrame with the specified columns removed

Note

This operation only works if the DataFrame has a Project operation. Otherwise, it will create a Project that excludes the specified columns.

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT"), column("email", "TEXT")]).collect()
>>> from moltres.io.records import Records
>>> _ = Records(_data=[{"id": 1, "name": "Alice", "email": "alice@example.com"}], _database=db).insert_into("users")
>>> # Drop by string column name
>>> df = db.table("users").select().drop("email")
>>> results = df.collect()
>>> "email" not in results[0]
True
>>> "name" in results[0]
True
>>> # Drop by :class:`Column` object
>>> df2 = db.table("users").select().drop(col("email"))
>>> results2 = df2.collect()
>>> "email" not in results2[0]
True
>>> # Drop multiple columns
>>> df3 = db.table("users").select().drop("email", "id")
>>> results3 = df3.collect()
>>> len(results3[0].keys())
1
>>> "name" in results3[0]
True
>>> db.close()

dropDuplicates(subset: Sequence[str] | None = None) → DataFrame[source]

Return a new DataFrame with duplicate rows removed.

Parameters:: subset – Optional list of column names to consider when identifying duplicates. If None, all columns are considered.
Returns:: New DataFrame with duplicates removed

Note

This is equivalent to distinct() when subset is None. When subset is provided, it’s implemented as a group_by on those columns with a select of all columns.

drop_duplicates(subset: Sequence[str] | None = None) → DataFrame[source]

Return a new DataFrame with duplicate rows removed (snake_case alias for dropDuplicates).

This is an alias for dropDuplicates(). See dropDuplicates() for full documentation.

Parameters:: subset – Optional list of column names to consider when identifying duplicates. If None, all columns are considered.
Returns:: New DataFrame with duplicates removed

dropna(how: str = 'any', subset: Sequence[str] | None = None) → DataFrame[source]

Remove rows with null values.

Parameters:

how – “any” (drop if any null) or “all” (drop if all null) (default: “any”)
subset – Optional list of column names to check. If None, checks all columns.

Returns:

New DataFrame with null rows removed

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT"), column("age", "INTEGER")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice", "age": 25}, {"id": 2, "name": None, "age": 30}, {"id": 3, "name": "Bob", "age": None}], _database=db).insert_into("users")
>>> # Drop rows where any column in subset is null
>>> df = db.table("users").select().dropna(how="any", subset=["name", "age"])
>>> results = df.collect()
>>> len(results)
1
>>> results[0]["name"]
'Alice'
>>> # Drop rows where all columns in subset are null
>>> df2 = db.table("users").select().dropna(how="all", subset=["name", "age"])
>>> results2 = df2.collect()
>>> len(results2)
3
>>> db.close()

property dtypes: List[Tuple[str, str]]

Return a list of tuples containing column names and their data types.

Delegates to SchemaInspector.

Similar to PySpark’s DataFrame.dtypes property, this returns a list of (column_name, type_name) tuples.

Returns:: List of tuples (column_name, type_name)
Raises:: RuntimeError – If schema cannot be determined (e.g., RawSQL without execution)

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> df = db.table("users").select()
>>> dtypes = df.dtypes
>>> len(dtypes)
2
>>> dtypes[0]
('id', 'INTEGER')
>>> dtypes[1][0]
'name'
>>> db.close()

except_(other: DataFrame) → DataFrame[source]

Return rows in this DataFrame that are not in another DataFrame (distinct rows only).

Parameters:: other – Another DataFrame to exclude from
Returns:: New DataFrame containing rows in this DataFrame but not in other
Raises:: RuntimeError – If DataFrames are not bound to the same Database

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("table1", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> db.create_table("table2", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}], _database=db).insert_into("table1")
>>> _ = :class:`Records`(_data=[{"id": 2, "name": "Bob"}], _database=db).insert_into("table2")
>>> df1 = db.table("table1").select()
>>> df2 = db.table("table2").select()
>>> # Except (rows in df1 but not in df2)
>>> df_except = df1.except_(df2)
>>> results = df_except.collect()
>>> len(results)
1
>>> results[0]["name"]
'Alice'
>>> db.close()

explain(analyze: bool = False) → str[source]

Get the query execution plan using SQL EXPLAIN.

Convenience method for query debugging and optimization.

Parameters:: analyze – If True, use EXPLAIN ANALYZE (executes query and shows actual execution stats). If False, use EXPLAIN (shows estimated plan without executing).
Returns:: Query plan as a string

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> df = db.table("users").select().where(col("id") > 1)
>>> # Get query plan
>>> plan = df.explain()
>>> "EXPLAIN" in plan or "SCAN" in plan or "SELECT" in plan
True
>>> # Get execution plan with actual stats
>>> plan2 = df.explain(analyze=True)
>>> len(plan2) > 0
True
>>> db.close()
>>> plan = df.explain(analyze=True)

Raises:: RuntimeError – If DataFrame is not bound to a Database

Example

>>> df = db.table("users").select().where(col("age") > 18)
>>> plan = df.explain()
>>> print(plan)
>>> # For actual execution stats:
>>> plan = df.explain(analyze=True)

explode(column: Column | str, alias: str = 'value') → DataFrame[source]

Explode an array/JSON column into multiple rows (one row per element).

Parameters:

column – Column expression or column name to explode (must be array or JSON)
alias – Alias for the exploded value column (default: “value”)

Returns:

New DataFrame with exploded rows

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> # Note: explode() requires array/JSON support which varies by database
>>> # This example shows the API usage pattern
>>> db.create_table("users", [column("id", "INTEGER"), column("tags", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "tags": '["python", "sql"]'}], _database=db).insert_into("users")
>>> # Explode a JSON array column (database-specific support required)
>>> df = db.table("users").select()
>>> exploded = df.explode(col("tags"), alias="tag")
>>> # Each row in exploded will have one tag per row
>>> # Note: Actual execution depends on database JSON/array support
>>> db.close()

Replace null values with a specified value.

Parameters:

value – Value to use for filling nulls. Can be a single value or a dict mapping column names to values.
subset – Optional list of column names to fill. If None, fills all columns.

Returns:

New DataFrame with null values filled

Note

This uses COALESCE or CASE WHEN to replace nulls in SQL.

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT"), column("age", "INTEGER")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice", "age": None}, {"id": 2, "name": None, "age": 25}], _database=db).insert_into("users")
>>> # Fill nulls with single value
>>> df = db.table("users").select().fillna(0, subset=["age"])
>>> results = df.collect()
>>> results[0]["age"]
0
>>> # Fill nulls with different values per column
>>> df2 = db.table("users").select().fillna({"name": "Unknown", "age": 0}, subset=["name", "age"])
>>> results2 = df2.collect()
>>> results2[1]["name"]
'Unknown'
>>> db.close()

filter(predicate: Column | str) → DataFrame

Filter rows based on a condition.

Parameters:: predicate – Column expression or SQL string representing the filter condition. Can be a Column object or a SQL string like “age > 18”.
Returns:: New DataFrame with filtered rows

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT"), column("age", "INTEGER")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice", "age": 25}, {"id": 2, "name": "Bob", "age": 17}], _database=db).insert_into("users")
>>> # Filter by condition using :class:`Column`
>>> df = db.table("users").select().where(col("age") >= 18)
>>> results = df.collect()
>>> len(results)
1
>>> results[0]["name"]
'Alice'
>>> # Filter using SQL string
>>> df2 = db.table("users").select().where("age > 18")
>>> results2 = df2.collect()
>>> len(results2)
1
>>> # Multiple conditions with :class:`Column`
>>> db.create_table("orders", [column("id", "INTEGER"), column("amount", "REAL"), column("status", "TEXT")]).collect()
>>> _ = :class:`Records`(_data=[{"id": 1, "amount": 150.0, "status": "active"}, {"id": 2, "amount": 50.0, "status": "active"}], _database=db).insert_into("orders")
>>> df3 = db.table("orders").select().where((col("amount") > 100) & (col("status") == "active"))
>>> results3 = df3.collect()
>>> len(results3)
1
>>> results3[0]["amount"]
150.0
>>> db.close()

first() → Dict[str, object] | None[source]

Return the first row as a dictionary, or None if empty.

Delegates to DataFrameExecutor.

Returns:: First row as a dictionary, or None if DataFrame is empty

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}], _database=db).insert_into("users")
>>> df = db.table("users").select()
>>> first_row = df.first()
>>> first_row["name"]
'Alice'
>>> # Empty :class:`DataFrame` returns None
>>> df2 = db.table("users").select().where(col("id") > 100)
>>> df2.first() is None
True
>>> db.close()

classmethod from_sqlalchemy(select_stmt: Any, database: Any | None = None) → DataFrame[source]

Create a DataFrame from a SQLAlchemy Select statement.

This allows you to integrate existing SQLAlchemy queries with Moltres DataFrame operations. The SQLAlchemy statement is wrapped as a RawSQL logical plan, which can then be further chained with Moltres operations.

Parameters:

select_stmt – SQLAlchemy Select statement to convert
database – Optional Database instance to attach to the DataFrame. If provided, allows the DataFrame to be executed with collect().

Returns:

DataFrame that can be further chained with Moltres operations

Return type:

DataFrame

Example

>>> from sqlalchemy import create_engine, select, table, column
>>> from moltres.dataframe.dataframe import DataFrame
>>> engine = create_engine("sqlite:///:memory:")
>>> # Create a SQLAlchemy select statement
>>> users = table("users", column("id"), column("name"))
>>> sa_stmt = select(users.c.id, users.c.name).where(users.c.id > 1)
>>> # Convert to Moltres DataFrame
>>> df = DataFrame.from_sqlalchemy(sa_stmt)
>>> # Can now chain Moltres operations
>>> df2 = df.select("id")

classmethod from_table(table_handle: Any, columns: Sequence[str] | None = None) → DataFrame[source]

groupBy(*columns: Column | str) → Any

Group rows by one or more columns for aggregation.

Parameters:: *columns – Column names or Column expressions to group by
Returns:: GroupedDataFrame that can be used with aggregation functions

Example

>>> from moltres import connect, col
>>> from moltres.expressions import functions as F
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> # Group by single column
>>> db.create_table("orders", [column("customer_id", "INTEGER"), column("amount", "REAL")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"customer_id": 1, "amount": 100.0}, {"customer_id": 1, "amount": 50.0}, {"customer_id": 2, "amount": 200.0}], _database=db).insert_into("orders")
>>> df = db.table("orders").select().group_by("customer_id").agg(F.sum(col("amount")).alias("total"))
>>> results = df.collect()
>>> len(results)
2
>>> results[0]["total"]
150.0
>>> # Group by multiple columns
>>> db.create_table("sales", [column("region", "TEXT"), column("product", "TEXT"), column("revenue", "REAL")]).collect()
>>> _ = :class:`Records`(_data=[{"region": "North", "product": "A", "revenue": 100.0}, {"region": "North", "product": "A", "revenue": 50.0}], _database=db).insert_into("sales")
>>> df2 = db.table("sales").select().group_by("region", "product").agg(F.sum(col("revenue")).alias("total_revenue"), F.count("*").alias("count"))
>>> results2 = df2.collect()
>>> results2[0]["total_revenue"]
150.0
>>> results2[0]["count"]
2
>>> db.close()
... )
>>> # SQL: SELECT region, product, SUM(revenue) AS total_revenue, COUNT(*) AS count
>>> #      FROM sales GROUP BY region, product

group_by(*columns: Column | str) → Any[source]

Group rows by one or more columns for aggregation.

Parameters:: *columns – Column names or Column expressions to group by
Returns:: GroupedDataFrame that can be used with aggregation functions

Example

>>> from moltres import connect, col
>>> from moltres.expressions import functions as F
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> # Group by single column
>>> db.create_table("orders", [column("customer_id", "INTEGER"), column("amount", "REAL")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"customer_id": 1, "amount": 100.0}, {"customer_id": 1, "amount": 50.0}, {"customer_id": 2, "amount": 200.0}], _database=db).insert_into("orders")
>>> df = db.table("orders").select().group_by("customer_id").agg(F.sum(col("amount")).alias("total"))
>>> results = df.collect()
>>> len(results)
2
>>> results[0]["total"]
150.0
>>> # Group by multiple columns
>>> db.create_table("sales", [column("region", "TEXT"), column("product", "TEXT"), column("revenue", "REAL")]).collect()
>>> _ = :class:`Records`(_data=[{"region": "North", "product": "A", "revenue": 100.0}, {"region": "North", "product": "A", "revenue": 50.0}], _database=db).insert_into("sales")
>>> df2 = db.table("sales").select().group_by("region", "product").agg(F.sum(col("revenue")).alias("total_revenue"), F.count("*").alias("count"))
>>> results2 = df2.collect()
>>> results2[0]["total_revenue"]
150.0
>>> results2[0]["count"]
2
>>> db.close()
... )
>>> # SQL: SELECT region, product, SUM(revenue) AS total_revenue, COUNT(*) AS count
>>> #      FROM sales GROUP BY region, product

head(n: int = 5) → List[Dict[str, object]][source]

Return the first n rows of the DataFrame.

Delegates to DataFrameExecutor.

Parameters:: n – Number of rows to return (default: 5)
Returns:: List of row dictionaries

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> Records(_data=[{"id": i, "name": f"User{i}"} for i in range(1, 6)], _database=db).insert_into("users")
>>> df = db.table("users").select()
>>> rows = df.head(3)
>>> len(rows)
3
>>> rows[0]["id"]
1
>>> db.close()

help() → None[source]

Display interactive help showing available operations and examples.

Example

>>> from moltres import connect
>>> db = connect("sqlite:///:memory:")
>>> df = db.table("users").select()
>>> df.help()  # Prints help information

intersect(other: DataFrame) → DataFrame[source]

Intersect this DataFrame with another DataFrame (distinct rows only).

Parameters:: other – Another DataFrame to intersect with
Returns:: New DataFrame containing the intersection of rows
Raises:: RuntimeError – If DataFrames are not bound to the same Database

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("table1", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> db.create_table("table2", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}], _database=db).insert_into("table1")
>>> _ = :class:`Records`(_data=[{"id": 2, "name": "Bob"}, {"id": 3, "name": "Charlie"}], _database=db).insert_into("table2")
>>> df1 = db.table("table1").select()
>>> df2 = db.table("table2").select()
>>> # Intersect (common rows only)
>>> df_intersect = df1.intersect(df2)
>>> results = df_intersect.collect()
>>> len(results)
1
>>> results[0]["name"]
'Bob'
>>> db.close()

Join with another DataFrame.

Parameters:

other – Another DataFrame to join with
on – Join condition - can be: - A single column name (assumes same name in both DataFrames): on="order_id" - A sequence of column names (assumes same names in both): on=["col1", "col2"] - A sequence of (left_column, right_column) tuples: on=[("id", "customer_id")] - A Column expression (PySpark-style): on=[col("left_col") == col("right_col")] - A single Column expression: on=col("left_col") == col("right_col")
how – Join type (“inner”, “left”, “right”, “full”, “cross”)
lateral – If True, create a LATERAL join (PostgreSQL, MySQL 8.0+). Allows right side to reference columns from left side.
hints – Optional sequence of join hints (e.g., USE_INDEX(idx_name)). Dialect-specific: MySQL uses USE INDEX; PostgreSQL uses optimizer hint comments (/*+ ... */).

Returns:

New DataFrame containing the join result

Raises:

RuntimeError – If DataFrames are not bound to the same Database

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> # Setup tables
>>> db.create_table("customers", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> db.create_table("orders", [column("id", "INTEGER"), column("customer_id", "INTEGER"), column("amount", "REAL")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice"}], _database=db).insert_into("customers")
>>> _ = :class:`Records`(_data=[{"id": 1, "customer_id": 1, "amount": 100.0}], _database=db).insert_into("orders")
>>> # PySpark-style with :class:`Column` expressions (recommended)
>>> customers = db.table("customers").select()
>>> orders = db.table("orders").select()
>>> df = customers.join(orders, on=[col("customers.id") == col("orders.customer_id")], how="inner")
>>> results = df.collect()
>>> len(results)
1
>>> results[0]["name"]
'Alice'
>>> results[0]["amount"]
100.0
>>> # Same column name (simplest)
>>> db.create_table("items", [column("order_id", "INTEGER"), column("product", "TEXT")]).collect()
>>> _ = :class:`Records`(_data=[{"order_id": 1, "product": "Widget"}], _database=db).insert_into("items")
>>> df2 = orders.join(db.table("items").select(), on="order_id", how="inner")
>>> results2 = df2.collect()
>>> results2[0]["product"]
'Widget'
>>> # Left join
>>> _ = :class:`Records`(_data=[{"id": 2, "name": "Bob"}], _database=db).insert_into("customers")
>>> df3 = customers.join(orders, on=[col("customers.id") == col("orders.customer_id")], how="left")
>>> results3 = df3.collect()
>>> len(results3)
2
>>> db.close()
...     lateral=True
... )
>>> # SQL: SELECT * FROM customers LEFT JOIN LATERAL (SELECT * FROM orders WHERE customer_id = customers.id) ...

limit(count: int) → DataFrame[source]

Limit the number of rows returned by the query.

Parameters:: count – Maximum number of rows to return. Must be non-negative. If 0, returns an empty result set.
Returns:: New DataFrame with the limit applied
Raises:: ValueError – If count is negative

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> Records(_data=[{"id": i, "name": f"User{i}"} for i in range(1, 6)], _database=db).insert_into("users")
>>> # Limit to 3 rows
>>> df = db.table("users").select().limit(3)
>>> results = df.collect()
>>> len(results)
3
>>> # Limit with ordering
>>> db.create_table("orders", [column("id", "INTEGER"), column("amount", "REAL")]).collect()
>>> Records(_data=[{"id": i, "amount": float(i * 10)} for i in range(1, 6)], _database=db).insert_into("orders")
>>> df2 = db.table("orders").select().order_by(col("amount").desc()).limit(2)
>>> results2 = df2.collect()
>>> len(results2)
2
>>> results2[0]["amount"]
50.0
>>> db.close()

model: Any | None = None

property na: NullHandling

Access null handling methods via the na property.

Returns:: NullHandling helper object with drop() and fill() methods

Example

>>> df.na.drop()  # Drop rows with nulls
>>> df.na.fill(0)  # Fill nulls with 0

nunique(column: str | None = None) → int | Dict[str, int][source]

Count distinct values in column(s).

Delegates to StatisticsCalculator.

Parameters:: column – Column name to count. If None, counts distinct values for all columns.
Returns:: integer count of distinct values. If column is None: dictionary mapping column names to distinct counts.
Return type:: If column is specified

Example

>>> from moltres import connect, col
>>> from moltres.expressions import functions as F
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("country", "TEXT"), column("age", "INTEGER")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "country": "USA", "age": 25}, {"id": 2, "country": "USA", "age": 30}, {"id": 3, "country": "UK", "age": 25}], _database=db).insert_into("users")
>>> df = db.table("users").select()
>>> # Count distinct values in a column
>>> df.nunique("country")
2
>>> # Count distinct for all columns
>>> counts = df.nunique()
>>> counts["country"]
2
>>> db.close()

orderBy(*columns: Column | str) → DataFrame

Sort rows by one or more columns.

Parameters:: *columns – Column expressions or column names to sort by. Use .asc() or .desc() for sort order. Can be strings (column names) or Column objects.
Returns:: New DataFrame with sorted rows

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Charlie"}, {"id": 2, "name": "Alice"}, {"id": 3, "name": "Bob"}], _database=db).insert_into("users")
>>> # Sort ascending with string column name
>>> df = db.table("users").select().order_by("name")
>>> results = df.collect()
>>> results[0]["name"]
'Alice'
>>> results[1]["name"]
'Bob'
>>> # Sort descending with :class:`Column` object
>>> db.create_table("orders", [column("id", "INTEGER"), column("amount", "REAL")]).collect()
>>> _ = :class:`Records`(_data=[{"id": 1, "amount": 50.0}, {"id": 2, "amount": 100.0}, {"id": 3, "amount": 25.0}], _database=db).insert_into("orders")
>>> df2 = db.table("orders").select().order_by(col("amount").desc())
>>> results2 = df2.collect()
>>> results2[0]["amount"]
100.0
>>> # Multiple sort columns
>>> db.create_table("sales", [column("region", "TEXT"), column("amount", "REAL")]).collect()
>>> _ = :class:`Records`(_data=[{"region": "North", "amount": 100.0}, {"region": "North", "amount": 50.0}, {"region": "South", "amount": 75.0}], _database=db).insert_into("sales")
>>> df3 = db.table("sales").select().order_by("region", col("amount").desc())
>>> results3 = df3.collect()
>>> results3[0]["region"]
'North'
>>> results3[0]["amount"]
100.0
>>> db.close()

order_by(*columns: Column | str) → DataFrame[source]

Sort rows by one or more columns.

Parameters:: *columns – Column expressions or column names to sort by. Use .asc() or .desc() for sort order. Can be strings (column names) or Column objects.
Returns:: New DataFrame with sorted rows

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Charlie"}, {"id": 2, "name": "Alice"}, {"id": 3, "name": "Bob"}], _database=db).insert_into("users")
>>> # Sort ascending with string column name
>>> df = db.table("users").select().order_by("name")
>>> results = df.collect()
>>> results[0]["name"]
'Alice'
>>> results[1]["name"]
'Bob'
>>> # Sort descending with :class:`Column` object
>>> db.create_table("orders", [column("id", "INTEGER"), column("amount", "REAL")]).collect()
>>> _ = :class:`Records`(_data=[{"id": 1, "amount": 50.0}, {"id": 2, "amount": 100.0}, {"id": 3, "amount": 25.0}], _database=db).insert_into("orders")
>>> df2 = db.table("orders").select().order_by(col("amount").desc())
>>> results2 = df2.collect()
>>> results2[0]["amount"]
100.0
>>> # Multiple sort columns
>>> db.create_table("sales", [column("region", "TEXT"), column("amount", "REAL")]).collect()
>>> _ = :class:`Records`(_data=[{"region": "North", "amount": 100.0}, {"region": "North", "amount": 50.0}, {"region": "South", "amount": 75.0}], _database=db).insert_into("sales")
>>> df3 = db.table("sales").select().order_by("region", col("amount").desc())
>>> results3 = df3.collect()
>>> results3[0]["region"]
'North'
>>> results3[0]["amount"]
100.0
>>> db.close()

performance_hints() → List[str][source]

Get performance optimization hints for this query.

Returns:: List of performance optimization suggestions

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> df = db.table("users").select().where(col("id") > 1)
>>> hints = df.performance_hints()
>>> len(hints) >= 0
True

pivot(pivot_column: str, value_column: str, agg_func: str = 'sum', pivot_values: Sequence[str] | None = None) → DataFrame[source]

Pivot the DataFrame to reshape data from long to wide format.

Parameters:

pivot_column – Column to pivot on (values become column headers)
value_column – Column containing values to aggregate
agg_func – Aggregation function to apply (default: “sum”)
pivot_values – Optional list of specific values to pivot (if None, uses all distinct values)

Returns:

New DataFrame with pivoted data

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("sales", [column("date", "TEXT"), column("product", "TEXT"), column("amount", "REAL")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"date": "2024-01-01", "product": "A", "amount": 100.0}, {"date": "2024-01-01", "product": "B", "amount": 200.0}, {"date": "2024-01-02", "product": "A", "amount": 150.0}], _database=db).insert_into("sales")
>>> # Pivot sales data by product
>>> df = db.table("sales").select("date", "product", "amount")
>>> pivoted = df.pivot(pivot_column="product", value_column="amount", agg_func="sum")
>>> results = pivoted.collect()
>>> len(results) > 0
True
>>> db.close()

plan: LogicalPlan

plan_summary() → Dict[str, Any][source]

Get a structured summary of the query plan.

Returns:

operations: List of operation types in the plan
table_scans: Number of table scans
joins: Number of joins
filters: Number of filter operations
aggregations: Number of aggregation operations
depth: Maximum depth of the plan tree

Return type:

Dictionary containing plan statistics

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> df = db.table("users").select().where(col("id") > 1)
>>> summary = df.plan_summary()
>>> summary["operations"]
['TableScan', 'Filter']
>>> db.close()

polars() → Any[source]

Convert this DataFrame to a PolarsDataFrame for Polars-style operations.

Returns:: PolarsDataFrame wrapping this DataFrame

Example

>>> from moltres import connect
>>> db = connect("sqlite:///:memory:")
>>> df = db.read.csv("data.csv")
>>> polars_df = df.polars()
>>> results = polars_df.collect()

printSchema() → None[source]

Print the schema of this DataFrame in a tree format.

Delegates to SchemaInspector.

Similar to PySpark’s DataFrame.printSchema() method, this prints a formatted representation of the DataFrame’s schema.

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> df = db.table("users").select()
>>> df.printSchema()
>>> # Output: root
>>> #          |-- id: INTEGER (nullable = true)
>>> #          |-- name: TEXT (nullable = true)
>>> db.close()

print_schema() → None[source]

Print the schema of this DataFrame in a tree format (snake_case alias for printSchema).

This is an alias for printSchema(). See printSchema() for full documentation.

recursive_cte(name: str, recursive: DataFrame, union_all: bool = False) → DataFrame[source]

Create a Recursive Common Table Expression (WITH RECURSIVE) from this DataFrame.

Parameters:

name – Name for the recursive CTE
recursive – DataFrame representing the recursive part (references the CTE)
union_all – If True, use UNION ALL; if False, use UNION (distinct)

Returns:

New DataFrame representing the recursive CTE

Example

>>> # Fibonacci sequence example
>>> from moltres.expressions import functions as F
>>> initial = db.table("seed").select(F.lit(1).alias("n"), F.lit(1).alias("fib"))
>>> recursive = initial.select(...)  # Recursive part
>>> fib_cte = initial.recursive_cte("fib", recursive)

sample(fraction: float, seed: int | None = None) → DataFrame[source]

Sample a fraction of rows from the DataFrame.

Parameters:

fraction – Fraction of rows to sample (0.0 to 1.0)
seed – Optional random seed for reproducible sampling

Returns:

New DataFrame with sampled rows

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> :class:`Records`(_data=[{"id": i, "name": f"User{i}"} for i in range(1, 11)], _database=db).insert_into("users")
>>> # Sample 30% of rows with seed for reproducibility
>>> df = db.table("users").select().sample(0.3, seed=42)
>>> results = df.collect()
>>> len(results) <= 10  # Should be approximately 30% of 10 rows
True
>>> db.close()

property schema: List[Any]

Return the schema of this DataFrame as a list of ColumnInfo objects.

Delegates to SchemaInspector.

Similar to PySpark’s DataFrame.schema property, this extracts column names and types from the logical plan without requiring query execution.

Returns:: List of ColumnInfo objects with column names and types
Raises:: RuntimeError – If schema cannot be determined (e.g., RawSQL without execution)

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> df = db.table("users").select()
>>> schema = df.schema
>>> len(schema)
2
>>> schema[0].name
'id'
>>> schema[0].type_name
'INTEGER'
>>> schema[1].name
'name'
>>> db.close()

select(*columns: Column | str) → DataFrame[source]

Select specific columns from the DataFrame.

Parameters:: *columns – Column names or Column expressions to select. Use “*” to select all columns (same as empty select). Can combine “*” with other columns: select(“*”, col(“new_col”))
Returns:: New DataFrame with selected columns

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT"), column("email", "TEXT")]).collect()
>>> from moltres.io.records import Records
>>> _ = Records(_data=[{"id": 1, "name": "Alice", "email": "alice@example.com"}], _database=db).insert_into("users")
>>> # Select specific columns
>>> df = db.table("users").select("id", "name", "email")
>>> results = df.collect()
>>> results[0]["name"]
'Alice'
>>> # Select all columns (empty select)
>>> df2 = db.table("users").select()
>>> results2 = df2.collect()
>>> len(results2[0].keys())
3
>>> # Select with expressions
>>> db.create_table("orders", [column("id", "INTEGER"), column("amount", "REAL")]).collect()
>>> _ = Records(_data=[{"id": 1, "amount": 100.0}], _database=db).insert_into("orders")
>>> df3 = db.table("orders").select(col("id"), (col("amount") * 1.1).alias("amount_with_tax"))
>>> results3 = df3.collect()
>>> results3[0]["amount_with_tax"]
110.0
>>> # Select all columns plus new ones
>>> df4 = db.table("orders").select("*", (col("amount") * 1.1).alias("with_tax"))
>>> results4 = df4.collect()
>>> results4[0]["id"]
1
>>> results4[0]["with_tax"]
110.0
>>> db.close()

selectExpr(*exprs: str) → DataFrame[source]

Select columns using SQL expressions.

This method allows you to write SQL expressions directly instead of building Column objects manually, similar to PySpark’s selectExpr().

Note

A snake_case alias select_expr() is also available.

Parameters:: *exprs – SQL expression strings (e.g., “amount * 1.1 as with_tax”)
Returns:: New DataFrame with selected expressions

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("orders", [column("id", "INTEGER"), column("amount", "REAL"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "amount": 100.0, "name": "Alice"}], _database=db).insert_into("orders")
>>> # Basic column selection
>>> df = db.table("orders").selectExpr("id", "name")
>>> results = df.collect()
>>> results[0]["id"]
1
>>> # With expressions and aliases
>>> df2 = db.table("orders").selectExpr("id", "amount * 1.1 as with_tax", "UPPER(name) as name_upper")
>>> results2 = df2.collect()
>>> results2[0]["with_tax"]
110.0
>>> results2[0]["name_upper"]
'ALICE'
>>> # Chaining with other operations
>>> df3 = db.table("orders").selectExpr("id", "amount").where(col("amount") > 50)
>>> results3 = df3.collect()
>>> len(results3)
1
>>> db.close()

select_expr(*exprs: str) → DataFrame[source]

Select columns using SQL expressions (snake_case alias for selectExpr).

This is an alias for selectExpr(). See selectExpr() for full documentation.

Parameters:: *exprs – SQL expression strings (e.g., “amount * 1.1 as with_tax”)
Returns:: New DataFrame with selected expressions

select_for_share(nowait: bool = False, skip_locked: bool = False) → DataFrame[source]

Select rows with FOR SHARE lock.

This method adds a FOR SHARE clause to the SELECT statement, which locks the selected rows for shared (read) access. Other transactions can still read the rows but cannot modify them until the transaction commits.

This method works with any plan structure (joins, aggregations, sorts, etc.) by finding or creating the appropriate Project node in the plan tree.

Parameters:

nowait – If True, don’t wait for lock - raise error if rows are locked. Requires database support (PostgreSQL, MySQL 8.0+).
skip_locked – If True, skip locked rows instead of waiting or erroring. Requires database support (PostgreSQL, MySQL 8.0+).

Returns:

New DataFrame with FOR SHARE locking enabled

Raises:

ValueError – If nowait or skip_locked is requested but not supported by dialect, or if the plan structure cannot support row-level locking.

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("products", [column("id", "INTEGER"), column("stock", "INTEGER")]).collect()
>>> from moltres.io.records import Records
>>> _ = Records(_data=[{"id": 1, "stock": 10}], _database=db).insert_into("products")
>>> with db.transaction() as txn:
...     df = db.table("products").select().where(col("id") == 1)
...     locked_df = df.select_for_share()
...     results = locked_df.collect()
...     # Rows are now locked for shared access

Example with joins:

>>> orders = db.table("orders").select()
>>> customers = db.table("customers").select()
>>> joined = orders.join(customers, on=[col("orders.customer_id") == col("customers.id")])
>>> locked_joined = joined.select_for_share()
>>> results = locked_joined.collect()

select_for_update(nowait: bool = False, skip_locked: bool = False) → DataFrame[source]

Select rows with FOR UPDATE lock.

This method adds a FOR UPDATE clause to the SELECT statement, which locks the selected rows for exclusive access. Other transactions cannot read or modify the rows until the transaction commits.

This method works with any plan structure (joins, aggregations, sorts, etc.) by finding or creating the appropriate Project node in the plan tree.

Parameters:

nowait – If True, don’t wait for lock - raise error if rows are locked. Requires database support (PostgreSQL, MySQL 8.0+).
skip_locked – If True, skip locked rows instead of waiting or erroring. Requires database support (PostgreSQL, MySQL 8.0+).

Returns:

New DataFrame with FOR UPDATE locking enabled

Raises:

ValueError – If nowait or skip_locked is requested but not supported by dialect, or if the plan structure cannot support row-level locking.

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("orders", [column("id", "INTEGER"), column("status", "TEXT")]).collect()
>>> from moltres.io.records import Records
>>> _ = Records(_data=[{"id": 1, "status": "pending"}], _database=db).insert_into("orders")
>>> with db.transaction() as txn:
...     df = db.table("orders").select().where(col("status") == "pending")
...     locked_df = df.select_for_update(nowait=True)
...     results = locked_df.collect()
...     # Rows are now locked for update

Example with joins:

>>> orders = db.table("orders").select()
>>> customers = db.table("customers").select()
>>> joined = orders.join(customers, on=[col("orders.customer_id") == col("customers.id")])
>>> locked_joined = joined.select_for_update()
>>> results = locked_joined.collect()

semi_join(other: DataFrame, *, on: str | Sequence[str] | Sequence[Tuple[str, str]] | None = None) → DataFrame[source]

Perform a semi-join: return rows from this DataFrame where a matching row exists in other.

This is equivalent to filtering with EXISTS subquery.

Parameters:

other – Another DataFrame to semi-join with (used as EXISTS subquery)
on – Join condition - can be: - A single column name (assumes same name in both DataFrames) - A sequence of column names (assumes same names in both) - A sequence of (left_column, right_column) tuples

Returns:

New DataFrame containing rows from this DataFrame that have matches in other

Raises:

RuntimeError – If DataFrames are not bound to the same Database

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("customers", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> db.create_table("orders", [column("id", "INTEGER"), column("customer_id", "INTEGER")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}], _database=db).insert_into("customers")
>>> _ = :class:`Records`(_data=[{"id": 1, "customer_id": 1}], _database=db).insert_into("orders")
>>> # Find customers who have placed orders
>>> customers = db.table("customers").select()
>>> orders = db.table("orders").select()
>>> customers_with_orders = customers.semi_join(orders, on=[("id", "customer_id")])
>>> results = customers_with_orders.collect()
>>> len(results)
1
>>> results[0]["name"]
'Alice'
>>> db.close()

show(n: int = 20, truncate: bool = True, *, count_total: bool = False) → None[source]

Print the first n rows of the DataFrame.

Delegates to DataFrameExecutor.

Parameters:

n – Number of rows to show (default: 20)
truncate – If True, truncate long strings (default: True)
count_total – If True, run an extra count() query to print total row count

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}], _database=db).insert_into("users")
>>> df = db.table("users").select()
>>> df.show(2)
>>> # Output: id | name
>>> #         ---|-----
>>> #         1  | Alice
>>> #         2  | Bob
>>> db.close()

show_sql(max_length: int | None = None) → None[source]

Pretty-print the SQL query that will be executed.

Parameters:: max_length – Optional maximum length to display. If SQL is longer, shows first part with “…” indicator. If None, shows full SQL.

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> df = db.table("users").select().where(col("id") > 1)
>>> df.show_sql()  # Prints formatted SQL
>>> db.close()

sort(*columns: Column | str) → DataFrame

Sort rows by one or more columns.

Parameters:: *columns – Column expressions or column names to sort by. Use .asc() or .desc() for sort order. Can be strings (column names) or Column objects.
Returns:: New DataFrame with sorted rows

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Charlie"}, {"id": 2, "name": "Alice"}, {"id": 3, "name": "Bob"}], _database=db).insert_into("users")
>>> # Sort ascending with string column name
>>> df = db.table("users").select().order_by("name")
>>> results = df.collect()
>>> results[0]["name"]
'Alice'
>>> results[1]["name"]
'Bob'
>>> # Sort descending with :class:`Column` object
>>> db.create_table("orders", [column("id", "INTEGER"), column("amount", "REAL")]).collect()
>>> _ = :class:`Records`(_data=[{"id": 1, "amount": 50.0}, {"id": 2, "amount": 100.0}, {"id": 3, "amount": 25.0}], _database=db).insert_into("orders")
>>> df2 = db.table("orders").select().order_by(col("amount").desc())
>>> results2 = df2.collect()
>>> results2[0]["amount"]
100.0
>>> # Multiple sort columns
>>> db.create_table("sales", [column("region", "TEXT"), column("amount", "REAL")]).collect()
>>> _ = :class:`Records`(_data=[{"region": "North", "amount": 100.0}, {"region": "North", "amount": 50.0}, {"region": "South", "amount": 75.0}], _database=db).insert_into("sales")
>>> df3 = db.table("sales").select().order_by("region", col("amount").desc())
>>> results3 = df3.collect()
>>> results3[0]["region"]
'North'
>>> results3[0]["amount"]
100.0
>>> db.close()

property sql: str

Property accessor for SQL string representation.

Returns:: SQL string representation of the query (formatted for readability)

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> df = db.table("users").select().where(col("id") > 1)
>>> print(df.sql)  # Pretty-printed SQL
>>> db.close()

sql_preview(max_length: int = 200) → str[source]

Get a preview of the SQL query (first N characters).

Parameters:: max_length – Maximum length of preview (default: 200)
Returns:: SQL preview string with “…” if truncated

Example

>>> from moltres import connect, col
>>> df = db.table("users").select().where(col("id") > 1)
>>> preview = df.sql_preview()
>>> len(preview) <= 203  # 200 + "..."
True

suggest_next() → List[str][source]

Suggest logical next operations based on current DataFrame state.

Returns:: List of suggested next operations

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> df = db.table("users").select()
>>> suggestions = df.suggest_next()
>>> len(suggestions) > 0
True

summary(*statistics: str) → DataFrame[source]

Compute summary statistics for numeric columns.

Delegates to StatisticsCalculator.

Parameters:: *statistics – Statistics to compute (e.g., “count”, “mean”, “stddev”, “min”, “max”). If not provided, computes common statistics.
Returns:: DataFrame with summary statistics

Note

This is a simplified implementation. A full implementation would automatically detect numeric columns and compute all statistics.

tail(n: int = 5) → List[Dict[str, object]][source]

Return the last n rows of the DataFrame.

Delegates to DataFrameExecutor.

Parameters:: n – Number of rows to return (default: 5)
Returns:: List of row dictionaries

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> Records(_data=[{"id": i, "name": f"User{i}"} for i in range(1, 6)], _database=db).insert_into("users")
>>> df = db.table("users").select().order_by("id")
>>> rows = df.tail(2)
>>> len(rows)
2
>>> rows[0]["id"]
4
>>> rows[1]["id"]
5
>>> db.close()

take(num: int) → List[Dict[str, object]][source]

Take the first num rows as a list.

Delegates to DataFrameExecutor.

Parameters:: num – Number of rows to take
Returns:: List of dictionaries representing the rows

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> Records(_data=[{"id": i, "name": f"User{i}"} for i in range(1, 6)], _database=db).insert_into("users")
>>> df = db.table("users").select()
>>> rows = df.take(3)
>>> len(rows)
3
>>> rows[0]["id"]
1
>>> db.close()

to_sql(pretty: bool = False) → str[source]

Convert the DataFrame’s logical plan to a SQL string.

Parameters:: pretty – If True, format SQL with indentation and line breaks for readability. If False, return compact SQL string.
Returns:: SQL string representation of the query

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> df = db.table("users").select().where(col("id") > 1)
>>> sql = df.to_sql()
>>> "SELECT" in sql
True
>>> "users" in sql
True
>>> db.close()

to_sqlalchemy(dialect: str | None = None) → Any[source]

Convert DataFrame’s logical plan to a SQLAlchemy Select statement.

This method allows you to use Moltres DataFrames with existing SQLAlchemy connections, sessions, or other SQLAlchemy infrastructure.

Parameters:: dialect – Optional SQL dialect name (e.g., “postgresql”, “mysql”, “sqlite”). If not provided, uses the dialect from the attached Database, or defaults to “ansi” if no Database is attached.
Returns:: SQLAlchemy Select statement that can be executed with any SQLAlchemy connection

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> from sqlalchemy import create_engine
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> df = db.table("users").select().where(col("id") > 1)
>>> # Convert to SQLAlchemy statement
>>> stmt = df.to_sqlalchemy()
>>> # Execute with existing SQLAlchemy connection
>>> engine = create_engine("sqlite:///:memory:")
>>> with engine.connect() as conn:
...     result = conn.execute(stmt)
...     rows = result.fetchall()
>>> db.close()

union(other: DataFrame) → DataFrame[source]

Union this DataFrame with another DataFrame (distinct rows only).

Parameters:: other – Another DataFrame to union with
Returns:: New DataFrame containing the union of rows
Raises:: RuntimeError – If DataFrames are not bound to the same Database

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("table1", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> db.create_table("table2", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}], _database=db).insert_into("table1")
>>> _ = :class:`Records`(_data=[{"id": 2, "name": "Bob"}, {"id": 3, "name": "Charlie"}], _database=db).insert_into("table2")
>>> df1 = db.table("table1").select()
>>> df2 = db.table("table2").select()
>>> # Union (distinct rows only)
>>> df_union = df1.union(df2)
>>> results = df_union.collect()
>>> len(results)
3
>>> names = {r["name"] for r in results}
>>> "Alice" in names and "Bob" in names and "Charlie" in names
True
>>> db.close()

unionAll(other: DataFrame) → DataFrame[source]

Union this DataFrame with another DataFrame (all rows, including duplicates).

Parameters:: other – Another DataFrame to union with
Returns:: New DataFrame containing the union of all rows
Raises:: RuntimeError – If DataFrames are not bound to the same Database

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("table1", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> db.create_table("table2", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice"}], _database=db).insert_into("table1")
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice"}], _database=db).insert_into("table2")
>>> df1 = db.table("table1").select()
>>> df2 = db.table("table2").select()
>>> # UnionAll (all rows, including duplicates)
>>> df_union = df1.unionAll(df2)
>>> results = df_union.collect()
>>> len(results)
2
>>> db.close()

union_all(other: DataFrame) → DataFrame[source]

Union this DataFrame with another DataFrame (all rows, including duplicates) (snake_case alias for unionAll).

This is an alias for unionAll(). See unionAll() for full documentation.

Parameters:: other – Another DataFrame to union with
Returns:: New DataFrame containing the union of all rows
Raises:: RuntimeError – If DataFrames are not bound to the same Database

validate() → List[Dict[str, Any]][source]

Validate the query plan and check for common issues.

Returns:

type: “warning” or “error”
message: Description of the issue
suggestion: Optional suggestion for fixing the issue

Return type:

List of dictionaries containing validation results

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> df = db.table("users").select().where(col("id") > 1)
>>> issues = df.validate()
>>> len(issues) >= 0
True

visualize_plan() → str[source]

Create an ASCII tree visualization of the query plan.

Returns:: String containing ASCII tree representation of the plan

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> df = db.table("users").select().where(col("id") > 1)
>>> print(df.visualize_plan())
>>> db.close()

where(predicate: Column | str) → DataFrame[source]

Filter rows based on a condition.

Parameters:: predicate – Column expression or SQL string representing the filter condition. Can be a Column object or a SQL string like “age > 18”.
Returns:: New DataFrame with filtered rows

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT"), column("age", "INTEGER")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice", "age": 25}, {"id": 2, "name": "Bob", "age": 17}], _database=db).insert_into("users")
>>> # Filter by condition using :class:`Column`
>>> df = db.table("users").select().where(col("age") >= 18)
>>> results = df.collect()
>>> len(results)
1
>>> results[0]["name"]
'Alice'
>>> # Filter using SQL string
>>> df2 = db.table("users").select().where("age > 18")
>>> results2 = df2.collect()
>>> len(results2)
1
>>> # Multiple conditions with :class:`Column`
>>> db.create_table("orders", [column("id", "INTEGER"), column("amount", "REAL"), column("status", "TEXT")]).collect()
>>> _ = :class:`Records`(_data=[{"id": 1, "amount": 150.0, "status": "active"}, {"id": 2, "amount": 50.0, "status": "active"}], _database=db).insert_into("orders")
>>> df3 = db.table("orders").select().where((col("amount") > 100) & (col("status") == "active"))
>>> results3 = df3.collect()
>>> len(results3)
1
>>> results3[0]["amount"]
150.0
>>> db.close()

withColumn(colName: str, col_expr: Column | str) → DataFrame[source]

Add or replace a column in the DataFrame.

Parameters:

colName – Name of the column to add or replace
col_expr – Column expression or column name

Returns:

New DataFrame with the added/replaced column

Note

This operation adds a Project on top of the current plan. If a column with the same name exists, it will be replaced. Window functions are supported and will ensure all columns are available.

Example

>>> from moltres import connect, col
>>> from moltres.expressions import functions as F
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("orders", [column("id", "INTEGER"), column("amount", "REAL"), column("category", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "amount": 100.0, "category": "A"}, {"id": 2, "amount": 200.0, "category": "A"}], _database=db).insert_into("orders")
>>> # Add a computed column
>>> df = db.table("orders").select()
>>> df2 = df.withColumn("amount_with_tax", col("amount") * 1.1)
>>> results = df2.collect()
>>> results[0]["amount_with_tax"]
110.0
>>> # Add window function column
>>> df3 = df.withColumn("row_num", F.row_number().over(partition_by=col("category"), order_by=col("amount")))
>>> results3 = df3.collect()
>>> results3[0]["row_num"]
1
>>> results3[1]["row_num"]
2
>>> db.close()

withColumnRenamed(existing: str, new: str) → DataFrame[source]

Rename a column in the DataFrame.

Parameters:

existing – Current name of the column
new – New name for the column

Returns:

New DataFrame with the renamed column

Example

>>> from moltres import connect
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "name": "Alice"}], _database=db).insert_into("users")
>>> df = db.table("users").select().withColumnRenamed("name", "user_name")
>>> results = df.collect()
>>> "user_name" in results[0]
True
>>> results[0]["user_name"]
'Alice'
>>> db.close()

withColumns(cols_map: Dict[str, Column | str]) → DataFrame[source]

Add or replace multiple columns in the DataFrame.

Parameters:: cols_map – Dictionary mapping column names to Column expressions or column names
Returns:: New DataFrame with the added/replaced columns

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("orders", [column("id", "INTEGER"), column("amount", "REAL")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = Records(_data=[{"id": 1, "amount": 100.0}], _database=db).insert_into("orders")
>>> df = db.table("orders").select()
>>> # Add multiple columns at once
>>> df2 = df.withColumns({
...     "amount_with_tax": col("amount") * 1.1,
...     "amount_doubled": col("amount") * 2
... })
>>> results = df2.collect()
>>> results[0]["amount_with_tax"]
110.0
>>> results[0]["amount_doubled"]
200.0
>>> db.close()

with_column(colName: str, col_expr: Column | str) → DataFrame[source]

Add or replace a column in the DataFrame (snake_case alias for withColumn).

This is an alias for withColumn(). See withColumn() for full documentation.

Parameters:

colName – Name of the column to add or replace
col_expr – Column expression or column name

Returns:

New DataFrame with the added/replaced column

with_column_renamed(existing: str, new: str) → DataFrame[source]

Rename a column in the DataFrame (snake_case alias for withColumnRenamed).

This is an alias for withColumnRenamed(). See withColumnRenamed() for full documentation.

Parameters:

existing – Current name of the column
new – New name for the column

Returns:

New DataFrame with the renamed column

with_columns(cols_map: Dict[str, Column | str]) → DataFrame[source]

Add or replace multiple columns in the DataFrame (snake_case alias for withColumns).

This is an alias for withColumns(). See withColumns() for full documentation.

Parameters:: cols_map – Dictionary mapping column names to Column expressions or column names
Returns:: New DataFrame with the added/replaced columns

with_model(model: Type[Any]) → DataFrame[source]

Attach a SQLModel or Pydantic model to this DataFrame.

Delegates to ModelIntegrator.

When a model is attached, collect() will return model instances instead of dictionaries. This provides type safety and validation.

Parameters:

model – SQLModel or Pydantic model class to attach

Returns:

New DataFrame with the model attached

Raises:

TypeError – If model is not a SQLModel or Pydantic class
ImportError – If required dependencies are not installed

Example

>>> from sqlmodel import SQLModel, Field
>>> class User(SQLModel, table=True):
...     id: int = Field(primary_key=True)
...     name: str
>>> df = db.table("users").select()
>>> df_with_model = df.with_model(User)
>>> results = df_with_model.collect()  # Returns list of User instances

>>> from pydantic import BaseModel
>>> class UserData(BaseModel):
...     id: int
...     name: str
>>> df_with_pydantic = df.with_model(UserData)
>>> results = df_with_pydantic.collect()  # Returns list of UserData instances

property write: Any: Return a DataFrameWriter for writing this DataFrame to a table.

class moltres.dataframe.core.dataframe.NullHandling(df: DataFrame)[source]

Bases: object

Helper class for null handling operations on DataFrames.

Accessed via the na property on DataFrame instances.

drop(how: str = 'any', subset: Sequence[str] | None = None) → DataFrame[source]

Drop rows with null values.

This is a convenience wrapper around DataFrame.dropna().

Parameters:

how – “any” (drop if any null) or “all” (drop if all null) (default: “any”)
subset – Optional list of column names to check. If None, checks all columns.

Returns:

New DataFrame with null rows removed

Example

>>> df.na.drop()  # Drop rows with any null values
>>> df.na.drop(how="all")  # Drop rows where all values are null
>>> df.na.drop(subset=["col1", "col2"])  # Only check specific columns

Fill null values with a specified value.

This is a convenience wrapper around DataFrame.fillna().

Parameters:

value – Value to use for filling nulls. Can be a single value or a dict mapping column names to values.
subset – Optional list of column names to fill. If None, fills all columns.

Returns:

New DataFrame with null values filled

Example

>>> df.na.fill(0)  # Fill all nulls with 0
>>> df.na.fill({"col1": 0, "col2": "unknown"})  # Fill different columns with different values
>>> df.na.fill(0, subset=["col1", "col2"])  # Fill specific columns with 0

AsyncDataFrame

Async lazy DataFrame representation.

class moltres.dataframe.core.async_dataframe.AsyncDataFrame(plan: LogicalPlan, database: Any | None = None, model: Type[Any] | None = None)[source]

Bases: DataFrameHelpersMixin

Async lazy DataFrame representation.

anti_join(other: AsyncDataFrame, *, on: str | Sequence[str] | Sequence[Tuple[str, str]] | None = None) → AsyncDataFrame[source]

Perform an anti-join: return rows from this DataFrame where no matching row exists in other.

This is equivalent to filtering with NOT EXISTS subquery.

Parameters:

other – Another DataFrame to anti-join with (used as NOT EXISTS subquery)
on – Join condition - can be: - A single column name (assumes same name in both DataFrames) - A sequence of column names (assumes same names in both) - A sequence of (left_column, right_column) tuples

Returns:

New DataFrame containing rows from this DataFrame that have no matches in other

Raises:

RuntimeError – If DataFrames are not bound to the same AsyncDatabase

async collect(stream: Literal[False] = False) → List[Dict[str, object]][source]

async collect(stream: Literal[True]) → AsyncIterator[List[Dict[str, object]]]

Collect DataFrame results asynchronously.

Parameters:

stream – If True, return an async iterator of row chunks. If False (default), materialize all rows into a list.

Returns:

List of dictionaries representing rows. If stream=False and model attached: List of SQLModel or Pydantic instances. If stream=True and no model attached: AsyncIterator of row chunks (each chunk is a list of dicts). If stream=True and model attached: AsyncIterator of row chunks (each chunk is a list of model instances).

Return type:

If stream=False and no model attached

Raises:

RuntimeError – If DataFrame is not bound to an AsyncDatabase
ImportError – If model is attached but Pydantic or SQLModel is not installed

Example

>>> import asyncio
>>> from moltres import async_connect
>>> from moltres.table.schema import column
>>> async def example():
...     db = await async_connect("sqlite+aiosqlite:///:memory:")
...     await db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
...     from moltres.io.records import :class:`AsyncRecords`
...     records = :class:`AsyncRecords`(_data=[{"id": 1, "name": "Alice"}], _database=db)
...     await records.insert_into("users")
...     table_handle = await db.table("users")
...     df = table_handle.select()
...     # Collect results (non-streaming)
...     results = await df.collect()
...     len(results)
...     1
...     results[0]["name"]
...     'Alice'
...     # Collect results (streaming)
...     async for chunk in await df.collect(stream=True):
...         pass  # Process chunks
...     await db.close()
...     # asyncio.run(example())

property columns: List[str]

Return a list of column names in this DataFrame.

Similar to PySpark’s DataFrame.columns property, this extracts column names from the logical plan without requiring query execution.

Returns:: List of column name strings
Raises:: RuntimeError – If column names cannot be determined (e.g., RawSQL without execution)

Example

>>> df = await db.table("users").select()
>>> print(df.columns)  # ['id', 'name', 'email', ...]
>>> df2 = df.select("id", "name")
>>> print(df2.columns)  # ['id', 'name']

async count() → int[source]: Return the number of rows in the DataFrame.

crossJoin(other: AsyncDataFrame) → AsyncDataFrame[source]

Perform a cross join (Cartesian product) with another DataFrame.

Parameters:: other – Another DataFrame to cross join with
Returns:: New DataFrame containing the Cartesian product of rows
Raises:: RuntimeError – If DataFrames are not bound to the same AsyncDatabase

cross_join(other: AsyncDataFrame) → AsyncDataFrame[source]

Perform a cross join (Cartesian product) with another DataFrame (snake_case alias for crossJoin).

This is an alias for crossJoin(). See crossJoin() for full documentation.

Parameters:: other – Another DataFrame to cross join with
Returns:: New DataFrame containing the Cartesian product of rows
Raises:: RuntimeError – If DataFrames are not bound to the same AsyncDatabase

cte(name: str) → AsyncDataFrame[source]

Create a Common Table Expression (CTE) from this DataFrame.

Parameters:: name – Name for the CTE
Returns:: New AsyncDataFrame representing the CTE

database: Any | None = None

async describe(*cols: str) → AsyncDataFrame[source]: Compute basic statistics for numeric columns.

distinct() → AsyncDataFrame[source]: Return a new DataFrame with distinct rows.

drop(*cols: str | Column) → AsyncDataFrame[source]

Drop one or more columns from the DataFrame.

Parameters:: *cols – Column names or Column objects to drop
Returns:: New AsyncDataFrame with the specified columns removed

Example

>>> # Drop by string column name
>>> await df.drop("col1", "col2").collect()
>>> # Drop by :class:`Column` object
>>> await df.drop(col("col1"), col("col2")).collect()
>>> # Mixed usage
>>> await df.drop("col1", col("col2")).collect()

dropDuplicates(subset: Sequence[str] | None = None) → AsyncDataFrame[source]: Return a new DataFrame with duplicate rows removed.

drop_duplicates(subset: Sequence[str] | None = None) → AsyncDataFrame[source]

Return a new DataFrame with duplicate rows removed (snake_case alias for dropDuplicates).

This is an alias for dropDuplicates(). See dropDuplicates() for full documentation.

Parameters:: subset – Optional list of column names to consider when identifying duplicates. If None, all columns are considered.
Returns:: New DataFrame with duplicates removed

dropna(how: str = 'any', subset: Sequence[str] | None = None) → AsyncDataFrame[source]: Remove rows with null values.

property dtypes: List[Tuple[str, str]]

Return a list of tuples containing column names and their data types.

Similar to PySpark’s DataFrame.dtypes property, this returns a list of (column_name, type_name) tuples.

Returns:: List of tuples (column_name, type_name)
Raises:: RuntimeError – If schema cannot be determined (e.g., RawSQL without execution)

Example

>>> df = await db.table("users").select()
>>> print(df.dtypes)
# [('id', 'INTEGER'), ('name', 'VARCHAR(255)'), ('email', 'VARCHAR(255)')]

except_(other: AsyncDataFrame) → AsyncDataFrame[source]: Return rows in this DataFrame that are not in another DataFrame (distinct rows only).

explode(column: Column | str, alias: str = 'value') → AsyncDataFrame[source]

Explode an array/JSON column into multiple rows (one row per element).

Parameters:

column – Column expression or column name to explode (must be array or JSON)
alias – Alias for the exploded value column (default: “value”)

Returns:

New AsyncDataFrame with exploded rows

Example

>>> from moltres import async_connect, col
>>> from moltres.table.schema import column
>>> db = await async_connect("sqlite+aiosqlite:///:memory:")
>>> # Note: explode() requires array/JSON support which varies by database
>>> # This example shows the API usage pattern
>>> await db.create_table("users", [column("id", "INTEGER"), column("tags", "TEXT")]).collect()
>>> from moltres.io.records import AsyncRecords
>>> _ = await AsyncRecords(_data=[{"id": 1, "tags": '["python", "sql"]'}], _database=db).insert_into("users")
>>> # Explode a JSON array column (database-specific support required)
>>> table_handle = await db.table("users")
>>> df = table_handle.select()
>>> exploded = df.explode(col("tags"), alias="tag")
>>> # Each row in exploded will have one tag per row
>>> # Note: Actual execution depends on database JSON/array support
>>> await db.close()

fillna(value: bool | int | float | str | None | Dict[str, bool | int | float | str | None], subset: Sequence[str] | None = None) → AsyncDataFrame[source]: Replace null values with a specified value.

filter(predicate: Column | str) → AsyncDataFrame

Filter rows based on a predicate.

Parameters:: predicate – Column expression or SQL string representing the filter condition. Can be a Column object or a SQL string like “age > 18”.
Returns:: New AsyncDataFrame with filtered rows

Example

>>> import asyncio
>>> from moltres import async_connect, col
>>> from moltres.table.schema import column
>>> async def example():
...     db = await async_connect("sqlite+aiosqlite:///:memory:")
...     await db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT"), column("age", "INTEGER")]).collect()
...     from moltres.io.records import :class:`AsyncRecords`
...     records = :class:`AsyncRecords`(_data=[{"id": 1, "name": "Alice", "age": 25}, {"id": 2, "name": "Bob", "age": 17}], _database=db)
...     await records.insert_into("users")
...     # Filter by condition using :class:`Column`
...     table_handle = await db.table("users")
...     df = table_handle.select().where(col("age") >= 18)
...     results = await df.collect()
...     len(results)
...     1
...     results[0]["name"]
...     'Alice'
...     await db.close()
...     # asyncio.run(example())

async first() → Dict[str, object] | None[source]: Return the first row as a dictionary, or None if empty.

classmethod from_sqlalchemy(select_stmt: Any, database: Any | None = None) → AsyncDataFrame[source]

Create an AsyncDataFrame from a SQLAlchemy Select statement.

This allows you to integrate existing SQLAlchemy queries with Moltres AsyncDataFrame operations. The SQLAlchemy statement is wrapped as a RawSQL logical plan, which can then be further chained with Moltres operations.

Parameters:

select_stmt – SQLAlchemy Select statement to convert
database – Optional AsyncDatabase instance to attach to the DataFrame. If provided, allows the DataFrame to be executed with collect().

Returns:

AsyncDataFrame that can be further chained with Moltres operations

Example

>>> from sqlalchemy import create_engine, select, table, column
>>> from moltres import AsyncDataFrame
>>> engine = create_engine("sqlite:///:memory:")
>>> # Create a SQLAlchemy select statement
>>> users = table("users", column("id"), column("name"))
>>> sa_stmt = select(users.c.id, users.c.name).where(users.c.id > 1)
>>> # Convert to Moltres AsyncDataFrame
>>> df = AsyncDataFrame.from_sqlalchemy(sa_stmt)
>>> # Can now chain Moltres operations
>>> df2 = df.select("id")

classmethod from_table(table_handle: Any, columns: Sequence[str] | None = None) → AsyncDataFrame[source]: Create an AsyncDataFrame from a table handle.

groupBy(*columns: Column | str) → Any

Group by the specified columns.

Example

>>> import asyncio
>>> from moltres import async_connect, col
>>> from moltres.expressions import functions as F
>>> from moltres.table.schema import column
>>> async def example():
...     db = await async_connect("sqlite+aiosqlite:///:memory:")
...     await db.create_table("sales", [column("category", "TEXT"), column("amount", "REAL")]).collect()
...     from moltres.io.records import :class:`AsyncRecords`
...     records = :class:`AsyncRecords`(_data=[{"category": "A", "amount": 100.0}, {"category": "A", "amount": 200.0}, {"category": "B", "amount": 150.0}], _database=db)
...     await records.insert_into("sales")
...     table_handle = await db.table("sales")
...     df = table_handle.select()
...     grouped = df.group_by("category")
...     result = grouped.agg(F.sum(col("amount")).alias("total"))
...     results = await result.collect()
...     len(results)
...     2
...     await db.close()
...     # asyncio.run(example())

group_by(*columns: Column | str) → Any[source]

Group by the specified columns.

Example

>>> import asyncio
>>> from moltres import async_connect, col
>>> from moltres.expressions import functions as F
>>> from moltres.table.schema import column
>>> async def example():
...     db = await async_connect("sqlite+aiosqlite:///:memory:")
...     await db.create_table("sales", [column("category", "TEXT"), column("amount", "REAL")]).collect()
...     from moltres.io.records import :class:`AsyncRecords`
...     records = :class:`AsyncRecords`(_data=[{"category": "A", "amount": 100.0}, {"category": "A", "amount": 200.0}, {"category": "B", "amount": 150.0}], _database=db)
...     await records.insert_into("sales")
...     table_handle = await db.table("sales")
...     df = table_handle.select()
...     grouped = df.group_by("category")
...     result = grouped.agg(F.sum(col("amount")).alias("total"))
...     results = await result.collect()
...     len(results)
...     2
...     await db.close()
...     # asyncio.run(example())

async head(n: int = 5) → List[Dict[str, object]][source]: Return the first n rows as a list.

intersect(other: AsyncDataFrame) → AsyncDataFrame[source]: Intersect this DataFrame with another DataFrame (distinct rows only).

Join with another DataFrame.

Parameters:

other – Another DataFrame to join with
on – Join condition - can be: - A single column name (assumes same name in both DataFrames): on="order_id" - A sequence of column names (assumes same names in both): on=["col1", "col2"] - A sequence of (left_column, right_column) tuples: on=[("id", "customer_id")] - A Column expression (PySpark-style): on=[col("left_col") == col("right_col")] - A single Column expression: on=col("left_col") == col("right_col")
how – Join type (“inner”, “left”, “right”, “full”, “cross”)

Returns:

New AsyncDataFrame containing the join result

Example

>>> import asyncio
>>> from moltres import async_connect, col
>>> from moltres.table.schema import column
>>> async def example():
...     db = await async_connect("sqlite+aiosqlite:///:memory:")
...     await db.create_table("customers", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
...     await db.create_table("orders", [column("id", "INTEGER"), column("customer_id", "INTEGER"), column("amount", "REAL")]).collect()
...     from moltres.io.records import :class:`AsyncRecords`
...     records1 = :class:`AsyncRecords`(_data=[{"id": 1, "name": "Alice"}], _database=db)
...     await records1.insert_into("customers")
...     records2 = :class:`AsyncRecords`(_data=[{"id": 1, "customer_id": 1, "amount": 100.0}], _database=db)
...     await records2.insert_into("orders")
...     # PySpark-style join
...     customers_table = await db.table("customers")
...     orders_table = await db.table("orders")
...     customers_df = customers_table.select()
...     orders_df = orders_table.select()
...     df = customers_df.join(orders_df, on=[col("customers.id") == col("orders.customer_id")], how="inner")
...     results = await df.collect()
...     len(results)
...     1
...     results[0]["name"]
...     'Alice'
...     await db.close()
...     # asyncio.run(example())

limit(count: int) → AsyncDataFrame[source]: Limit the number of rows returned.

model: Type[Any] | None = None

property na: AsyncNullHandling

Access null handling methods via the na property.

Returns:: AsyncNullHandling helper object with drop() and fill() methods

Example

>>> await df.na.drop().collect()  # Drop rows with nulls
>>> await df.na.fill(0).collect()  # Fill nulls with 0

async nunique(column: str | None = None) → int | Dict[str, int][source]

Count distinct values in column(s).

Parameters:: column – Column name to count. If None, counts distinct values for all columns.
Returns:: integer count of distinct values. If column is None: dictionary mapping column names to distinct counts.
Return type:: If column is specified

Example

>>> from moltres import connect, col
>>> from moltres.expressions import functions as F
>>> from moltres.table.schema import column
>>> db = await connect("sqlite:///:memory:")
>>> await db.create_table("users", [column("id", "INTEGER"), column("country", "TEXT"), column("age", "INTEGER")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "country": "USA", "age": 25}, {"id": 2, "country": "USA", "age": 30}, {"id": 3, "country": "UK", "age": 25}], _database=db).insert_into("users")
>>> df = await db.table("users").select()
>>> # Count distinct values in a column
>>> await df.nunique("country")
2
>>> # Count distinct for all columns
>>> counts = await df.nunique()
>>> counts["country"]
2
>>> await db.close()

orderBy(*columns: Column | str) → AsyncDataFrame

Sort rows by one or more columns.

Parameters:: *columns – Column expressions or column names to sort by. Use .asc() or .desc() for sort order. Can be strings (column names) or Column objects.
Returns:: New AsyncDataFrame with sorted rows

Example

>>> from moltres import col
>>> # Sort ascending with string column name
>>> df = await db.table("users").select().order_by("name")
>>> # SQL: SELECT * FROM users ORDER BY name

>>> # Sort ascending with :class:`Column` object
>>> df = await db.table("users").select().order_by(col("name"))
>>> # SQL: SELECT * FROM users ORDER BY name

>>> # Sort descending
>>> df = await db.table("orders").select().order_by(col("amount").desc())
>>> # SQL: SELECT * FROM orders ORDER BY amount DESC

>>> # Multiple sort columns (mixed string and :class:`Column`)
>>> df = (
...     await db.table("sales")
...     .select()
...     .order_by("region", col("amount").desc())
... )
>>> # SQL: SELECT * FROM sales ORDER BY region, amount DESC

order_by(*columns: Column | str) → AsyncDataFrame[source]

Sort rows by one or more columns.

Parameters:: *columns – Column expressions or column names to sort by. Use .asc() or .desc() for sort order. Can be strings (column names) or Column objects.
Returns:: New AsyncDataFrame with sorted rows

Example

>>> from moltres import col
>>> # Sort ascending with string column name
>>> df = await db.table("users").select().order_by("name")
>>> # SQL: SELECT * FROM users ORDER BY name

>>> # Sort ascending with :class:`Column` object
>>> df = await db.table("users").select().order_by(col("name"))
>>> # SQL: SELECT * FROM users ORDER BY name

>>> # Sort descending
>>> df = await db.table("orders").select().order_by(col("amount").desc())
>>> # SQL: SELECT * FROM orders ORDER BY amount DESC

>>> # Multiple sort columns (mixed string and :class:`Column`)
>>> df = (
...     await db.table("sales")
...     .select()
...     .order_by("region", col("amount").desc())
... )
>>> # SQL: SELECT * FROM sales ORDER BY region, amount DESC

pivot(pivot_column: str, value_column: str, agg_func: str = 'sum', pivot_values: Sequence[str] | None = None) → AsyncDataFrame[source]

Pivot the DataFrame to reshape data from long to wide format.

Parameters:

pivot_column – Column to pivot on (values become column headers)
value_column – Column containing values to aggregate
agg_func – Aggregation function to apply (default: “sum”)
pivot_values – Optional list of specific values to pivot (if None, uses all distinct values)

Returns:

New AsyncDataFrame with pivoted data

Example

>>> from moltres import async_connect
>>> from moltres.table.schema import column
>>> db = await async_connect("sqlite+aiosqlite:///:memory:")
>>> await db.create_table("sales", [column("date", "TEXT"), column("product", "TEXT"), column("amount", "REAL")]).collect()
>>> from moltres.io.records import AsyncRecords
>>> _ = await AsyncRecords(_data=[{"date": "2024-01-01", "product": "A", "amount": 100.0}, {"date": "2024-01-01", "product": "B", "amount": 200.0}, {"date": "2024-01-02", "product": "A", "amount": 150.0}], _database=db).insert_into("sales")
>>> # Pivot sales data by product
>>> table_handle = await db.table("sales")
>>> df = table_handle.select("date", "product", "amount")
>>> pivoted = df.pivot(pivot_column="product", value_column="amount", agg_func="sum")
>>> results = await pivoted.collect()
>>> len(results) > 0
True
>>> await db.close()

plan: LogicalPlan

printSchema() → None[source]

Print the schema of this DataFrame in a tree format.

Similar to PySpark’s DataFrame.printSchema() method, this prints a formatted representation of the DataFrame’s schema.

Example

>>> df = await db.table("users").select()
>>> df.printSchema()
# root
#  |-- id: INTEGER (nullable = true)
#  |-- name: VARCHAR(255) (nullable = true)
#  |-- email: VARCHAR(255) (nullable = true)

Note

Currently, nullable information is not available from the schema, so it’s always shown as nullable = true.

print_schema() → None[source]

Print the schema of this DataFrame in a tree format (snake_case alias for printSchema).

This is an alias for printSchema(). See printSchema() for full documentation.

recursive_cte(name: str, recursive: AsyncDataFrame, union_all: bool = False) → AsyncDataFrame[source]

Create a Recursive Common Table Expression (WITH RECURSIVE) from this DataFrame.

Parameters:

name – Name for the recursive CTE
recursive – AsyncDataFrame representing the recursive part (references the CTE)
union_all – If True, use UNION ALL; if False, use UNION (distinct)

Returns:

New AsyncDataFrame representing the recursive CTE

Example

>>> # Fibonacci sequence example
>>> from moltres.expressions import functions as F
>>> initial = await db.table("seed").select(F.lit(1).alias("n"), F.lit(1).alias("fib"))
>>> recursive = initial.select(...)  # Recursive part
>>> fib_cte = initial.recursive_cte("fib", recursive)

sample(fraction: float, seed: int | None = None) → AsyncDataFrame[source]

Sample a fraction of rows from the DataFrame.

Parameters:

fraction – Fraction of rows to sample (0.0 to 1.0)
seed – Optional random seed for reproducible sampling

Returns:

New AsyncDataFrame with sampled rows

Example

>>> df = await db.table("users").select().sample(0.1)  # Sample 10% of rows
>>> # SQL (PostgreSQL): SELECT * FROM users TABLESAMPLE BERNOULLI(10)
>>> # SQL (SQLite): SELECT * FROM users ORDER BY RANDOM() LIMIT (COUNT(*) * 0.1)

property schema: List['ColumnInfo']

Return the schema of this DataFrame as a list of ColumnInfo objects.

Similar to PySpark’s DataFrame.schema property, this extracts column names and types from the logical plan without requiring query execution.

Returns:: List of ColumnInfo objects with column names and types
Raises:: RuntimeError – If schema cannot be determined (e.g., RawSQL without execution)

Example

>>> df = await db.table("users").select()
>>> schema = df.schema
>>> for col_info in schema:
...     print(f"{col_info.name}: {col_info.type_name}")
# id: INTEGER
# name: VARCHAR(255)
# email: VARCHAR(255)

select(*columns: Column | str) → AsyncDataFrame[source]

Select columns from the DataFrame.

Parameters:: *columns – Column names or Column expressions to select. Use “*” to select all columns (same as empty select). Can combine “*” with other columns: select(“*”, col(“new_col”))
Returns:: New AsyncDataFrame with selected columns

Example

>>> import asyncio
>>> from moltres import async_connect, col
>>> from moltres.table.schema import column
>>> async def example():
...     db = await async_connect("sqlite+aiosqlite:///:memory:")
...     await db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT"), column("email", "TEXT")]).collect()
...     from moltres.io.records import :class:`AsyncRecords`
...     records = :class:`AsyncRecords`(_data=[{"id": 1, "name": "Alice", "email": "alice@example.com"}], _database=db)
...     await records.insert_into("users")
...     # Select specific columns
...     table_handle = await db.table("users")
...     df = table_handle.select("id", "name")
...     results = await df.collect()
...     results[0]["name"]
...     'Alice'
...     await db.close()
...     # asyncio.run(example())

selectExpr(*exprs: str) → AsyncDataFrame[source]

Select columns using SQL expressions (async version).

This method allows you to write SQL expressions directly instead of building Column objects manually, similar to PySpark’s selectExpr().

Parameters:: *exprs – SQL expression strings (e.g., “amount * 1.1 as with_tax”)
Returns:: New AsyncDataFrame with selected expressions

Example

>>> import asyncio
>>> from moltres import async_connect
>>> from moltres.table.schema import column
>>> async def example():
...     db = await async_connect("sqlite+aiosqlite:///:memory:")
...     await db.create_table("orders", [column("id", "INTEGER"), column("amount", "REAL"), column("name", "TEXT")]).collect()
...     from moltres.io.records import :class:`AsyncRecords`
...     records = :class:`AsyncRecords`(_data=[{"id": 1, "amount": 100.0, "name": "Alice"}], _database=db)
...     await records.insert_into("orders")
...     # With expressions and aliases
...     table_handle = await db.table("orders")
...     df = table_handle.select()
...     df2 = df.selectExpr("id", "amount * 1.1 as with_tax", "UPPER(name) as name_upper")
...     results = await df2.collect()
...     results[0]["with_tax"]
...     110.0
...     await db.close()
...     # asyncio.run(example())

select_expr(*exprs: str) → AsyncDataFrame[source]

Select columns using SQL expressions (snake_case alias for selectExpr).

This is an alias for selectExpr(). See selectExpr() for full documentation.

Parameters:: *exprs – SQL expression strings (e.g., “amount * 1.1 as with_tax”)
Returns:: New AsyncDataFrame with selected expressions

select_for_share(nowait: bool = False, skip_locked: bool = False) → AsyncDataFrame[source]

Select rows with FOR SHARE lock.

This method adds a FOR SHARE clause to the SELECT statement, which locks the selected rows for shared (read) access. Other transactions can still read the rows but cannot modify them until the transaction commits.

This method works with any plan structure (joins, aggregations, sorts, etc.) by finding or creating the appropriate Project node in the plan tree.

Parameters:

nowait – If True, don’t wait for lock - raise error if rows are locked. Requires database support (PostgreSQL, MySQL 8.0+).
skip_locked – If True, skip locked rows instead of waiting or erroring. Requires database support (PostgreSQL, MySQL 8.0+).

Returns:

New AsyncDataFrame with FOR SHARE locking enabled

Raises:

ValueError – If nowait or skip_locked is requested but not supported by dialect, or if the plan structure cannot support row-level locking.

Example

>>> from moltres import async_connect, col
>>> from moltres.table.schema import column
>>> db = async_connect("sqlite+aiosqlite:///:memory:")
>>> await db.create_table("products", [column("id", "INTEGER"), column("stock", "INTEGER")]).collect()
>>> from moltres.io.records import AsyncRecords
>>> _ = await AsyncRecords(_data=[{"id": 1, "stock": 10}], _database=db).insert_into("products")
>>> async with db.transaction() as txn:
...     table_handle = await db.table("products")
...     df = table_handle.select().where(col("id") == 1)
...     locked_df = df.select_for_share()
...     results = await locked_df.collect()
...     # Rows are now locked for shared access

Example with joins:

>>> orders = await db.table("orders").select()
>>> customers = await db.table("customers").select()
>>> joined = orders.join(customers, on=[col("orders.customer_id") == col("customers.id")])
>>> locked_joined = joined.select_for_share()
>>> results = await locked_joined.collect()

select_for_update(nowait: bool = False, skip_locked: bool = False) → AsyncDataFrame[source]

Select rows with FOR UPDATE lock.

This method adds a FOR UPDATE clause to the SELECT statement, which locks the selected rows for update until the transaction commits or rolls back. This is useful for preventing concurrent modifications.

Parameters:

nowait – If True, don’t wait for lock - raise error if rows are locked. Requires database support (PostgreSQL, MySQL 8.0+).
skip_locked – If True, skip locked rows instead of waiting or erroring. Requires database support (PostgreSQL, MySQL 8.0+).

Returns:

New AsyncDataFrame with FOR UPDATE locking enabled

Raises:

ValueError – If nowait or skip_locked is requested but not supported by dialect.

Example

>>> from moltres import async_connect, col
>>> from moltres.table.schema import column
>>> db = async_connect("sqlite+aiosqlite:///:memory:")
>>> await db.create_table("orders", [column("id", "INTEGER"), column("status", "TEXT")]).collect()
>>> from moltres.io.records import AsyncRecords
>>> _ = await AsyncRecords(_data=[{"id": 1, "status": "pending"}], _database=db).insert_into("orders")
>>> async with db.transaction() as txn:
...     table_handle = await db.table("orders")
...     df = table_handle.select().where(col("status") == "pending")
...     locked_df = df.select_for_update(nowait=True)
...     results = await locked_df.collect()
...     # Rows are now locked for update

semi_join(other: AsyncDataFrame, *, on: str | Sequence[str] | Sequence[Tuple[str, str]] | None = None) → AsyncDataFrame[source]

Perform a semi-join: return rows from this DataFrame where a matching row exists in other.

This is equivalent to filtering with EXISTS subquery.

Parameters:

other – Another DataFrame to semi-join with (used as EXISTS subquery)
on – Join condition - can be: - A single column name (assumes same name in both DataFrames) - A sequence of column names (assumes same names in both) - A sequence of (left_column, right_column) tuples

Returns:

New DataFrame containing rows from this DataFrame that have matches in other

Raises:

RuntimeError – If DataFrames are not bound to the same AsyncDatabase

async show(n: int = 20, truncate: bool = True, *, count_total: bool = False) → None[source]: Print the first n rows of the DataFrame.

sort(*columns: Column | str) → AsyncDataFrame

Sort rows by one or more columns.

Parameters:: *columns – Column expressions or column names to sort by. Use .asc() or .desc() for sort order. Can be strings (column names) or Column objects.
Returns:: New AsyncDataFrame with sorted rows

Example

>>> from moltres import col
>>> # Sort ascending with string column name
>>> df = await db.table("users").select().order_by("name")
>>> # SQL: SELECT * FROM users ORDER BY name

>>> # Sort ascending with :class:`Column` object
>>> df = await db.table("users").select().order_by(col("name"))
>>> # SQL: SELECT * FROM users ORDER BY name

>>> # Sort descending
>>> df = await db.table("orders").select().order_by(col("amount").desc())
>>> # SQL: SELECT * FROM orders ORDER BY amount DESC

>>> # Multiple sort columns (mixed string and :class:`Column`)
>>> df = (
...     await db.table("sales")
...     .select()
...     .order_by("region", col("amount").desc())
... )
>>> # SQL: SELECT * FROM sales ORDER BY region, amount DESC

async summary(*statistics: str) → AsyncDataFrame[source]: Compute summary statistics for numeric columns.

async take(num: int) → List[Dict[str, object]][source]: Take the first num rows as a list.

to_sql() → str[source]: Compile the DataFrame plan to SQL.

to_sqlalchemy(dialect: str | None = None) → Any[source]

Convert AsyncDataFrame’s logical plan to a SQLAlchemy Select statement.

This method allows you to use Moltres AsyncDataFrames with existing SQLAlchemy async connections, sessions, or other SQLAlchemy infrastructure.

Parameters:: dialect – Optional SQL dialect name (e.g., “postgresql”, “mysql”, “sqlite”). If not provided, uses the dialect from the attached AsyncDatabase, or defaults to “ansi” if no AsyncDatabase is attached.
Returns:: SQLAlchemy Select statement that can be executed with any SQLAlchemy connection

Example

>>> from moltres import async_connect, col
>>> from moltres.table.schema import column
>>> from sqlalchemy.ext.asyncio import create_async_engine
>>> async def example():
...     db = await async_connect("sqlite+aiosqlite:///:memory:")
...     await db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT")]).collect()
...     df = await db.table("users")
...     df = df.select().where(col("id") > 1)
...     # Convert to SQLAlchemy statement
...     stmt = df.to_sqlalchemy()
...     # Execute with existing SQLAlchemy async connection
...     engine = create_async_engine("sqlite+aiosqlite:///:memory:")
...     async with engine.connect() as conn:
...         result = await conn.execute(stmt)
...         rows = result.fetchall()
...     await db.close()

union(other: AsyncDataFrame) → AsyncDataFrame[source]: Union this DataFrame with another DataFrame (distinct rows only).

unionAll(other: AsyncDataFrame) → AsyncDataFrame[source]: Union this DataFrame with another DataFrame (all rows, including duplicates).

union_all(other: AsyncDataFrame) → AsyncDataFrame[source]

Union this DataFrame with another DataFrame (all rows, including duplicates) (snake_case alias for unionAll).

This is an alias for unionAll(). See unionAll() for full documentation.

Parameters:: other – Another DataFrame to union with
Returns:: New DataFrame containing the union of all rows
Raises:: RuntimeError – If DataFrames are not bound to the same AsyncDatabase

where(predicate: Column | str) → AsyncDataFrame[source]

Filter rows based on a predicate.

Parameters:: predicate – Column expression or SQL string representing the filter condition. Can be a Column object or a SQL string like “age > 18”.
Returns:: New AsyncDataFrame with filtered rows

Example

>>> import asyncio
>>> from moltres import async_connect, col
>>> from moltres.table.schema import column
>>> async def example():
...     db = await async_connect("sqlite+aiosqlite:///:memory:")
...     await db.create_table("users", [column("id", "INTEGER"), column("name", "TEXT"), column("age", "INTEGER")]).collect()
...     from moltres.io.records import :class:`AsyncRecords`
...     records = :class:`AsyncRecords`(_data=[{"id": 1, "name": "Alice", "age": 25}, {"id": 2, "name": "Bob", "age": 17}], _database=db)
...     await records.insert_into("users")
...     # Filter by condition using :class:`Column`
...     table_handle = await db.table("users")
...     df = table_handle.select().where(col("age") >= 18)
...     results = await df.collect()
...     len(results)
...     1
...     results[0]["name"]
...     'Alice'
...     await db.close()
...     # asyncio.run(example())

withColumn(colName: str, col_expr: Column | str) → AsyncDataFrame[source]

Add or replace a column in the DataFrame.

Parameters:

colName – Name of the column to add or replace
col_expr – Column expression or column name

Returns:

New AsyncDataFrame with the added/replaced column

Note

This operation adds a Project on top of the current plan. If a column with the same name exists, it will be replaced. Window functions are supported and will ensure all columns are available.

Example

>>> from moltres.expressions import functions as F
>>> from moltres.expressions.window import Window
>>> window = Window.partition_by("category").order_by("amount")
>>> await df.withColumn("row_num", F.row_number().over(window)).collect()

withColumnRenamed(existing: str, new: str) → AsyncDataFrame[source]: Rename a column in the DataFrame.

withColumns(cols_map: Dict[str, Column | str]) → AsyncDataFrame[source]

Add or replace multiple columns in the DataFrame.

Parameters:: cols_map – Dictionary mapping column names to Column expressions or column names
Returns:: New AsyncDataFrame with the added/replaced columns

Example

>>> from moltres import connect, col
>>> from moltres.table.schema import column
>>> db = await connect("sqlite:///:memory:")
>>> await db.create_table("orders", [column("id", "INTEGER"), column("amount", "REAL")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> _ = :class:`Records`(_data=[{"id": 1, "amount": 100.0}], _database=db).insert_into("orders")
>>> df = await db.table("orders").select()
>>> # Add multiple columns at once
>>> df2 = await df.withColumns({
...     "amount_with_tax": col("amount") * 1.1,
...     "amount_doubled": col("amount") * 2
... })
>>> results = await df2.collect()
>>> results[0]["amount_with_tax"]
110.0
>>> results[0]["amount_doubled"]
200.0
>>> await db.close()

with_column(colName: str, col_expr: Column | str) → AsyncDataFrame[source]

Add or replace a column in the DataFrame (snake_case alias for withColumn).

This is an alias for withColumn(). See withColumn() for full documentation.

Parameters:

colName – Name of the column to add or replace
col_expr – Column expression or column name

Returns:

New AsyncDataFrame with the added/replaced column

with_column_renamed(existing: str, new: str) → AsyncDataFrame[source]

Rename a column in the DataFrame (snake_case alias for withColumnRenamed).

This is an alias for withColumnRenamed(). See withColumnRenamed() for full documentation.

Parameters:

existing – Current name of the column
new – New name for the column

Returns:

New DataFrame with the renamed column

with_columns(cols_map: Dict[str, Column | str]) → AsyncDataFrame[source]

Add or replace multiple columns in the DataFrame (snake_case alias for withColumns).

This is an alias for withColumns(). See withColumns() for full documentation.

Parameters:: cols_map – Dictionary mapping column names to Column expressions or column names
Returns:: New AsyncDataFrame with the added/replaced columns

with_model(model: Type[Any]) → AsyncDataFrame[source]

Attach a SQLModel or Pydantic model to this AsyncDataFrame.

When a model is attached, collect() will return model instances instead of dictionaries. This provides type safety and validation.

Parameters:

model – SQLModel or Pydantic model class to attach

Returns:

New AsyncDataFrame with the model attached

Raises:

TypeError – If model is not a SQLModel or Pydantic class
ImportError – If required dependencies are not installed

Example

>>> from sqlmodel import SQLModel, Field
>>> class User(SQLModel, table=True):
...     id: int = Field(primary_key=True)
...     name: str
>>> df = await db.table("users")
>>> df = df.select()
>>> df_with_model = df.with_model(User)
>>> results = await df_with_model.collect()  # Returns list of User instances

>>> from pydantic import BaseModel
>>> class UserData(BaseModel):
...     id: int
...     name: str
>>> df_with_pydantic = df.with_model(UserData)
>>> results = await df_with_pydantic.collect()  # Returns list of UserData instances

property write: Any: Return an AsyncDataFrameWriter for writing this DataFrame to a table.

class moltres.dataframe.core.async_dataframe.AsyncNullHandling(df: AsyncDataFrame)[source]

Bases: object

Helper class for null handling operations on AsyncDataFrames.

Accessed via the na property on AsyncDataFrame instances.

drop(how: str = 'any', subset: Sequence[str] | None = None) → AsyncDataFrame[source]

Drop rows with null values.

This is a convenience wrapper around AsyncDataFrame.dropna().

Parameters:

how – “any” (drop if any null) or “all” (drop if all null) (default: “any”)
subset – Optional list of column names to check. If None, checks all columns.

Returns:

New AsyncDataFrame with null rows removed

Example

>>> await df.na.drop().collect()  # Drop rows with any null values
>>> await df.na.drop(how="all").collect()  # Drop rows where all values are null
>>> await df.na.drop(subset=["col1", "col2"]).collect()  # Only check specific columns

Fill null values with a specified value.

This is a convenience wrapper around AsyncDataFrame.fillna().

Parameters:

value – Value to use for filling nulls. Can be a single value or a dict mapping column names to values.
subset – Optional list of column names to fill. If None, fills all columns.

Returns:

New AsyncDataFrame with null values filled

Example

>>> await df.na.fill(0).collect()  # Fill all nulls with 0
>>> await df.na.fill({"col1": 0, "col2": "unknown"}).collect()  # Fill different columns with different values
>>> await df.na.fill(0, subset=["col1", "col2"]).collect()  # Fill specific columns with 0

pandas() → Any[source]

Convert this AsyncDataFrame to an AsyncPandasDataFrame for Pandas-style operations.

Returns:: AsyncPandasDataFrame wrapping this AsyncDataFrame

Example

>>> from moltres import async_connect
>>> db = await async_connect("sqlite+aiosqlite:///:memory:")
>>> df = await db.load.csv("data.csv")
>>> pandas_df = df.pandas()
>>> results = await pandas_df.collect()

polars() → Any[source]

Convert this AsyncDataFrame to an AsyncPolarsDataFrame for Polars-style operations.

Returns:: AsyncPolarsDataFrame wrapping this AsyncDataFrame

Example

>>> from moltres import async_connect
>>> db = await async_connect("sqlite+aiosqlite:///:memory:")
>>> df = await db.load.csv("data.csv")
>>> polars_df = df.polars()
>>> results = await polars_df.collect()

GroupedDataFrame

Grouped DataFrame helper.

class moltres.dataframe.groupby.groupby.GroupedDataFrame(plan: LogicalPlan, keys: tuple[Column, ...], parent: DataFrame)[source]

Bases: object

Represents a DataFrame grouped by one or more columns.

This is returned by DataFrame.group_by() and provides aggregation methods.

agg(*aggregations: Column | str | Dict[str, str], allow_empty: bool = False) → DataFrame[source]

Apply aggregation functions to the grouped data.

Parameters:

*aggregations –

One or more aggregation expressions. Can be:

Column expressions (e.g., sum(col("amount")))
String column names (e.g., "amount" — defaults to sum())
Dictionary mapping column names to aggregation functions (e.g., {"amount": "sum", "price": "avg"})

Returns:

DataFrame with aggregated results

Raises:

ValueError – If no aggregations are provided or if invalid aggregation expressions are used

Example

>>> from moltres import connect, col
>>> from moltres.expressions import functions as F
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("sales", [column("category", "TEXT"), column("amount", "REAL"), column("price", "REAL")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> :class:`Records`(_data=[{"category": "A", "amount": 100.0, "price": 10.0}, {"category": "A", "amount": 200.0, "price": 20.0}, {"category": "B", "amount": 150.0, "price": 15.0}], _database=db).insert_into("sales")
>>> # Using :class:`Column` expressions
>>> df = db.table("sales").select()
>>> result = df.group_by("category").agg(F.sum(col("amount")).alias("total"), F.avg(col("price")).alias("avg_price"))
>>> results = result.collect()
>>> len(results)
2
>>> results[0]["total"]
300.0
>>> # Using string column names (defaults to sum)
>>> result2 = df.group_by("category").agg("amount")
>>> results2 = result2.collect()
>>> results2[0]["amount"]
300.0
>>> # Using dictionary syntax
>>> result3 = df.group_by("category").agg({"amount": "sum", "price": "avg"})
>>> results3 = result3.collect()
>>> results3[0]["amount"]
300.0
>>> db.close()

keys: tuple[Column, ...]

parent: DataFrame

pivot(pivot_col: str, values: Sequence[str] | None = None) → PivotedGroupedDataFrame[source]

Pivot the grouped data on a column.

Parameters:

pivot_col – Column to pivot on (values become column headers)
values – Optional list of specific values to pivot (if None, must be provided later or discovered)

Returns:

PivotedGroupedDataFrame that can be aggregated

Example

>>> df.group_by("category").pivot("status").agg("amount")
>>> df.group_by("category").pivot("status", values=["active", "inactive"]).agg("amount")

plan: LogicalPlan

class moltres.dataframe.groupby.groupby.PivotedGroupedDataFrame(plan: LogicalPlan, keys: tuple[Column, ...], pivot_column: str, pivot_values: tuple[str, ...] | None, parent: DataFrame)[source]

Bases: object

Represents a DataFrame grouped by columns with a pivot operation applied.

This is returned by GroupedDataFrame.pivot() and provides aggregation methods that will create pivoted columns.

agg(*aggregations: Column | str | Dict[str, str]) → DataFrame[source]

Apply aggregation functions to the pivoted grouped data.

Parameters:

*aggregations –

One or more aggregation expressions. Can be:

Column expressions (e.g., sum(col("amount")))
String column names (e.g., "amount" — defaults to sum())
Dictionary mapping column names to aggregation functions (e.g., {"amount": "sum", "price": "avg"})

Returns:

DataFrame with pivoted aggregated results

Raises:

ValueError – If no aggregations are provided or if invalid aggregation expressions are used

Example

>>> from moltres import col
>>> from moltres.expressions import functions as F
>>> # Using string column name
>>> df.group_by("category").pivot("status").agg("amount")

>>> # Using :class:`Column` expression
>>> df.group_by("category").pivot("status").agg(F.sum(col("amount")))

>>> # With specific pivot values
>>> df.group_by("category").pivot("status", values=["active", "inactive"]).agg("amount")

keys: tuple[Column, ...]

parent: DataFrame

pivot_column: str

pivot_values: tuple[str, ...] | None

plan: LogicalPlan