DataFrame API

The DataFrame API provides a PySpark-like interface for querying SQL databases. All operations are lazy and compile to SQL that runs directly on your database. Use this reference alongside the getting-started and patterns guides.

AsyncDataFrame

GroupedDataFrame

GroupBy implementations for DataFrame operations.

class moltres.dataframe.groupby.AsyncGroupedDataFrame(plan: LogicalPlan, database: 'AsyncDatabase' | None = None)[source]

Bases: object

Represents a grouped DataFrame for async aggregation operations.

agg(*aggregates: Column | str | Dict[str, str]) AsyncDataFrame[source]

Apply aggregate functions to the grouped data.

Parameters:

*aggregates

Aggregation expressions. Can be: - Column expressions (e.g., sum(col(“amount”))) - String column names (e.g., “amount” - defaults to sum()) - Dictionary mapping column names to aggregation functions

(e.g., {“amount”: “sum”, “price”: “avg”})

Returns:

AsyncDataFrame with aggregated results

Example

>>> from moltres import col
>>> from moltres.expressions import functions as F
>>> # Using :class:`Column` expressions
>>> await df.group_by("category").agg(F.sum(col("amount")).alias("total"))
>>> # Using string column names (defaults to sum)
>>> await df.group_by("category").agg("amount", "price")
>>> # Using dictionary syntax
>>> await df.group_by("category").agg({"amount": "sum", "price": "avg"})
pivot(pivot_col: str, values: Sequence[str] | None = None) AsyncPivotedGroupedDataFrame[source]

Pivot the grouped data on a column.

Parameters:
  • pivot_colColumn to pivot on (values become column headers)

  • values – Optional list of specific values to pivot (if None, must be provided later or discovered)

Returns:

AsyncPivotedGroupedDataFrame that can be aggregated

Example

>>> await df.group_by("category").pivot("status").agg("amount")
>>> await df.group_by("category").pivot("status", values=["active", "inactive"]).agg("amount")
class moltres.dataframe.groupby.AsyncPandasGroupBy(_grouped: AsyncGroupedDataFrame)[source]

Bases: object

Async Pandas-style GroupBy wrapper around Moltres AsyncGroupedDataFrame.

Provides pandas-style groupby API with dictionary aggregation support.

agg(**aggregations: str | Dict[str, str]) AsyncPandasDataFrame[source]

Apply aggregations using pandas-style dictionary syntax.

Parameters:

**aggregationsColumn names mapped to aggregation functions or dicts

Returns:

AsyncPandasDataFrame with aggregated results

Example

>>> await df.groupby('country').agg(amount='sum', price='mean')
>>> await df.groupby('country').agg({'amount': 'sum', 'price': ['mean', 'max']})
count() AsyncPandasDataFrame[source]

Count rows in each group.

first() AsyncPandasDataFrame[source]

Get first value of each column in each group.

last() AsyncPandasDataFrame[source]

Get last value of each column in each group.

max() AsyncPandasDataFrame[source]

Maximum value of all columns in each group.

mean() AsyncPandasDataFrame[source]

Mean of all numeric columns in each group.

Returns:

AsyncPandasDataFrame with mean of all numeric columns for each group

Note

This attempts to average all columns. For better control, use agg() with specific columns.

min() AsyncPandasDataFrame[source]

Minimum value of all columns in each group.

nunique() AsyncPandasDataFrame[source]

Count distinct values for all columns in each group.

size() AsyncPandasDataFrame[source]

Count rows in each group (alias for count).

sum() AsyncPandasDataFrame[source]

Sum all numeric columns in each group.

Returns:

AsyncPandasDataFrame with sum of all numeric columns for each group

Note

This attempts to sum all columns. For better control, use agg() with specific columns.

class moltres.dataframe.groupby.AsyncPolarsGroupBy(_grouped: AsyncGroupedDataFrame)[source]

Bases: object

Async Polars-style GroupBy wrapper around Moltres AsyncGroupedDataFrame.

Provides Polars-style groupby API with expression-based aggregations.

agg(*exprs: Column | Dict[str, str]) AsyncPolarsDataFrame[source]

Apply aggregations using Polars-style expressions.

Parameters:

*exprsColumn expressions for aggregations, or dictionary mapping column names to function names

Returns:

AsyncPolarsDataFrame with aggregated results

Example

>>> await df.group_by('country').agg(col('amount').sum(), col('price').mean())
>>> await df.group_by('country').agg({"amount": "sum", "price": "avg"})  # Dictionary syntax
count() AsyncPolarsDataFrame[source]

Count rows in each group.

first() AsyncPolarsDataFrame[source]

Get first value of each column in each group.

last() AsyncPolarsDataFrame[source]

Get last value of each column in each group.

max() AsyncPolarsDataFrame[source]

Maximum value of all columns in each group.

mean() AsyncPolarsDataFrame[source]

Mean of all numeric columns in each group.

Returns:

AsyncPolarsDataFrame with mean of all numeric columns for each group

Note

This attempts to average all columns. For better control, use agg() with specific columns.

min() AsyncPolarsDataFrame[source]

Minimum value of all columns in each group.

n_unique() AsyncPolarsDataFrame[source]

Count distinct values for all columns in each group.

std() AsyncPolarsDataFrame[source]

Standard deviation of all numeric columns in each group.

sum() AsyncPolarsDataFrame[source]

Sum all numeric columns in each group.

Returns:

AsyncPolarsDataFrame with sum of all numeric columns for each group

var() AsyncPolarsDataFrame[source]

Variance of all numeric columns in each group.

class moltres.dataframe.groupby.GroupedDataFrame(plan: LogicalPlan, keys: tuple[Column, ...], parent: DataFrame)[source]

Bases: object

Represents a DataFrame grouped by one or more columns.

This is returned by DataFrame.group_by() and provides aggregation methods.

agg(*aggregations: Column | str | Dict[str, str]) DataFrame[source]

Apply aggregation functions to the grouped data.

Parameters:

*aggregations

One or more aggregation expressions. Can be: - Column expressions (e.g., sum(col(“amount”))) - String column names (e.g., “amount” - defaults to sum()) - Dictionary mapping column names to aggregation functions

(e.g., {“amount”: “sum”, “price”: “avg”})

Returns:

DataFrame with aggregated results

Raises:

ValueError – If no aggregations are provided or if invalid aggregation expressions are used

Example

>>> from moltres import connect, col
>>> from moltres.expressions import functions as F
>>> from moltres.table.schema import column
>>> db = connect("sqlite:///:memory:")
>>> db.create_table("sales", [column("category", "TEXT"), column("amount", "REAL"), column("price", "REAL")]).collect()
>>> from moltres.io.records import :class:`Records`
>>> :class:`Records`(_data=[{"category": "A", "amount": 100.0, "price": 10.0}, {"category": "A", "amount": 200.0, "price": 20.0}, {"category": "B", "amount": 150.0, "price": 15.0}], _database=db).insert_into("sales")
>>> # Using :class:`Column` expressions
>>> df = db.table("sales").select()
>>> result = df.group_by("category").agg(F.sum(col("amount")).alias("total"), F.avg(col("price")).alias("avg_price"))
>>> results = result.collect()
>>> len(results)
2
>>> results[0]["total"]
300.0
>>> # Using string column names (defaults to sum)
>>> result2 = df.group_by("category").agg("amount")
>>> results2 = result2.collect()
>>> results2[0]["amount"]
300.0
>>> # Using dictionary syntax
>>> result3 = df.group_by("category").agg({"amount": "sum", "price": "avg"})
>>> results3 = result3.collect()
>>> results3[0]["amount"]
300.0
>>> db.close()
keys: tuple[Column, ...]
parent: DataFrame
pivot(pivot_col: str, values: Sequence[str] | None = None) PivotedGroupedDataFrame[source]

Pivot the grouped data on a column.

Parameters:
  • pivot_colColumn to pivot on (values become column headers)

  • values – Optional list of specific values to pivot (if None, must be provided later or discovered)

Returns:

PivotedGroupedDataFrame that can be aggregated

Example

>>> df.group_by("category").pivot("status").agg("amount")
>>> df.group_by("category").pivot("status", values=["active", "inactive"]).agg("amount")
plan: LogicalPlan
class moltres.dataframe.groupby.PandasGroupBy(_grouped: GroupedDataFrame)[source]

Bases: object

Pandas-style GroupBy wrapper around Moltres GroupedDataFrame.

Provides pandas-style groupby API with dictionary aggregation support.

agg(**aggregations: str | Dict[str, str]) PandasDataFrame[source]

Apply aggregations using pandas-style dictionary syntax.

Parameters:

**aggregationsColumn names mapped to aggregation functions or dicts

Returns:

PandasDataFrame with aggregated results

Example

>>> df.groupby('country').agg(amount='sum', price='mean')
>>> df.groupby('country').agg({'amount': 'sum', 'price': ['mean', 'max']})
count() PandasDataFrame[source]

Count rows in each group.

first() PandasDataFrame[source]

Get first value of each column in each group.

last() PandasDataFrame[source]

Get last value of each column in each group.

max() PandasDataFrame[source]

Maximum value of all columns in each group.

mean() PandasDataFrame[source]

Mean of all numeric columns in each group.

Returns:

PandasDataFrame with mean of all numeric columns for each group

Note

This attempts to average all columns. For better control, use agg() with specific columns.

min() PandasDataFrame[source]

Minimum value of all columns in each group.

nunique() PandasDataFrame[source]

Count distinct values for all columns in each group.

size() PandasDataFrame[source]

Count rows in each group (alias for count).

sum() PandasDataFrame[source]

Sum all numeric columns in each group.

Returns:

PandasDataFrame with sum of all numeric columns for each group

Note

This attempts to sum all columns. For better control, use agg() with specific columns.

class moltres.dataframe.groupby.PolarsGroupBy(_grouped: GroupedDataFrame)[source]

Bases: object

Polars-style GroupBy wrapper around Moltres GroupedDataFrame.

Provides Polars-style groupby API with expression-based aggregations.

agg(*exprs: Column | Dict[str, str]) PolarsDataFrame[source]

Apply aggregations using Polars-style expressions.

Parameters:

*exprsColumn expressions for aggregations, or dictionary mapping column names to function names

Returns:

PolarsDataFrame with aggregated results

Example

>>> df.group_by('country').agg(col('amount').sum(), col('price').mean())
>>> df.group_by('country').agg({"amount": "sum", "price": "avg"})  # Dictionary syntax
count() PolarsDataFrame[source]

Count rows in each group.

first() PolarsDataFrame[source]

Get first value of each column in each group.

last() PolarsDataFrame[source]

Get last value of each column in each group.

max() PolarsDataFrame[source]

Maximum value of all columns in each group.

mean() PolarsDataFrame[source]

Mean of all numeric columns in each group.

Returns:

PolarsDataFrame with mean of all numeric columns for each group

Note

This attempts to average all columns. For better control, use agg() with specific columns.

min() PolarsDataFrame[source]

Minimum value of all columns in each group.

n_unique() PolarsDataFrame[source]

Count distinct values for all columns in each group.

std() PolarsDataFrame[source]

Standard deviation of all numeric columns in each group.

sum() PolarsDataFrame[source]

Sum all numeric columns in each group.

Returns:

PolarsDataFrame with sum of all numeric columns for each group

var() PolarsDataFrame[source]

Variance of all numeric columns in each group.