Tim Dumol Tim Dumol - 4 months ago 5x
Python Question

How to TABLESAMPLE with SQLAlchemy?

Is it currently possible to

on some of the tables in an ORM query in SQLAlchemy?

The closest I could find is
, but that places the suffix on the end of the query, rather than on the end of a table.

If it helps, this is my use case:

My query is currently:

query = session.query(A).options(
).filter(A.id >= min_id, A.id < max_id, [...])

and I would like to
on the table
so that I could efficiently load only a subset of
and their corresponding relations, with some filters. I am using PostgreSQL 9.5 as my database.



SQLAlchemy 1.1 and newer support TABLESAMPLE for selectables. FromClause elements have a method FromClause.tablesample() (think Tables) and sqlalchemy.tablesample() function can be used with all selectables, for example declarative model classes:

from sqlalchemy import tablesample, func
from sqlalchemy.orm import aliased

# Create an alias for A that uses SYSTEM sampling method (default)
a_sampled = aliased(A, tablesample(A, 2.5))

# Create an alias for A that uses BERNOULLI sampling method
a_bernoulli = aliased(A, tablesample(A, func.bernoulli(2.5)))

There's a slight asymmetry in that tablesample(A, ...), where A is a declarative class, returns a TableSample from-clause, which has to be then aliased if used as a model. Otherwise it acts like a FromClause, e.g. the Table underlying the model class.


There seems to be no discussion about TABLESAMPLE support at the time of writing this. As SQLAlchemy is very extensible, here's a very simple (read: may not work for all use cases) implementation of tablesample support:

from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql import Alias, FromClause
from sqlalchemy.orm import aliased

class TableSample(Alias):

    __visit_name__ = 'tablesample'

    def __init__(self, selectable, argument=None, method=None,
        super(TableSample, self).__init__(selectable)
        self.method = method
        self.argument = argument
        self.seed = seed 

def tablesample(element, argument=None, method=None, seed=None):
    if isinstance(element, FromClause):
        return TableSample(element, argument=argument, method=method, seed=seed)
        return aliased(element, TableSample(element.__table__,
            argument=argument, method=method, seed=seed))

def compile_tablesample(element, compiler, **kwargs):
    s = "%s TABLESAMPLE %s(%s)" % (
        compiler.visit_alias(element, **kwargs),
        element.method or 'SYSTEM',

    if element.seed:
        s += " REPEATABLE (%s)" % compiler.process(element.seed, **kwargs)

    return s

The argument should be for the time being just a float representing a percentage between 0 and 100 for convenience, though PostgreSQL would accept any real-valued expression. The seed on the other hand is compiled and so literal python values have to be wrapped with literal() or the like.

It should be used in similar fashion to aliased:

a_sampled = tablesample(A, 2.5)

query = session.query(a_sampled).options(
).filter(a_sampled.id >= min_id, a_sampled.id < max_id, [...])