inferno inferno - 5 months ago 47
Java Question

How to push down project, filter, aggregation to TableScan in Calcite

I am using Apache Calcite to implement a distributed OLAP system, which datasource is RDBMS. So I want to push down the project/filter/aggregation in

tree to
MyTableScan extends TableScan
. In
, a
to get the pushed
. At last,
to generate the Query to the source database. At the same time, the project/filter/aggregation in original
tree should be moved or modified.

As I known, Calcite does not support this feature.

Current limitations: The JDBC adapter currently only pushes down table scan operations; all other processing (filtering, joins, aggregations and so forth) occurs within Calcite. Our goal is to push down as much processing as possible to the source system, translating syntax, data types and built-in functions as we go. If a Calcite query is based on tables from a single JDBC database, in principle the whole query should go to that database. If tables are from multiple JDBC sources, or a mixture of JDBC and non-JDBC, Calcite will use the most efficient distributed query approach that it can.

In my opinion,
may be a good choice. Unfortunately, when I create new
, I can not easily find the parent node to remove a node.

is a good choice? Anyone has a good idea to implement this feature?



Creating a new RelOptRule is the way to go. Note that you shouldn't be trying directly remove any nodes inside a rule. Instead, you match a subtree that contains the nodes you want to replace (for example, a Filter on top of a TableScan). And then replace that entire subtree with an equivalent node which pushes down the filter.

This is normally handled by creating a subclass of the relevant operation which conforms to the calling convention of the particular adapter. For example, in the Cassandra adapter, there is a CassandraFilterRule which matches a LogicalFilter on top of a CassandraTableScan. The convert function then constructs a CassandraFilter instance. The CassandraFilter instance sets up the necessary information so that when the query is actually issued, the filter is available.

Browsing some of the code for the Cassandra, MongoDB, or Elasticsearch adapters may be helpful as they are on the simpler side. I would also suggest bringing this to the mailing list as you'll probably get more detailed advice there.