Apache Calcite Interview Questions and Answers

Apache Calcite is a dynamic data management framework.

It contains many of the pieces that comprise a typical database management system, but omits some key functions: storage of data, algorithms to process data, and a repository for storing metadata.

Calcite intentionally stays out of the business of storing and processing data. As we shall see, this makes it an excellent choice for mediating between applications and one or more data storage locations and data processing engines. It is also a perfect foundation for building a database: just add data.


1)How to push down project, filter, aggregation to TableScan in Calcite


Answer)Creating a new RelOptRule is the way to go. Note that you shouldn't be trying directly remove any nodes inside a rule. Instead, you match a subtree that contains the nodes you want to replace (for example, a Filter on top of a TableScan). And then replace that entire subtree with an equivalent node which pushes down the filter.

This is normally handled by creating a subclass of the relevant operation which conforms to the calling convention of the particular adapter. For example, in the Cassandra adapter, there is a CassandraFilterRule which matches a LogicalFilter on top of a CassandraTableScan. The convert function then constructs a CassandraFilter instance. The CassandraFilter instance sets up the necessary information so that when the query is actually issued, the filter is available.

Browsing some of the code for the Cassandra, MongoDB, or Elasticsearch adapters may be helpful as they are on the simpler side. I would also suggest bringing this to the mailing list as you'll probably get more detailed advice there.


2)I would like to use the apache calcite api raw without using jdbc connections. I can use the jdbc api just fine but I am getting null pointer exceptions when trying to use the api.


Answer)There's some crazy stuff going on here apparently. You need to pass internalParameters that you get out of the prepare call into your DataContext, and look them up in get. Apparently Calcite uses this to pass the query object around. You probably want to implement the other DataContext keys (current time, etc) as well.


final class MyDataContext(rootSchema: SchemaPlus, map: util.Map[String, Object])

extends DataContext {

override def get(name: String): AnyRef = map.get(name)


...

}


// ctx is your AdapterContext from above

val prepared = new CalcitePrepareImpl().prepareSql(ctx, query, classOf[Array[Object]], -1)

val dataContext = new DerpDataContext(

ctx.getRootSchema.plus(),

prepared.internalParameters

)


3)How to change Calcite's default sql grammar, to support such sql statement "select func(id) as (a, b, c) from xx;"


Answer)To change the grammar accepted by the SQL parser, you will need to change the parser. There are two ways of doing this.

The first is to fork the project and change the core grammar, Parser.jj. But as always when you fork a project, you are responsible for re-applying your changes each time you upgrade to a new version of the project.

The second is to use one of the grammar expansion points provided by the Calcite project. Calcite's grammar is written in JavaCC, but the it first runs the grammar though the FreeMarker template engine. The expansion points are variables in the template that your project can re-assign. For example, if you want to add a new DDL command, you can modify the createStatementParserMethods variable, as is done in Calcite's parser extension test:

# List of methods for parsing extensions to "CREATE [OR REPLACE]" calls.

# Each must accept arguments "(Span span, boolean replace)".

createStatementParserMethods: [

"SqlCreateTable"

]

Which of these approaches to use? Definitely use the second if you can, that is, if your grammar change occurs in one of the pre-defined expansion points. Use the first if only if you must, because you will run into the problem of maintaining a fork of the grammar.

If possible, see whether Calcite will accept the changes as a contribution. This is the ideal scenario for you, because Calcite will take on responsibility for maintaining your grammar extension. But they probably will only accept your change if it is standard SQL or a useful feature implemented by one or more major databases. And they will require your code to be high quality and accompanied by tests.


4)I have a simple application that does text substitution on literals in the WHERE clause of a SELECT statement. I run SqlParser.parseQuery() and apply .getWhere() to the result.

However, for the following query the root node is not an SqlSelect, but an SqlOrderBy:


select EventID, Subject

from WorkOrder

where OwnerID = 100 and Active = 1 and Type = 2

order by Subject

If we use "group by" instead of "order by" then the root is an SqlSelect as expected.


Is this the intended behaviour?


Answer)Yes, this is intended. ORDER BY is not really a clause of SELECT. Consider


SELECT deptno FROM Emp

UNION

SELECT deptno FROM Dept

ORDER BY 1

The ORDER BY clause applies to the whole UNION, not to the second SELECT. Therefore we made it a standalone node.

When you ask Calcite to parse a query, the top-level nodes returned can be a SqlSelect (SELECT), SqlOrderBy (ORDER BY), SqlBasicCall (UNION, INTERSECT, EXCEPT or VALUES) or SqlWith (WITH).



Launch your GraphyLaunch your Graphy
100K+ creators trust Graphy to teach online
Learn Bigdata, Spark & Machine Learning | SmartDataCamp 2024 Privacy policy Terms of use Contact us Refund policy