A major component of Exago BI’s report execution process is our SQL generation engine. Exago—I’ll drop the “BI” for brevity from here on out—can effectively serve as a relational database, performing operations in memory when necessary. However, we still want to leverage computation at the data source when we can be confident that it will be more performant than doing the same operations in memory. In recent years, we have seen increased demand for reporting on large datasets, including those stored in data warehouses like Snowflake and Amazon Redshift. Given this trend, we have decided to give ongoing investment to the improvement of our SQL generation engine. What follows are three new enhancements released in Exago v2020.1 that let certain types of reports take much better advantage of summarization in the database.
While report designers have been able to group data using Exago’s Excel-like formula language for years, adding such a group would disqualify reports from in-database summarization, forcing Exago to query for the unsummarized dataset and reduce it in memory. As of v2020.1, Exago now supports converting group formulas to SQL in order to do formula-based summaries right in the query.
Take the common use case of summarizing the number of records in each year. Imagine we have a table of episodes from a popular science fiction program featuring a distinguished former member of the Royal Shakespeare Company. If we wanted to build a report to count the number of episodes that aired per year, Exago can now generate SQL that looks like the following:
GROUP BY YEAR(Episode.AirDate)
This leverages our ability to convert Exago formulas used on a report to SQL expressions, and can even work with administrator-defined functions if the appropriate SQL mapping is provided. Reports can also use groups based on more complex formulas and still take advantage of this feature.
Summarizing formula calculations
Often, a report designer wants to summarize not just on a database column, but on a calculation. Let’s say we want to build a report that summarizes in how many episodes each character appeared, giving “extra points” to our favorite episode in the series. Simply design a report using the appropriate summary function, and Exago will now push the calculation to the database, generating SQL like the following:
SUM(CASE WHEN Episode.Name = ‘The Inner Light’ THEN 2 ELSE 1 END)
INNER JOIN Episode
GROUP BY Character.ID
There are some restrictions to the formula that can be summarized; for example, if the summarized formula contains a cell reference that refers to a “more outer” group section than the calculation cell, summarization will happen in memory. But many real-world use cases will be able to realize the benefit.
Multiple data sources and grouping
Before v2020.1, Exago would unconditionally disqualify a report from in-database summarization if the report used data from more than one data source. This is no longer true, and if the report otherwises qualifies for in-database summarization, Exago will leverage the database when it can guarantee correctness of the resulting data set.
A big win for report performance
Any one of the aforementioned enhancements can improve report execution runtime by orders of magnitude for certain cases. These types of improvements are exciting for the team, and represent a lot of work carefully considering the many different ways our clients use the application, ensuring correctness in all cases, and striving to have automated test coverage for all permutations of our SQL generation features.
If you’re interested in observing the new features, turn on the “Show SQL Window” configuration setting, and open the “Show SQL” dialog in the Advanced Designer to preview the query that will be generated for a given report design.