“People often fail to optimize their stored procedures because they don’t understand the performance implications from a joining perspective.” – Stewart Meyers, CTO
Stored procedures (SPs) are a type of programmable data object found in relational databases, and their programmability makes them a powerful means of manipulating tables for reporting purposes. They’re more powerful than views, which can only join, filter, and perform basic calculations on tables. But there’s a catch: stored procedures cannot be joined to each other or to other data objects in the database. This makes them different from other types of data objects — different enough to cause performance issues if used unwisely.
This guide explains how Exago BI handles SPs and offers best practices for leveraging them, helping admins and DBAs circumvent performance problems before they happen.
Best Practice: Use SPs to create a data warehouse rather than standalone data objects.
SPs are well utilized as part of data normalization and ETL, a grooming process that prepares raw, transactional data for reporting. They can be programmed to eliminate table redundancy, scrub the data, manipulate it, and generally make it more intelligible to analysts. More importantly, however, an SP can either return a single data set or multiple data sets. When a SP is used for ETL and returns multiple tables, the resulting dataset can be stored and used as a data warehouse.
A major downside to using data warehouses, however, is data latency. While the raw data is always current and up-to-date, the warehouse is only current right after the SP executes. Even if the SP is set to update the warehouse multiple times per day, there will still be data latency between executions.
One way to combat this problem is by triggering the SP to execute as soon as a change (or a set of specific changes) has been made to the raw data. For example, if a database collects information for a set of hospitals, it might not be as important to have real-time updates on employee addresses as it would be to have current patient health statuses and room numbers. In this case, a latency trigger would be placed on patient data objects so that changes in patient data would cause the SP to execute and update the data warehouse, preventing reporting information from going stale.
This is the most performant means of utilizing SPs, as it enables Exago BI to report directly off of the warehouse tables. In the event, however, that it becomes necessary to report off SPs directly (as opposed to off SP-generated data warehouse tables), SP joining restrictions come back into play. Once the data set is passed from the database over the network to the Exago BI server, Exago BI treats it like any other object in that admins can join it to tables, views, or even other SPs if they like. Ideally, these SPs would suffice on their own and not need to be joined to other objects.
If this is the case, there are a couple of ways to optimize the SP’s performance. Because databases don’t support the filtering of SPs natively, you’ll want to configure Exago’s Programmable Object Settings (see Filtering SPs below) to boost filtering efficiency. It’s also a good idea to program the SP using the columnStr parameter, which can be used to limit the number of columns returned to a select subset of the total. With both of these measures taken, a standalone SP will pass only the requisite data over the network in response to a query.
If, however, admins do need to join the SPs to other objects, it’s important to understand that in most cases, this means that the SP-generated data set must be passed over the network in its entirety each time it is called, even if only a few rows will appear on the report. This not only clogs up the network but also taxes the server, which must then allocate processing power to joining the SP locally.
The following sections offer tips for mitigating such performance pitfalls.
Best Practice: Use “high-overlap” joins.
If the top figure in the image below represents a “high-overlap” inner join between two SPs, then the bottom figure represents a “low-overlap” inner join. High-overlap and low-overlap aren’t industry terms, but for the purposes of this discussion, a high-overlap join is one where a high percentage of rows from both objects remain after they are joined to each other, and a low-overlap join is its converse.
Let’s say that SP A produces a data set with 50K rows, and SP B returns a data set with 55K rows. If there’s a lot of overlap between these SPs, as depicted in the top figure, then there’s not that much more data coming over the network than is needed for the report output. For example, if the SPs have 40K rows in common, only 24% of the transferred data goes to waste((40K common rows x 2 SPs) / 105K total rows x 100 = 76% of rows used). This constitutes a fairly efficient call to the database.
If, however, there’s little overlap between SPs A and B, as depicted in the bottom figure, then the database call becomes profoundly inefficient, and the network has to manage a lot of traffic for a minimal return. If, for example, these SPs have only 17 rows in common, then over 99% of the data goes to waste ((17 common rows x 2 SPs) / 105K total rows x 100 = 0.03% of rows used).
Avoiding low-overlap joins between stored procedures increases the efficiency of calls to the database and helps prevent lags in performance.
Best Practice: Use Exago BI’s Programmable Object Settings to filter and, where necessary, combine datasets in the database.
Without the help of Exago BI’s Programmable Object Settings, configured in the Admin Console, filtering an SP joined to another SP would require returning both SPs from the database in their entirety, joining them, and then filtering the result. If the join and filter were highly exclusive, displaying only a few rows of data in the report output, the process would be deeply inefficient and tax both the network and the server.
Programmable Object Settings can help shift the workload from the server to the database. The two settings of interest in this case are Filter Parameter Name and Full Filter Parameter Name. These are the names of the SP parameters Exago BI should pass its filter strings through. With Filter Parameter Name, the data passed is limited to the object being called, but with Full Filter Parameter Name, it must apply to all objects — not just the one being called, but all programmable objects joined to it as well. Using a universal filter parameter requires some foresight on the part of the DBA or programmer writing the SPs.
To see how these settings improve processing time, let’s say that we have only the Filter Parameter Name set in the admin console. SPs A and B are joined together, and we’re passing a filter to each SP using an AND operator (e.g., SELECT * WHERE SP_A.City = “Los Angeles” AND SP_B.Date = 4/5/16). In this instance, because the filter string is inclusive and Filter Parameter Name is enabled, Exago BI can inject the filter query into the SP’s code and have the database handle filtering.
If, however, the filter is complex (contains grouping logic) or is exclusive (includes an OR operator), just having Filter Parameter Name isn’t enough to alleviate strain on the network and server. This is because in complex and exclusive filtering situations, the objects must be joined before they are filtered, and SPs cannot be joined in the database. This forces both joining and filtering to happen on the server.
Providing a Full Filter Parameter Name is the best way around this. Identifying a universal filter parameter in Exago BI allows the application to inject both an all-inclusive filter string into an SP as well as modify its joining code, sidestepping the SP join limitation by causing SPs to merge during execution rather than after. As a result, both filtering and joining can happen in the database, regardless of filter complexity.
Best Practice: If using MS SQL Server, use Table-Valued Functions instead of SPs.
Table-Valued Functions (TVFs), available only in MS SQL Server, are programmable data objects that can join to other objects in the database. Exago BI treats TVFs like standard tables and cannot pass parameter values to them. TVFs are the more performant option when joining multiple programmable objects together, especially if they are large. To learn more about how Exago BI interacts with TVFs, visit our Knowledge Base.