One important aspect of creating a great QlikView or Qlik Sense application is bringing only the appropriate data into it. For example, if we are analyzing call center data for the last 12 months, I don’t want to bring in the data from the 1990’s when we first started collecting data. Maybe, we should limit the data to the last 12 months. Secondly, using the same example, there are call center employees that have long since retired or moved on from the company. If I am only loading data from the last 12 months, I really only want to analyze employees that had activity during that time and ignore employees that have moved on. The following example is done in QlikView but could easily be replicated in Qlik Sense. You can download the qvw here: Filtering-Data.qvw Please note that the data source is MSSQL 2016 and I have used a free database called World Wide Imports.
Why Should You Remove Data From Your Qlik Applications?
So why do we care if there is some extra data? There are a few important reasons.
Including non-relevant data complicates the user experience. Say, for example, we have to load our buyers in to an app that analyzes a company’s procurements. If the buyer names are found in the Employees table, we might not want the user to have to sift through 2,500 employees to identify the 6 buyer names. It creates a better user experience if we only include the 6 buyers rather than the entire list of employee names.
QlikView performs better when it is “lean and mean”. By that, I simply mean that when there are fewer data points in a data model, the calculation engine will run faster and the server will consume less RAM for that application. This becomes more critical as the data volumes get larger.
What Tools do you Need?
You must download the QV Document Analyzer. This app is brought to us by the amazing Rob Wunderlich. It is regularly updated as QlikView evolves so check back once in a while. The QV Document Analyzer is a QlikView document (qvw) that will tell you important things about another QlikView document. The QV Document Analyzer will tell you all sorts of things about your qvw regarding memory and performance. We will use the QV Document Analyzer in this post to measure how fast the calculations are being completed and how much memory our application is using. It will also tell us if there are unused fields in our data that can be removed.
Look at your data in terms of rows and columns. Rows obviously refer to the number of records in each table. Columns refer to the distinct field names in your data model tables. We will start by looking at rows. There are several functions that can be used to limit the rows in your application. We will cover some common techniques.
Explicit Where Clauses
When it comes to a fact or transaction table, we should logically decide how to limit the record count. Often, this will be controlled with a time element. The primary fact table in our example is the Sale table. The business requirement here is that we want to analyze only the current year and the full prior year. So we will limit the data with a where clause. But I will do this in two steps so that we can easily maintain the script should the business requirement change. The second step is to load the actual Sale table using a where clause to filter the data. Note that because the where clause is happening in the select statement, we are using MSSQL syntax to reduce the data query. You have a choice of where to put the where clause (corny, I know), but generally, it is best to put it in the first possible step so that subsequent steps are already working with a reduced set. To be clear, the where clause can go at the end of the load statement or at the end of the select statement. To summarize, I would generally recommend using an explicit where clause to limit the rows in your fact table. For dimensional tables, I would recommend that you reduce rows that are relevant to the fact table. This means that you should load the fact table first so that when you subsequently load your dimensional tables, you can filter them based on the rows left in the fact table. We will cover a few techniques to accomplish that.
Firstly, a dimensional table is a table that adds context to your facts. Tables with customer information, geographical data or item master data are good examples of dimension tables. In the Item table below, we will use the Left Keep function to limit the data to the records that will associate to the Sale table. If you notice, the left keep also refers to the Sale table. Another way to explain this is that QlikView will pretend to join the Item table to the Sale table, using like fields to figure out what the joined set would be, but after figuring that out, QlikView will NOT really join the two tables. The left part of the function is the part that does the data reduction. Because it is a left keep, we are retaining all the records in the left table (Sale), will be losing unmatched records in the right table (Item) and will be Keeping them separate rather than Joining them together.
The Where Exists function asks QlikView to compare the currently loading field with a previously loaded field. Where a match is found, the entire row is loaded. Where there is no match, the entire row is discarded. In the first example below, you will notice that we use the parentheses to denote the field that will be compared. When used with one parameter, we are expecting that the field we are loading exactly matches the name of a previously loaded field.
In the second example below, you should note that we are referring to two fields. The first field should be the previously existing field. The second field will be the field we are currently loading.
A few notes regarding Where Exists
- If you make it Where Not Exists you will load the values that are not matched, which might not be helpful in this context but is good to know.
- When using the 2 parameter version, the second parameter is asking for the original name of the field. Therefore, if you are aliasing the field in the upper part of the load statement, you will list the original field name in the exists clause rather than the alias.
- You can also create an expression to obtain the second parameter. Sometimes the comparison you are making might require 2 or fields be combined. Feel free to create a concatenated key inside the second parameter.
When should you use Where Exists and when should you utilize Left Keep? I typically use the Where Exists method because it usually runs faster than the Left Keep method. Also, for me, it is simpler to code because I don’t have to think too hard about how QlikView might associate fields in the Left Keep scenario. But if you have a situation where the field name you are filtering on already has been loaded into memory more than once, Where Exists uses the entire list of values to determine the reduction, which might not give you the required result. In those cases, Left Keep might be better since you will specify an exact table to filter on.
What are the Results of Our Row Reduction exercise?
I first reduced the fact table and then reduced all the dimension tables to sets that were 100% relevant to the fact table using either left keep or where exists. So let’s measure the improvements thus far. I used the QV Document Analyzer to look at my qvw before the changes and after the changes. As you can see below the effects were dramatic both in Calculation times and RAM footprint.
So far, we have spent our time getting rid of rows. Now we will focus attention on eliminating unneeded columns or fields. Looking back at the results above, you can see another startling fact. Our application has 53 unused fields. If you move to the Fields tab of the QV Document Analyzer, you can get some detail on where fields are being used in the dashboard and where fields are not being used. The bottom line is that if a field is not being used as a key, a chart dimension, in an expression or in a list box, we probably don’t need it in our application. At the bottom of the page, you will find a ready-made drop field statement. Pasting this at the end of your script will drop all the unnecessary fields from your application.
A few notes regarding dropping fields from your application.
- Always review the drop field list and comment out any fields that you might still need. As QlikView changes, the QV Document Analyzer might not catch every kind of field use.
- Sometimes, I will leave a field in even if it is not being used because I anticipate that a user might want to search and select from that data.
- Although it is super easy to paste in a drop field statement, it is probably more optimal to not bring in those fields in the first place. Commenting out fields in your load statement will prevent QlikView from loading the field. Using this method, you will save reload time and publisher RAM since the network is dragging back less data.
- In fact, in our SQL SELECT statements we should probably avoid “Select * …” and instead list out only the fields we need to begin with from our database. This will create even more efficiency since we are not loading all the columns from the database to begin with, saving resources on our DB server and our QV server.
In the example below, most of the fields are commented out. We only really want 6 of the 14 columns.
What are the Results of Our Column Reduction exercise?
After commenting and dropping fields from our script, we ended up with only 23 fields. Take a look at this data model. ¡magnífico!
Finally, we will run our app through the QV Document Analyzer one more time. Note that we have further reduced our memory footprint. Calculation time was not materially affected because calculation time only builds for fields that are being filtered or aggregated in the display. I created a dashboard with a few sheets to help with our exercises. Again, if you want this app, you can get it here: Filtering-Data.qvw Another great option for making comparisons between your baseline and then optimized applications is to combine the Document Analyzer with Rob Wunderlich’s DA Compare Tool. This QV app allows you to compare 2 or more versions of your optimized apps to directly compare the results of your changes. Good Stuff!
I always spend a little time optimizing my QlikView applications. This will make your documents as efficient and usable as possible. The most important steps in this process are stripping the data down to only what is needed. There are lots of other things we can do to optimize our documents, but we will save those for another post. If you have other techniques that efficiently reduce the data in your QlikView documents, I would love to hear them. In the comments below. Happy Qliking!