Qlik Roots: Link Table vs. Concatenation – A Case of Mixed Granularity

Featured Image - Link Table vs Concatenate

What Qlik developer has NOT run into challenges involving separate fact tables and synthetic keys. Today, we will explore a version of this problem and the 2 best solutions for solving it, using a link table or concatenating the 2 tables together. In addition, we throw in the issue that the two fact tables differ in granularity.

The Teacher Becomes the Student

Let me first admit that this was a total learning exercise for me. I have taught a regular development class for a specific client using an unchanging set of materials for some years now. Towards the end of day 3, I always cover a case where we create a link table to associate some budget data to sales data where the budget is at a different grain. But a student challenged the notion of using a link table to do this.

Why can’t we just concatenate the Budget data to the Sales table?

I started to immediately explain the reasons why that would be a bad idea and then realized with embarrassment that the concatenation method probably would be fine. So we decided to try the experiment in class. And guess what. It actually worked well. Needless to say, I will be adjusting my materials to include this method. And as humbling as it is to realize “I don’t know it all”, this was just too fundamental to not create a post about it.

Setting up the Scenario

The following example uses the free sample database Wide World Imports which consists of sales data plus some dummied-up forecast data. The tables I used have been converted into QVD and all the files have been zipped up in-case you want to download the example and experiment for yourself. LivingQlikView -Mixed Granularity Fact Tables To switch between the link table and the concatenation method just comment and uncomment the corresponding script tabs. This exercise utilizes QlikView but could easily be adapted to Qlik Sense and I have created a text file that houses the script to give you a start in Qlik Sense. But please note that you will have to create a connection in your library and change the from statements to “lib” statements in the script. I’m won’t go through the entire script here since you can download it above, but instead will focus on the pivotal parts of the script, the data model and the pros and cons of the two methods. The primary focus of the data is sales. We would consider it the primary fact table. Before adding the forecast data to the mix, you can see the dimensional tables associated directly to the sale table. Table_01

Forecast Data: Crosstable

Looking at the forecast spreadsheet, it is apparent that the Year column has been pivoted horizontally. This is great for humans to read, but not so great for a machine to read and organize. Ideally, we want the years to exist as values in a single Year column, which also pivots the forecast amounts into their own column as well. Table_02 The Crosstable function is found in the Table Files wizard after you click on Next a few times. It is simple to use. The real trick is just recognizing the table is in a pivoted format to begin with. You can get more detailed info on this function: Qlik Community: The Crosstable Load Here is the crosstable function in action: Script_01 Before we concentrate on a single associating method, we first need to decide how the forecast data relates to the rest of the data model. It is obvious, looking at the forecast table that Year and SalesRep will be needed to associate the data. Regardless of which method we use to associate the forecast data to the data model, it will be very helpful if we can get both fact tables to contain all the key fields. The sale table does NOT have the Year field, but it does have the Invoice Date Key that drives the calendar table. So we can use the year function to derive the year with confidence. Script_02 For the SalesRep field, we can approach it by adding SalesRep to the Sale table or by adding the Salesperson Key to the forecast table. I decided on the latter and used a mapping load to add the Salesperson Key to the Forecast table. Script_03

Method 1: Link Table

The first method to associate this data is to use a link table. This is basically the solution QlikView creates when left to its own devices in the form of a synthetic key table. I much prefer creating my own link table for several reasons:

  • It forces you to think about the data relationships. Sometimes the fields in the link table should NOT be used to link tables together.
  • You have visibility of the concatenated key. A synthetic key cannot be displayed.
  • You have no option to simplify a synthetic key like you can with a key you construct. I would normally use autonumber to simplify and reduce memory footprint.

The first step to creating this solution is to create your composite key. This is a concatenation of the key fields. This new key must be created in both fact tables. Script_04 Script_05 Now we can create our link table. We are creating a table with the concatenated key and the ingredient fields. We need a row for each unique combination of these values from BOTH the sale table and the forecast table. I use distinct to get only unique values from each table and join to skip over repeats between the two tables. This results in a link table with the least amount of rows to get the job done. I then drop the ingredient fields (Year and Salesperson Key) from each of the fact tables as they are now fully represented in the link table. Script_06 After running the script, I have replaced the synthetic key and I have a working data model. Table_03

Method 2: Single Concatenated Fact Table

I always assumed that because the two fact tables exhibit a different grain, that we could not concatenate them together. The fact is, you MAY create an issue if you don’t do this right. Let me explain. In the data model below, you might notice that SalesRep as well as the key field Salesperson Key are both in the Sale table. If you concatenate the forecast data to this table based on Year and Salesperson Key, the field SalesRep will not be associated to the forecast data. This is because in a concatenation, we are simply appending rows to the bottom of the table so we won’t get the SalesReps populated through the entire table. In the second version, you can see the SalesRep exists in a proper dimensional table hanging off the fact table. When selections are made in this table, the expected filtering will occur in the forecast data.

Table_04

Version 1 – SalesRep will be populated for the Sales facts, but not for the concatenated Forecast rows.

Table_05

Because the SalesRep now hangs off a dimensional table, it will associate to all the fact records.

The other part of the key is date related and could cause an issue due to the mixed granularity. In this case, Forecast is held at the year level while Sales are brought in on a daily basis. But that does NOT create an issue for our calendar in this example, because we can simply put the year in the fact table itself. The calendar table has no larger increment than the year, so it is safe to leave that connected only to fact rows that have an Invoice Date. If Forecast was held at the month, however, this could have created an issue. One way to solve for this would be to create any field that is month are larger in increment within the fact table or in a separate table joined by MonthYear. The other solution would be to create an artificial Invoice Date Key for the Forecast rows for each month where May, 2017 equals 5/1/2017 for example. So the lesson here is that when using the concatenation method, it would be wise to create dimensional tables for any dimensional fields that could be directly derived from the fields shared between the concatenated fact tables. Another possible solution would be to join the dimensional data after the concatenation is completed. In fairness, this issue could also occur with the link table method, but is less apparent because it is likely that keys are shared between the two fact tables. Here is the script for concatenating the forecast to the sale table. It is simple as long as you follow the advice above. Script_07 Here is the finished concatenated data model. Because the common fields were already part of the Sale table, the only visible difference is the added m.forecast field at the bottom. One problem I have actually created here is th

Dealing with Mixed Granularity in the User Interface

Regardless of the method you choose, one thing is certain in the UI. You will not be able to filter a fact by a field more detailed or unrelated to its key. So even though we have very detailed selecting ability for the sales data, when we start selecting individual Months or individual Customers, for example, the forecast data will not filter in a way that appears accurate. This is just a given with mixed grain. In the figure below, a simple selection on Customer completely removes the forecast data.

UI_01

The dark green bars for Forecast are gone!

In fact, that brings up one of the key differences in these two methods. Given simple expressions with no set analysis, the Link Table method will allow the less granular data to fully display even when more detailed selections are made. In the concatenated version, the less granular data will simply disappear when detailed selections are made. A sensible thing to do when you have differing levels of facts would be to “turn off” any dimension selections that won’t work in any chart that includes the less granular fact. In our example we would add some set analysis for all expressions in any chart that includes forecast data.

UI_02

Now we are ignoring the user’s selection for Customer so all bars are at full strength and accurate relative to each other.

This could become somewhat cumbersome as you will need to add every “more granular” field to the modifier. This will unfortunately make for a long expression if you want to be complete. I think it is also necessary to notate somewhere on the screen that these particular charts will not react to some selections because “the forecast is completed on an annual level” for example.

Pros and Cons of Link Tables and Concatenated Tables

Pros and Cons of the Data Model Approaches I think the biggest two factors that will drive the decision to use one method over the other are data complexity and data volume. As data volume grows, you will need to lean on the concatenation method. As data becomes more complex you might need a link table or a hybrid model that utilizes both approaches. I, for one, will definitely start using the concatenation method more freely. I have used it when the granularity is the same but now I know, that issue in itself should not prevent me from concatenating fact tables.

Final Thoughts

The amazing HIC wrote a great blog article that will give you more insight into this topic. https://community.qlik.com/blogs/qlikviewdesignblog/2013/05/17/concatenate-vs-link-table Obviously, I am not the all-knowing guru in the Qlik universe. I welcome your input and feedback on this topic. Happy Qliking!

4 comments… add one
  • Reply Alex Nerush February 15, 2017, 5:08 pm

    The bad performance is another drawback of using linked tables. Expression calculation based on fields from different “fact” tables needs a lot of resources (memory and cpu), thus it needs much more time to get calculation results. Take a look at Qonnections 2015_T37_The QIX Engine-under the hood.pdf which explains all in details. The document is available on the Parnter Portal.

  • Reply Aaron Couron February 15, 2017, 7:03 pm

    Probably didn’t word it well, but that relates to the first item in my “Pros and Cons” list.

  • Reply David Dumas February 20, 2017, 4:11 pm

    This is really not an example of a link table vs concatenate. I always concatenate facts, and have dozens of qlikview apps all of which have several facts across different sources. IMO you can never go wrong by concatenating facts. Link tables are useful for resolving “role playing” dimensions – that’s another topic. In my travels, I had found no other valid use for link tables. What I do in these situations (related to this post) is first disable dimensionality through set analysis for those dimensions not common to certain facts. In this case, customer, item, and city need to be disabled from measures that compute [salesperson forecast amount]. I always have a daily grain calendar. [Forecast Year] can always be resolved to a [Forecast Date] of Ex. 1/1/YYYY. It is also important to educate users that certain measures like [salesperson forecast amount] can only be used in charts that share common dimensions of time and Salesperson.

    • Reply Aaron Couron February 20, 2017, 5:54 pm

      David,
      I think you have valid points. I struggle with two issues. First when beginning a project, sometimes the client provides a flat table that combines facts and dimensional data in one table. This can create issues in a concatenated scenario when the client then decides to add another fact table from a different source (although this can be a problem in the link table scenario too, but to lesser degree). The other issue is really just that it is easier to visualize the associations and granularity when a link table is used.
      I thank you for your comment!

Leave a Comment

Join us at Living QlikView

Never miss a post.

We Never Spam. NEVER.

Show Buttons
Hide Buttons