By Mayank J

How To Reduce Workbook Size And Improve Calculation Speed

This might not be the exhaustive post, which can guide you on how to optimize excel workbooks, I'm sharing some major findings which I recently came across while working on 200 MB size Excel based financial model.

To share the background, we were challenged, to prepare data intensive financial model in excel. We had to prepare Monthly model for 6 years which contains data of 100,000 Rows. We had to use sales data book to prepare financials. As a result, we had no choice but to use long ranges formula due to which workbook took a lot of time to calculate the model.
Hence, we challenged some of the basic Principles of financial modelling (Best Practices) to overcome this situation and if it works.

The top 4 points that tilted the balance in our favour were:-
  1. Use of OFFSET
  2. Use of Daisy chain
  3. Hardcode the calculated values using macros (With an option to paste formulas)
  4. Separate input sheets based on input classification

In today’s world, instead of file size, Excel calculation speed matters the most. You can still deal with file size but if calculation speed is impacted due to size, we need to find optimize the spreadsheet.

Before touching the above points in details, let’s focus on basic checklist first.
a) First and foremost, check whether last used cell (End cell) is set throughout the model and set it across the whole model
b) Remove any external link in your workbook. You may try the External links locator excel addin for finding these
c) Avoid conditional formatting in such a large models as much as possible
d) Try to avoid objects and ensure there are no hidden objects as well
e) Calculation mode should be ‘Manual’
f) Workbook format should be ‘.xlsb’ instead of default format i.e., ‘.xlsx’ or ‘.xlsm’ as it saves a lot of file size
g) Double check for any circularity in the model. If found try circularity break or remove it completely.
h) There should be no Hidden worksheet or rows or column in your workbook
i) Check Names ranges, Data validation to ensure no errors / long formula is present.

After following the above spreadsheet optimsation techniques do not result in significant calculation speed improvement, we can proceed with our advanced list.

  1. Use of OFFSET
    Due to difficult in tracing precedents being referred by OFFSET function among many other drawbacks, it is advised not use these in financial models.
    Though, in data intensive modelling, if ranges refers to 100,000 rows, it will be not easy task to track the cell(s) which are impacting your output. For e.g., if SUMIF has been used on 100,000 rows, you cannot directly check, where your condition is matching and picking the correct cell. You need manually crosscheck to track back your precedents.

    Hence, in such cases, if we use OFFSET, where, we will sum only those cell where a condition is true, then it could actually reduce calculation time.

    For e.g., you have 10 stores, which sells 1000 products each. Now you want to SUM the monthly revenue of 10 stores. Hence, either we can use SUMIF, which will check the condition for all the 10,000 cells in 12*6 (6 year monthly model) columns or we can sum 1000 cells only, no need to check condition.
    We can sort the stores by their name and use OFFSET to calculate the desire result. Interesting! huh?

    Therefore, in large dataset, we can use OFFSET to keep a hold on calculation speed. In addition, it will increase the Workbook performance, as only single cell is linked instead of 10,000 cells.
    As a result, if earlier it takes 2 mins to insert a row or move sheet from one position to another, it will now take only 1 second. Try and feel the difference.

  2. Use of Daisy Chain
    As per the best practices, calculation or reference should be linked to source cell instead of linking it to a already created link.

    Benefit of the above logic is, when user press ‘Ctrl + [’  it will land to the source cell and user can change the input directly. Else the linked input cell needs to be traced multiple times to arrive at the source.

    In our case, when we have to link 10,000 cells to a single cell, that particular sheet or cell becomes heavy, in sense of row insertion or deletion. If we insert a row, excel has to change the reference of all the linked cells. Hence, in such models, we need to compromise with source cell tracking with daisy chaining.
    As in daisy chain, you will insert a row and it has to change only one linked cell instead of thousand other cells, which will increase the performance of the workbook.

  3. Hardcode the calculated values
    In this case, you will require bit of VBA knowledge for copy-paste macros.
    In our case, we have 100,000 rows with 72 columns i.e., 7,200,000 cells which have to be calculated. The Logic starts with revenue and ends with financial statements. Hence, you might need to calculate many folds of 7,200,000 cells.

    Further, the cell formula is also needed (we can't simply hardcode calculations), but what if you run sensitivity analysis, you might need to wait for several minutes (10-60 minutes) to calculate in such a heavy model.

    In this case a Copy-Paste macro can work. Where we hardcode calculations that aren't required all the time to be updated. And when required we press copy-paste macro that calculates the values for hardcoded cells for us using stored formula somewhere else.

    This will not only increase calculation speed, but also drastically reduce your file size. In our case, 200 MB model can be converted to 50 MB model (Easy to send over mail as well).

  4. Separate inputs into different sheets
    As per my understanding, this is the most simple technique to explain.

    According to the modelling standards and industry demand, all the inputs should be kept at one place. So that while auditing the workbook, we will not miss any assumptions.

    Though in this case, we might have revenue, cost, quantity, price, tax etc for each of the 100,000 items. Hence, if we keep all the inputs on the same tab, either we will reach to 600,000 rows or 100,000 rows with several columns. In both the cases, your worksheet size will increase and you will face difficulty to insert row, increase timeline if required or perform any calculation.

    Hence, we can create separate sheets for each of the items and create one Global Input tab, where we can place the list of all the inputs area. So that, user can easily navigate and update the inputs without impacting single worksheet size.

    Easy to do, right?

In financial modelling we sometimes need to compromise with best practices in certain situations to increase model usebility. But best practices should always be kept in might before adopting the above changes. Do write in comments if you have faced such a situation?
1 Comments
Anurag Kushwaha

This is really interesting !!!

Leave a Reply