Parallel execution
In my previous post I briefly touched the subject of parallel execution in a spreadsheet-like desktop application. I’d like to go a bit deeper on that subject.
Let’s say you have a spreadsheet. It consists of N rows and M columns. The columns 0…M – X are raw data – they most likely come from a different source but thats really not the point. The rest is the interesting part. They are calculated using formulas – most people have done this a million times in spreadsheets – it’s really not that big a deal.
Lets say I have a column C1, it is calculated using a formula. So C1 = C2 + C3. It’s a not-so-complicated formula – but that is not the point. I have many calculated columns. And they all use other columns in their formulas – all these formulas need to be calculated to show me the full dataset of N rows and M columns. This means that the program making the calculations are making (a minimum of) N * X calculated cells.
See – humans are very organized when they work with spreadsheets. Usually we don’t clutter things up all that much. Which means that column CN usually aren’t calculated by a formula using any column CM where M > N. This comes very natural – we mentally execute from left to right (western-style) and don’t really like forcing our minds to do otherwise.
This leads to the normal execution style. We execute the calculation of new columns from left to right doing on at a time. Which is very time consuming. So now we have a new approach – which isn’t exactly rocket science – but it’s worth considering when you develop applications like ours.
Given a set of columns that needs to be calculated using formulas that contain reference statical columns we build a dependency graph. We apply a neat trick at the beginning. The static data is assembled in one node. If a formula reference any column on the static data we make a dependency to that node. There are columns that A: Don’t reference any column (these are rare – at least in the software that we develop), or B: Don’t reference any static column – which is very common for calculated columns referencing only other calculated columns. Consider the following dependency graph:
Here node #1 is our base data – we can’t and won’t do anything about that one! Node #2 is a calculated column using a simple formula utilizing the data provided by node #1. Node #3 and node #4 are based solely on the data provided in the column created by node #2. Node #5 represents a column created with the input of column derived from node #3 and #4.
To deliver a full dataset to the user every node must be processed. This is the time consuming part and therefore we are looking for ways to speed up this process. In this case we have nodes #2-5 that needs processing (remember the base data in node #1 is static). The simple way to deliver a better-than-sequential result is to move processing of individually independent nodes to parallel execution units. In fact – my solution to this problem is to actually start a calculation thread per node. Each of these threads will be waiting on the threads calculating the dependencies of the node. By doing this the execution will run in parallel – and hopefully this will result in a performance gain considering the overall execution speed of the complete graph calculation. On the graph presented node #2 will be the first to get processed. It’s only dependency is the root – which needs no processing. Right when the calculation of node #2 is done the threads controlling processing of node #3 and #4 will start in parallel – these two only have one dependency which has already been processed. Finally – when both node #3 and #4 are done – the calculation of node #5 will begin, as both #3 and #4 are dependencies of #5.
Keeping a graph structure is a also a great benefit in other ways. An example is how it allows one node to be changed – forcing a re-processing of all dependent nodes by having a recursive “dirty” marking scheme.




[...] a previous post I wrote about how we could do some performance tuning by developing applications suited for [...]