<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Michael Banzon &#187; algorithm</title>
	<atom:link href="http://michaelbanzon.com/tag/algorithm/feed/" rel="self" type="application/rss+xml" />
	<link>http://michaelbanzon.com</link>
	<description></description>
	<lastBuildDate>Sun, 05 Feb 2012 13:54:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Dataflow programming</title>
		<link>http://michaelbanzon.com/2011/12/19/dataflow-programming/</link>
		<comments>http://michaelbanzon.com/2011/12/19/dataflow-programming/#comments</comments>
		<pubDate>Mon, 19 Dec 2011 14:24:36 +0000</pubDate>
		<dc:creator>mbanzon</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[hardware]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://michaelbanzon.com/?p=219</guid>
		<description><![CDATA[In a previous post I wrote about how we could do some performance tuning by developing applications suited for parallel execution. Please pardon my ignorance &#8211; but I stumbled upon an article in Dr.Dobb&#8217;s &#8211; about &#8220;Dataflow programming&#8221; &#8211; which is actually what this is all about! If you would like to learn more on [...]]]></description>
			<content:encoded><![CDATA[<p>In <a title="Parallel execution" href="http://michaelbanzon.com/2011/09/21/parallel-execution/">a previous post</a> I wrote about how we could do some performance tuning by developing applications suited for parallel execution.</p>
<p>Please pardon my ignorance &#8211; but I stumbled upon <a href="http://drdobbs.com/database/231400148">an article in Dr.Dobb&#8217;s &#8211; about &#8220;Dataflow programming&#8221;</a> &#8211; which is actually what this is all about! If you would like to learn more on the matter or refresh your knowledge please go read the article &#8211; it describes the problem and solution very well. You might also want to read the Wikipedia article on <a href="http://en.wikipedia.org/wiki/Dataflow_programming">Dataflow programming</a> and the one on <a href="http://en.wikipedia.org/wiki/Flow-based_programming">Flow-based programming</a>. If you have more time and thirst for this subject please read <a href="http://zone.ni.com/devzone/cda/tut/p/id/6098">this article</a> which clarifies why this type of problem solving is suitable for parallel hardware.</p>
]]></content:encoded>
			<wfw:commentRss>http://michaelbanzon.com/2011/12/19/dataflow-programming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parallel execution</title>
		<link>http://michaelbanzon.com/2011/09/21/parallel-execution/</link>
		<comments>http://michaelbanzon.com/2011/09/21/parallel-execution/#comments</comments>
		<pubDate>Wed, 21 Sep 2011 05:48:37 +0000</pubDate>
		<dc:creator>mbanzon</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://michaelbanzon.com/?p=202</guid>
		<description><![CDATA[In my previous post I briefly touched the subject of parallel execution in a spreadsheet-like desktop application. I&#8217;d like to go a bit deeper on that subject. Let&#8217;s say you have a spreadsheet. It consists of N rows and M columns. The columns 0…M &#8211; X are raw data &#8211; they most likely come from [...]]]></description>
			<content:encoded><![CDATA[<p>In my previous post I briefly touched the subject of parallel execution in a spreadsheet-like desktop application. I&#8217;d like to go a bit deeper on that subject.</p>
<p>Let&#8217;s say you have a spreadsheet. It consists of N rows and M columns. The columns 0…M &#8211; X are raw data &#8211; they most likely come from a different source but thats really not the point. The rest is the interesting part. They are calculated using formulas &#8211; most people have done this a million times in spreadsheets &#8211; it&#8217;s really not that big a deal.</p>
<p>Lets say I have a column C1, it is calculated using a formula. So C1 = C2 + C3. It&#8217;s a not-so-complicated formula &#8211; but that is not the point. I have many calculated columns. And they all use other columns in their formulas &#8211; all these formulas need to be calculated to show me the full dataset of N rows and M columns. This means that the program making the calculations are making (a minimum of) N * X calculated cells.</p>
<p>See &#8211; humans are very organized when they work with spreadsheets. Usually we don&#8217;t clutter things up all that much. Which means that column CN usually aren&#8217;t calculated by a formula using any column CM where M &gt; N. This comes very natural &#8211; we mentally execute from left to right (western-style) and don&#8217;t really like forcing our minds to do otherwise.</p>
<p>This leads to the normal execution style. We execute the calculation of new columns from left to right doing on at a time. Which is very time consuming. So now we have a new approach &#8211; which isn&#8217;t exactly rocket science &#8211; but it&#8217;s worth considering when you develop applications like ours.</p>
<p>Given a set of columns that needs to be calculated using formulas that contain reference statical columns we build a dependency graph. We apply a neat trick at the beginning. The static data is assembled in one node. If a formula reference any column on the static data we make a dependency to that node. There are columns that A: Don&#8217;t reference any column (these are rare &#8211; at least in the software that we develop), or B: Don&#8217;t reference any static column &#8211; which is very common for calculated columns referencing only other calculated columns. Consider the following dependency graph:</p>
<p><a href="http://michaelbanzon.com/2011/09/21/parallel-execution/graph/" rel="attachment wp-att-203"><img class="size-full wp-image-203 aligncenter" title="Execution graph" src="http://michaelbanzon.com/wp-content/uploads/graph.png" alt="" width="227" height="333" /></a></p>
<p>Here node #1 is our base data &#8211; we can&#8217;t and won&#8217;t do anything about that one! Node #2 is a calculated column using a simple formula utilizing the data provided by node #1. Node #3 and node #4 are based solely on the data provided in the column created by node #2. Node #5 represents a column created with the input of column derived from node #3 and #4.</p>
<p>To deliver a full dataset to the user every node must be processed. This is the time consuming part and therefore we are looking for ways to speed up this process. In this case we have nodes #2-5 that needs processing (remember the base data in node #1 is static). The simple way to deliver a better-than-sequential result is to move processing of individually independent nodes to parallel execution units. In fact &#8211; my solution to this problem is to actually start a calculation thread per node. Each of these threads will be waiting on the threads calculating the dependencies of the node. By doing this the execution will run in parallel &#8211; and hopefully this will result in a performance gain considering the overall execution speed of the complete graph calculation. On the graph presented node #2 will be the first to get processed. It&#8217;s only dependency is the root &#8211; which needs no processing. Right when the calculation of node #2 is done the threads controlling processing of node #3 and #4 will start in parallel &#8211; these two only have one dependency which has already been processed. Finally &#8211; when both node #3 and #4 are done &#8211; the calculation of node #5 will begin, as both #3 and #4 are dependencies of #5.</p>
<p>Keeping a graph structure is a also a great benefit in other ways. An example is how it allows one node to be changed &#8211; forcing a re-processing of all dependent nodes by having a recursive &#8220;dirty&#8221; marking scheme.</p>
]]></content:encoded>
			<wfw:commentRss>http://michaelbanzon.com/2011/09/21/parallel-execution/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Bayesian filtering</title>
		<link>http://michaelbanzon.com/2008/08/21/bayesian-filtering/</link>
		<comments>http://michaelbanzon.com/2008/08/21/bayesian-filtering/#comments</comments>
		<pubDate>Thu, 21 Aug 2008 06:08:21 +0000</pubDate>
		<dc:creator>mbanzon</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[e-mail]]></category>
		<category><![CDATA[rss]]></category>
		<category><![CDATA[spam]]></category>

		<guid isPermaLink="false">http://www.southbound.dk/?p=32</guid>
		<description><![CDATA[Currently I&#8217;m thinking a lot about Bayesian filtering and new ways to apply this to my every day software. For those who aren&#8217;t that informed on the subject I would recommend reading A Plan for Spam by Paul Graham. Wikipedia has a few articles of interest as well. I&#8217;d recommend the ones on Bayes&#8217; theorem [...]]]></description>
			<content:encoded><![CDATA[<p>Currently I&#8217;m thinking a lot about Bayesian filtering and <em>new</em> ways to apply this to my every day software.</p>
<p>For those who aren&#8217;t that informed on the subject I would recommend reading <a title="Paul Graham - A Plan for Span" href="http://www.paulgraham.com/spam.html">A Plan for Spam</a> by Paul Graham. Wikipedia has a few articles of interest as well. I&#8217;d recommend the ones on <a title="Bayes' theorem" href="http://en.wikipedia.org/wiki/Bayes%27_theorem">Bayes&#8217; theorem</a> and <a title="Bayesian spam filtering" href="http://en.wikipedia.org/wiki/Bayesian_spam_filtering">Bayesian spam filtering</a>.</p>
<p>Currently I see a lot of areas where this technique can be applied &#8211; generally everything evolve around e-mails, RSS-feeds and automatic web-site crawling.</p>
<p>I hope that I&#8217;ll have the time in the near future to actually implement something.</p>
]]></content:encoded>
			<wfw:commentRss>http://michaelbanzon.com/2008/08/21/bayesian-filtering/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

