<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Everything is Data</title>
	<atom:link href="http://everythingisdata.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://everythingisdata.wordpress.com</link>
	<description>Neil's Research Blog</description>
	<lastBuildDate>Fri, 06 Jan 2012 18:42:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='everythingisdata.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Everything is Data</title>
		<link>http://everythingisdata.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://everythingisdata.wordpress.com/osd.xml" title="Everything is Data" />
	<atom:link rel='hub' href='http://everythingisdata.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Too Good To Be Believed</title>
		<link>http://everythingisdata.wordpress.com/2011/10/07/too-good-to-be-believed/</link>
		<comments>http://everythingisdata.wordpress.com/2011/10/07/too-good-to-be-believed/#comments</comments>
		<pubDate>Fri, 07 Oct 2011 23:27:00 +0000</pubDate>
		<dc:creator>Neil Conway</dc:creator>
				<category><![CDATA[Research Notes]]></category>

		<guid isPermaLink="false">http://everythingisdata.wordpress.com/?p=714</guid>
		<description><![CDATA[In the (excellent) Sinfonia SOSP &#8217;07 paper, the authors compare a group communication system (GCS) built using Sinfonia with the open source Spread GCS. Although I like the Sinfonia paper a lot, I thought this evaluation was actually detrimental to &#8230; <a href="http://everythingisdata.wordpress.com/2011/10/07/too-good-to-be-believed/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=714&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In the (excellent) <a href="http://www.sosp2007.org/papers/sosp064-aguilera.pdf">Sinfonia SOSP &#8217;07 paper</a>, the authors compare a <a href="http://en.wikipedia.org/wiki/Group_communication_system">group communication system</a> (GCS) built using Sinfonia with the open source <a href="http://www.spread.org/">Spread GCS</a>. Although I like the Sinfonia paper a lot, I thought this evaluation was actually detrimental to the paper. The authors present several graphs comparing the performance of SinfoniaGCS with Spread, such as this one:</p>
<p><a href="http://everythingisdata.files.wordpress.com/2011/10/spread_sinfonia_graph1.png"><img src="http://everythingisdata.files.wordpress.com/2011/10/spread_sinfonia_graph1.png?w=500" alt="Performance comparison of Spread and SinfoniaGCS" title="spread_sinfonia_graph1"   class="aligncenter size-full wp-image-716" /></a></p>
<p>Clearly, SinfoniaGCS vastly outperforms Spread in this configuration. At first glance, this might seem like a great experiment: the authors have demonstrated that you can use Sinfonia to build a high-performance GCS, right?</p>
<p>To me, including this evaluation in the paper is <i>not</i> helpful, because it raises more questions than it answers. There are two possibilities:</p>
<ol>
<li><b>Spread&#8217;s performance is truly terrible.</b> In that case, what are we to learn from comparing the performance of SinfoniaGCS with a worst-in-class alternative?</li>
<li><b>Spread is misconfigured.</b> The paper notes that Spread wasn&#8217;t configured to use IP broadcast or multicast, and that SinfoniaGCS was allowed to batch together 128 messages at a time; either change could have a huge performance impact. Again, there is little to learn from an apples-to-oranges comparison between a carefully tuned Sinfonia system and a misconfigured Spread system.</li>
</ol>
<p>A convincing performance study would show that by using the Sinfonia infrastructure, one can build a GCS that <i>approaches</i> the performance of an optimized GCS written from scratch (and perhaps that using Sinfonia leads to a smaller/simpler GCS implementation). Alternatively, if using Sinfonia really does allow dramatically better performance, the reasons for the performance difference should be explored: Sinfonia is not magic, and if the authors could have identified some reasons for why traditional GCS designs perform poorly on modern datacenter networks, that would be an interesting result.</p>
<p>Instead, the authors merely speculate that using IP broadcast/multicast would improve Spread performance and leave it at that. Unfortunately, the result is a performance study from which we can learn very little.</p>
<br />Filed under: <a href='http://everythingisdata.wordpress.com/category/research-notes/'>Research Notes</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/everythingisdata.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/everythingisdata.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/everythingisdata.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/everythingisdata.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/everythingisdata.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/everythingisdata.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/everythingisdata.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/everythingisdata.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/everythingisdata.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/everythingisdata.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/everythingisdata.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/everythingisdata.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/everythingisdata.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/everythingisdata.wordpress.com/714/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=714&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://everythingisdata.wordpress.com/2011/10/07/too-good-to-be-believed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">neilconway</media:title>
		</media:content>

		<media:content url="http://everythingisdata.files.wordpress.com/2011/10/spread_sinfonia_graph1.png" medium="image">
			<media:title type="html">spread_sinfonia_graph1</media:title>
		</media:content>
	</item>
		<item>
		<title>Benchmarking Dataflow Graphs in Ruby</title>
		<link>http://everythingisdata.wordpress.com/2011/04/12/dataflow-ruby-benchmark/</link>
		<comments>http://everythingisdata.wordpress.com/2011/04/12/dataflow-ruby-benchmark/#comments</comments>
		<pubDate>Tue, 12 Apr 2011 17:12:35 +0000</pubDate>
		<dc:creator>Neil Conway</dc:creator>
				<category><![CDATA[Research Notes]]></category>

		<guid isPermaLink="false">http://everythingisdata.wordpress.com/?p=662</guid>
		<description><![CDATA[Our group at UC Berkeley recently released Bloom, a new programming language for distributed computing. Specifically, we released an initial alpha of &#8220;Bud,&#8221; which is a Ruby DSL that lets you embed (distributed) declarative rules inside Ruby classes. The current &#8230; <a href="http://everythingisdata.wordpress.com/2011/04/12/dataflow-ruby-benchmark/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=662&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Our group at UC Berkeley recently released <a href="http://www.bloom-lang.org">Bloom</a>, a new programming language for distributed computing. Specifically, we released an initial alpha of &#8220;Bud,&#8221; which is a Ruby DSL that lets you embed (distributed) declarative rules inside Ruby classes.</p>
<p>The current Bud prototype has a pretty naive evaluation scheme (deliberately); so I&#8217;ve been idly thinking about how to improve its performance. The standard way to evaluate <a href="http://en.wikipedia.org/wiki/Datalog">Datalog</a> rules is fairly similar to how a typical relational database evaluates queries: the system produces a dataflow graph (&#8220;query plan&#8221;), which consists of nodes (e.g., project, join or scan operators) connected by edges. The dataflow graph typically starts with the input relations (e.g., base tables, EDB for Datalog) and computes the derived relations (e.g., the user&#8217;s query in the case of a database, the IDB relations for Datalog). There is a lot more to building an efficient Datalog evaluator, but I&#8217;ll leave that for future posts.</p>
<p>To get good performance for Bloom programs, we&#8217;ll likely want to generate a dataflow graph. So how should we evaluate this graph? Two options come to mind:</p>
<ol>
<li>Represent nodes in the graph as Ruby objects and push tuples through the graph by evaluating Ruby.</li>
<li>Represent nodes in the graph using C code and connect the Ruby query planner to the C runtime in some fashion (e.g., have the planner generate a description of the desired dataflow and parse that description in a C module, have the Ruby code generate C source code directly, or use something like LLVM to generate machine code at runtime).</li>
</ol>
<p>Naturally the first option would be easier, but how does it perform? I wrote a quick and dirty microbenchmark to get a rough idea. The benchmark constructs a dataflow graph of 7 nodes: 4 &#8220;predicate&#8221; nodes that check whether the tuple matches a constant, 2 &#8220;join&#8221; nodes that probe a 6,000 element hash table (modeling half of a symmetric hash join), and 1 &#8220;sink&#8221; node that appends the tuple to an array (modeling storage into an in-memory relation). I ran 1 million tuples through the graph, and repeated this 10 times (dropping the first run to allow caches to warmup). I ran the benchmark on several Ruby implementations:</p>
<ul>
<li>MRI 1.8.7-p334</li>
<li>MRI 1.9.2-p180</li>
<li>JRuby 1.6.0</li>
<li>Rubinius git checkout from April 11th (version string &#8220;1.2.4dev (1.8.7 73390817 yyyy-mm-dd JI)&#8221;)</li>
</ul>
<h3>Results</h3>
<p>The average time taken to route 1 million tuples through the dataflow graph, for each Ruby implementation:<br />
<a href="http://everythingisdata.files.wordpress.com/2011/04/perf1_small1.png"><img src="http://everythingisdata.files.wordpress.com/2011/04/perf1_small1.png?w=500&#038;h=311" alt="" title="Ruby Dataflow Benchmark" width="500" height="311" class="aligncenter size-full wp-image-674" /></a></p>
<p>I was surprised to see how much faster MRI 1.9 is than MRI 1.8. It is also interesting that JRuby does pretty decently.</p>
<p>So how should we interpret these results? Well, let&#8217;s compare the performance of the Ruby implementations with a more-or-less equivalent dataflow microbenchmark written in C:<br />
<a href="http://everythingisdata.files.wordpress.com/2011/04/perf2_small1.png"><img src="http://everythingisdata.files.wordpress.com/2011/04/perf2_small1.png?w=500&#038;h=311" alt="" title="Ruby + C Dataflow Benchmark" width="500" height="311" class="aligncenter size-full wp-image-675" /></a></p>
<h3>Conclusions</h3>
<p>After adding the C results, we can put the Ruby benchmark results into perspective: although recent Ruby implementations are considerably faster than MRI 1.8, they are still much slower (&gt; 10x) than C for this particular microbenchmark. Is that surprising? Not really &#8212; Ruby is certainly not an ideal choice for high-performance, data-intensive computing. Despite recent progress in building faster Ruby implementations, the performance gap (for this microbenchmark) remains considerable.</p>
<p>Obviously, take these results with a grain of salt: many more factors would contribute to the performance of a complete system than a trivial microbenchmark like this. There are also factors that are difficult to measure in a small microbenchmark: for example, using a pure Ruby implementation would force the GC to manage the state stored in Bloom collections, which would result in additional performance overhead.</p>
<h3>Additional Details</h3>
<ul>
<li>Benchmarks performed on a late 2010 Macbook Air (2.13Ghz Core 2), running OSX 10.6.7. The C benchmark was compiled with Apple&#8217;s GCC (4.2.1) at optimization level &#8220;-O2&#8243;, and was (dynamically) linked against APR 1.4.2.</li>
<li><a href="https://github.com/neilconway/dataflow_bench/raw/master/source/ruby/bench.rb">Ruby source code</a></li>
<li><a href="https://github.com/neilconway/dataflow_bench/raw/master/source/c/bench.c">C source code</a></li>
<li><a href="https://github.com/neilconway/dataflow_bench/raw/master/data_apr_2011/results.csv">Raw benchmark data</a></li>
</ul>
<br />Filed under: <a href='http://everythingisdata.wordpress.com/category/research-notes/'>Research Notes</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/everythingisdata.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/everythingisdata.wordpress.com/662/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/everythingisdata.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/everythingisdata.wordpress.com/662/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/everythingisdata.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/everythingisdata.wordpress.com/662/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/everythingisdata.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/everythingisdata.wordpress.com/662/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/everythingisdata.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/everythingisdata.wordpress.com/662/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/everythingisdata.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/everythingisdata.wordpress.com/662/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/everythingisdata.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/everythingisdata.wordpress.com/662/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=662&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://everythingisdata.wordpress.com/2011/04/12/dataflow-ruby-benchmark/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">neilconway</media:title>
		</media:content>

		<media:content url="http://everythingisdata.files.wordpress.com/2011/04/perf1_small1.png" medium="image">
			<media:title type="html">Ruby Dataflow Benchmark</media:title>
		</media:content>

		<media:content url="http://everythingisdata.files.wordpress.com/2011/04/perf2_small1.png" medium="image">
			<media:title type="html">Ruby + C Dataflow Benchmark</media:title>
		</media:content>
	</item>
		<item>
		<title>&#8220;Not-a-Bot: Improving Service Availability in the Face of Botnet Attacks&#8221;</title>
		<link>http://everythingisdata.wordpress.com/2009/11/22/not-a-bot-improving-service-availability-in-the-face-of-botnet-attacks/</link>
		<comments>http://everythingisdata.wordpress.com/2009/11/22/not-a-bot-improving-service-availability-in-the-face-of-botnet-attacks/#comments</comments>
		<pubDate>Sun, 22 Nov 2009 07:17:29 +0000</pubDate>
		<dc:creator>Neil Conway</dc:creator>
				<category><![CDATA[Paper Summaries]]></category>
		<category><![CDATA[botnet]]></category>
		<category><![CDATA[cs268]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[paper summary]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://everythingisdata.wordpress.com/?p=611</guid>
		<description><![CDATA[This paper focuses on distinguishing human-generated activity from bot-generated activity. Then, human-generated activity can be given preferential treatment (e.g. favorable routing of traffic, not being treated as spam). Their measure for distinguishing human-generated actions from machine-generated actions is pretty coarse &#8230; <a href="http://everythingisdata.wordpress.com/2009/11/22/not-a-bot-improving-service-availability-in-the-face-of-botnet-attacks/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=611&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://nms.csail.mit.edu/~ramki/nab.pdf">This paper</a> focuses on distinguishing human-generated activity from bot-generated activity. Then, human-generated activity can be given preferential treatment (e.g. favorable routing of traffic, not being treated as spam). Their measure for distinguishing human-generated actions from machine-generated actions is pretty coarse and imprecise: an action is human-generated if it is preceded by keyboard or mouse input within a certain amount of time.</p>
<p>To implement this scheme, they go into considerable (exhaustive) detail about how to use the Trusted Computing Module (TPM) to build a trusted path between the physical input devices (keyboard, mouse) and a small piece of software called the <i>attestor</i>. To certify an action as human-generated, applications ask the attester for an attestation, passing a hash of the content to the attested for. If there has been user input within a predefined period, the attester returns a cryptographically-signed token that can be attached to the user action. When an upstream service receives the user action (e.g. HTTP request, email), it can verify the attestation by hashing the content of the action, and checking the cryptographic signature. Incorporating the content hash prevents an attestation for action <i>x</i> being used instead with action <i>y</i>. The verifier also needs to check that attestations are not reused, so the attester includes a nonce in the attestation token.</p>
<p>It is possible that a bot can monitor user actions, and submit malicious content to the attester whenever the user uses an input device. This would allow attestations to be created for malicious content, which means upstream software cannot blindly trust attested-for content. To reduce the impact of this attack, the paper suggests rate-limiting attestations to one per second.</p>
<h3>Discussion</h3>
<p>I liked how the paper discussed an alternative approach to the same problem (having the attester track keyboard and mouse inputs, and then <i>match</i> that recorded history against the content that is to be attested, looking for a correspondence). Many papers present the solution they chose as the only alternative, when in fact it usually represents only one point in a much richer design space.</p>
<p>In some sense, the inverse of the proposed functionality would be more useful: i.e. being able to assert &#8220;this content was <i>definitely</i> bot-generated.&#8221; As proposed, it might be very hard for upstream services to make use of the certifications unless this idea saw widespread adoption. For example, suppose that 0.1% of your traffic is guaranteed by NAB to be human-generated. The remaining traffic may or may not be bot-generated, so you effectively can&#8217;t discriminate against it.</p>
<p>The paper suggests that, using this approach, human-generated content on a bot-infested machine can be effectively distinguished from bot-generated traffic. This seems pretty unlikely: the bot software can simply suppress <i>all</i> outgoing attestations (e.g. by installing a rootkit and interfering with the API used to request attestations), leaving upstream software in the same state as they would be without NAB.</p>
<br />Posted in Paper Summaries  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/everythingisdata.wordpress.com/611/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/everythingisdata.wordpress.com/611/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/everythingisdata.wordpress.com/611/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/everythingisdata.wordpress.com/611/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/everythingisdata.wordpress.com/611/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/everythingisdata.wordpress.com/611/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/everythingisdata.wordpress.com/611/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/everythingisdata.wordpress.com/611/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/everythingisdata.wordpress.com/611/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/everythingisdata.wordpress.com/611/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/everythingisdata.wordpress.com/611/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/everythingisdata.wordpress.com/611/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/everythingisdata.wordpress.com/611/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/everythingisdata.wordpress.com/611/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=611&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://everythingisdata.wordpress.com/2009/11/22/not-a-bot-improving-service-availability-in-the-face-of-botnet-attacks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">neilconway</media:title>
		</media:content>
	</item>
		<item>
		<title>&#8220;BotGraph: Large Scale Spamming Botnet Detection&#8221;</title>
		<link>http://everythingisdata.wordpress.com/2009/11/22/botgraph-large-scale-spamming-botnet-detection/</link>
		<comments>http://everythingisdata.wordpress.com/2009/11/22/botgraph-large-scale-spamming-botnet-detection/#comments</comments>
		<pubDate>Sun, 22 Nov 2009 03:16:36 +0000</pubDate>
		<dc:creator>Neil Conway</dc:creator>
				<category><![CDATA[Paper Summaries]]></category>
		<category><![CDATA[botnet]]></category>
		<category><![CDATA[cs268]]></category>
		<category><![CDATA[dryad]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[paper summary]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://everythingisdata.wordpress.com/?p=603</guid>
		<description><![CDATA[Botnets are used for various nefarious ends; one popular use is sending spam email by creating and then using accounts on free webmail providers like Hotmail and Google Mail. In the past, CAPTCHAs have been used to try to prevent &#8230; <a href="http://everythingisdata.wordpress.com/2009/11/22/botgraph-large-scale-spamming-botnet-detection/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=603&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Botnets are used for various nefarious ends; one popular use is sending spam email by creating and then using accounts on free webmail providers like Hotmail and Google Mail. In the past, <a href="http://en.wikipedia.org/wiki/CAPTCHA">CAPTCHAs</a> have been used to try to prevent this, but they are increasingly ineffective. Hence, the <a href="http://research.microsoft.com/pubs/79413/botgraph.pdf">BotGraph</a> paper proposes an algorithm for detecting bot-created accounts by analyzing user access behavior. They describe the algorithm, its implementation with <a href="http://research.microsoft.com/en-us/projects/dryad/">Dryad</a>, and present experimental results from real-world Hotmail access logs.</p>
<h3>Algorithm</h3>
<p>BotGraph employs three different ideas for detecting automated users:</p>
<ol>
<li>They regard sudden spikes in the number of accounts created by a single IP as suspicious. Hence, they use a simple exponentially-weighted moving average (EWMA) to detect such spikes, and throttle/rate-limit account signups from suspicious IPs. This has the effect of making it more difficult for spammers to obtain webmail accounts.</li>
<li>They argue that the number of bot machines will be much smaller than the number of bot-created webmail accounts; hence, one bot machine will access a large number of accounts. They also argue that a single bot-created webmail account will be accessed from multiple bots on different <a href="http://en.wikipedia.org/wiki/Autonomous_system_(Internet)">autonomous systems</a> (ASs), due to churn in the botnet (although this seems pretty unconvincing to me), and the fact that rate-limiting makes it more difficult to create large numbers of bot accounts. Hence, they look for pairs of user accounts that had logins from an overlapping set of ASs.</li>
<li>Finally, they consider a user&#8217;s email-sending behavior:<br />
<blockquote><p>
Normal users usually send a small number of emails per day on average, with different email sizes. On the other hand, bot-users usually send many emails per day, with identical or similar email sizes
</p></blockquote>
<p>Hence, they regard users who send 3+ emails per day as &#8220;suspicious&#8221;; they also regard as suspicious users whose email-size distributions are dissimilar from most other users.</li>
</ol>
<p>They use feature #1 primarily to rate-throttle new account creations. Feature #3 is used to avoid false positives.</p>
<p>Feature #2 is the primary focus of the paper. They construct a <i>user-user</i> graph with a vertex for each user account. Each edge has a weight that gives the number of shared login ASs &#8212; that is, the number of ASs that were used to login to both accounts. Within the user-user graph, they look for connected components with an edge weight over a threshold <i>T</i>: they begin by finding components with <i>T=2</i>, and then iteratively increasingly the threshold until each component has no more than 100 members.</p>
<h3>Implementation</h3>
<p>They describe two ways to implement the construction of the user-user graph using a data-parallel system like MapReduce or Dryad, using the login log from Hotmail (~220GB for one month of data):</p>
<ol>
<li>Partition the login records by client IP. Emit an intermediate record <i>(i, j, k)</i> for each shared login on the same day from AS <i>k</i> to accounts <i>i</i> and <i>j</i>. In the reduce phase, group on <i>(i, j)</i> and sum. The problem with this approach is that it requires a lot of communication: most edges in the user-user graph have weight 1, and hence can be dropped, but this approach still requires sending them over the network.</li>
<li>Partition the login records by user name. For each partition, compute a &#8220;summary&#8221; of the IP-day keys present for users in that partition (the paper doesn&#8217;t specify the nature of the summary, but presumably it is analogous to a <a href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filter</a>). Each partition sends its summary to every other partition. Using the summaries, each partition can exchange login records with other partitions in a way that allows edge weights to be computed, but doesn&#8217;t require sending weight 1 edges over the network.</li>
</ol>
<p>They argue that the second method can&#8217;t be implemented with Map and Reduce, although I&#8217;m not sure if I believe them: multicasting can be done by writing to HDFS, as can shipping data between logical partitions.</p>
<h3>Discussion</h3>
<p>I think the major problem with their experimental results is that there&#8217;s effectively no adversary: botnet operators presumably weren&#8217;t aware of this technique when the experiments were performed. Hence, they haven&#8217;t adapted their tactics &#8212; which might actually be quite easy to do.</p>
<p>For example, it seems like it would be quite easy to defeat their EWMA-based throttling by simply increasing the number of signups/time gradually. Essentially, the bot machine acts like an HTTP proxy with a gradually-increasing user population. One can imagine such a bot even mimicking the traffic patterns exhibited by a real-world proxy (e.g. increase at 9AM, decrease at 5PM). Certainly using a simple EWMA seems too primitive to defeat a dedicated adversary.</p>
<p>Similarly, it also seems quite easy to avoid sharing a single webmail account among multiple botnets: simply assign a single webmail account to a single bot machine, and don&#8217;t reuse webmail accounts if the bot machine becomes inaccessible. The idea, again, is to simulate an HTTP proxy that accesses a large number of webmail accounts. The paper&#8217;s argument that &#8220;churn&#8221; <i>requires</i> reuse of webmail accounts &#8220;to maximize bot-account utilization&#8221; is unconvincing and unsubstantiated. Since this is the entire principle upon which their technique is based, I&#8217;d be quite concerned that a relatively simple adaptation on the part of botnet operators would make this analysis ineffective.</p>
<p>I thought the paper&#8217;s wide-eyed tone toward using MapReduce-style systems for graph algorithms was annoying. <i>Lots</i> of people do large-scale graph algorithms using MapReduce-style systems; in fact, that&#8217;s one of the main things MapReduce was originally designed for (e.g. computing PageRank). The paper is not novel in this respect, and I was surprised that they didn&#8217;t cite one of the <a href="http://scholar.google.com/scholar?q=mapreduce+graph">many prior papers</a> on this subject.</p>
<br />Posted in Paper Summaries  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/everythingisdata.wordpress.com/603/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/everythingisdata.wordpress.com/603/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/everythingisdata.wordpress.com/603/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/everythingisdata.wordpress.com/603/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/everythingisdata.wordpress.com/603/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/everythingisdata.wordpress.com/603/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/everythingisdata.wordpress.com/603/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/everythingisdata.wordpress.com/603/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/everythingisdata.wordpress.com/603/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/everythingisdata.wordpress.com/603/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/everythingisdata.wordpress.com/603/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/everythingisdata.wordpress.com/603/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/everythingisdata.wordpress.com/603/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/everythingisdata.wordpress.com/603/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=603&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://everythingisdata.wordpress.com/2009/11/22/botgraph-large-scale-spamming-botnet-detection/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">neilconway</media:title>
		</media:content>
	</item>
		<item>
		<title>&#8220;Cutting the Electric Bill for Internet-Scale Systems&#8221;</title>
		<link>http://everythingisdata.wordpress.com/2009/11/20/cutting-the-electric-bill-for-internet-scale-systems/</link>
		<comments>http://everythingisdata.wordpress.com/2009/11/20/cutting-the-electric-bill-for-internet-scale-systems/#comments</comments>
		<pubDate>Fri, 20 Nov 2009 00:26:48 +0000</pubDate>
		<dc:creator>Neil Conway</dc:creator>
				<category><![CDATA[Paper Summaries]]></category>
		<category><![CDATA[cs268]]></category>
		<category><![CDATA[energy]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[paper summary]]></category>

		<guid isPermaLink="false">http://everythingisdata.wordpress.com/?p=589</guid>
		<description><![CDATA[This paper begins with three observations: Energy-related costs are an increasingly large portion of total data center operating expenses. The cost of electricity can vary significantly between different times and between different regions at the same time. Many distributed systems &#8230; <a href="http://everythingisdata.wordpress.com/2009/11/20/cutting-the-electric-bill-for-internet-scale-systems/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=589&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://nms.lcs.mit.edu/papers/sigcomm372-aqureshi.pdf">This paper</a> begins with three observations:</p>
<ol>
<li>Energy-related costs are an increasingly large portion of total data center operating expenses.</li>
<li>The cost of electricity can vary significantly between different times and between different regions at the same time.</li>
<li>Many distributed systems already have the ability to dynamically route requests to different hosts and physical regions (for example, most CDNs try to minimize client latency and bandwidth costs by routing requests to a data center &#8220;close&#8221; to the client.)</li>
</ol>
<p>Therefore, the paper investigates the feasibility of shifting load among replicas of a service that are located in different geographic regions, according to the current price of electricity in each region. For this to be effective, several things must be true:</p>
<ol>
<li>There must be significant variation in the price of electricity available in different regions at the same time.</li>
<li>Data centers must be <i><a href="http://www.cra.org/ccc/docs/ieee_computer07.pdf">energy proportional</a></i>: as the load on a data center is decreased by a factor of <i>k</i>, its energy usage should decrease by the same factor.</li>
<li>Routing traffic to minimize the cost of electricity may result in increasing client latency and using more bandwidth (since cheap power might be far away from the client); the additional routers traversed might also use additional energy.</li>
</ol>
<p>To answer the first question, the authors conduct a detailed empirical study of the cost of energy in different regions across the US, and compare that information with traffic logs from Akamai&#8217;s CDN. The authors use the Akamai traffic data to estimate how much the cost of electricity could be reduced by routing requests to the cheapest available electricity source, subject to various additional constraints.</p>
<p>The authors don&#8217;t do much to address the second question: they admit that the effectiveness of this technique depends heavily on energy-proportionality, but most computing equipment is not very energy-proportional (idle power consumption of a single system is typically ~60% of peak power usage, for example). Since energy-proportionality is the subject of much recent research, they express the hope that future hardware will be more energy-proportional. Finally, they carefully consider the impact of electricity-price-based routing on other optimization goals: for example, they consider only changing routes in a way that doesn&#8217;t result in <i>any</i> increased bandwidth charges (due to the &#8220;<a href="http://en.wikipedia.org/wiki/Burstable_billing">95-5</a>&#8221; pricing scheme that most bandwidth providers use). A realistic implementation of this technique would consider electricity cost as one factor in a multi-variable optimization problem: we want to simultaneously minimize electricity cost, minimize client-perceived latency and minimize bandwidth charges, for example.</p>
<h3>Summary of Results</h3>
<p>The authors found significant asymmetry in electricity prices between geographic areas; furthermore, this asymmetry was dynamic (different regions were cheaper at different times). These are promising results for dynamic routing of requests based on electricity prices.</p>
<p>When cluster energy usage is completely proportional to load and bandwidth cost is not considered, price-sensitive routing can reduce energy costs by ~40%. The savings drop to only 5% if the energy-proportionality of current hardware is used, and the savings drop to a third of that if we are constrained to not increase bandwidth costs at all (assuming 95-5 pricing). Hence, this technique is only really effective if energy-proportional data centers are widely deployed.</p>
<h3>Discussion</h3>
<p>I thought this was a great paper. The basic idea is simple, but their empirical study of the US electricity market was carefully done, and the results are instructive.</p>
<p>One interesting question is what would happen to the electricity market if techniques like these were widely deployed. Essentially, electricity consumption would become more price-elastic. When a given region offers a lower price, demand could move to that region quite quickly, which might act to drive up the price. Conversely, it would lower demand in higher-priced regions, lowering the price &#8212; and hence benefiting more inelastic energy consumers in that region.</p>
<br />Posted in Paper Summaries  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/everythingisdata.wordpress.com/589/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/everythingisdata.wordpress.com/589/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/everythingisdata.wordpress.com/589/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/everythingisdata.wordpress.com/589/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/everythingisdata.wordpress.com/589/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/everythingisdata.wordpress.com/589/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/everythingisdata.wordpress.com/589/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/everythingisdata.wordpress.com/589/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/everythingisdata.wordpress.com/589/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/everythingisdata.wordpress.com/589/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/everythingisdata.wordpress.com/589/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/everythingisdata.wordpress.com/589/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/everythingisdata.wordpress.com/589/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/everythingisdata.wordpress.com/589/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=589&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://everythingisdata.wordpress.com/2009/11/20/cutting-the-electric-bill-for-internet-scale-systems/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">neilconway</media:title>
		</media:content>
	</item>
		<item>
		<title>&#8220;Scalable Reliable Multicast&#8221;</title>
		<link>http://everythingisdata.wordpress.com/2009/11/19/scalable-reliable-multicast/</link>
		<comments>http://everythingisdata.wordpress.com/2009/11/19/scalable-reliable-multicast/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 18:37:08 +0000</pubDate>
		<dc:creator>Neil Conway</dc:creator>
				<category><![CDATA[Paper Summaries]]></category>
		<category><![CDATA[cs268]]></category>
		<category><![CDATA[multicast]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[paper summary]]></category>

		<guid isPermaLink="false">http://everythingisdata.wordpress.com/?p=584</guid>
		<description><![CDATA[The SRM (Scalable Reliable Multicast) argues that, unlike reliable unicast protocols, different applications vary widely in their requirements for reliable multicast delivery. Hence: One cannot make a single reliable multicast delivery scheme that simultaneously meets the functionality, scalability, and efficiency &#8230; <a href="http://everythingisdata.wordpress.com/2009/11/19/scalable-reliable-multicast/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=584&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.icir.org/floyd/srm-paper.html">SRM</a> (Scalable Reliable Multicast) argues that, unlike reliable unicast protocols, different applications vary widely in their requirements for reliable multicast delivery. Hence:</p>
<blockquote><p>
One cannot make a single reliable multicast delivery scheme that simultaneously meets the functionality, scalability, and efficiency requirements of all applications.
</p></blockquote>
<p>Hence, they propose a model in which a generic skeleton is &#8220;fleshed out with application specific details.&#8221; The application provides a means to talk about how much data has been sent and received (&#8220;application data units&#8221;, unlike the connection-state oriented acknowledgments in TCP), a means for allocating bandwidth among the members of the group, and a means for individual nodes to decide how to allocate their local outgoing bandwidth. Following this model, their multicast framework only provides best-effort packet delivery, with possible duplication and reordering of packets &#8212; they believe that applications can build stronger reliable delivery and ordering properties on top of their framework, as needed. Best-effort multicast is actually implemented using IP multicast.</p>
<h3>Unicast vs. Multicast</h3>
<p>The authors point out two differences between reliable delivery for unicast vs. multicast that I thought were interesting:</p>
<ul>
<li>In any reliable delivery protocol, one party must take responsibility for detecting lost data and retransmitting it. In unicast, either the sender or receiver can play this role equally well (TCP uses the sender, other protocols like NETBLT use the receiver). In multicast, sender-side delivery state is problematic: the sender must track the set of active recipients and the current state of each recipient, which is expensive, and difficult to do as the multicast group changes. In some sense, the whole point of multicast is to relieve the sender of that responsibility. Hence, they argue that receiver-side delivery state is better for multicast: group membership isn&#8217;t relevant, and the burden of keeping per-receiver state is avoided.</li>
<li>A reliable delivery protocol also needs a vocabulary to talk about how much data has been sent or received. Typical unicast protocols use a vocabulary based on communication state (typically either bytes or packets (segments)). This is not ideal for multicast, because a newly-joining recipient doesn&#8217;t share that communication state, and hence can&#8217;t interpret byte-oriented or packet-oriented messages. SRM instead argues for talking in terms of &#8220;application data units.&#8221;</li>
</ul>
<h3><tt>wb</tt> Framework</h3>
<p>The example multicast application in the paper is <tt>wb</tt>, a shared distributed whiteboard. This has the advantage that protocol commands (e.g. draw shape X at location Y) are mostly idempotent (if associated with a timestamp), so the underlying multicast protocol doesn&#8217;t need to enforce a total order on deliveries.</p>
<p>In <tt>wb</tt>, new operations are multicast to the entire group. When a receiver detects a loss (by detecting gaps in sequence numbers), it starts a &#8220;repair request&#8221; timer. The timer value is determined by the distance of the receiver from the original data source. When the timer expires, the recipient multicasts a &#8220;repair request&#8221; to the entire group, asking for the missing data. If a recipient sees another repair request for the same data before its timer expires, it suppresses its own repair request. Retransmission of missing data is handled similarly: when nodes receive repair requests for data they have seen, they start a response timer based on their distance from the repair request source. When the timer expires, they multicast the requested data to the entire group. Any other nodes that have a response timer for the requested data suppress their own timers. They also propose a bandwidth allocation scheme to divide the available bandwidth among new data operations and repair data.</p>
<h3>Tuning the Request/Response Repair Timers</h3>
<p>It is important that the repair request and repair response timers at different nodes be de-synchronized, to avoid redundant messages. The paper observes that certain topologies require methods for achieving de-synchronization: in a simple &#8220;chain&#8221; topology, seeding the timer with network distance is sufficient. For a &#8220;star&#8221; topology, all nodes are the same distance from the data source, so randomization must be used. A combination of these techniques must be used for a tree topology. Rather than requiring the timers be tuned for each individual network, they instead propose an adaptive algorithm that uses prior request/response behavior to tune the timers automatically.</p>
<br />Posted in Paper Summaries  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/everythingisdata.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/everythingisdata.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/everythingisdata.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/everythingisdata.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/everythingisdata.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/everythingisdata.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/everythingisdata.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/everythingisdata.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/everythingisdata.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/everythingisdata.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/everythingisdata.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/everythingisdata.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/everythingisdata.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/everythingisdata.wordpress.com/584/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=584&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://everythingisdata.wordpress.com/2009/11/19/scalable-reliable-multicast/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">neilconway</media:title>
		</media:content>
	</item>
		<item>
		<title>&#8220;Skilled in the Art of Being Idle&#8221;</title>
		<link>http://everythingisdata.wordpress.com/2009/11/18/skilled-in-the-art-of-being-idle/</link>
		<comments>http://everythingisdata.wordpress.com/2009/11/18/skilled-in-the-art-of-being-idle/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 08:22:58 +0000</pubDate>
		<dc:creator>Neil Conway</dc:creator>
				<category><![CDATA[Paper Summaries]]></category>
		<category><![CDATA[cs268]]></category>
		<category><![CDATA[energy]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[paper summary]]></category>

		<guid isPermaLink="false">http://everythingisdata.wordpress.com/?p=570</guid>
		<description><![CDATA[&#8220;Skilled in the Art of Being Idle&#8221; looks at how to reduce energy consumption by network end hosts (primarily desktops and laptops). Modern computers have various &#8220;sleep&#8221; states that allow reduced power consumption during idle periods. However, putting a computer &#8230; <a href="http://everythingisdata.wordpress.com/2009/11/18/skilled-in-the-art-of-being-idle/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=570&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>&#8220;<a href="http://tier.cs.berkeley.edu/docs/nedevschi_nsdi09.pdf">Skilled in the Art of Being Idle</a>&#8221; looks at how to reduce energy consumption by network end hosts (primarily desktops and laptops). Modern computers have various &#8220;sleep&#8221; states that allow reduced power consumption during idle periods. However, putting a computer to sleep has several costs:</p>
<ol>
<li>Transitioning into and out of a sleep state requires time (the paper cites a <a href="http://research.microsoft.com/pubs/79419/agarwal-NSDI09-Somniloquy.pdf">recent paper</a> that found that typical machines take 3-8 seconds to enter &#8220;S3&#8243; sleep state, and 3-5 seconds to resume).</li>
<li>Sleeping hosts cannot respond to network packets; hence, they can lose their network presence (e.g. DHCP lease can expire and be reassigned to another host). They also cannot run periodic tasks (e.g. backup or virus scanning).</li>
</ol>
<p>A naive approach would put idle nodes to sleep, and then awaken them (via established &#8220;<a href="http://en.wikipedia.org/wiki/Wake-on-LAN">wake-on-LAN</a>&#8221; support) when a packet is delivered to the node. This is insufficient: the authors demonstrate that in both home and office environments, the inter-arrival time of packets destined for idle computers is too small, so the cost of transitioning into and out of sleep state would negate any significant power savings. To avoid this problem, prior work has proposed using a proxy to handle network traffic intended for a sleeping node. The proxy can either ignore the traffic (if appropriate); handle the packet itself (e.g. by responding to an ARP query for the sleeping node&#8217;s IP), or it can awaken the sleeping node and forward the packet to it, if necessary. The effectiveness of proxying therefore depends on most traffic for an idle node fitting into the first two categories.</p>
<p>The paper is an empirical study of 250 machines owned by Intel employees, to assess the need for proxying in practice; the potential benefit of proxying; the network traffic that must be handled by a proxy; and so on.</p>
<h3>Summary of Findings</h3>
<ul>
<li>Most machines are idle, most of the time &#8212; on average, machines were idle 90% of the time.</li>
<li>Home and office idle network traffic is markedly different, in both inter-arrival time of packets for idle machines (offices have more such traffic), and the nature of this traffic.</li>
<li>A power-saving proxy would need to handle broadcast, multicast, and unicast traffic to achieve significant gains. However, broadcast and multicast traffic represents &#8220;low-hanging fruit&#8221;: a proxy that handles only broadcast and multicast traffic would recover 80% of the idle time in home environments, and over 50% of the idle time in office environments.</li>
<li>For broadcast, address resolution (ARP, NBNS) and service discovery (SSDP for UPnP devices) protocols are the dominant sources of traffic for idle nodes. Both kinds of traffic are easy to proxy.</li>
<li>For multicast environments, routing traffic (HSRP, PIM) is the dominant source of traffic for idle nodes in an office environment. In a home environment, service discovery (SSDP) is dominant.</li>
<li>Their results for unicast traffic are less clear. They argue that only outgoing TCP connections dominate unicast traffic for idle nodes, and that less than 25% of this traffic is the result of some action a node initiated before becoming idle (and hence might need to be maintained in the idle state). They argue that outgoing traffic <i>initiated</i> while the machine is idle can often be batched together, or avoided entirely.</li>
<br />Posted in Paper Summaries  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/everythingisdata.wordpress.com/570/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/everythingisdata.wordpress.com/570/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/everythingisdata.wordpress.com/570/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/everythingisdata.wordpress.com/570/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/everythingisdata.wordpress.com/570/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/everythingisdata.wordpress.com/570/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/everythingisdata.wordpress.com/570/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/everythingisdata.wordpress.com/570/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/everythingisdata.wordpress.com/570/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/everythingisdata.wordpress.com/570/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/everythingisdata.wordpress.com/570/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/everythingisdata.wordpress.com/570/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/everythingisdata.wordpress.com/570/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/everythingisdata.wordpress.com/570/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=570&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://everythingisdata.wordpress.com/2009/11/18/skilled-in-the-art-of-being-idle/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">neilconway</media:title>
		</media:content>
	</item>
		<item>
		<title>&#8220;Scalable Application Layer Multicast&#8221;</title>
		<link>http://everythingisdata.wordpress.com/2009/11/18/scalable-application-layer-multicast/</link>
		<comments>http://everythingisdata.wordpress.com/2009/11/18/scalable-application-layer-multicast/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 07:52:25 +0000</pubDate>
		<dc:creator>Neil Conway</dc:creator>
				<category><![CDATA[Paper Summaries]]></category>
		<category><![CDATA[cs268]]></category>
		<category><![CDATA[multicast]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[overlay network]]></category>
		<category><![CDATA[paper summary]]></category>

		<guid isPermaLink="false">http://everythingisdata.wordpress.com/?p=572</guid>
		<description><![CDATA[Multicast is clearly an efficient technique for applications with one-to-many communication patterns, but the deployment of in-network multicast has been slow. Therefore, there have been a number of proposals for implementing multicast at the application level, as an overlay over &#8230; <a href="http://everythingisdata.wordpress.com/2009/11/18/scalable-application-layer-multicast/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=572&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Multicast is clearly an efficient technique for applications with one-to-many communication patterns, but the deployment of in-network multicast has been slow. Therefore, there have been a number of proposals for implementing multicast at the application level, as an <a href="http://en.wikipedia.org/wiki/Overlay_network">overlay</a> over the physical network. This is not as efficient as true in-network multicast (because the same message may be sent over the same link multiple times), but is much more flexible and easier to deploy.</p>
<p>&#8220;<a href="http://pages.cs.wisc.edu/~suman/pubs/sigcomm02.pdf">Scalable Application Layer Multicast</a>&#8221; is one such proposal for application-layer multicast via an overlay network; their proposed protocol is called <i>NICE</i>. Their focus is on applications that require low-latency delivery of relatively low-bandwidth data streams to a large set of recipients, although they argue that their techniques are also applicable to high-volume streams with some minor changes.</p>
<h3>Network Topology</h3>
<p>In NICE, nodes are arranged into layers; a single node can appear in more than one layer. Every node belongs to the lowest layer, <i>L0</i>. The nodes in a layer are arranged (automatically) into clusters of nodes that are &#8220;near&#8221; to one another (in terms of network distance/latency). Each cluster has a <i>leader</i>, which is the node in the &#8220;center&#8221; of the cluster (NICE tries to make the leader the node that has the smallest maximal latency to the other nodes in the cluster). Layer <i>L1</i> consists of all the cluster leaders from <i>L0</i>; the clustering algorithm is applied to <i>L1</i> in turn, yielding another set of cluster leaders which form <i>L2</i>, and so forth. The height of the tree is determined by the number of nodes and a constant <i>k</i>: each cluster contains between <i>k</i> and <i>3k-1</i> nodes.</p>
<p>To multicast a data message, a node forwards the message to every cluster peer in every layer in which the node belongs, except that a node never forwards a message back to the message&#8217;s previous hop. </p>
<h3>Protocol</h3>
<p>To join the multicast group, a node begins by contacting a designated node called the <i>Rendezvous Point</i> (RP). The RP is typically the root of the NICE tree. The joining node walks down the tree from the root, choosing the child node that is closest to it (lowest latency).</p>
<p>Cluster leaders periodically check whether the cluster size constraint (<i>k</i> &lt;= size &lt;= <i>3k-1</i>) has been violated; if so, they initiate a cluster merge or split, as appropriate. Splitting a cluster into two clusters is done by trying to minimize the maximum of the radii of the resulting clusters.</p>
<p>All the nodes in a cluster periodically sends heartbeats to each of its cluster peers. This is used to detect node failures, and to update pair-wise latency information for nodes. If a cluster leader fails or deliberately leaves the NICE group, a new leader is chosen by the same heuristic (minimize maximum latency from new center to any cluster peer). A new leader may also be chosen if the pair-wise latencies in the cluster drift sufficiently far to make selecting a new leader justified. Also, each member of every layer <i>i</i> periodically probes its latency to the nodes in layer <i>i+1</i> &#8212; if the node is closer to another <i>i+1</i> layer node, it moves to the corresponding cluster in layer <i>i</i>.</p>
<h3>Discussion</h3>
<p>I didn&#8217;t see that the authors provided any grounds for choosing an appropriate <i>k</i> value (essentially the tree fan-in), which seems like it would be an important parameter.</p>
<br />Posted in Paper Summaries  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/everythingisdata.wordpress.com/572/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/everythingisdata.wordpress.com/572/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/everythingisdata.wordpress.com/572/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/everythingisdata.wordpress.com/572/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/everythingisdata.wordpress.com/572/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/everythingisdata.wordpress.com/572/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/everythingisdata.wordpress.com/572/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/everythingisdata.wordpress.com/572/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/everythingisdata.wordpress.com/572/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/everythingisdata.wordpress.com/572/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/everythingisdata.wordpress.com/572/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/everythingisdata.wordpress.com/572/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/everythingisdata.wordpress.com/572/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/everythingisdata.wordpress.com/572/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=572&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://everythingisdata.wordpress.com/2009/11/18/scalable-application-layer-multicast/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">neilconway</media:title>
		</media:content>
	</item>
		<item>
		<title>&#8220;ExOr: Opportunistic Multi-Hop Routing for Wireless Networks&#8221;</title>
		<link>http://everythingisdata.wordpress.com/2009/11/15/exor-opportunistic-multi-hop-routing-for-wireless-networks/</link>
		<comments>http://everythingisdata.wordpress.com/2009/11/15/exor-opportunistic-multi-hop-routing-for-wireless-networks/#comments</comments>
		<pubDate>Sun, 15 Nov 2009 22:08:29 +0000</pubDate>
		<dc:creator>Neil Conway</dc:creator>
				<category><![CDATA[Paper Summaries]]></category>
		<category><![CDATA[cs268]]></category>
		<category><![CDATA[mesh network]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[paper summary]]></category>
		<category><![CDATA[routing]]></category>
		<category><![CDATA[wireless]]></category>

		<guid isPermaLink="false">http://everythingisdata.wordpress.com/?p=563</guid>
		<description><![CDATA[Wired networks are traditionally viewed as a unicast medium: packets are sent from A to B. Because the underlying medium can have a degree of sharing (e.g. multiple hosts connected to the same Ethernet hub), steps must be taken to &#8230; <a href="http://everythingisdata.wordpress.com/2009/11/15/exor-opportunistic-multi-hop-routing-for-wireless-networks/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=563&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Wired networks are traditionally viewed as a unicast medium: packets are sent from <i>A</i> to <i>B</i>. Because the underlying medium can have a degree of sharing (e.g. multiple hosts connected to the same Ethernet hub), steps must be taken to avoid interference between concurrent senders (e.g. <a href="http://en.wikipedia.org/wiki/Carrier_sense_multiple_access_with_collision_detection">CSMA</a>).</p>
<p>In a wireless network, the medium is shared: any hosts in range of a radio transmission can receive it. If the goal is to use a traditional unicast routing protocol over a wireless link, this makes interference between senders a more challenging problem (e.g. as discussed in the <a href="http://everythingisdata.wordpress.com/2009/09/27/macaw-a-media-access-protocol-for-wireless-lans/">MACAW</a> paper). However, rather than viewing broadcast as an inconvenience we need to work around, the broadcast nature of wireless can also be leveraged to design new protocols that would be infeasible for a wired network.</p>
<p>A nice example of a broadcast-oriented protocol for wireless is &#8220;<a href="http://pdos.csail.mit.edu/papers/roofnet:exor-sigcomm05/roofnet_exor-sigcomm05.pdf">ExOr: Opportunistic Multi-Hop Routing for Wireless Networks</a>&#8220;. In a traditional routing design, a sender chooses the next hop that should receive a packet, and then unicasts the packet to that destination. In ExOr, a sender broadcasts a <i>batch</i> of packets to a group of nodes simultaneously. The set of batch recipients coordinate to try to ensure that each packet in the batch is forwarded onward. The recipient of a packet is only chosen <i>after</i> the packet is sent, which allows ExOr to &#8220;opportunistically&#8221; take advantage of links that have high loss rates. ExOr assumes that node reception probabilities are mostly independent and mostly decrease with distance &#8212; both of which are probably pretty reasonable assumptions.</p>
<h3>Design</h3>
<p>The source collects a batch of packets destined for the same host, and then chooses a <i>forwarding list</i> for the batch. The forwarding list is a list of nodes, sorted by the expected cost of delivering packets from that node to the eventual destination node. The cost metric is similar to <a href="http://everythingisdata.wordpress.com/2009/10/07/a-path-metric-for-multi-hop-wireless-routing/">ETX</a> (expected number of transmissions required to send the packet to the destination via unicast, including retransmissions); unlike ETX, it only considers the forward delivery probability. While it would be possible to including all possible recipient nodes in the forwarding list, this would increase the coordination cost among the forwarders, so ExOr only includes &#8220;likely&#8221; recipients in the list (estimated 10%+ chance of receiving a broadcast packet).</p>
<p>Each packet contains a <i>batch map</i>, which holds the sender&#8217;s estimate of the highest priority (according to the cost metric) node to have received each packet in the batch. When a node receives a packet, it uses the packet&#8217;s batch map to update its local batch map. This means that batch map information propagates through the nodes, carrying information about packet reception from high priority nodes to lower priority nodes.</p>
<p>After the source broadcasts a batch, each member of the forwarding list broadcasts, ordered by descending priority (ETX value). Each node broadcasts the packets it received, along with its updated batch map. Nodes coordinate to schedule their transmissions to try to avoid interference (e.g. by estimating when each node&#8217;s expected transmission time is, according to ETX value and batch map contents). The protocol continues cycling through the nodes in priority order. At the end of each cycle, the ultimate destination broadcasts its batch map 10 times; at the beginning of each cycle, the source resends packets that weren&#8217;t received by any node (by observing batch map contents). The ExOr scheme stops when 90% of the packets in a batch have been transferred, and uses a traditional routing policy to deliver the remainder of the batch.</p>
<h3>Discussion</h3>
<p>Overall, ExOr is a really neat idea. That said, it is at best a special-purpose solution, because of the high latency it incurs for batching and multiple transmission rounds. Similarly, the need for a split TCP proxy is pretty ugly. A complete solution to wireless routing would perhaps adaptively switch between latency-oriented and bandwidth-oriented routing techniques, depending on the nature of the traffic.</p>
<p>It seems arbitrary to me that ExOr works on a batch-by-batch basis&#8212;that almost seems like establishing a new TCP connection for each window&#8217;s worth of data. The amount of useful work done in each transmission round decreases, until the protocol imposes an arbitrary cutoff at 90% and switches to a traditional routing protocol. Instead, wouldn&#8217;t it be more sensible for ExOr to allow new packets to be inserted into the batch as the destination confirms the delivery of packets? This would essential emulate the &#8220;conservation of packets&#8221; principle from the <a href="http://everythingisdata.wordpress.com/2009/09/06/congestion-avoidance-and-control/">Van Jacobson</a> paper.</p>
<br />Posted in Paper Summaries  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/everythingisdata.wordpress.com/563/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/everythingisdata.wordpress.com/563/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/everythingisdata.wordpress.com/563/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/everythingisdata.wordpress.com/563/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/everythingisdata.wordpress.com/563/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/everythingisdata.wordpress.com/563/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/everythingisdata.wordpress.com/563/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/everythingisdata.wordpress.com/563/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/everythingisdata.wordpress.com/563/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/everythingisdata.wordpress.com/563/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/everythingisdata.wordpress.com/563/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/everythingisdata.wordpress.com/563/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/everythingisdata.wordpress.com/563/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/everythingisdata.wordpress.com/563/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=563&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://everythingisdata.wordpress.com/2009/11/15/exor-opportunistic-multi-hop-routing-for-wireless-networks/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">neilconway</media:title>
		</media:content>
	</item>
		<item>
		<title>&#8220;White Space Networking with Wi-Fi like Connectivity&#8221;</title>
		<link>http://everythingisdata.wordpress.com/2009/11/12/white-space-networking-with-wi-fi-like-connectivity/</link>
		<comments>http://everythingisdata.wordpress.com/2009/11/12/white-space-networking-with-wi-fi-like-connectivity/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 03:44:42 +0000</pubDate>
		<dc:creator>Neil Conway</dc:creator>
				<category><![CDATA[Paper Summaries]]></category>
		<category><![CDATA[cs268]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[paper summary]]></category>
		<category><![CDATA[wireless]]></category>

		<guid isPermaLink="false">http://everythingisdata.wordpress.com/?p=555</guid>
		<description><![CDATA[&#8220;White Space Networking with Wi-Fi like Connectivity&#8221; discusses how to achieve wireless networking using so-called &#8220;white space&#8221;: unused portions of the UHF spectrum (approximately 512-698 Mhz). This is attractive, because using the UHF spectrum can allow wireless networking over a &#8230; <a href="http://everythingisdata.wordpress.com/2009/11/12/white-space-networking-with-wi-fi-like-connectivity/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=555&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>&#8220;<a href="http://www.eecs.harvard.edu/~mdw/papers/whitefi-sigcomm09.pdf">White Space Networking with Wi-Fi like Connectivity</a>&#8221; discusses how to achieve wireless networking using so-called &#8220;white space&#8221;: unused portions of the UHF spectrum (approximately 512-698 Mhz). This is attractive, because using the UHF spectrum can allow wireless networking over a relatively long range (1 mile or greater, compared to 300-600 feet for 802.11n). UHF signals are also better able to penetrate walls and buildings in urban environments.</p>
<p>The difficulty with using UHF spectrum is that the spectrum has already been assigned for use by two &#8220;incumbents&#8221;: analog TV and wireless microphones (although the use of the spectrum by analog TV should be reduced since this paper was written, due to the recent conversion of analog TV to digital). Per FCC rules, non-incumbent use of the UHF spectrum must avoid interfering with incumbent use. While TV signals are relatively stable, wireless microphone transmissions can begin without warning, which poses some challenges to providing wireless networking using the UHF spectrum. To gain improved throughput, the authors also suggest that multiple contiguous free channels should be aggregated together and used for networking.</p>
<p>To enable networking over UHF, several problems must be solved:</p>
<ul>
<li><b>Spectrum Assignment:</b> What portion of the 180 Mhz UHF band should be used for networking in a given locale? The UHF spectrum is typically &#8220;fragmented&#8221;: some portions of the spectrum are in use, while others are free. Furthermore, to increase throughput, multiple contiguous free channels should be used by a single network. The right spectrum to use will also change over time, e.g. as hosts move and wireless microphones are activated. The channel used must be free for both the AP and all clients, which is made challenging by the larger spacial extent of UHF networks.</li>
<li><b>AP Discovery:</b> WiFi APs emit beacons every 100ms on the WiFi channel they are using. To find APs, clients can easily scan each channel, looking for AP beacons. This is more challenging for UHF, because there are more possibilities to scan (30 UHF channels, and 3 possible channel widths, yielding 84 combinations).</li>
<li><b>Handling Disconnections:</b> If incumbent use of a channel is detected, networking transmissions must immediately stop using the channel (the authors found that even transmitting control packets can result in audible interference for wireless microphones using the channel). If clients and APs are prone to abruptly disconnect from a channel, a method is needed to choose another channel and continue communication.</li>
</ul>
<h3>Design</h3>
<p>To solve the spectrum assignment problem, clients and APs exchange information about which channels are free near them. APs probe for new channels when they detect incumbent use on their current channel, or if they observe a performance drop. Channel probing is done by considering which channels are free for all clients and the AP, and by then estimating the available bandwidth on each channel.</p>
<p>To enable AP discovery, the authors propose some signal processing magic. They propose a technique called &#8220;SIFT&#8221; which essentially allows them to scan the spectrum range and cheaply determine the presence of an AP on a channel; they can then tune the radio transceiver to the identified channel, and decode the beacon packet as usual.</p>
<p>To handle disconnections due to incumbent use of the spectrum, they propose using a separate 5Mhz <i>backup channel</i>, which is advertised as part of AP beacon packets. If a client senses that a disconnection has occurred (e.g. because no data packets have been received recently), it switches to the backup channel, and both listens for and emits <i>chirps</i>. Chirps contain information about the available white spaces near the chirping node, which is used to pick a new channel for communication. APs periodically scan for chirps.</p>
<p>It&#8217;s possible that the backup channel is already in use by another incumbent. In this case, the client picks an arbitrary available channel as the <i>secondary backup channel</i>, and emits chirps on it. The AP periodically scans all channels to attempt to find such chirps. The same signal processing magic (SIFT) that is used to enable cheap AP discovery is used to make scanning all channels periodically feasible.</p>
<h3>Related Reading</h3>
<p>Matt Welsh <a href="http://matt-welsh.blogspot.com/2009/08/whitefi-wi-fi-like-networking-in-uhf.html">discusses this paper</a> on his blog. The <a href="http://www.eecs.harvard.edu/~rohan/talks/rohan-sigcomm-whitefi.pptx">slides</a> for the SIGCOMM talk on this paper are also available online.</p>
<br />Posted in Paper Summaries  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/everythingisdata.wordpress.com/555/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/everythingisdata.wordpress.com/555/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/everythingisdata.wordpress.com/555/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/everythingisdata.wordpress.com/555/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/everythingisdata.wordpress.com/555/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/everythingisdata.wordpress.com/555/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/everythingisdata.wordpress.com/555/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/everythingisdata.wordpress.com/555/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/everythingisdata.wordpress.com/555/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/everythingisdata.wordpress.com/555/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/everythingisdata.wordpress.com/555/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/everythingisdata.wordpress.com/555/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/everythingisdata.wordpress.com/555/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/everythingisdata.wordpress.com/555/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=everythingisdata.wordpress.com&amp;blog=7583738&amp;post=555&amp;subd=everythingisdata&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://everythingisdata.wordpress.com/2009/11/12/white-space-networking-with-wi-fi-like-connectivity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">neilconway</media:title>
		</media:content>
	</item>
	</channel>
</rss>
