Kickfire and “Stream Processing”

I noticed Robert’s post about the Kickfire launch. He mentioned Truviso — for whom I work — so I thought I’d add my two cents.

Kickfire is the company previous known as “C2App“. I’m not familiar with the details of their technology, but the basic idea is to use custom hardware to accelerate data warehousing queries (this blog post has some more details). Using custom hardware is not a new idea — Netezza have been doing something superficially similar for years, with considerable success. In addition to custom hardware, Kickfire apparently use a few other data warehousing techniques that have recently come back in vogue (e.g. column-wise storage with compression, coupled with the ability to do query execution over compressed data). As an aside, I think that building a data warehousing product using MySQL is a fairly surprising technical decision.

One thing I did notice is that Kickfire’s PR mentions “stream processing” repeatedly, and Robert’s post suggests that the sort of stream processing done by Kickfire is similar to what Truviso does. This is not the case: the two companies and their products are very different. I’d guess that Kickfire are using the term because it’s become something of a buzzword.

I’d like to talk more about Truviso on this blog in the future, but the basic idea behind data stream processing is to allow analysis queries to be performed over live streams of data, as the data arrives at the system. In traditional databases, in order to apply a query to a piece of data, you first need to insert the data item into the database, wait for it to be committed to disk (force-write the write-ahead log), and then finally run a query on it from scratch. When data arrives at a rapid pace and you need low-latency query results, this “store-and-query” model has terrible performance; it’s also an unnatural way to structure a client application (you’re essentially polling for results). Instead, a data stream query processor allows the user to define a set of long-running continuous queries that represent the conditions of interest over the incoming data streams. As new live data arrives, the data is applied to the queries to incrementally update their results; client applications can simply consume new query results as soon as they become available. This allows you to get query results that are always up-to-date, without the need to first write data to disk (the data can either be discarded, or else written to disk asynchronously). For certain domains, such as algorithmic trading, network and environment monitoring, fraud detection, and real-time reporting, the data stream approach often yields much better performance and a more natural programming model. For more info, see the talk on data stream query processing I gave at last year’s PgCon.

So what does this have to do with using custom hardware to accelerate data warehousing queries? Not a whole lot. I’m guessing that Kickfire have co-opted the “stream processing” label because they push analysis queries down to the custom chip, and then “stream” the stored data over the chip, to compute multiple queries in a single pass. If you squint at it right, there are some similarities to stream query processing (in both cases, you only want to take one pass over the data), but fundamentally, Kickfire is trying to solve a very different problem, and using a very different set of technologies. Data warehouse engines like Kickfire (and Greenplum) are complements to data stream systems like Truviso (and Streambase, Coral8, and others), not supplements or competitors.


Leave a comment

Filed under Advogato, PostgreSQL

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s