The Berkeley DB group is hosting a talk about Facebook Hive this Thursday, at Soda Hall in UC Berkeley. Details and the abstract are below — it should be an interesting talk! I’d encourage anyone in the area to attend — if you need directions / parking suggestions / etc., just drop me a line.
Thursday, October 16th, 2008
606 Soda Hall, UC Berkeley
Title: Hive: Data Warehousing using Hadoop
Hive is an open-source data warehousing infrastructure built
on top of Hadoop that allows SQL like queries along with
abilities to add custom transformation scripts in different
stages of data processing. It includes language constructs
to import data from various sources, support for object
oriented data types and a metadata repository that
structures hadoop directories into relational tables and
partitions with typed columns. Facebook uses this system for
variety of tasks – classic log aggregation, graph mining,
text analysis and indexing.
In this talk we will give an overview of the Hive system,
the data model, query language compilation and execution and
the metadata store. We will also discuss our near term
roadmap and avenues for significant contributions in terms
of query optimization, execution speed and data compression
amongst others. We will also present some statistics on
usage within Facebook and outline some of the challenges in
operating Hive/Hadoop in a utility computing model in fast
Joydeep Sensarma has been working in the Facebook Data Team
for the last 1+ year where he’s taken turns coding up Hive,
keeping Hadoop running, eating and sleeping in that order.
He’s really glad he no longer works on closed source file
and database systems like he did for the last ten years.
Zheng Shao has worked in Facebook Data Team on Hadoop and
Hive for about 6 months. Before that he worked in the Yahoo
web search team which heavily uses Hadoop.
Namit Jain has been working in the Facebook Data team with
Hive for about 6 months. Before that he was in the database
and application server groups at Oracle for about 10 years.