IBM is joining the CERN openlab for DataGrid applications to help create a massive data-management system built on Grid computing.
IBM’s storage virtualization and file management technology will play a key role in the collaboration, which aims to create a data file system “far larger than exists today” to help scientists at the renowned particle physics research center understand some of the most fundamental questions about the nature of matter and the universe, according to IBM and CERN, the European Organization for Nuclear Research.
Conceived in IBM Research, the storage virtualization technology, known as Storage Tank, is designed to provide scalable, high-performance and highly available management of huge amounts of data using a single file namespace regardless of where or on what operating system the data reside.
IBM and CERN will work together to extend Storage Tank’s capabilities so it can manage and provide access from any location worldwide to the unprecedented amounts of data – billions of gigabytes a year – that CERN’s Large Hadron Collider (LHC) is expected to produce when it goes online in 2007. The LHC is the next-generation particle accelerator designed to recreate the conditions shortly after the Big Bang to help researchers understand the initial seconds when the universe was formed.
The CERN community – which is credited with inventing the World Wide Web in 1990 – hopes to push the Internet even further with Grid computing and the massive data processing requirements for the LHC. CERN openlab is a collaboration between CERN and leading industrial partners that will create and implement data-intensive Grid-computing technologies to aid LHC scientists. Because the same issues facing CERN are becoming increasingly important to the IT industry, the CERN openlab and its partners – which include Enterasys, HP and Intel – are working together to explore advanced computing and data management solutions.
By 2005, the CERN openlab collaboration with IBM is expected to be able to handle up to a petabyte (a million gigabytes) of data.
“We are delighted that IBM is joining the CERN openlab for DataGrid applications,” said Wolfgang von Ruden, Information Technology Division leader and head of the CERN openlab. “Together with IBM, we aim to achieve a one petabyte storage solution and integrate it with the Grid that CERN is building to handle the extreme data challenges of the LHC project.”
“CERN’s scientists and colleagues want to be able to get to their data wherever it may be – local or remote and regardless of which operating system on which it may reside,” said Jai Menon, a fellow at IBM’s Almaden Research Center in San Jose and co-director of IBM’s Storage Systems Institute joint program between IBM Research and the company’s product division. “This is the perfect environment for us to enhance Storage Tank to meet the demanding requirements of large-scale Grid computing systems.”
As part of the agreement, several leading storage management experts from IBM’s Almaden and Haifa (Israel) Research Labs will work with the CERN openlab team. IBM will also give CERN the system’s initial 20 terabytes of high-performance disk storage and six Linux-based servers, and IBM Switzerland will provide additional support.
Storage Tank employs policy-based storage management and includes clustering and specialized protocols that detect network failures to enable very high reliability and availability.
In this initiative, IBM is following a collaboration strategy initiated in 2001 with the European Union-sponsored European Data Grid project, which is also led by CERN.