Science

Parsing data at the Large Hadron Collider

Ian Bird at the Large Hadron Collider is tossing out tons of data that cost billions to get.
Ian Bird is the computing grid project leader for the Large Hadron Collider. ((Forbes.com))

TMI is the plaintive text message from a teen whose friend tells more than anybody really wants to know. But "too much information" also describes one of the challenges facing Ian Bird and his colleagues at the Large Hadron Collider, the physics experiment now powering up to full speed in an underground lab at the French-Swiss border. It will use enough electricity to power the nearby city of Geneva.

The job of the collider is to take tiny particles, accelerate them almost to the speed of light, smash them together and then take pictures of the collisions. To a lay person the snapshots may look like dashes and squiggles. For particle physicists like Bird they are expected to provide insights into the basic composition of matter and the workings of the universe.



There are five collision detectors at the collider, each one some five stories tall and operating much like a giant digital camera. Together they have 150 million sensors, each snapping 40 million pictures a second. Bird, 55, who is project leader for the computing grid, says the detectors can put out more data than his computers can handle, enough to fill the biggest computer disk drive in a fraction of a second. Since that would be more data than anyone would know what to do with, says Bird, working with the collider "is a matter of learning what data you can throw away. Most of it just isn't interesting."

Think of a closed-circuit security camera that watches a store entrance all night. No one arriving in the morning needs to sit through all 12 hours; they just need to see the parts of the tape of special interest. Easier said than done, since it might not be immediately obvious if a blur on the screen is a burglar — or simply a shadow from a moving car.

Learning to separate the collider's signals from its noise will take time, Bird says. Since the device is still relatively new — it began operating last November — scientists don't understand all its quirks, and so at first will be conservative in tossing out information. With more experience, they will get a better sense of what's routine and what isn't. Eventually they will be saving only, in effect, the collider's greatest hits. Even after tossing out all extraneous data, Bird's computers will be storing 15 petabytes a year. To put all that on DVDs, you'd need a stack of 1.7 million.

There is a certain audacity in tossing out collider data before anyone looks at it, since billions of dollars have been spent collecting it. Says Bird: "You need to make sure that your detectors are very finely calibrated. You need to have confidence that there are no bugs in your data." And since much of this is new territory, "you need to have lots of young researchers."