A tool for parallel scientific applications, the Distributed Analysis Environment (DIANE), has been successfully used in several processing challenges on the Grid. In April, DIANE was used during the in silico drug-discovery application to analyse possible drug components to fight the avian flu virus on the Enabling Grids for E-sciencE (EGEE) infrastructure (see CERN Courier September 2006 p18). In the following months, the DIANE scheduling layer was also used for a series of large-scale data-processing tasks for the International Telecommunications Union using several Grid sites around Europe (see CERN Courier September 2006 p17). Originally targeted at investigating distributed ntuple analysis for particle physics, DIANE has become an application-independent user-scheduling tool on the Grid and has been interfaced with a number of applications in various fields.
DIANE is a python framework based on a master-worker processing model, used on top of regular Grid middleware in a transparent way. Worker agents are sent to the Grid as regular Grid jobs. By opening a TCP/IP connection, they register to the master agent, which runs on the user's desktop computer and is the coordination point for the virtual worker pool. Workers may dynamically join and leave the pool, without disrupting the processing as a whole. The units of computation are a large number of short tasks, which the master allocates to workers directly, bypassing the middleware-scheduling layer. This allows the total job turnaround time to be reduced and also leads to a much faster reaction to errors in task execution by reallocating them to other workers. During the drug-discovery data challenge, it was demonstrated that a scheduler such as DIANE can improve the distribution efficiency on the Grid from less than 40% to more than 80% by optimizing the allocation of the fine-grained computing tasks.
DIANE's python framework allows the easy integration of existing applications, even those as complex as Athena, the analysis framework of the ATLAS experiment at CERN, and will be used to operate part of the Geant4 toolkit. DIANE has also been interfaced to Ganga, a user-friendly Grid interface created in the context of ATLAS and LHCb experiments.
• For more details on the DIANE scheduler, please see the extended version of this article in the CERN Computing Newsletter at http://www.cerncourier.com/articles/cnl, or go to the DIANE webpage at http://cern.ch/diane.