Status of the CDF Data Access Project

January 6 1999

 

The design of the CDF Data Acess system was reviewed by an internal CDF Godparent committee in October 1998. The review committee recommended that the collaboration adopt the overall design for the data handling system that is documented in Ref. 1

Since then we have been working on a prototype to demonstrate the feasibility of the resource management scheme outlined in ref. 1. The prototype design and results is documented in ref. 2. A brief status follows:

The prototype environment consiststed of a single cpu workstation, a 9 GB disk, 2 cpu queues (short and long), each of which had 4 execution slots, and 2 IO queues. For most of the tests, tape access was simulated via disk to disk copies. These copies used eight simulated datasets, 4 small (1 GB) and 4 large (10 GB). Information on these datasets was stored in the CDF catalog. In addition, the system was tested using tape resident datasets, where the data was stored in the Emass tape robot.

The test procedure consisted of simulating many analysis scenarios. Mixes of up to a dozen jobs to analyze the long and short datasets were submitted. Each analysis job used a randomly chosen amount of cpu time, when appropriate. The tests included several ``freight train'' scenarios, where several users requested the same dataset. In these cases, the overall number of tape mounts was significantly less than would occur if the jobs were run separately in sequence. As might be expected, the testing has discovered a number of bugs (mostly fixed) in the software. There were no fundamental problems with the design.

The prototype work has been sucessful and we now need to proceed with the design for the full system.

[1] CDF Run II Data handling Resource Management and Software Components

[2] The DH Resource Management Prototype (CDF Run II Data Handling)