Fourth Report
of the Review Committee
on the Upgrade of
CDF/D0 Computing for Run II

 

 

 

M. Ernst, B. Jacobsen, V. O'Dell,
T. Schalk, W. von Rüden (editor), B. Yanny

 

 

Fermilab, February 2, 1999

 

Table of Contents

  1. 1. Executive Summary *

    2. Introduction *

    3. General Remarks *

    4. Infrastructure *

    4.1 Analysis Tools *

    4.2 ZOOM *

    4.3 Visualization *

    4.4 C++ Working Group *

    4.5 Support Databases *

    4.6 NT Support *

    4.7 Serial Media *

    5. Data Access *

    5.1 Introduction *

    5.2 Data Access Prototypes *

    5.2.1 SAM/ENSTORE *

    5.2.2 DIM/FTT/OCS *

    5.3 RIP *

    5.4 Fiber Channel *

    5.5 Networking *

    6. Procurement *

    7. Management *

    8. Summary of Recommendations *

    8.1 Comments on previous recommendations *

    8.2 New recommendations *

    9. Conclusions *

    10. Appendix A: Charge to the Committee *

    10.1 Original Charge as of June 1997 *

    10.2 Plan for Run II Off-line Computing Review January 25-28, 1999 *

    Appendix B: Schedule of the Review Meeting *

  2. Executive Summary

    This report describes the outcome of the fourth meeting of the Review Committee for the CDF/D0 Computing Upgrade for Run II.

    Following a proposal from the experiments, the scope of the review has been broadened beyond the common projects to include separate reviews of the offline projects for CDF and D0. Since the overall time allocated remained unchanged, we devoted one day to each experiment, followed by a one day review of the common projects.

    Most of the recommendations of the June ’99 review have been followed and many milestones successfully completed. We applaud the progress made in the domain of project management by creating a complete WBS and the corresponding resource loaded schedules. Both collaborations have established their data access model and exercised successfully first prototypes.

    The committee found that the overall progress made by both experiments is impressive and that they are at an appropriate level of readiness in their ramp up to Run II. The number of physicists working on simulation and reconstruction has gone up significantly and CD provides support in almost all areas. Both experiments require some additional help. Possible solutions have been discussed.

    No evidence of any fundamental problem was found which would prevent the collaborations from analyzing their data at the start of the run, nor did we discover any "single point of failure" or area of major risk. There is still a huge amount of work to be done and the developers will have to concentrate on the most fundamental parts first, to ensure that a baseline solution is available early enough such that it can be exercised by the users well before the end of this year.

    Two memoranda with the committee’s conclusions on the individual reviews of D0 and CDF have been addressed to the Director and the Associate Director for computing.

    The fifth review meeting is scheduled for June 1999. We suggest reducing the time for the experiment specific reviews somewhat to leave more time for the review of the Joint Projects.

    The committee would like to thank Fermilab’s management and support staff for the excellent organization of the meeting and the help given to the committee members. It surely appreciated once more the open discussions with the participants and the good atmosphere.

  3. Introduction
  4. Fermilab’s management initiated the review of the CDF/D0 Computing Upgrade for Run II in April 1997 by setting up a review committee. For completeness, the charge to the committee is enclosed again as Appendix A. The committee members are:

    Wolfgang von Rüden, CERN (Chair)
    Michael Ernst, DESY
    Bob Jacobsen, LBNL
    Vivian O'Dell, KTeV, Fermilab
    Terry Schalk, UCSC
    Brian Yanny, EAG, Fermilab

    The review schedule is updated as we progress and is now:

    June 12&13, 1997 First review meeting at FNAL
    June 16, 1997 Oral report by W. von Rüden to PAC in Aspen
    June 30, 1997 Report on first review

    October 13-15, 1997 Second review meeting at FNAL
    October 18, 1997 Oral report by B. Jacobsen to PAC at FNAL
    October 31, 1997 Report on second review (delayed to November 17)

    March, 1998 Report by CD on procurement (postponed to June meeting)

    June 15-18, 1998 Third review meeting at FNAL
    June 22, 1998 Oral report by W. von Rüden to PAC in Aspen
    July 10, 1998 Report on third review

    January 15, 1999 Deadline for sub-mission of papers
    January 25-28, 1999 Fourth review meeting at FNAL
    February 2, 1999 Report on fourth review

    May 28, 1999 Deadline for sub-mission of papers
    June 7-11, 1999 Fifth review meeting at Fermilab
    June 15, 1999 Report on fifth review
    June 21, 1999 Oral report by W. von Rüden to PAC in Aspen

    January 10, 2000 Deadline for sub-mission of papers
    January 17-20, 2000 Sixth review (to be confirmed in June ’99)

    During its first meeting in June 1997 the committee reviewed the "Data Management Needs Assessment", the transition to OO programming and evaluated both cost and schedule considerations.

    The second meeting in October 1997 concentrated on data access models, mass storage, software development environment and project management issues. The second report was sent to Fermilab on November 17, 1997.

    The third meeting in June 1998 had its emphasis on data access and project management.

    The fourth meeting had a different form, since the committee was asked to review also the overall offline situation for CDF and D0 in addition to the common projects. One full day was devoted to each experiment, the third day was allocated to the Joint Projects.

  5. General Remarks
  6. The fourth review took place at Fermilab on January 25-28, 1999. The detailed agenda is attached as Appendix B.

    The committee appreciated the good and timely preparation of the documentation as well as the high quality of the presentations.

    Due to the additional reviews of CDF and D0, the Joint Project could not get the same attention as in the past. We therefore apologize for possible omissions in mentioning the good work done in several areas. We hope to compensate for it during the next review.

    The committee found that the overall progress made by both CDF and D0 is impressive and that they are at an appropriate level of readiness in their ramp up to Run II. A separate detailed assessment was sent to the Director and the Associate Director for Computing of Fermilab.

    The committee was very pleased to see that the Joint Project has been well accepted by all parties and that it has led to a much better communication between them. This is also thanks to the good work done by the project office.

    We heard about a possible roll-in delay in the year 2000. Nevertheless, we believe that the current schedule should not be relaxed. Possible adjustments in the procurement plan may be beneficial, but will depend upon the laboratory’s budget profile. Sufficient equipment needs to be ordered and installed this year to test scalability issues and allow the experiments to use their systems as they are built.

  7. Infrastructure
    1. Analysis Tools
    2. Since the last review, a lot of progress has been made towards the adoption of a common set of analysis tools. The work of the PASREQ committees has resulted in a remarkable convergence of the needs of the experiments and has pointed the way to the eventual adoption of a single product, ROOT. There remains a lot to done to ensure that ROOT is of the quality level needed, and that CD and the ROOT developers can cooperate. The computing division is taking steps to resolve these, including sponsoring short and long term visits and a workshop in March. The goal is to have Fermilab as an active contributor to the overall ROOT effort, resulting in good support for the uses of ROOT here. These efforts should be applauded and supported.

      D0 and CDF both using ROOT is very good. CD should aggressively move to support ROOT. This should be done as a specific effort in concert with the ROOT developers, not as part of any of the existing common projects.

    3. ZOOM
    4. Although there was not a dedicated report on the ZOOM project, we were told that it has delivered several products in use at CDF and/or D0. ZOOM intends to support these, including modifications to handle compiler and platform changes and minor upgrades specifically requested by users. This is the right course. The number of people in ZOOM appears matched to the support need in the report we were given, and that should be their first priority. Requests for the initiation of new projects exist, but it does not appear that they can be justified given the large backlog of other work to be done in preparation for Run II.

      ZOOM should not add any new projects. It should be in a maintenance phase.

    5. Visualization
    6. Both experiments expressed a desire to use common standards for visualization, such as HEPVIS, OpenInventor and ROOT. CD has been supporting these and has provided a value service. This is planned to continue at about the current level, with the likely addition of ROOT.

      CD should support visualization products such as HEPVIS, Openinventor, ROOT and the experiments should use them.

    7. C++ Working Group
    8. The C++ working group still has important work to do to maintain the development environment (example: getting working debuggers; resolving conflicts with compiler versions between Run I and Run II) It should continue to focus on these immediate needs. The risks due to reliance on a single compiler (KAI), though present, do not warrant effort now to port the applications to other compilers. This can be done later, if needed.

      The C++ working group should continue to resolve the practical aspects of using these new technologies, with an emphasis on immediate needs.

    9. Support Databases
    10. There has been significant progress over the past six months in database use within the experiments. Both experiments intend to use SQL-compatible databases, using ORACLE and mSQL as engines. This permits a significant amount of common infrastructure for database backup, database server support, licensing, etc. The existing joint project intends to supply this, but additional resources are necessary. CDF and D0 must rapidly agree on deliverables for this project and we recommend that they start with simple implementations.

      Adding support to the Common Database Support Group is important and necessary. CDF and D0 should take the lead in defining the common infrastructure for the group to provide. The CD proposal to get a database consultant to teach techniques is a fine idea and the experiments should take advantage of it.

    11. NT Support
    12. The issue of support for use of NT in analysis was again raised. D0 has chosen NT for their L3 trigger and this clearly requires a certain level of support for code development and release management tools. This is being provided by the joint projects, CD and the D0 collaboration, and this should continue. Use of NT for offline simulation, reconstruction and/or analysis has not been justified, however, and remains not a part of the Run II baseline. There appear to be D0 collaborators who want to do this and it is an internal collaboration issue how to handle that. The committee believes that there is no justification to use joint project or CD resources for this.

      No CD effort should go into an NT based analysis station.

    13. Serial Media

    The Serial Media Working Group presented the results of their tests to date on tape technologies. The have done good work and have a clear path to making a decision soon. This process should continue as planned, so that a technological choice can be made in time to get a fraction of the drives and media in place for the tests in autumn. These will serve as valuable operational experience while there is still time to resolve unexpected problems with new technology.

    Note that delays in the overall Run II schedule are possible. Unless these are larger than the possibilities that have been discussed, however, it is unlikely that they will allow the serial media selection to be delayed long enough to result in the adoption of a technology not available now. We therefore feel that the technology choice should proceed; delays should only make a difference to the schedule for the final procurement.

    The Serial Media Working Group is an excellent collaborative effort and should follow the situation with the next generation tape drives appropriately.

  8. Data Access
    1. Introduction
    2. The data access needs for the two experiments D0 and CDF, though they originate with similar detectors and aim to do similar physics, have evolved along different paths, reflecting differences in time-scales for making key decisions within the collaborations, as well as perhaps different philosophies about how to analyze data by a large heterogeneous group of scientists.

      In hindsight, some of the problems with the two disjoint data access methods as they have evolved could have been lessened had there been a joint project established very early on involving CD, D0 and CDF which focussed on overall system engineering, including system architecture and following continuing technology developments.

      While a return to complete commonality does not appear to be feasible at this late stage, the underlying similarity in the two collaborations' ultimate goals may lead to some future convergence of a beneficial nature.

      Both experiments are actively developing prototypes (see below). We recommend to use these prototypes as they become available to ensure immediate feedback to the developers.

    3. Data Access Prototypes
      1. SAM/ENSTORE

We are pleased with the progress made on the data access prototyping. We applaud the SAM prototype which was developed on schedule (as we requested in our last review). The SAM prototype demonstrated the basic functionality including the following components:

The system was demonstrated to work with the ENSTORE prototype provided by the HPPC ENSTORE development group. The committee notes that the prototype was not able to demonstrate full scalability, as extensive tests which stress resources with failures of pieces of the system were not yet completed.

The committee recommends explicitly to integrate SAM/ENSTORE pieces into the forthcoming D0 milestone Monte Carlo Challenge in Feb-Mar '99 to test the interface and integration of the data access and data processing.

The committee recommends that the D0/CD joint data access working group spend time thinking about realistic user access patterns for analysis (while data taking and reconstruction are simultaneously ongoing) and refine their existing prototype (within the bounds of the robot, tape drive, disk and CPU hardware available this summer) to demonstrate that their system will scale for Run II. An iteration before the next review should be feasible.

D0/CD should develop realistic user access patterns for analysis to demonstrate the scalability of the SAM/ENSTORE system.

We agree with the planned effort to develop a live backup system for SAM/ENSTORE. Such a system will be required.

The committee took note that D0 has currently no concrete plans for D0-specific home directory and program file servers, at least nothing was presented.

We recommend careful planning well ahead of the start of the run for D0-specific home directory and program file servers.

      1. DIM/FTT/OCS

We applaud the prototype work done to demonstrate the proof of principle of the Disk Inventory Manager (DIM) of CDF. While the tape interface (FTT/OCS) was not incorporated in this prototype, this system is essentially that used for Run I, and has previously been shown to be robust at Run I data taking rates.

The specific tests performed by the prototype, attempting to simulate true data access patterns (several long large jobs competing for resources against several small short jobs) were precisely the sort of experiments which needed to be carried out.

The committee recommends that tests be carried out on a larger scale once the DIM design has been refined and the prototype advanced. A future prototype should include multiple tape access and tape writing (as well as reading) to the extent that drives are available by summer of 1999.

The committee believes that full scalability was not demonstrated by the small prototype and considerable thought should go into such aspects of the system. For example, the FTT/OCS tape handling system will need to be scaled up in data rate for Run II with the same reliability expected as in Run I. The interface to batch tools, LSF or in-house built, to monitor tape full indicators, tape changing, tape and tape drive host system errors are important considerations which we believe may be more of a problem than is now being anticipated by CDF. The collaboration should be realistic about requests for FTT/OCS enhancements to meet Run II needs.

We are pleased with the comments of CDF not to preclude the future use of the ENSTORE system as tape interface.

Concerning DIM itself, the prototype was built for only one machine implying a number of limitations. A full DIM design and implementation will require effort at the level of two FTEs as indicated in the CDF report. We agree with the CDF internal panel recommendation that the baseline analysis system, consisting of locally mounted disks and presumably one large SMP box (by the end of 1999) is an appropriate system to target the DIM design to and will make this job more manageable. Hooks can be left in looking forward to future technologies (i.e. Storage Area Networks) and to properly support multiple SMPs, but these considerations should not be allowed to slow down a baseline DIM design or expand the manpower resources required to implement a baseline DIM.

We agree with CDF’s plans for the DIM development to match the baseline analysis system by the end of 1999. Extensions to include multiple SMPs should not slow down the baseline development.

    1. RIP
    2. The RIP project has demonstrated data transfer rates in excess of the requirements. We applaud this result. The fact that both D0 and CDF needs have been met with two different transfer technologies (Gigabit Ethernet and Fiber Channel) is a tribute to the skill of the RIP team. The collaborations should supply the RIP team with any information it needs on glitch buffer file format and work out handshaking protocols on a reasonable time scale so work may continue.

    3. Fiber Channel
    4. Both collaborations will keep their frequently accessed data on magnetic disk with the drives directly attached to their central analysis server(s). Assuming an overall capacity of 20-30 TB per collaboration and a drive capacity of ~70 GB per disk drive by the year 2000 will result in a total number of over 350 drives to be connected to each of the servers. Given the huge amount of drives, the traditional way of handling the physical connection via SCSI on both the physical and the protocol layer would not only require a large installation but also a significant maintenance effort caused by the parallel cabling and the connectors. Other factors to be considered seriously include physical scalability due to cable length limitations and flexibility concerning technological innovations expected to become mature in the foreseeable future.

      To overcome such problems, industry is aggressively migrating from physical SCSI to Fiber Channel. Importantly enough, the SCSI protocol on top of Fiber Channel is preserved which is making Fiber Channel an ideal replacement for physical SCSI in the first phase. Since each collaboration will have several hundred users trying to access the data disks concurrently, a flexible architecture is needed to cope with future optimization needs. Physical SCSI is not offering this optimization potential.

      Therefore, the committee urges the collaborations and CD to gain jointly knowledge and practical experience in this area to build the Central Analysis Server based on Fiber Channel connected disk drives. This should be limited to the use of Fiber Channel as a physical replacement only. Evolving technologies such as Global File Systems, which we consider to become relevant to our field in the future, are by far not mature enough today. Effort should not be invested beyond observation before the run starts.

      The committee recommends to build the Central Analysis Server based on Fiber Channel connected disk drives instead of SCSI.

    5. Networking

Networking is a crucial element in the computing environment of the experiments and as such is to be treated as a piece of fundamentally important infrastructure. It plays an essential role in the Reconstruction Input Pipeline as well as in the central analysis and reconstruction framework and it connects the physicists to their data. We learned from the presentations that a knowledgeable and experienced networking group exists.

The committee strongly recommends close collaboration between CDF, D0 and the networking group to meet the networking requirements and to implement them in a timely fashion.

  1. Procurement
  2. Preparations for Physics Analysis Hardware procurement are going well. Both CDF and D0 requested that system administration support start during the prototyping and procurement stages so that management of the new machines could be learned early on. We feel this is quite reasonable.

    Several PC farm purchases have happened since the time of the last review. These have highlighted that large purchases of commodity machines will need special care both in system specification and whom to include in the acceptable bidder pool. We agree with the strategy to purchase from at least two vendors and we believe the presented procurement schedule will match the experiment's needs. However, this will happen only if the requirements for these systems are finalized as quickly as possible.

    It is unclear which benchmarks to use to evaluate computing power. The Fermilab existing benchmark, TINY, and SpecInt95 do not track each other well from large servers to smaller desktops. Optimization flags can change TINY numbers by factors of two. It is clear that the true benchmark should be how efficiently CPUs run real CDF or D0 reconstruction and physics analysis code.

    We recommend that a benchmark program constructed from the reconstruction packages be developed and used to compare the performance of the systems under consideration.

    The bid strategy is crucial for getting the cheapest computing possible, however, this is something Fermilab is expert at and we trust their judgement.

    The computing procurement will be spread over FY99 and FY00 with about one third of the total computing needs ordered and installed during FY99. Although the schedule for the start of Run II may be slipping, the hardware procurement schedule should proceed aggressively so that the experiments can get their final software packages installed and debugged on the chosen platform(s) and test the performance on a sizeable fraction of the final setup.

    We recommend that the hardware procurement schedule should proceed aggressively.

  3. Management
  4. We are pleased with the effort made by CDF, D0 and the CD team in the usage of the WBS to understand the software projects. We encourage all three groups to expand on this effort and use it as a real tool to manage the scope and work flow of these projects. We are well aware of the fact that this effort is not easy and reiterate our recommendation that the experiments and CD need an additional person to help with this task. Modern detector software efforts are very large and complex and are becoming even more so with time. These tools have worked well on the detector hardware projects and the software development teams need to learn to use them as well for software projects.

    We were confused by the lack of a common view between CD, CDF and D0 of the number of people working on the common projects and believe that using real names within the resource loading of the CD WBS would help this issue. Care should also be given to tracking projects as well as entering projects into the WBS. This would yield real information for % complete as well as making it possible to identify real overloading of the common projects. Making each project leader responsible for her/his own WBS is a good idea. Usage of this tool should also happen early in the plans for the operations phase.

    We encourage CDF, D0 and CD to expand the effort on project management by including real names and to update the WBS on a regular basis. We reiterate our recommendation to get professional help in this area.

    All three groups presented their management structures at the review and all three seem well thought out and well matched to the task.

    We heard for the first time at this review the concept of postdocs within CD. We thought this was a very good idea and would like to encourage it. We note, however, that like other postdocs they must have the opportunity to do science and have a well defined place within the CD management structure.

    Requests were made for additional CD people from both experiments for high priority projects. If CD cannot find the funding for these positions the experiments should be allowed to divert some of the funds previously allocated to computer equipment.

    We suggest to consider a reallocation of material funds to finance manpower for high priority projects, if other sources cannot be found.

  5. Summary of Recommendations
    1. Comments on previous recommendations

Before coming to the new recommendations, the committee would like to comment on the recommendations made during the June ‘98 review.

  1. We recommend that priorities, scope and milestones, accurate FTE estimates and resource loaded schedules be made for each of the 15 common projects. Conside-ration should be given to consolidation where possible.
  2. Done.

  3. Assign or hire a full-time assistant to support project management of the common projects. The experiments need an additional person for their project management planning.
  4. Done partially for CD, but not yet for the experiments.

  5. The committee congratulates CD on the successful hire of the OO-expert. It recommends hiring another such person as well as a few professional persons to implement designs.
  6. Just completed, offer to candidate went out.

  7. We fully agree with the decision, that HPSS will not be the baseline for Run II. However, the normal efforts to make it a possible future choice should continue. HPPC should proceed with the ENSTORE prototype development and make it available to the collaborations as early as possible to get feedback for further development.
  8. Done.

  9. The committee is very pleased with the thorough work of the Serial Media Working Group and endorses the conclusions.
  10. Work goes on as agreed.

  11. The committee praises the work and good progress made by D0 and CD in collabo-rating on SAM and ENSTORE. It recommends to the SAM working group to imple-ment a prototype of the data access model on top of ENSTORE, such that it is in active use by the end of October 1998.
  12. Milestone was met.

  13. Proceed with the prototyping of the bookkeeping database.
  14. Work in progress.

  15. The CDF collaboration (internally) should converge rapidly on a data access system design. It should develop, by the end of October 1998, a working prototype layered on the ENSTORE mass storage prototype which demonstrates a major vertical slice of their data access system.
  16. Prototype work was done, but not based on ENSTORE.

  17. We support the D0 request for help with NT. Nevertheless, CD support should be allocated with caution, for instance to level 3 trigger needs, since NT is not part of the general Run II baseline.
  18. Done.

  19. CD should finalize the ZOOM development shortly and enter the maintenance phase. Pay attention to stay compatible with other HEP efforts.
  20. Has been followed only partially, since additional development has been added.

  21. We are pleased with the strategy and the effort going into the analysis tools and encourage the three parties involved to stick to the schedule and to release selected products as early as possible to the community. We strongly encourage the colla-boration with other efforts in the High Energy Physics community.
  22. Done. The common decision to go with ROOT is correct.

  23. The committee agrees with the procurement plans and supports the proposal to delay part of the procurement until May 2000, whereby significant savings are expected.

Done.

    1. New recommendations

The committee proposes to the experiments and the computing division to complete the following actions and to have, where appropriate, the corresponding documents ready for the fifth review in June 1999. The recommendations are in the same order as they appear in the report. The committee leaves it to the parties involved to decide the relative priorities.

  1. D0 and CDF both using ROOT is very good. CD should aggressively move to support ROOT. This should be done as a specific effort in concert with the ROOT developers, not as part of any of the existing common projects.
  2. ZOOM should not add any new projects. It should be in a maintenance phase.
  3. CD should support visualization products such as HEPVIS, Openinventor, ROOT and the experiments should use them.
  4. The C++ working group should continue to resolve the practical aspects of using these new technologies, with an emphasis on immediate needs.
  5. Adding support to the Common Database Support Group is important and necessary. CDF and D0 should take the lead in defining the common infrastructure for the group to provide. The CD proposal to get a database consultant to teach techniques is a fine idea and the experiments should take advantage of it.
  6. No CD effort should go into an NT based analysis station.
  7. The Serial Media Working Group is an excellent collaborative effort and should follow the situation with the next generation tape drives appropriately.
  8. The committee recommends explicitly to integrate SAM/ENSTORE pieces into the forthcoming D0 milestone Monte Carlo Challenge in Feb-Mar '99 to test the interface and integration of the data access and data processing.
  9. D0/CD should develop realistic user access patterns for analysis to demonstrate the scalability of the SAM/ENSTORE system.
  10. We recommend careful planning well ahead of the start of the run for D0-specific home directory and program file servers.
  11. The committee recommends that tests be carried out on a larger scale once the DIM design has been refined and the prototype advanced. A future prototype should include multiple tape access and tape writing (as well as reading) to the extent that drives are available by summer of 1999.
  12. We agree with CDF’s plans for the DIM development to match the baseline analysis system by the end of 1999. Extensions to include multiple SMPs should not slow down the baseline development.
  13. The committee recommends to build the Central Analysis Server based on Fiber Channel connected disk drives instead of SCSI.
  14. The committee strongly recommends close collaboration between CDF, D0 and the networking group to meet the networking requirements and to implement them in a timely fashion.
  15. We recommend that a benchmark program constructed from the reconstruction packages be developed and used to compare the performance of the systems under consideration.
  16. We recommend that the hardware procurement schedule should proceed aggressively.
  17. We encourage CDF, D0 and CD to expand the effort on project management by including real names and to update the WBS on a regular basis. We reiterate our recommendation to get professional help in this area.
  18. We suggest to consider a reallocation of material funds to finance manpower for high priority projects, if other sources cannot be found.
  1. Conclusions
  2. The committee was very pleased to see how much progress has been made since June ’98. The project management got the necessary attention which should be pursued to keep the projects under control. This would be eased by adding technical help.

    The hire of the second OO-expert is almost completed and will hopefully lead to a similar success as the first hire.

    The advances in developing the computing infrastructure are remarkable and both experiments profit substantially from them. Some corrective action needs to be taken as outlined in our recommendations.

    Both experiments have established their data access model. D0 and CD have jointly developed the SAM/ENSTORE system, which still needs to be subjected to tests proving its scalability. CDF has built their system based on the RUN I experience. Help is needed urgently to progress faster with the development of DIM and to perform similar tests.

    The RIP project has exceeded the requirements of the experiments.

    To cope with the large number of disk units required to build up the 20-30 Tbytes of storage we recommend to use Fiber Channel technology.

    The committee found that the overall progress made by both experiments is impressive and that they are at an appropriate level of readiness in their ramp up to Run II. The number of physicists working on simulation and reconstruction has gone up significantly and CD provides support in almost all areas. Both experiments require some additional help. Possible solutions have been discussed, including the reallocation of material funds.

    No evidence of any fundamental problem was found which would prevent the collaborations from analyzing their data at the start of the run, nor did we discover any "single point of failure" or major risk. There is still a huge amount of work to be done and the developers will have to concentrate on the most fundamental parts first, to ensure that a baseline solution is available early enough such that it can be exercised by the users well before the end of this year.

    For the next review, we agreed with the experiments and CD to reduce the experiment-specific parts to half a day each and to allocate again more time to the common projects.

    Finally, the committee would like to thank all participants warmly for their contributions and express its pleasure in working with them.

     

  3. Appendix A: Charge to the Committee
    1. Original Charge as of June 1997
    2. The Committee is asked to review the status of the joint effort by CDF, D0, and the Computing Division to upgrade CDF/D0 computing for Run II. We are interested in an evaluation of both cost and schedule considerations.

      We do not expect there to be sufficient information at the first review (June) to evaluate all aspects. The focus in June should be on the "Data Management Needs Assessment" and the projected budget for equipment and media expenditures.

      However, some part (about a third) of the review in June should be devoted to a preliminary review of the schedule, plans for the availability of personnel resources, as well as plans for project management. This should include attention to software to be developed by CDF/D0, and by Computing Division, personnel. This is both software and tools intended for joint use and that aimed at the individual needs of the experiments. We expect these matters, and more specifics on the technical choices being made for data handling, will be the principle focus of a subsequent review in late summer or early fall. We also anticipate that after a year we would want to have a third review to evaluate progress.

      The following questions are suggestive of what we are interested in learning from this series of reviews:

      Costs and data management technology:

      Given restricted budgets, are the proposed expenditures estimated on a sound basis of actual requirements for the approved running program for these detectors? Are the requirements well enough understood? Is the present level of definition and specification appropriate, allowing some agility to respond to technological developments in the market place before the equipment/software must actually be purchased? Are the technical choices sensible? Have all options been investigated? Are the projections of future options and prices reasonable? Where there are differences in the choices made for the two detectors, is this appropriate? (i.e. do any independent choices incur significant unjustified costs including personnel effort?)

      Non-commercial software, schedule, and personnel resources:

      Is the schedule, both for purchases and for CDF/D0/Computing Division integration and software development effort, appropriately matched to a reasonable understanding of the Run II start-up and running schedule? Are plans and budgets for consultant or contract help reasonable? Is the transition to new operating systems, paradigms (C++), adequately understood and planned? In particular, how is the traditional approach, where detector builders move on to software as actual data taking nears, going to be handled in the new environment (i.e., learning curve, training, class libraries, hand holding, etc.)? Is there appropriate sharing of software and tools between the two detectors? Is the Computing Division role well understood? Are they prepared for it?

      Management:

      Is the plan for management of the joint and separate efforts (project) reasonable? Is it formal enough? Too formal? Again, are the places where the two detectors have plans, projected effort, and management in common, the right ones (too much in common? too little?)?

      The first review is scheduled for June 12 and 13. June 12 will be devoted to presentations by members of the effort. The morning of June 13 will be devoted to

      a) responses to any overnight questions asked by the committee;

      b) executive session of the committee;

      c) a ‘close out’ meeting with the Directorate, the Computing Division management, and the collaboration management (including at least one spokesperson and project manager from each collaboration and the lead presenters).
      (The exact schedule for the close out meeting on Friday June 13 will need to be coordinated with departure schedules of Directorate and other people leaving for the Aspen PAC.)

      The chairman of the committee and probably one other member will be invited to discuss the committee's preliminary findings at the Aspen PAC meeting on Monday morning, June 15.

      We ask the committee to provide a brief written interim report by the end of June.

    3. Plan for Run II Off-line Computing Review January 25-28, 1999

This plan has been prepared at the request of John Peoples in his letter to spokespersons of August 12 for discussion at a meeting on September 22. It is the result of several meetings and an Email exchange and has been accepted by both collaborations' Run II computing managers, the Computing Division Head, and by the members of the von Rüden Committee.

As has been the practice, this Run II Off-line Computing Review will be four days long starting on a Monday.

Monday and Tuesday

One day each for CDF and D0 will focus on a single collaboration. This day will include three components:

The detailed agenda for these days will be prepared by the head of the collaboration's computing effort with the Associate Director responsible for computing (T. Nash). These days will be closed sessions with the following invited: the head of the experiment being reviewed may invite any member of that collaboration; the Run II Computing Steering Committee (this includes the top management of the Computing Division and the leadership of both collaborations' off line computing); the Directorate.

The traditional private meetings that the Committee held with the collaborations and Computing Division on Monday will be replaced by the closed meetings on Monday and Tuesday and a half hour to one hour private session at the beginning of each of Monday and Tuesday, at the discretion of the Committee Chairman.

Wednesday

The third day will focus on selected common computing projects, selected by the Head of the Computing Division (M. Kasemann) with the Associate Director for computing after consultation with the experiments in the Run II Computing Steering Committee. The selections will be made based on there being significant issues or progress to review. This day will be open session.

Thursday

The fourth day will include executive session, closeouts, and some time for the committee to draft its report. There will be three closeouts, one for each of the collaborations, and one for the common project. The invitees will be the same as for the corresponding review sessions.

The Report

The committee will be asked to break its report into three pieces, much as the PAC does. One piece will be publicly available. This part will cover the efforts of the Common Project, and of the Computing Division outside the Common Project, including the degree to which these efforts are addressing the individual collaboration needs. It will also provide a general overview of the combined state of readiness of the individual collaboration efforts. It will not get into specifics of either collaboration's effort, nor will it compare readiness of the collaborations.

The other two pieces will focus on the review of the individual collaboration's internal off-line computing activities. These parts will be transmitted by the Director to the individual collaboration spokespersons and will not be made publicly available by the Laboratory. It is understood that the Committee will have only approximately a half a day of presentation time to hear these aspects and that this will significantly limit the depth of their review.

Advanced Material

The collaborations, the Common Project, and the Computing Division are expected to provide all review material to the Committee by January 1, 1999. This will partially mitigate the limited presentation time for this extensive subject.

Appendix B: Schedule of the Review Meeting

Monday, January 25

Review of D0 Offline Computing

Participants: E. Buckley-Geer, M. Diesburg, H. Greenlee, N. Hadley, M. Kasemann, B. Klima, Q. Li Greenlee, M. Kasemann, L. Lueking, W. Merritt, H. Montgomery, T. Nash, R. Pordes, H. Schellman, D. Skow, F. Stichelbaut, I. Terekhov, V. White, S. Wolbers, J. Womersley

9:00 am Private Meeting with D0

9:30 am Introduction, W. Merritt, N. Hadley
– Brief Review of Project Organization
– Brief Review of Project Outline – Decisions and Strategies
– Big Picture Timeline

10:00 am Infrastructure
– General Program Infrastructure (EDM, d0om/DSPACK, framework, ZOOM)
– Configuration Management and Releases
– Other Infrastructure (graphics, tools, etc)

10:45 am Coffee

11:00 am Data Access
– SAM and Enstore
– Databases

12:00 pm Monte Carlo
– General
– MCC99: Goals, Status and Plans

12:45 pm Lunch

2:00 pm Algorithms
– Basic Reconstruction Packages
– Particle Identification
– Level 3 Algorithms (??)

3:00 pm Computing Hardware Plans
– Purchases
– Global Computing Model Report

3:30 pm Summary, W. Merritt, N. Hadley
– Milestones
– Resources: current and requested

 

Tuesday, June 16

Review of CDF offline computing

Participants: D. Amidei, F. Bedeschi, E. Buckley-Geer, A. Goshaw, L. Groer, N. Hadley, R. Harris, S. Lammel, M. Lancaster, M. Kasemann, J. Kowalkowski, W. Merritt, T. Nash, R. Pordes, E. Sexton-Kennedy, M. Shapiro, D. Skow, R. Snider, T. Watts, V. White, E. Wicklund, S. Wolbers, A. Yagil

9:00 am Private Meeting with CDF

9:45 am Overview of CDF Software and Computing, M. Shapiro

10:45 am Coffee

11:00 am Overview of CDF Data Access, E. Buckley-Geer

12:00 pm Offline WBS and Schedule, D. Amidei

13:00 pm Lunch

2:00 pm Status of Reconstruction Software, R. Snider (TBC)

2:45 pm Status of Infrastructure Software, L. Sexton-Kennedy (TDC)

4:00 pm Coffee

4:15 pm Progress in Support Databases, with emphasis on Calibration, M. Lancaster

5:00 pm Manpower and Resources, D. Amidei

 

7:00 pm Closed Committee Session

 

Wednesday June 17

Review of selected common computing projects

Participants: B. Angelos, D. Amidei, J. Bakken, F. Bedeschi, D. Box, E. Buckley-Geer, J. Butler, J. Cranshaw, M. Diesburg, D. Dyxin, S. Fuess, I. Gaines, A. Goshaw, L. Groer, N. Hadley, R. Harris, D. Holmgren, S. Lammel, L. Lueking, M. Lancaster, M. Kasemann, J. Kowalkowski, W. Merritt, T. Nash, D. Petravick, R. Pordes, M. Schweitzer, E. Sexton-Kennedy, M. Shapiro, D. Skow, R. Snider, I. Terekov, M. Vranicar, T. Watts, S. White, V. White, E. Wicklund, S. Wolbers, A. Yagil

9:00 am Introduction, Charge to the committee

9:15 am Overview of Joint Projects

10:15 am Coffee

10:45 am Data Access: CDF

11:45 am Data Access: D0

12:30 am Lunch

1:30 pm Storage management: ENSTORE

1:45 pm Support Databases

2:05 pm Serial Media Working Group

2:35 pm Production Management

3:05 pm Coffee

3:35 pm Reconstruction Hardware

4:04 pm Physics Analysis Hardware

15’ Preparation of procurement
15’ CDF Plans
15’ D0 Plans

4:50 pm Reconstruction Input Pipeline

5:20 pm Concluding Remarks

6:30 pm Closed Committee Session

8:30 pm Working Dinner Committee

 

Thursday, June 18

Time Participants

9:00 am Closed Committee Session

11:00 am Closeout to D0

11:30 am Closeout to CDF

12:00 pm Closed Committee Session

1:00 pm Lunch

2:00 pm Closeout to all

2:30 pm Committee Report Drafting Session.

6:00 pm Complete Draft of Report.

 

7:00 pm Social Event