D0 Milestones D. Petravick July 26, 1999 Milestones ========== Collaboration Milestones: ======================== While a plan to install these systems and make them usable might have its interior milestones, the systems described here have to be ready for certain external events, which are part of a collaboration-wide effort to be ready for Run II. These milestones are summarized below: Monte Carlo Challenge - Phase 2 1) Sept. 1, 1999 -- specify tape formats and SAM metadata formats in sufficient detail to allow remote sites to set up for Monte Carlo production. 2) Oct. 1, 1999 -- be ready to import tapes into SAM/ENSTORE from remote sites, and be able to store the data and metadata reliably, with at least 8 x 5 access for users 3) Nov. 30, 1999 -- be ready to begin reconstruction of the MCC Phase 2 data, using the D0 production farm (TBD: do we want to use an I/O node in this test?) Online related milestones 1) Aug. 1, 1999 -- demonstrate the capability to log data from DAB to FCC, into the robot, No need to keep the data. 2) Aug. 31, 1999 -- demonstrate the logging from DAB to FCC at design rate no need to keep the data. 3) Oct. 1, 1999 (continuing up to the date of the Run) -- maintain the ability to log data from DAB to FCC, on demand. no need to keep the data. CD Milestones ============= Join the strenghtened KERBEROS realm Preconditions: 1) The successful completion of the Computing Division's prototyping activities. 2) CD supplying authentication servers. The project work for D0 can be summarized as: 1) Software which uses telnet, ftp or the "r" utilities must be converted to use a kerberos equivalent methods and deal with any side effects. Kerberos supplies "r" utilities, but of course one has to deal wit the side effects of finite ticket length. 2) Kerberos utilities must be installed on D0 systems and made usable by system administrators 3) The Kerberos system must be administered by adding D0 users, etc. Equipment and Interconnects D. Petravick June 27, 1999 D0ORA ====== Purpose :- D0ORA is the production data base machine. Its purpose is to hold and serve the D0 data file catalog, calibration and alignment constants, and other ORACLE tables. The machine must sized so to remain performant constantly. Since the system holds precious data, certain disk systems need to be robust, and effective backups must be taken. Live backup software moves data through the databse engine -- It works with the logging system to give users an up-to-date view of the data, and the backup system a view that is frozen in time. Therefore, the live backups system may have lower throughtput throughout for a given amount of system resources compared to the usual file level backups. Therefore, the requirements of databse backup should be considered when specifying the system, so that performing backups will not degrade the database throughput as seen by applications significanty. Hardware :- Two suns were purchased: D0ORA1 and D0ORA2. D0ORA1 is the more expandable machine. However, Currently D0ORA2 is the production machine. It has 72 GB of disk for database tables. D0ORA1 is the development machine, integration machine and machine to be used for large scale tests not apropriate for D00RA2. By February, D0ORA1 will be upgraded, and moved into production. D0ORA2 will become the development machine. The CPU will be upgraded. The maximum capacity is to 12 CPUS, with 6 GB of memory. A maximum envisioned I/O upgrade is and two I/O cards with 4 F/C ports. This would fill all slots on D0ORA1. Whether all of this upgrade needs to be purchased and installed in the duration of this plan is TBD, and subject to studies or the exercise of engineering judgement. Disk :- A summary of the kinds of disk areas is presented here, and developed in the discussion below. Type of activity Amount B/UP? Structuring ================ ====== ===== ============ Database tables TBD Y TBD Database logs/journals TBD Y small, critical Swap TBD N N/A Products TBD Y Not critical System TBD Y Not Critical Oracle scratch areas TDB Y TBD Other TBD N TBD Table -- Allocation of the disk system D0ORA1 will recieve a new disk purchase worth about $60K. This will be used mostly for database tables. RAW JBOD disks has been determined to be insufficiently reliable. There is a design task to select a Fibre Channel hardware RAID approach. [Seperate disks must be provided for database log files] The reliability requirements for log files are such that they [may / may not] reside on a simple jbod type disk. Once D0ORA1 is in place as the production database machine, it must be performant, highly available, and highly repairable 24x7. Work will be done to bring D0ORA1 to production quality and capacity. Backups :- System Backups :- D0ORA1 must be provided with a device which can re-install Solaris using media from SUN, and a device which can backup the system disk and restore the Solaris (system) partition from the boot prompt. D0 will begin to understand the DBMS backup requirements of the system. Rows in oracle databse tables can be frozen incrementally and backed up incrementally. D0 will use this mechanism to conteol the amount of table space whicn needs to be backed up. Other Oracle backups :- The Oracle backup requiremetns outside of RMAN are TBD. Backup Hardware :- D00RA1 will be provided with a tape library and backup software which can back up user software, miscellaneous data files and ORACLE database tables in an automated way. [The library selected is an EXABYTE libary, which holds 200 volumes and up to 10 drives. Two drives (upgrade-able to MAMMOTH-2 will be initially procured. Each drive will write 1 tape/day. Therefore, Initially, D0ORA2 will have, at most, 40 GB of backup @6MB/sec if the data do not compress.] The backup hardware need to be sized to complete normal incremental backups [in 8 hours?] to complete non-incremental backups [in 8 hours?] to performa the worst-case restore [in 16 hours?] The Physical design for this system is not complete. Racking is yet to be requisitioned. Permanent space in FCC is yet to be identified. D0ORA1 will be pruned of the ad-hoc disks and peripherals which were attached to it when it was a development machine. These "prunings" will be used to augment D0ORA2 when it becomes the development/test machine. There are no purchased upgrades to D0ORA2 on the February, 2000 time scale. The Ethernet interface will remain 100 mbps. Its bandwidth is adequate for serving the D0 catalog and calibration and alignment constants. Monitoring :- The machine must be highly available, hence well managed at three levels : system administration, ORACLE database availability, and the D0 supplied servers running on the machine. Problems with this machine, the ORACLE DBMS, and D0-specific applications may manifest themselves as problems with the D0 Offline and online processing. To resolve problems, D0 will provide a production liaison, during times production is running. This person will have a general knowledge of the overall D0 system, and will be able to investigate problems with the data system. This person will be able to isolate problems associated with D0ORA1 summon specialized help from Hardware, Operating system, ORACLE DBMS, or D0 application software when production is running. The production database system will support a web sever, whose function is to serve SAM-oriented status displays. In addition to investigation by the D0 production liaison, The operating system will be monitored by means of the PATROL package. If PATROL notices certain errors it will notify the system adminstrator in a way that is TBD, 24 x7. [It is TBD if PATROL displays are on the web]. The operators will monitor the system by periodic PING, via XFALIVE. On failure they will notify the system adminstrator. D0 Enstore Cluster ================== The goal of this cluster is to provide a stable flow of data for the early users of SAM, and to establish operational facilities and habits needed to support Run II data movement, for example a "burn in" library with example tapes for testing new and repaired drives. The SMWG has not selected a tape drive. There is to be an interim procurement of 10 MAMMOTH-1 tape drives to meet test goals until the SMWG selects a drive and a showing is made the procured equipment is reliable enough to replace the MAMMOTH-1 tape plant. The procuement will include 400 labeled tapes for testing. D0 may need to use DLT drives for data interchange. The need will be established in the d0 fall data handling meetings. The AIT-1 drives are not likely to be used for run II and can be be removed from the D0 production system at any time. The expected final choice of the SMWG is the MAMMOTH-II tape drive. while it is best to "stream tape", it is an acceptable practise to use 100 mbps Ethernet. The Enstore nodes will provide a large (>100 MB/ MAMMOTH-2) ram buffer to mitigate tape drive stopping and starting. Computers: There are two phases to the computing procurement: An interim purchase and a full purchase. Interim Purchase: These computers have been requisitioned. They are Linux PCs Installation is beginning. Each Linux box will drive two 100mbps Ethernet adapters. The configuration of Linux boxes are as follows: Node name Function Servers HW notes ======== ======== ======== ======== d0ensrv1 managed persistent PNFS, 2 SCSI disks data volume clerk, file clerk. d0ensrv2 Servers not in config, alarm, the datapath log, web servers. d0ensrv3 Servers in Library managers, the datapath media changer ... ... for ADIC arm 2. d0enmvr1 Data movement Mover 3 Mammoths on 3 busses 100 mbps Ethernet d0enmvr2 Data Movement Mover 2 DLTs on 2 busses 100 mbps ethernet d0enmvr3 Data Movement Mover 2 AIT on 2 busses 100 mbps Ethernet d0encons Console server KITS console cyclades box server The physical design for this computer installation is complete, and will use the black "stealth" rack placed on the "farm side" of the D0 robot. Short cable lengths (conservative use of differential SCSI) constrains the mover computers to be near their respctive tape drives. Final Purchase: The final purchase will ahve 4 Server PCs and 14 Mover PCs plus one console PC. Thsi system will achive the designed-to rates, and can be expanded by addin more movers. The system is sized on teh following assumptions: 1. Required DC rate is 170 MB/s 150 MB/s nominal rate 20 MB/s extra rate from online to catch up 2. Nominal tape writing speed = 8.2 MB/s Mammoth-2 tape drives at 11*1000^2 B/s = 10.5 MB/s Mount/dismount lowers duty factor by 20% 3. Spares drives needed ~ 25% 4. 2 operator mount drives 5. 2 drives/pc each on separate scsi bus Number drives needed = (170/8.2)*1.25+2 ~ 28 drives and 14 Mover PCs, with 28 100 mbps Ethernets for the movers. The physical design for this computer installation is not complete, and requires apce on both sides of the D0 robot. Short cable lengths (conservative use of differential SCSI) constrains the mover computers to be near their respctive tape drives. It is a requirement that all D0 Run II software be able to tolerate a failure of an individual tape drive, possibly tieing up an individual volume. A typical (small) number of Tape drive failures can be dealt with on the next business day. However the system must be monitored for the failure of a "large" fraction of tape drives when D0 is running production. Similarly, the system can tolerate the failure of an individual mover node. However the system must be monitored for the failure of "large" fractions of mover nodes when D0 is running production. Problems with this system may be detected at several levels. To resolve problems in the overall data systems, D0 will provide a production liaison, when productin is running. This person will have a general knowlege of the overall d0 system, and is able to investigate problems seen and identify problems associated with Enstore and summon specialzed help from Hardware, Operating system, or Enstore software when production is running.. The operating system will be monitored by means of the PATROL package. If PATROL notices certain errors it will notify the system adminstrator in a way that is TBD, 24 x7. Failure of a single mover node does not require such notification. The Enstore Daemons will be monitored by means of an Application specific ALARM SERVER which produces web pages giving the system status, and also producing output for PATROL. The monitored data produced by PATROL will be examined for alarm conditions by METHODS UNKNOWN TO ME, by which a proper type of intervention will be instantiated, when D0 is running production. The Operators will monitor the non-mover computers by periodic PING, via XFALIVE. On failure they will notify the system adminstrator and the D0 production liaison. Tape Drives: The SMWG selection process has not yet concluded, but the most likely Run II tape drive is MAMMOTH-II, which is not scheduled to be available until about [November 1, 1999] as a production unit. Not having the SMWG tape drive choice in hand, the directive is to use MAMMOTH-1 drives in the interim. The DO side of the ADIC has only three MAMMOTH-1 tape drives. Ten more drives will be ordered to suport testing. It has been the system design goal that client systems "stream" tape via I.P. A 12 MB/sec MAMMOTH-2 tape drive is slightly too fast for a 100 mbps Ethernet (11 MB/sec max) It is an allowable change to these criteria to "stream" a exabyte at less than full rate provided memory buffering is used on the mover nodes to ensure that the drives do not start/stop too much. Not being deployed until proven on the "RIP" cluster used for Enstore development: LINUX Disk mirroring or RAID controller (for the database) ADIC2 ===== ADIC 2 is an OS/2 computer with relatively weak security. It needs to be seen by computers on the D0 Enstore Cluster and the RIP cluster. Access to ADIC2 will be restricted to those computers by the provisioning of a private network, accessible only by logging into aan Enstore or RIP computer. SDIC labary: I n case of a library failure, D0 WE plan to allow D0 to at least record data into the CD STK robot. D0 Production Farm ================== The main function of the farm is to reconstruct d0 raw data while streaming in raw data from tapes in the ENSTORE system, and to stream reconstructed output files into the ENSTORE system. For February, the farm is to sustain bandwidths of 12 MB/ sec in and 6 MB/sec out. The procured hardare will be able to sustain the ultimate system requirement of 24 MB/sec to and from tape. The system consists of two or three computers with distinct functions: an I/O node, worker nodes, and a possibly seperate machine to run the FBS deamons. This last node would be a cost optimization, minimizing the number of LSF licenses required for the system, and seperating data flow from control. FBS node :- It is TBD if this computer will be constructed. One reason for its existance it to minimize the number of LSF licenses required. Only one license is required. However, LSF requires one license/CPU. If the origin 2000 were used to run the FBS system, more license would be required. The value of a license is about $1000 + $200/year in maintenance. The actual machine may be FNPCD. This machine will master an area with linux executables which will be pulled by the worker nodes. Worker Nodes :- 50 Farm worker nodes are out to bid. There will be 50 worker nodes. Their main features are: 2 INTEL Processors 1 100 mbps Ethernet connection 512 MB of memory 2 18 GB disk drives, 6 MB/sec or higher, on one EIDE bus. 1 6.4 BG IDE system disk, on a seperate EDIE bus. 1 Floppy 1 CDROM LINUX operating system, farms batch system, sam and ENSTORE are the principle underpinnings. Most EIDE disks run faster than the 6 MB/sec specified in the bid. Most likely, the EIDE controller will be either 33MB/sec or 66MB/sec. However, It is unknown if striping the two data disks would increase their data rate. However, since the bus rate is much larger than the desire streaming rate, so the presumption is that an increase in rate would be obtained. Area Disk Size Backup? Structuring Management ==== ====== ====== ======= =========== ============ Linux O/S "6GB" ~2 GB no partition System Adminstrator Swap "6GB" 1 GB n/a partition n/a Mini products "6GB" rest no partition Fermilab UPS File staging 36GB 18GB no 1/2 2-disk D0 prod'n software... stripe stripe File staging 36GB 18GB no 1/2 2-disk D0 prod'n software... stripe stripe Table :- Disk layout for D0 Worker Nodes I/O Node:- An SGI Origin 2000, the former FNFSR is specified as an I/O node. The node name is D0BBIN. The specifications for the machine are: SGI O2000 4 300 MHz processors. 1 GB of memory. 360 GB of JBOD F/C staging disks. 1 F/C Controllers. 1 GB Ethernet controller for Enstore. 1 GB Ethernet controlle for talking to the Farm nodes. 1 100 mbps ethernet controlers for general networking. 1 Tape drive for backups. Area Amount Backup? Structuring Management ==== ====== ======= =========== ============ system + swap 7 + 2GB [PRN] 2-way mirror... System adminsitrator ... 2 disks Spare... 7 + 2GB [PRN] 2-way mirror... System adminsitrator ... system + swap D0 Products ... 18GB tot no one partition D0 farm people + ...+ login +... FNAL UPS Linux Master + local login Staging 360GB no 72GB two D0 Farm software/ SAM way stripes. Right now, the plan and software direction is to input directly from Enstore to a farm node, so the I/O node is basically an "O" node. It is likely, but unproven that the two EIDE disks can sustain transfers form the 100 mbps ethernet connection. While 100 mbps does short of the nominal uncompressed tape speed of a MAMMOTH-2, The Enstore side system will a large RAM buffer whose function is to keep the tapes mostly streaming. The IRIX "mpadmin" command will be used to reserve 3 CPUS for Enstore data transfers. The Ethernet cards will be adminstered to bind their interrupts to these CPUS/ ENSTORE does not YET have a feature to cause data transfers to run on the CPU responsible for the ETHERNET controller's interrupts. It is UNKNOWN TO ME whether the volume of SAM file transfers is significant enough to cause SAM to implement code to exploit this optimization. Monitoring:- The Overall Farm production must be monitored 24x7. The I/O node must be monitored 24x7. While it is a requirement on the farms software that individual farm nodes can fail without disrupting production, a "large" fraction of farm nodes must be available 24x7. Problems with this system may be detected at several levels. To resolve problems observed via symptoms in the data systems, D0 will provide a production shift person while production is running. This person will have a general knowlege of the overall d0 system, and is able to investigate problems seen and identify problems associated with the Farms and summon specialzed help from Hardware, Operating system, or FBS software 24x7. The operating system will be monitored by means of the PATROL package. If PATROL can notice certain errors and correct them. The Operators will monitor the farm computers by periodic PING, via XFALIVE. If many fail over a short time they will notify the system adminstrator and the D0 production liaison. The Physical design for this system [is/ is not] complete. Racking [has/ is yet to be] requisitioned. Space in FCC has been identified. DOMINO ====== The main function of DOMINO is Production data analysis Non-production data analysis Software development and test Use of "productivty" tools (such as mail and Tex) (potential) hosting login areas for local users. D0 intend to not partition the machine. CPU will be managed by Fair Share. Memory will be managed by Fair Share. A summary of its hardware is as follows: QTY Item where? === ==== ===== 64 Processors 64*256 MB of memory 3 F/C controllers attached to 5TB one in each of rack 2, 3,4 of JBOD disk 1 F/C attached to 100GB of "vendor supplied" disk 1 F/C controller attached to a RAID Disk controller for login areas a few miscellaneous SCSI busses. 3 GB Ethernet card one in each of rack 2,3,4 16 100mbs Ethernet cards. It is a design goal to keep rack 4 so that it may be partitioned from the rest of the machine. A summary of the kinds of disk areas is presented here, and developed in the discussion below. Area Amount Enstr? Backup? Structuring Allocation via... ==== ====== ====== ======= =========== ================= Local Work 1.0 N no 18 GB, single Fair Share to disk partitions implement spokespersons policy. Scratch 0.25 N no large Logical Blockdays, 7 day... volumes delete time. Legacy plus... 2.75TB Y no TBD IRIX quota Sam Project + Y no some volumes SAM Sam pinning. reserved for structuring expt's Swap TBD n/a n/a n/a n/a Products TBD N weekly Not critical FNAL UPS System TBD N weekly Not critical? System Administator Table -- Allocation of the disk system. (based on hadley's mail of july 21) Code development and Productivity Tools Activities:- These activities include code development, code testing on moderate data sets [apart from SAM], mail reading, web browsing, TeX processing, and all the pleasant and agreeable day to day tasks performed by a scientist when not analyzing Run II data sets. Scientists connect to the system via some kind of remote logins, and will use FTP and other utilities to move non-managed files around. Use OF DOMINO does _not_ include web serving. Web serving is conducted on d0world.fnal.gov. These activities use include use of batch services and job scheduling apart from the SAM system. The Batch machinery will include LSF. The network interface cards used for these unstructured activities are are distinct and partitioned from the "structured" activities using SAM/ENSTORE discussed below. They will be implemented on 2 of the 100mbps interface cards supplied with the machine. Load balancing over these links will be provided by the NLBS software purchased with the machine. This software will work with kerberized and non-kerberized versions of interactive applications. Login Areas :- A RAID subsystem has been procured for the User Home areas. This hardware will be initially installed on D02ka and it will be NFS sevred to other machines. The home partitions will sized at 18GB to match the current capacity of the MAMMOTH-1 tape drive, and to cap the amount of time for backup and recovery. If the performance of the NFS areas is acceptable the central home areas will be made available to to DOMINO and to the workstations, and the current workstation home machines, both SGI and Linux, would be decommissioned. IF the NFS performance is not acceprable, the Home areas will be move to be local to DOMINO and [not served (Lee)| served only to other central analysis macines (Fagan)]. Local Work Area :- In addition, the the backed up and possibly NFS served login area, users will have local work areas which will be managed by IRIX disk quotas. This space is not backed up. The total size will be 1 TB, taken from the 5.04 TB disk procurement. The scratch area will be structured in 18GB-single disk partitions, linked to the home areas by the standard link /home/work/username. Scratch Area :- There will also be a 1/4 TB Scratch area controlled by block days and derived from the 5.04 TB disk procurement. The scratch area will not be backed up. Non-production Analysis :- [DLP -- I am having a hard time resolving LEE's and FAGAN's comments -- The following is my best attempt. DO NOT GET MAD AT ME, HELP ME FIX THE PROSE] Non-production analysis smaller data analysis/or code debugging run in the SAM framework. The activities are: Small data set staging (single user, not optimized, using SAM). Small "ntuple" analysis Data sets for software developers. LSF will be used to queue excess demab. The fair share softwer purchased wit the machien will be used to limit memeory size. The DOMINO systems supports this in the following way: Project area :- There will be a 1 TB project area for non-production analysis projects. Production Analysis :- Production analysis is for large scale projects, and projects which affect a large number of people. The kinds of Production analysis are: Using the SAM framework to iterate over the thumbnail datasets. Enlarging and occaisonally rebuilding the thumbnail data sets, putting a copy of the thumbnail into Enstore. Freight train processing of large data sets using the SAM framework. [Caching and purging other datasets, which are of current interest and may be re-used using by the Sam framework] Physics group datasets. Event Picking Network Interfaces :- The following table describes how the Ethernet contollers will be partitioned, and is discussed in detail in the rest of this document. Type of use number of Number of 100 mbps 1000 mbps ===================== ========= ========== Prod'ivity/Code devel 2 0 ENSTORE 1 3 Sam file serviing 2 0 Reserved 3 uninstalled 8 --------- --------- ------ Total 16 3 Table :- Allocation of NIC card to services on DOMINO In addition to the normal use of the network for interactive logins, casual use of FTP, web browsing, and so on, DOMINO uses two network- oriented services to get and serve data, Ensore and [sam cache servicing] . Enstore must be configured to provide [100] MB/sec in aggrigate to and from the machine, and the SAM services TBD MB/sec. The overall system design specifies that data must flow to and from the tape robot at a rate which will stream [or nearly in the sense that a 100 mbps link is almost 12 MB/sec] a MAMMOTH-2 tape. Given software which double-buffers, and a system which has enough resources in general (for example, no swapping), the main special considerations to achieve streaming are: 1) Sufficient network bandwidth for a transfer, 2) Sufficient bandwidth to the disk. 3) Ensuring that the program(s) moving the data can be scheduled promptly, as their I/O's complete even when the machine is compute bound. In addition, a specific optimization discovered during the RFI process was that the amount of CPU needed to move data over IP is lessened on O2000 computers by binding interrupts from a network card to a specific CPU, and running the program doing the network I/O on that CPU. The RFI tested 195 MHz CPUs and found that over 30 MB/sec could be transferred using one CPU, when jumbo frames were not used. Enstore balances its use of multiple Ethernet conections to the CISCO 6509 switch by round-robining through a list of Ethernet controllers. This sharing is most effective when the controllers are allocated to Enstore. Three GB Ethenet controllers and 1 100 mbps controllers will be dedicated to ENSTORE. Two 100 mbps controllers will be dedicated to SAM file transfers. It is policy that these controllers are allocated to these tasks. The allocation may be useful when we operate the software in the strengthened kerberos realm. Since SAM and ENSTORE are used in tandem, the application administrator should be trusted to use the allocated the resources appropriately. Causual uses [will be prevented from using these controlers by means of METHODS UNKNOWN TO ME | will keep use to a minimum via administrative means]. [SGI did not known how to turn inetd services off on the dedicated interfaces] THe IRIX "mpadmin" command will be used to reserve 3 CPUS for Enstore data transfers. The Ethernet cards will be adminstered to bind their interrupts to these CPUS/ ENSTORE does not YET have a feature to cause data transfers to run on the CPU responsible for the ETHERNET controller's interrupts. It is UNKNOWN TO ME whether the volume of SAM file transfers is significant enough to cause SAM to implement code to exploit this optimization. Backups Backup software for the home and other backed-up areas will be [Legato]. (Legato claims that backups of the system parition are useable for restoration by the 02000 bott firmware.) [An EXABYTE robot, with a capacity of 10 drives and 200 tapes has been ordered for this purpose. Two MAMMOTH-1 drives, with free upgrade to MAMMOTH-2 have been ordered for this robot.] The robot will be attached to the machine hosting the home areas. Legato client licenses are inexpensive and could be purchased later if the backup capability needs to be expanded. Monitoring Problems with this system may be detected at several levels. To resolve problems observed via symptoms in the data systems, D0 will provide a production liaison, when production is running. This person will have a general knowlege of the overall d0 system, and is able to investigate problems seen and identify problems associated with the Operating system or batch system and summon specialzed help from Hardware, OSS or ISA software 24x7. The operating system will be monitored by means of the PATROL package. If PATROL notices certain errors it will notify the system adminstrator in a way that is TBD, 24 x7. User problems clearly not related to the data system, (such as poor response time) will be forwarded directly to the existing helpdesk / operator dispatching machinery . Tools supplied with the procurement (performacne co-pilot) will be used for performance monitoring [at the system administrator level]. The Operators will monitor the d002KB computer by periodic PING, via XFALIVE. On failure will notify the system adminstrator. Physical design The Physical design for this system complete [except for the tape]. Racking is provided by SGI, except for the backup robot. Space in FCC has been identified. D02KA ===== If the NFS serving experiment is successful, d02ka will be an experiment-wide NFS server. It would share the backup robot with DOMINO. [It will also be an NIS server]. Actual implementation of the NIS server may be postponed until it is reasonably certain that d02ka is not needed for debugging hardware or operating system problems found on DOMINO. The home areas are on their own fibre channel loop, separate from the scratch disk. The moving of home areas to and from d02ka is accomplished by plugging the cable into the respective computer, not by copying the files. Backup Equipment for the backup and installation of the bootable partitions containing IRIX was procured with the machine. The networking for this test will be [GB Ethernet] and [is/ is not already installed] SAM Cluster =========== The SAM system has several daemons which serve the whole experiment. They are named ______________, ____________, and _____________. During the period of this plan, they will remain on the SAM LINUX cluster. These Deamons will be mover to etirhg DOMINO or D0ORA by February. CISCO 6509 SWITCH ================= The production switch (CISCO Catalyst 6509) is expected to arrive on or around August 1. [Before it is deployed, it will be exercised in the lab.] The 6509's backplane bandwidth is 32 ghbps, the next genertion supervisior will sustin 128 Gbps. The flexibility of the network design relies on thsi oversupply of bandwidth. There are seperate 6500 switches for teh D0 DAQ and the D0 assembly building. Until the switch can be deployed, integration of the whole system can proceed, because the FNAL networking group is providing bandwidth and other services. These are centered around a CISCO Catalyst 8500 switch, which is a "backbone" type switch, a CISCO switch associated with the RIP cluster, and a Foundry switch associated with the RIP cluster. When using this equipment, we should expect functionality, but since there is no central backbone, not all services can run at rate simultaneously. Minor or chronic robustness problems would have less priority than work related to making the final hardware available. The CISCO 6509 model number designates that 9 slots are available for networking modules. The available networking modules are: S-X6224-100FX-MT 24 Port 100 FX MultiMode MT-RJ WS-X6248-RJ-45 48-port 10/100 RJ-45 Module (~$150/ port) WS-X6248-TEL 48-port 10/100 Telco Module WS-X6408-GBIC 8-port Gigabit Ethernet Module (Req. GBICs) (~$1000/port) WS-X6416-GBIC 16-port Gig-Ethernet Mod. (Avail 10/99) Already procured for D0 are 2 48-port 10/100 RJ-45 Module 2 8-port Gigabit Ethernet Module (Req. GBICs) There are two additional GB ethernet ports on the supervisor. A 16-port Gigabit module will not be available in August. It is due to become available around 10/1. Becasue of that, it is probably best to conduct the August D0 Run-II tests with a minimal configuration. It is probably best to wait for the 16-port gigabit ethernet module to become available, and not to waste slots using two 8-port gigabit ethernet modules instead. And we understand that the 16-port gigabit ethernet module does not use GBICs. While this switch is not in service at FNAL, the Catalyst 6509 is in service at BaBar, which does use the 100 mbit RJ45 module, and which very likely uses the 8-port GB ethernet Module. The 48-port RJ-45 module costs about $7,000 or ~$150/port. The 8 port Gig Ethernet costs about $7,000 and is ~$875/port exclusive of GBIC (which costs $350 for 1000B-SX). Around this August 1, CISCO is expected to implement routing features in the switch. It is hard to say how well the router features will work in an initial release, some problems should be anticipated. If the routing features do not work well the switch can move data through some other router which we can connect externally. Becasue this switch will be the central hub of the D0 data system, there is a need to consider the reliability of the switch. CISCO describes the reliability features of the switch as: All system elements including power supplies, fans, supervisors, line-card modules, and switch fabrics are hot-swappable such that elements can be added, removed or replaced without service interruption of unrelated traffic flows. In dual-Supervisor configurations, Cisco's Fast-Switchover will transfer switch control to the redundant Supervisor within seconds for mission-critical applications requiring maximum network availability. All system elements are also field replaceable units (FRUs) for maximizing serviceability and minimizing network downtime. For network-level resilience, Catalyst 6000 Family switches also support automatic recovery from failure using Spanning Tree per VLAN, and support load-sharing for faster link convergence using Fast EtherChannel or Gigabit EtherChannel technologies. Load balancing with even higher availability can also be accommodated using Cisco's MultiModule Channeling, where ports from different line cards can be aggregated into higher-bandwidth links. Catalyst 6000 Family switches are also capable of load balancing across Layer-3 paths. Fermilab's experience is that supervisor reliability is normally very high. It's also been our experience that redundant, dual supervisors can cause more problems than they prevent. So, we don't have any plans initially to use dual supervisors. But the capability is there is if we need it., However, the cost includes one of the 9 slots available for network modules. If the switch breaks, or fails to be performant, it will cause apparent problems with the D0 data system, the operating system of D0 platforms, or with any of the layered application software systems. Any of the on call experte associated with these system may be called. They will have the expertise to identify problems in the network and summon the network group, 24x7. The switch will be monitored by means of METHODS UNKNOWN TO ME. The monitored data will be examined for salient failures by METHODS UNKNOWN TO ME, by which a proper type of intervention will be instantiated, 24x7. The Physical design for this system [is/ is not] complete. Racking [has/ is yet to be] requisitioned. Space in FCC [has / has no] been identified. D0 DAQ BACK END =============== Computers :- This system is a set of three TRU64 UNIX boxes. Internally, data are stored on a SCSI array, which has three SCSI busses internally. Each tru64 box will have three such arrays. Each OSF/1 box has a Gigabit Ethernet card attached. [The Gigabit ethernet cards shown are dedicated solely for ENSTORE data movement, other uses will use a different NIC card[s].] Software :- The data acquisition software will use spindle management to ensure that the disk performance will be available to stream a MAMMOTH-2 tape drive. [The RAID array can consistently perform at over 12 MB/second] [A METHOD UNKNOWN TO ME will be used to allow the ENCP data movement processes to be scheduled in the face of a heavy CPU load. | these machines will only be servers, the load will be controlled so that some spare CPU is available]. The machines will be managed so that they are generally not resource limited, for example no swapping. Networks :- Currently, a 48-port 10/100 switch is at the D0AB, and an 8-port GB ethernet switch is available. The eventaul plan it to form the D0AB network around a CISCO 6509 switch. A GB fiber from D0 to FCC is dedicated for data movement from The D0AB to the D0 Offline switch, and then onto the Enstore computers. [The didication will be achieved by ??routing?? ?? evnentually??]. Other uses of the network will use a different connection.