Status of SAM - November 12, 1998 A first prototype system has been put together which contains, and exercises, at least skeletal components of most parts of the proposed SAM data access system. This prototype has the following functionality: a) startup of an experiment data access SAM framework, including all database servers, log file servers, information presentation servers and global optimizer required for the operation of the data access system. b) a rather fully developed database for tracking files, events, file and event characteristics, runs, plus the production and consumption of files and events. This database has been designed using Oracle designer tools and implemented using Oracle. c) ability to catalog files in the system using an import description language to describe the nature and lineage of the files, including the process(es) which generated them. This has been developed in consultation with the Monte Carlo challenge working group in order to provide the necessary tools for import and cataloging of MC data files. People are ready now to starting using this to store and catalog actual MC data. d) ability to create and save project descriptions of a simple nature - basically queries and constraints which result in a specific list of files. e) ability to create and save project snapshots - i.e. the resolution of a project description to a list of files at a particular moment. f) ability to start a project (based on a project snapshot) and to initiate a data delivery pipeline (or multiple pipelines) associated with that project g) ability for staging processes ("encp clients") to announce themselves to a project as able to carry out part of the work of the project data delivery pipeline h) ability to register uniquely a particular "consumer" as a unique combination of - a particular project, a particular SAM user, a particular process family, and a user specified identifier i) ability for multiple analysis programs, running as either the same or different "consumers" (or some combination thereof), on any machine (designated as part of a particular station) to register with a running project and to request the next file to be analyzed. j) ability for an analysis program to record that it has successfully completed analysis of a file and for that fact to be noted permanently in the database k) ability for the system to handle analysis program errors and failures to either request files or record consumption of files l) ability for an analysis program to re-join a project and carry on consuming files from the last successfully consumed file m) ability to run multiple projects, each with multiple consumers, on one or many machines n) ability for the staging processes to make calls to the Enstore storage management system in order to get delivery of one or more files into their managed disk cache area. Currently only one file at a time is being requested in an encp command, but a request for multiple files on a single encp command has been made to Enstore, purely for the purposes of optimizing performance of the system. o) ability to receive and record data provided by Enstore on completion of a file copy encp command. p) ability for the system to run and assure delivery of all files in a project to all registered analysis programs. This has also been demonstrated to work properly when the order in which the staging processes are permitted (by the global optimizer) to fetch files from Enstore is deliberately and randomly mismatched to the order in which they requested permission to launch such requests, based on the list of files being traversed. This is an unrealistic situation which merely serves to exercise the communication protocol between the global optimizer and other components of the system. q) ability to look at rudimentary statistics on the file throughput, consumer load, bandwidth and latency of the system. The system has been implemented using servers, with interfaces between all components (including calls made by the analysis programs) defined in the CORBA interface definition language IDL. Stub code and skeleton code to implement the various parts of the system can therefore be generated in any of the languages C++, Java or Python. All three of these languages have been used in at least parts of the system (although not all three for all parts). Interface to the database is also through an IDL defined interface to a database server (conceptually one server, but in reality may be several servers each one implementing some part of the total database interface). The current database server code is implemented in Python and its underlying implemention is hidden from the rest of the system. Therefore it could just as easily use an msql database, a mySQL database, a set of Python built-in databases for all or parts of the database, or be implemented in C++, at some later time for performance reasons. The analysis program interface code currently consists only of interfaces to register, get a file name, declare a file successfully read, declare a file in error, and to rejoin a project. C++ stubs have been used for this analysis program interface and have been successfully compiled (along with a particular CORBA implementation) using the KAI compliler. Many necessary parts of our data handling system have not been addressed or implemented in this prototype code including 1) The full range of features and specifications which an analysis program needs when specifying its characteristics (e.g. Production Version and params) and its mode of operation (e.g. Freight Train, Files on demand) 2) Writing files from an analysis program and the merging of output files 3) The full complexity of trigger and stream specification in the database. Also luminosity tracking information. 4) Robust error handling 5) Global Optimizer code for "batching" file requests (in a way that is well matched to both the disk cache space available and the physical position of files on tape volumes) and for regulating file requests to Enstore (based on policies and bandwidth restrictions for certain access modes, and using data provided by Enstore on transfer rates and latencies). 6) Useful and correct statistics on the system performance. 7) Code running on platforms other than linux 8) Browsing tools for the database 9) Interplay between this I/O data handling system and a Batch system which is regulating the number of analysis programs and their cpu usage. 10) The pickevents subsystem 11) Intra-Station and Inter-Station disk to disk staging of files 12) System configuration including Station components, policies and resources 13) Recording and handling of semi-permanent disk cache of files when a project terminates. 14) Definition and handling of various SAM user and administrator roles Also many of the features and functionality demonstrated in the prototype need further refinement and extension. The prototype system has been demonstrated to run on a set of 6 PCs each running Linux. Performance of the system has not yet been measured or addressed. It has used an Enstore system hooked up to an STK robot and a Redwood tape drive to write files into the Robot and to read back a set of files from a few Redwood tapes. It has also been demonstrated to run using an Enstore system which emulates a tape robot by using a disk file as the physical 'tape library'. In both cases the system runs for several hours, although there are still bugs and problems which at this time would prevent the system from running smoothly for 24 hours say. Goals for the very near future (ie. between now and Christmas) apart from understanding ramifications of upcoming workshop 1) clean-up and put in some more error and exception handling 2) integrate SAM Prototype user analysis code with Prototype Farm job and test 3) conduct tests in collaboration with Enstore using the Grau robot. For this we will need the switch with 2 Gigabit uplinks to hook to Enstore's similar switch. 4) enhance performance tools, which analyze the system based on the log file and study the statistics 5) do performance testing, understand bottlenecks, scalability issues 6) write and test parts of Optimizer code and algorithms - this may drive further enhancements in the interface with Enstore. It may also start to flesh out details of the interface between a Batch system and SAM. 7) continue to flesh out details of user analysis code interface 8) refine, iterate, document. Lee and Vicky