In the July 1984 issue of LSN we highlighted the importance of ownership of records as a consideration in retrospective conversion planning. In this issue we examine some of the factors to be taken into account when a library or group of libraries wishes to combine separately developed files of machine-readable records into a single data base on an automated library system. This need arises when a library has bibliographic records from different sources such as a book jobber, a bibliographic utility and a retrospective conversion vendor, or when a group of libraries wish to mount their separate machine-readable files on a shared automated library system with a single bibliographic master file. The following discussion focuses on bibliographic records formatted according to the accepted national standard-the MARC format.
The vendor of an automated library system will normally undertake the merging of separate files and the deletion of duplicate bibliographic records as a part of system implementation. It is also common practice for system vendors to provide the software required to translate standard records from sources such as OCLC and MARC/REMARC into the internal format used by their systems. Another process, often performed at this time, is to separate the item-specific or individual copy information from the bibliographic records and use it in the creation of separate copy records.
To verify that this generalization was correct, the editors contacted three representative turnkey system vendors. Their responses are described in the following paragraphs.
OCLC, the vendor of LS/2000, confirmed that the combination of separate files, the translation of files from "regular" sources such as MARC/REMARC into the format used by LS/2000, the elimination of duplicate bibliographic records, the retention of local information recorded in the copies of a record not selected for retention on the master file, and the generation of item records are processes handled by the system. Except in the case where special programming is required to handle nonstandard record formats, such processing incurs no charges. However, these are time consuming processes-for example, the generation of item records and the associated creation of indices and authority files can require up to a minute of computer time per record. In very large files, this could translate into a "time" cost since the system is not available for regular use until the processing has been completed.
CLSI confirmed that system software has been developed to match and merge records from separate sources into a single combined data base. Libraries would need to purchase the required software, as part of the general system software purchase, and would also have to buy or rent a tape drive to load the records. The software for this operation is parameterized, allowing a library or group to establish its own criteria for the circumstances under which bibliographic records ate considered to be identical. This method of loading and matching records from different sources incurs no per record charges.
However, CLSI customer libraries can expect to incur per record costs in the creation of copy-specific records. Unless the item data has been entered into the 049 fields of the bibliographic records as required by the CLSI format, the company has found it necessary to undertake custom programming for each different customer. The tapes are then processed through the specially written programs before they are mounted onto the local system and merged. CLSI personnel estimate the charge for this programming at between $.05 and $.10 per item record created. The system creates the appropriate item records at the same time that the processed tapes are loaded and merged into a single bibliographic file. None of the local data input during the conversion are lost when duplicate bibliographic records are deleted -- this information is transferred into the appropriate item record.
The current processing procedures of DataPhase, the third vendor queried, may present problems. The potential problems occur not in merging the files or in eliminating duplicate bibliographic records, but hi retaining the individual copy information-the local data- included in the bibliographic records. At present, this data is not retained for any record other than the one selected as the master record for the bibliographic file. The copy specific or individual copy information is purged along with the duplicate bibliographic records.
DataPhase systems include the software for tape loading and matching. A tape drive is a mandatory part of the system configuration so it would not be necessary to lease one for the loading and merging operation. Records in the MARC format can be loaded without difficulty. However, unless local/copy information has been recorded in the 949 field in accordance with the DataPhase format, this information cannot be processed and is dropped from the record. One way to overcome this is to have the tapes preprocessed by a tape processing service, transferring the local or copy specific information into the required field and into the DataPhase format. However, even this does not overcome the problem of losing item specific data when there is more than one occurrence of a bibliographic record.
In the method used to retain the selected record, all of the record is retained (including the item data provided that it is recorded in the 949 field in the Data Phase format). However, this is not supplemented by any information from the duplicate records. Therefore, the item specific information included in duplicate records is lost in the match/merge process and must be rekeyed.
Data Phase offers libraries several options for comparing multiple records to identify duplicates. The choices are: the 001 field; the 035 and 010 fields; or all three fields. In the implementation of some of these options, the system software can be instructed to select the record to be retained as the master bibliographic record on the basis of the encoding level of the record-the "highest quality" record being the one selected for retention. Libraries can also choose whether such selection should be implemented automatically or whether the records should be referred to a file for later human review.
The procedures and costs for file merging on the three vendor systems described above also apply to a library seeking to add its records to the data base of an existing automated system. However, additional charges may be incurred in this situation. The library which owns the system may seek to impose a charge to perform the required processing. This might be justified due to the processing time required to undertake the merger, and the fact that such processing might restrict normal system use or performance.
Specific negotiations on this point should be conducted prior to finalizing arrangements for a library to join an existing system. For comparative purposes, the costs quoted by the system operator can be compared to the charges that would be levied by one of the many processing services which specialize in library-related tape processing. SOLINET, for instance, has the ability to "break-out" copy information from bibliographic records and create item records. To date, this capability has been demonstrated for records to be mounted on LS/2000, CLSI, and DataPhase systems. AMIGOS also provides similar services. The same capability is now being developed for Geac systems. The vendor of the system on which the file is to be mounted may also be able to suggest suitable sources for such support.