Mirror from
{OCLC logo}
htpp://www.oclc.org/
with permission

The Bosnian
National Library
Building a Virtual Collection


Edward T. O’Neill, Consulting Research Scientist,
Jeffrey A. Young, Consulting Systems Analyst, and
Robert Bremer, Database Specialist

Abstract

Artillery attacks during the siege of Sarajevo in August 1992 destroyed the collection and catalog of the Bosnian National and University Library. The effort to rebuild the library depends, at least in part, on the reconstruction of its catalog. We examined WorldCat, the OCLC Online Union Catalog, to determine the criteria that would be most effective in selecting records appropriate for rebuilding a Bosnian catalog. For the first cut extraction, we applied the following seven criteria to WorldCat:

  1. works of Bosnian authorship,
  2. works in Serbo-Croatian languages,
  3. works about Bosnia or Yugoslavia,
  4. works classified by LC or Dewey as Bosnian,
  5. titles with Bosnian keywords,
  6. works published in Bosnia, and
  7. works with Bosnian-related subjects.

Examination of the results suggests that adding selected geographic subject headings and personal name subject headings will result in better recall.

Back to OB Quaterly

The artillery attack on the National and University Library of Bosnia and Herzegovina in August 1992 resulted in the massive destruction of the library and the culture it represented. When the four days of bombardment ended on August 27, all that remained were burned ruins. The flames "engulfed almost 50,000 feet of wooden bookshelves" [Perlez 1996, A4] destroying both the collection and the catalog. "A burnt-out skeleton was all that was left of this once-beautiful and renowned cultural institution; fire had raced through most of its three million volumes" [Lorkovic 1992, 736, 816]. Ze´co provides a detailed account of the library from an insider’s perspective [Zeco 1996, 294–301].

Recovery Effort

The library community was horrified by the level of destruction, and librarians throughout the world have generously offered to assist in the rebuilding effort. The recovery has several distinct aspects: reconstruction of the building, recreation of the catalogs, and rebuilding of the collections. There are different views on whether the library should be restored to its former splendor or "left in its distressed state as a memorial to the three-and-a-half-year siege of Sarajevo that led to the near-total destruction" of the library [Kniffel 1996, 17]. The Sabre Foundation is among those coordinating efforts to restock libraries in Bosnia and several American universities are donating materials [Mazmanian 1996, 14]. Harvard University Library and Harvard University Press announced that they will play a leading role in this effort to rebuild the collection [Kniffel 1996, 21].

Rebuilding the Catalog

Efforts to rebuild the collections are hampered by the destruction of the catalog—no complete record of the library’s holdings remains. The effort to rebuild the library generated strong support and offers of assistance from the international library community. The first step is to generate a database of bibliographic information pertaining to Bosnia along with an indication of what institutions hold the material. Representatives from Harvard University Library, Yale University Library, and OCLC met in summer 1995 and agreed that OCLC should assume technical leadership in support of the project within the United States.

Bosniaca is defined as all documents in any format written in any language on or about the territory of Bosnia and Herzegovina. It also includes all items published within the territory of Bosnia. Initial efforts will concentrate on producing a subset of the OCLC database. In the future, it may be desirable to add records from international sources. The resulting database will eventually be made available to the National and University Library of Bosnia and Herzegovina and other interested parties.

Bosniaca records in WorldCat (the OCLC Online Union Catalog) will form the basis of the Bosniaca Catalog. Bosniaca records were identified by searching WorldCat for records meeting at least one of the following criteria:

  1. Published by a Bosnian author. The record must contain either a 100 or 700 field for a Bosnian author identified through a comprehensive list of possible Bosnian authors. This list of 3,404 authors was obtained by scanning the personal names file for those authors who have published more in Serbo-Croatian than in any other languages except English. (As part of the OCLC Control service, a comprehensive file of the personal names and their attributes including language of publication is maintained. The file was used to identify authors who publish primarily in Serbo-Croation.)
  2. Published in Serbo-Croatian. A fixed field language code (field 008 / 35–37) of either scc or scr.
  3. Published about Bosnia or Yugoslavia. The geographic area code (field 043) contains the code e-bn— or e-yu—.
  4. Classified as Bosnian (LC Class). Field 050 and 090 LC call number ranges DB231-250 (obsolete class for Bosnian history), DR1652-1785 (Bosnian history), or PG1400-1798 (Serbo-Croatian literature).
  5. Classified as Bosnian (Dewey Class). Field 082 and 092 Dewey Decimal call numbers 914.9742 (Bosnia description/travel), 949.742 (Bosnian history), or 891.82 (Serbo-Croatian literature).
  6. Keywords. The title (fields 245, 246, and 740) containing the words Bosanski, Bosansko-Hercegovacki, Bosne, Bosnia, or Bosnian.
  7. Published in Bosnia. The place of publication (field 260 subfield a) is Banja Luka, Bihac, Bosnia, Mostar, Sarajevo, Travnik, Trebinje, Tuzla, or Zenica.
  8. Bosnian subject. A subject heading (fields 600-651) contains the words Bosnia or Bosnian.

Many of the criteria overlap; many of the selected records matched on several criteria. The criteria were selected to provide a high recall—the precision is a secondary consideration. It is expected that many of the retrieved records will fall outside the definition of Bosnian materials. If precision becomes a problem, the nonrelevant materials can later be removed from the catalog. The extraction of Bosnian material will be done twice: the initial set of Bosnian records will be used to review and refine the criteria. The revised criteria will then be used to extract the records for the Bosnian catalog. The revised criteria will also be used by other libraries willing to contribute records to the catalog.

WorldCat Record Retrieval

WorldCat was initially searched in November 1996 for records matching the criteria. At the time of the search, WorldCat contained almost 36 million records; 103,983 matched one or more of the criteria. The percent of records retrieved by each criterion is shown in fig. 1. The percentages sum to more than 100 since many of the records matched on multiple criteria.

{short description of image}

Fig. 1 WorldCat Matches

The language code produced the greatest number of matches; 78,766 (75.7%) records have the language code for Serbo-Croatian. The next most productive criterion was authorship; 35,003 records were retrieved with a Bosnian author as either the main entry or as an added entry. The geographic area code generated 28,931 matches. The Library of Congress classification was the only other selection criterion which generated a large number of matches.

Characteristics of the Records Retrieved

For records with either an LC or Dewey classification, the class number was converted to one of 29 broad subject categories. The resulting subject distribution is shown in fig. 2. The set was dominated by Language and literature and History. Together, these two groups accounted for almost 55% of all retrieved records. Science, mathematics, and technology are under represented. These fields do not usually have a geographic dimension and therefore are less likely to be retrieved by the geographically oriented selection criteria.

{short description of image}


Fig. 2 Subjects

Serbo-Croatian is the dominant language, accounting for approximately 75% of the records. The distribution of languages other than Serbo-Croatian is shown in fig. 3. In this group, English was by far the most common. Almost 80 languages are included in the "other" group.

{short description of image}

Fig. 3 Common Languages

Ninety-two percent of the records were for books, 5% for serials, and the remaining 3% for all of the other formats. The publication dates for the books are shown in fig. 4. The figure shows a steady growth except during the decade of World War II. The drop in the 90s is due to both the war and the lag in libraries acquiring current publications.

{OCLC WC - Bosnia dates}

Fig. 4 Publication Dates for Books

Nearly 75% of the material retrieved was published in Yugoslavia or in what was formerly Yugoslavia. The country code bn has been used since 1992 and is present in only 628 WorldCat records. Materials published in Bosnia and Hercegovina prior to 1992 generally are identified with country code yu (Yugoslavia). For material published outside the former Yugoslavia, the regions of publication are shown in fig. 5. Almost equal numbers of items were published in North America and Western Europe, with most of the remainder coming from Eastern Europe.

{OCLC WC Bosnia - countries}

Fig. 5 Country of Publication

Enhancing the Search

The primary purpose of this initial retrieval is to test and evaluate the effectiveness of the retrieval criteria. It is expected that other libraries will also contribute to this rebuilding effort by using similar retrieval criteria to search their catalogs to identify materials which are not included in WorldCat. Searching for Bosnian authors is complex, requiring a Boolean OR of several thousand author names. To greatly simplify the search, it would be desirable to drop this criterion unless there is strong evidence that it retrieved a large number of records that would not have been retrieved otherwise.

The analysis of the retrieval consists of two phases:

  1. the statistical analysis of the records retrieved and
  2. the manual review of individual records to assess the precision and recall.

The manual relevance assessment has not been started and its scope will depend, in part, on the availability of subject specialists. The statistical analysis requires far less time and effort but is expected to assist in refining the criteria.

The topical and geographic subject headings were examined to see if additional terms could be identified that would enhance the recall. The most frequently used topical subject headings found in the retrieval set are:

World War, 1939–1945
Serbo-Croatian language
Serbian literature
Yugoslav War, 1991
Croatian literature
World War, 1914–1918
Communism
English language
Serbian poetry
Yugoslav literature
Serbs
Eastern question (Balkan)
Chess

As might be expected, many of the headings are generic without any connection to Bosnia. Some headings such as Chess have little regional meaning while other headings such as Communism may have strong ties to the region but are still too general to use for retrieval. After reviewing the common subject headings used, several additional keywords were identified. Adding Yugoslavia, Serbo-Croatian, Balkan, Adriatic, Yugoslavs, Slavs, and Slavic to Bosnia and Bosnian, the two keywords used originally, would significantly improve the recall.

Prior to the breakup, Yugoslavia included six republics: Bosnia and Hercegovina, Slovenia, Croatia, Macedonia, Serbia, and Montenegro. Since 1992, Yugoslavia refers only to a federation of two republics: Serbia and Montenegro. As a result, it is difficult to distinguish between Bosnian materials and materials associated with one of the other five republics. The recall, which is the primary focus of this project, would be improved by adding the terms Serbs, Serbian, Croats, Croatian, Slovenes, and Slovenian. However, much of the additional material would relate to Yugoslavia in general rather than Bosnia in particular.

Personal name subject headings could also enhance the recall. The 12 most frequent personal name subject headings in the retrieved set were:

Tito, Josip Broz, 1892–1980
Mary, Blessed Virgin, Saint
Karadzic, Vuk Stafanovi, 1787–1864
Petar II, Prince Bishop of Montenegro, 1813–1851
Andric, Ivo, 1892–1975
Krleza, Miroslav, 1893–
Sava, Saint, 1169–1237
Mihailovic, Draza, 1893–1946
Marx, Karl, 1818–1883
Markovic, Svetozar, 1846–1875
Jesus Christ
Tesla, Nikola, 1856–1943

As can be observed, many of the frequent personal name subject headings would be useful in retrieving relevant material. Some frequently used headings, such as Jesus Christ, the Virgin Mary, and Karl Marx, rank high due to their high general use rather than any unique regional interest. Most of the others had strong regional ties.

A large number of relevant LC class numbers were not originally included in the criteria. Some lacked any regional specificity or included regional specificity only as part of the subject cuttering. However, many frequently occurring classes were identified that appeared relevant, including:

DR1214 History — Balkan Peninsula — General Works
HC407 Social Sciences — Economic History and Conditions — Europe — Balkan States — Yugoslavia
AS346 General Works — Academies and Learned Societies — Europe — Turkey and the Balkan States — Yugoslavia
DR301-396 History — Balkan Peninsula — Yugoslavia (Obsolete)

Adding additional Library of Congress class numbers would significantly enhance the recall. As with the subject headings, it will be difficult to differentiate between Yugoslav and Bosnian materials.

As expected, analysis of the retrieved records indicated that revising the selection criteria will improve the recall. Manual review of the retrieved records should result in further refinement. Since the number is relatively small—five times as many records would easily fit on a CD-ROM—the focus will remain on improving the recall. WorldCat will be researched using the revised criteria to extract the records for the Bosnian catalog.

Adding Material from Other Sources

WorldCat contains one of the largest, if not the largest, single collections of Bosnian records in the world. It represents a large proportion of the Bosnian materials held by American libraries. However, American libraries, even collectively, would not have acquired everything held by the National and University Library of Bosnia and Herzegovina. It is expected that European libraries hold a large number of Bosnian materials that are not in WorldCat. The materials from European libraries and other libraries that are not OCLC members will need to be included in the effort to rebuild the catalog.

OCLC will accept machine-readable records from other libraries willing to participate in the rebuilding effort and add them to the collection of Bosnian materials found in WorldCat. OCLC has developed comprehensive record-matching algorithms [O’Neill 1990, 13–14] to merge bibliographic records from external sources with those from WorldCat. The software can determine reliably which of the new records duplicate existing records and which are for new bibliographic items. New bibliographic records and holdings symbols will be added to the database. Only the holding information from incoming matching records will be retained.

Future Plans

This project is in its early phase. OCLC’s contributions are but a small part of a global effort to assist in the rebuilding of the National and University Library of Bosnia and Herzegovina. Many of the resources lost in the fire resulting from the artillery attack were unique and will never be recovered. However, collectively, the collections of OCLC members combined with those of other libraries should contain most of the nonunique material lost in the fire. Building the catalog, a virtual collection of Bosnian materials, is a small but important step in the rebuilding of the National and University Library of Bosnia and Herzegovina.

References

[logo and name] OCLC Online Computer Library Center, Inc.

Back to OB Quaterly