DBC logo
**********************
6. Registration method

Experience of existing search engines on the Internet, which make it possible to search directly in large amounts of data, has proved that searches are too inaccurate in many cases, and that they generally produce far too many apparently random hits.

Consequently, our starting-point was that it should be possible for Internet publications to be included in the DanBib base, catalogued according to AACR2 in MARC format, thus maintaining our tradition for processing information from the other materials in DanBib.

The experience gained during the project has thus been incorporated in chapter 9: "Elektroniske materialer", in: Katalogiseringsregler og bibliografisk standard for danske biblioteker : omarbejdet til brug for onlinekataloger / udarbejdet af Katalogdatarådet for Statens Bibliotekstjeneste. (prepared by the Catalogue Data Council for the National Library Authority). Chapter 9 was prepared in parallel with our project, and the editor is one of the project participants, Annette Laursen. Chapter 9 was approved by the SBT on 1.7.1997. The change in danMARC 2 format resulting was approved by the SBT on 15.7.1997.

Owing to the special "non-physical" and often dynamic nature of Internet publications, a number of factors and problems arise in connection with registration. The examples below reflect our choices so far.

One basic difficulty was the problem of establishing the sources of registration. There were problems in establishing the central issue of the title. The most eye-catching title on the screen is often concealed in (or is itself) a GIF image, and how should we regard the formalised HTML title, which is obligatory in HTML coding of the document? As shown in the examples below, we chose to base our decisions on the screen display and regard the most obvious title as the obligatory title, and then make an entry for the HTML title if it differed from this.

Another special difficulty during registration involves the problems connected with fixing the dates of complexes of documents. It is often necessary to study an underlying document in detail to find the latest date. If a publication has this complex nature we add a note (e.g. "Description based on version seen on 07.07.1997").

Here are some of the new fields used to ensure that indexing is complete:

259 File description

e.g.: Tekstdata (Text data)

856 Location information for electronic material

This field is designed for URLs and PURLs, and contains a great number of sub-fields (e.g. stating file size and access conditions).

e.g.: *uhttp://home.socialdemokratiet.dk/socdem/*z Består af en række filer under denne

sti (Consists of a number of files in this path)

A special note system had to be amended and expanded to describe publications. Here are the fields used, with examples of the formulations used:

501 Note about system requirements

- including the necessary help programs in reading and using publications (e.g. Acrobat Reader, Quicktime (a video play program)).

e.g.: Tilgangsmåde: Internet via World Wide Web (Access method: Internet via World

Wide Web)

512 Note on description

e.g.: Titel hentet fra titelskærmbillede (Title retrieved from title display)

Titel hentet fra HTML-kildeteksten (Title retrieved from HTML source text)

Titel hentet fra kolofon (Title retrieved from colophon)

HTML-titel: (anføres kun hvis den afviger fra den bibliografiske titel) (HTML title:

(only stated if different from bibliographic title))

Ajourføres (Updated)

Ajourføres månedligt (Updated once a month)

Kan nedtages i følgende formater: Ren tekstversion pakket med WinZip (41 kB) og

MS Word pakket med WinZip (308 kB) (Can be downloaded in following formats:

Pure text version compressed using WinZip (41 kB) and MS Word compressed

using WinZip (308 kB))

520 Note about bibliographic history of the work

e.g.: Findes også i trykt version med titel: (angives kun hvis den trykte versions titel

afviger fra Internetpublikationens) (Also found in printed version with title: (only

stated if the printed version's title differs from that of the Internet publication)

539 Note about basis of cataloguing

e.g.: Beskrivelse baseret på: Version 2.07. (Updated February 16, 1997) (Description

based on: Version 2.07. (Updated February 16, 1997))

Beskrivelsen baseret på udgave som set 07.07.1997 (Description based on version

as seen on 07.07.1997)

New terminology (particularly form codes) has been used:

e.g. databases, company homepages, organisation homepages, sounds, search engines, link connections.

Special factors applying to static publications

Static publications present the fewest problems because they can often be registered in parallel with publications in physically fixed form (books, articles, CD-ROMs etc.) This was practical, because many Internet publications are available as parallel versions, and it meant that we could re-use a good number of previous registrations (e.g. many of the Ministry of Research's publications).

Even though these and many other publications are often almost identical with their printed versions, we have decided to observe the normal procedure in the national bibliography. So each work in a new medium is catalogued as an independent version in its own item. Another option when both a printed version and an Internet version are present would be to refrain from registering the Internet version and use the information about its existence as a reference in a note when registering the printed version. We chose independent registration because there is often some uncertainty as to whether the printed version and the Internet version are identical. In future many Internet publications will no doubt be available in new versions, making reference in a note inaccurate. In addition, it will not be possible to search for and find the Internet publication/version concerned in the same way as for other materials.

However, we have not included independent registration of the range of different file formats for downloading. Independent registration would mean that we would have to download files in each individual format and open/decompress the files in order to register them independently. We felt that a text in several file formats could be compared with a (printed) work in various bindings; and such bindings are not described in independent bibliographic items.

Special factors applying to dynamic publications

These publications, which resemble loose-leaf documents which are constantly updated (or periodicals, depending on how each publication is defined) will generally be registered by open registration.

They present a range of special problems due to their changeable nature:

Changes in content are always being made - at worst the content may be replaced completely, and there is a considerable risk that important bibliographic details such as the title and copyright will be changed.

Such changes of form and/or content mean that there is a risk that the registration performed may be misleading or incorrect. Naturally, this risk increases as the number of descriptive elements in registration increases. But at the same time the number of descriptive elements also determines the extent of the coverage provided by the registration in question. So the problem is to find a balance so that registration can be (relatively) proof against future changes without becoming so brief and schematic that it is meaningless.

To solve these problems in terms of changes on the contents page, we have equipped our publications with an extensive note describing the contents (field 504), using a free formulation to identify important (and probably permanent) elements of the contents. It may seem tempting to use field 530, the note field that quotes the contents, since this field reflects the structure of publications accurately (if the headings are informative). However, this will often create difficulties when publications are supplemented (e.g. with an extra section inserted between the points quoted).

In addition, we have supplied each publication with a range of terminology applied according to the principles used for books and articles and supplemented with a number of new form codes for the new publication types (see above).

We have also allocated classification marks so Internet publications can be searched for just like other publications in DanBib.

Topics can often be described to ensure that they cover the content of the publication concerned reliably for a long period: content notes, terminology and classification marks can be designed with such flexibility that they still cover (to a certain extent) any changes made in publications.

Metadata in Internet publications

As mentioned above, compared with traditional cataloguing and formating many of the problems encountered in the recently revised cataloguing regulations and in danMARC 2 have been allowed for to a large extent.

The number of publications (and the volatile nature of many publications) also means that in addition to traditional cataloguing of selected publications for the national bibliography, new ideas are needed with a view to utilising all these net publications more efficiently.

One option is to ask the owner/supplier of each publication to help the process of registration by enriching his documents with metadata - i.e. by storing a limited amount of core information about each publication in the publication itself.

On an international scale The Dublin Core Element Set is an attempt to define the metadata needed to optimise access to net-borne publications.

Dublin Core originated in a concrete project at OCLC, but is now regarded as something of a "de facto" standard for the use of metadata.

Under the auspices of NORDINFO, The Nordic Metadata Project is performing a number of trials concerned with the development of facilities for the creation and utilisation of metadata. A form has been designed for this purpose based on Dublin Core, using which producers of Internet publications can quickly and easily allocate metadata in the correct html syntax.

In connection with The Nordic Metadata Project, facilities will also be developed for the conversion of Dublin Core to MARC format.

In the Indoreg project this electronic form has been translated to allow for Danish conditions (http://purl.dk/metadata), and the plan is that it should be marketed to producers of net publications. Self-registration will be used in the future work involved in publications chosen for national bibliographic registration, and metadata will also help make net searches more accurate for publications which are not enriched.

**********************
Jørgen Nielsen (jgn@dbc.dk)  16/9 1997