DCV - Introduction into the Digital resources

Introduction into the Digital resources

Reliability, Readability, Operationalibility

The text of a work may be attested in a printed edition, a manuscript or a digital edition. Different attestations of the same work may present variations of the text and different information about the text. A prominent quality of a "good attestation" is the reliablity of the presented text.

Digital resources of a particular attestation consist in digital images ("human readable digital resources") or in digital text files ("machine readable digital resources").

Digital images are scans or photographs of printed editions or manuscripts. Different digital images of the same attestation may vary according to their visual quality, but they are intened to be read by humans. They can hardly read by machines, i.e., the text of an attestation that is depicted in a digital image, can hardly be extracted by computer programs. The criterion for the quality of digital images is readability for humans.

Digital text files are intended to be operated with the help of computer programs. Different digital text files of the same attestation may come in various types and the criteria for the quality of digital text files depend on the purposes of their compliation. But a prominent quality is certainly the operationalibilty by machines.

Digital text files contain (A) the text of an attestation of a work ("data") and (B) information about the text ("metadata"). The resources may vary according to (1) the correspondence of text of the file with the text of the digitized source, (2) the demarcation and the structure of the contained metadata and (3) the kind of the contained metadata.

Types of digital text files

The basic type consists of (1) the text of a particular attestation, (2) an indication of the digitized source and (3) page and line references to this source. The best known repository of such digital text files of Sanskrit works is the Göttingen Register of Electronic Texts in Indian Languages (GRETIL) .

A informed human reader of such basic digital text files can easily distinguish between the text of the attestation ("data") and information about the attestation ("metadata"), such as its source or the page and line references to it. In advanced digital text files this distinction is made explicit for machines by a demarcation of the metadata, mostly by the use of the "Extensible Markup Language" (xml). The metadata provided in such digital text files may range from the markup of simple editorial features (like text highlighted in bold) up to the inclusion of additional information on various aspects of the text (such as notes of the editior, variants from other attestation, etc.). The Digital Corpus of Sanskrit (DCS) is an example for a project that operates with digital text files, which are enriched by information on various linguistic aspects of the text.

digitized texts of printed editions for search,

human/machine readable digital resources,

resource file, derivations

resource - tag view,

derivations - full: browser, simple: gretil-artig, plain: search;

full view hat wieder derivate, nämlich die Auszüge etc.