|
|
A great deal of thought has been put into the ACIP file nomenclature and catalog number conventions. The Project has had to plan ahead to a future where there may be many tens of thousands of files in the database, each at its own verification level, and different editions of the same texts, yet all designated with an easily identifiable, unique file name and catalog number. The basis of ACIP text file names is the Tohoku University catalogs of the Kangyur, Tengyur, and native Tibetan works. Whenever a work has already been assigned a number in one of these catalogs, this becomes the basis of the file name. When a work has not been assigned such a number (normally for a text in the Sungbum Collection), it receives the next available ACIP number. No attempt is made to assure that even successive titles in the collected works of a single author receive successive catalog numbers, since this becomes a moot point once we are released from the limitations of a printed paper catalog. All the catalog entries are loaded into the AsiaView catalog database, and can be sorted on author with a single click of the mouse button. The "Volume/section" field in the database then tells you where the work was originally located within a particular volume, even if the volume was not sequentially paginated. Technically speaking, ACIP file nomenclature can be divided into "file name" (the part of the entire file identifier that appears before the period) and "extension" (the part appearing after). ACIP File Names The core of an ACIP file name is a 4-digit number, either from the Tohoku catalog or, where not in this catalog, the ACIP catalog number. Since Tohoku numbers for the Kangyur and Tengyur Collections go from 1 to 4569, these numbers can never be confused with a Tohoku Sungbum work, and so ACIP will also be using these numbers for other collections. For example, there will be separate works under K0001, S0001, and R0001, even though only the first refers to a Tohoku number. The very first position, and often the second position, of the file name is a letter, which at the present time represents one of the following: G = Image from the ACIP Graphics Collection GS = Separate series of "G" numbers for the seals that appear GSP = St. Petersburg seals in PCX graphics format GM = Graphics file in Macintosh format GP = Graphics file in PCX graphics format GT = Graphics file in TIF graphics format K = Work from the Kangyur Collection KD = Work from the Derge edition of the Kangyur; please note that, at the beginning of the project, we sometimes utilized the "Delhi" version of the Derge edition of the Kangyur published by His Holiness the Karmapa and included in the Library of Congress PL480 Program; this was later found to contain many errors introduced during the printing process, and we later moved to the Lhasa edition. KL = Work from the Lhasa edition of the Kangyur KX = Work from the Kangyur that has been published by itself as a separate text (such as a local monastery’s edition of the Diamond-Cutter Sutra); edition information is then included in the "Notes" field of the ACIP Master Catalog entry R = Work from the Reference Materials Collection (including Sanskrit Study Tools) S = Work from the Sungbum Collection S followed by any other letter = specific edition or typing of a work from the Sungbum, as explained further in the "Notes" field in the ACIP Master Catalog entry for the particular work T = Work from the Tengyur Collection TD = Work from the Derge edition of the Tengyur; please note that, at the beginning of the project, we utilized the "Delhi" version of the Derge edition of the Tengyur published by His Holiness the Karmapa and included in the Library of Congress PL480 Program; this was later found to contain many errors introduced in the printing process, and we later obtained an original Derge blockprint for input. TS = Work from the Serdri edition of the Tengyur Thus the initial letter indicates the primary collection in which a work is found. Please note though that some works may occur in more than one collection; for example, the gsan-yig or "record of teachings received" for any particular Lama will be given an initial "S" letter since it appears as a part of his or her collected works, but may still also appear in the "Reference" collectionÑstill with the initial "S"Ñdue its unique value as a reference work. Please note also that native catalogs which are printed as an attached volume in any particular publication of the Kangyur or Tengyur are given "K" and "T" initial letters, but only to indicate their source, and not to identify them as an actual work within these collections. After these one or two letters come the four digits of the core catalog number. For texts, this is then normally followed by a "status" letter, which gives a quick indication of the editing or verification level of the text. These status levels, at present, are A = first typing of a text B = second typing of the same text C = result of first automated comparison of the "A" and "B" typings D = result of second automated comparison of the "A" and "B" typings E = result of third automated comparison, of the "C" and "D" comparisons F = expert manual proofreading, normally for hardcopy publication G,H = text converted to various older, proprietary Tibetan-script formats for publishing I with an "INC" extension = text is incomplete, in the sense of lacking 4 or more lines, but the complete part is up to "E" level I with an "ACT" or similar extension = "E" level text with 3 or fewer lines missing in the entire text L = "E" level text which has gone through automated page and line statistical checks successfully, and has no incomplete sections at all M = "L" level text that has passed through automated updates and checks for common typing errors N = "F" level text that has gone through same checks as an "M" level text These status letters can sometimes be followed by another number or letter. Numbers indicate multiple volumes, or sections of a very large work. When these numbers reach the number of digits that would make the file name longer than eight positions (many of our users are still working under operating systems that cannot handle longer file names), we begin to use letters to represent tens: for example, S5977MA1 is the 101st title in a compendium that has been verified to an "M" level, and which was only given a single catalog number by the authors of the Tohoku catalog. When the letter "P" appears at the very end of a file name, this indicates that we have purposely only typed in a portion of the text, such as the catalog only in a long work which contains both a catalog and an extended historical discussion. This is to be distinguished from an "incomplete" text, which we are normally hurrying to complete, but have released as incomplete so that scholars can utilize the amount of data already finished. ACIP File Extensions The following extensions may be found after the dot in the ACIP file nomenclature: ACE = English-language text approved for release by Asian Classics Input Project ACM = Text with mixed languages, and approved for release by ACIP ACS = Sanskrit-language text approved for release by ACIP ACT = Tibetan-language text approved for release by ACIP INC = Text lacking 4 or more lines GIF = Graphics file in GIF format MAC = Graphics file in Macintosh format PCX = Graphics file in PCX format PDF = File in a format readable by Acrobat viewer RAW = A now obsolete extension once used to represent texts released by popular demand but not yet manually proofread RTF = Text (most often Tibetan-letter text) in RTF format TIF = Graphics file in TIF format ACIP Catalog Numbers This is how a text is called up in the AsiaView program. The title of the text appears in a menu list, followed by its catalog number in parentheses. This catalog number is distinguished from the full file name in that it only identifies the text, and does not give additional information as to verification level, edition version, language, and so on. The AsiaView program then goes to the ACIP Master Catalog, obtains the file name that is currently listed for that catalog entry (which is the one with the highest verification level), and then calls it up for viewing or searching. As a final note, there are currently three special marks that may follow the catalog number; these have the following meanings: # (number sign) = text is copyrighted, and not available for public release; it has been input for in-house use only ^ (caret) = all of the text, or incomplete parts of it, are currently on order and can be expected soon * (asterisk) = text is considered restricted by tradition and will be released only to qualified individuals upon written request The catalog number of a text will not change over the years, but the file name may change, as the text moves up in verification levels. Catalog numbers at present only give the initial "Collection" letter(s), the core Tohoku or ACIP 4-digit ID number, and then Search and Download Texts | Input Code & Transcription Standards Download Tibetan Fonts | Standards for Tibetan and Sanskrit Pronunciation File Nomenclature and Number | Conventions for the St. Petersburg Catalog Structure of the ACIP Database | The ACIP Master Catalog Text Verification Procedures | Download the ACIP Release IV User Manual |
|
| ||