|
|
The text then goes to the first automated verification operator. ACIP has developed a special comparison program that reads through both of the two initial typings at the same time, alerting the operator any time a single letter position of the two versions does not match. The verification operator chooses one of the two versions, or else types in a different manual correction, and this is written to a third file, which becomes the first "compared" file. Now a second automated verification operator repeats this process on the first two typings again. This results in a second "compared" file. Finally, a fifth operator comes and compares the two "compared" files, producing a final overseas version which is, in theory, a completely true representation of the original Tibetan text. What if the original text itself has mistakes? ACIP has a strict policy that input operators and automated verification staff are not allowed to correct even what appears to be an obvious error in the original text. This is because many such "obvious" errors may not be errors at all, but rather just a relatively rare word. Operators are encouraged though to make a mark in the text to alert proofreaders at a later stage that there may be an error, and that the reading should be checked more carefully. Just inputting what is written in the original texts, without attempting any editing or corrections, will require some 150 to 200 years for the single tradition we have focused upon: the lineage of great ideas from ancient India through the Dalai Lamas, and down to the present day. The Project does not have the staff or financial resources to undertake substantial editing work at this time, and we seek only to "capture" the texts as they now appear in the wood-block editions. Unless the Project is endowed at a substantial level in the near future, this final editing workÑwhich to be done with total accuracy requires a scholar possessing two to three decades of intense trainingÑwill be left to future generations. As ACIP staff use a text, however, we come across readings that we judge to be surely or possibly mistaken. In these cases, the original reading is left untouched, and the suggested correction is placed next to it within square brackets; if the reading is ambiguous and the suggested correction is not certain, we follow it with a question mark within the closing bracket. Whenever the text itself is torn, faded, or badly smudged, the input operator makes his best attempt at interpreting it and then marks the area with a question mark placed in brackets. Please note that the AsiaView program has a setting in its search screens (called "Text only", as opposed to "Formatted text") which tells the program to ignore the material in brackets, so that a string of syllables is not interrupted and missed by the search program. In the near future, ACIP intends to reverse this convention, and to put "certain" corrections in the text, followed by the original, mistaken reading in square brackets, preceded by a < sign. This will signify that the corrected reading has replaced the original. It is extremely risky even for very qualified, native scholars to attempt to correct the wide range of variant readings that appear in Tibetan texts, covering as they do many centuries of literature, styles, and spellings. Again, we see our primary task simply as capturing the current state of each text. Once the text has gone through the dual-entry and triple comparison process described above, it is normally shipped to the US head-quarters of the Project for further statistical and spelling verification, and concatenation into discrete texts. Special programs check the page and line information, run statistical average analyses of line lengths, perform basic spell-checking, and check for common entry errors. These are then manually corrected, by ordering replacement pages from the overseas entry centers if needed. The extremely erratic electrical supply common at most ACIP refugee input centers (many of the centers work off diesel engines all or some of the day) means that "brownouts" are common. During a brownout, computer data is often corrupted, unbeknownst to the operators. The final statistical verification programs also check for "illegal" or non-standard characters for Tibetan texts, and then these are noted and re-ordered. Since the remaining, uncorrupted text (which for example could be 999 lines out of a thousand-line text) can still be of immense research value to scholars, ACIP does release such files, clearly marked "incomplete," on its various releases. When a particular text is needed immediately for the use of native Tibetan monasteries, translation projects and similar institutions, it is put through a manual proofreading process, usually by one of a very small handful of truly qualified senior Lamas available. The result of this proofreading is upgraded to a higher verification level. Search and Download Texts | Input Code & Transcription Standards Download Tibetan Fonts | Standards for Tibetan and Sanskrit Pronunciation File Nomenclature and Number | Conventions for the St. Petersburg Catalog Structure of the ACIP Database | The ACIP Master Catalog Text Verification Procedures | Download the ACIP Release IV User Manual |
|
| ||