First 1000 Years of Greek

XML files for the works in the First Thousand Years of Greek Project

This project is maintained by OpenGreekAndLatin

Welcome to the First One-Thousand Years of Greek Project

The goal of this project is to collect at least one edition of every Greek work composed between Homer and 250CE with a focus on texts that do not already exist in the Perseus Digital Library. So, e.g., neither Thucydides nor the text of the New Testament are here because both of these texts are already in Perseus (http://www.perseus.tufts.edu/hopper/). The TEI XML versions of the Perseus Greek texts (c. 10 million words) are available at https://github.com/PerseusDL/canonical-greekLit, where they are being revised (upgrading to epiDoc compliant P5 TEI XML) and reorganized to be more readily CTS compliant. This project has been generously funded by the Harvard Library Arcadia Fund and produced in an international cooperation with the Center for Hellenic Studies, the Harvard Library, Mount Alison University, Tufts University, the University of Leipzig, and the University of Virginia.

All the works in the repository for which we have added metadata are listed below, organized by author, with links to the individual files. Note that all of these files are 100% CTS-compliant. If you see any problems with this list, please start an issue on the main repository page. At this time, the repository contains 18,425,590 words of CTS-compliant texts, primarily in Greek, with c. 4 million words currently being corrected and converted to epiDoc-compliant TEI XML. When these remaining texts and the Perseus collection are added, the amount of CC-licensed TEI XML Greek available on GitHub will exceed 30 million words.

The list below also includes the unique identifiers that we use for every author, work, and edition. We use standard identifiers to name our texts, including references to the numbers adopted by the canons of the TLG and (for Latin) PHI. The final element in the URN identifies the edition. See the TEI headers of the individual files to find all information about the origin of the file.

How to download individual files from this page
The small table under the description of each edition, translation, or commentary contains three different links. The first, Read Online, is a link to this text on the CTS reading environment at the University of Leipzig, Germany. The second, XML Source, is a link back to the text in the Github repository. And the third, Annotated, leads an XML file that contains a tokenized, lemmatized, and POS-tagged text of this work. Clicking on the either the second or the third link will take you to the file in the appropriate repository. If you wish to download the TEI-XML of the file, you should instead right click on the link and choose "Save link as..."

Word Counts by Source:
University of Leipzig: 12,209,736
A Digital Corpus for Graeco-Arabic Studies: 3,204,490
Perseus Project: 1,561,008
Harvard College Library: 1,499,119
Coptic SCRIPTORIUM: 1,241
Unknown: 828

Word Counts by Funder:
European Social Fund Saxony: 12,129,750
Andrew W. Mellon Foundation: 3,208,109
Harvard Library Arcadia Fund: 1,230,624
A Google Digital Humanities Award: 1,023,878
Unknown: 787,529
Google Digital Humanities Award: 54,004
Tufts University: 42,528