Corpws Hanesyddol yr Iaith Gymraeg 1500-1850 / Historical Corpus of the Welsh Language 1500-1850. Around 500,000 words of Welsh from the sixteenth to nineteenth centuries.
Text of the Middle Breton Life of St Barbara (Buhez Sante Barba) (Le mystère de sainte Barbe, tragédie bretonne, texte de 1557, publié avec traduction française, introduction et dictionnaire étymologique du breton moyen, ed. Emile Ernault (Paris: Ernest Thorin, 1888). (Opens text file)
Icelandic Parsed Historical Corpus (IcePaHC). Currently a preview of around 31,000 words from the twelfth and the nineteenth centuries, parsed using a system compatible with CorpusSearch. It ultimately aims to contain one million words from the twelfth to the nineteenth centuries.
*Corpus of Middle English Prose and Verse. Allows lexical and collocational searches of 146 Middle English texts via an online search interface.
*Penn Parsed Corpora of Historical English. Three parsed corpora searchable with CorpusSearch:
PPCME2 = Penn-Helsinki Parsed Corpus of Middle English, Second edition, 55 text samples, 1.2 million words, parsed, based largely on the Middle English section of the Helsinki Corpus (1150-1500).
PPCEME = Penn-Helsinki Parsed Corpus of Early Modern English, 1.7 million words, including 570,000 words from the Early Modern English of the Helsinki Corpus, parsed (1500-1710).
PPCMBE = Penn Parsed Corpus of Modern British English, just under 1 million words of British English from 1700 to 1914, parsed.
*Parsed Corpus of Early English Correspondence. 2.2 million words from 4970 letters from c. 1410 to 1695, parsed. Searchable with CorpusSearch. Freely available via the Oxford Text Archive (approval needed).
*York-Toronto-Helsinki Parsed Corpus of Old English Prose. A 1.5-million-word syntactically annotated corpus of Old English prose texts.
Corpus of Late Modern English Texts (extended version). 15 million words of British English (unparsed) from 176 texts from 1710 to 1920.
*All these corpora have two (separate) tagging systems: part-of-speech (.pos) tagging and syntactic annotation (.psd) and can be interrogated using CorpusSearch.
The MCVF (Modéliser le changement: les voies du français) corpus. Parsed corpus of Old, Middle and Modern French up to the end of the eighteenth century, theoretically downloadable from the website and searchable using CorpusSearch.
Tycho Brahe Parsed Corpus of Historical Portuguese. 52 Portuguese texts (2.4 million words) by authors born between 1380 and 1845. Only some texts are parsed: 28 have part-of-speech tagging and 8 have syntactic annotation.
Croatian Language Online Repository (Institute of Croatian Language and Linguistics)
Izbornyk (electronic library of Ukrainian literature, including a number of Old East Slavic texts)
If any of these links don't work, please let me know.