This page is very much under construction. All of the corpora linked here need further investigation to determine how useful they are for which purposes. Not all may be publicly accessible without a licensing fee.
Written Japanese corpora
- Querying Internet corpora, Leeds University (includes Japanese)
- BCCWJ: Balanced Corpus of Contemporary Written Japanese (KOTONOHA)
- Tanaka Corpus of example sentences, as seen on WWWJDIC
- Warning: These were student-constructed examples; may contain errors, may not reflect statistical patterns of spontaneous language use
Spoken Japanese corpora