This page is very much under construction. All of the corpora linked here need further investigation to determine how useful they are for which purposes. Not all may be publicly accessible without a licensing fee.
Written Japanese corpora
- Querying Internet corpora, Leeds University (includes Japanese)
- BCCWJ: Balanced Corpus of Contemporary Written Japanese (KOTONOHA)
- Tanaka Corpus of example sentences, as seen on JDIC
- Warning: These were student-constructed examples; may contain errors, may not reflect statistical patterns of spontaneous language use
Spoken Japanese corpora
- Hypermedia Corpus of Spoken Japanese (audio and video) [March 2012: Broken link?]
- Japanese Speech Corpora of Major City Dialects
- CSJ: Corpus of Spontaneous Japanese (can this corpus actually be accessed?)
- List of speech/acoustic databases in Japan