Datasets

List of Datasets

Dataset of Pre-Modern Japanese Text

Pre-Modern Japanese Text, owned by National Institute of Japanese Literature, is released image and text data as open data. In addition, some text has description, transcription, and tagging data.

Kuzushiji Dataset

As a by-product of transcription on Dataset of Pre-Modern Japanese Text (PMJT), shapes and coordinates of old Japanese characters (Kuzushiji) were compiled to create another dataset for training to make machines and humans smarter.

Dataset of Edo Cooking Recipes

Cooking books in the period of Edo, included in Dataset of Pre-Modern Japanese Text were curated to create recipe datasets through the process of transcription, translation to modern Japanese, and structuring into the recipe format.

Bukan Complete Collection

The project aims at analyzing comprehensively the collection of "Bukan" books, which is the best seller through the 200 years of Edo period, and constructing core information platform about Edo period in terms of human and geospatial information about Daimyo (lords) and Shogunate government.

Collection of Facial Expressions

The project aims at making research infrastructure for art history research by collecting facial expressions for style compartive study from Japanese Emaki (illustrated scroll), or potentially from work of art across the globe.

Dataset of Modern Magazines

Modern magazines are digitized and released as image datasets. n2i project is working on constructing the dataset of modern documents to develop OCR for those documents.

Geoshape Repository

Geoshape repository is a data repository of releasing geometry of geographic features. It includes "Historical Administrative Boundaries Dataset Beta Version," that shows the historical change of administrative boundaries since 1920.