"Ehon Musierami" Pre-Modern Japanese Text Dataset (NIJL)

"Ehon Musierami" Pre-Modern Japanese Text Dataset (NIJL)

ROIS-DS Center for Open Data in the Humanities (CODH) is developing humanities research in the era of open science. Our work includes 'data-driven humanities' to analyze humanities resources, using state-of-the-art technology from computer science and statistics, and 'big data in the humanities' utilizing datasets created from humanities research in a trans-disciplinary context. [Read more..]

Important News

>> List of News (in Japanese)

November 11, 2019
[Call for Participation] Japanese Culture and AI Symposium 2019 - AI for Reading Kuzushiji is Now Ready!

May 1, 2019
Each character in Kuzushiji Dataset - List of Characters can now be seen in the original book. For example, by clicking one of the characters in "," you will be redirected to the original book on the Kuzushiji Recognition Viewer with a bounding box drawn by the blue lines. You can check the original context of the characters before cropping. In addition, this viewer has the functionality of single kuzuzhiji recognition so that you can try kuzushiji recognition using AI (deep learning / machine learning).

List of Datasets

Dataset of Pre-Modern Japanese Text

Pre-Modern Japanese Text, owned by National Institute of Japanese Literature, is released image and text data as open data. In addition, some text has description, transcription, and tagging data.

Dataset of Edo Cooking Recipes

Cooking books in the period of Edo, included in Dataset of Pre-Modern Japanese Text were curated to create recipe datasets through the process of transcription, translation to modern Japanese, and structuring into the recipe format.

Kuzushiji Dataset

As a by-product of transcription on Dataset of Pre-Modern Japanese Text (PMJT), shapes and coordinates of old Japanese characters (Kuzushiji) were compiled to create another dataset for training to make machines and humans smarter.

KMNIST Dataset

Adapted from Kuzushiji Dataset, KMNIST dataset is a drop-in replacement for MNIST dataset. We provide three types of datasets, namely Kuzushiji-MNIST、Kuzushiji-49、Kuzushiji-Kanji, for different purposes.

Collection of Facial Expressions

The project aims at making research infrastructure for art history research by collecting facial expressions for style compartive study from Japanese Emaki (illustrated scroll), or potentially from work of art across the globe.

Dataset of Modern Magazines

Modern magazines are digitized and released as image datasets. n2i project is working on constructing the dataset of modern documents to develop OCR for those documents.

Geoshape Repository

Geoshape repository is a data repository of releasing geometry of geographic features. It includes "Historical Municipal Boundaries Dataset Beta Version" about the historical change of municipal boundaries since 1920 and "Village Boundaries Dataset" of 2015.

List of Projects

Historical Big Data

Historical big data is a project about seamless analysis of the environment and the society from the past and the present based on various records written by humans.

Kuzushiji Challenge!

Old books in the Edo period was written by kuzushiji (old Japanese characters), but most of the modern Japanese people cannot read those characters any more. Then, can AI (artificial intelligence) read kuzushiji? We release a large-scale machine learning dataset "Kuzushiji Dataset" to the world, and promote the research and development of AI kuzushiji recognition (OCR) so that we can tackle the grand challenge of analyzing the thousand years of Japanese literate culture.

Edo+150 Projects

On November 9, 1867, the restoration of imperial rule symbolized the end of Edo Period. 150 years have passed since then, and now is the time to revive the information space of Edo, using open data about the 260 years of Edo period, and taking advantage of the state-of-the-art technologysuch as artificial intelligence (AI).

Bukan Complete Collection

The project aims at analyzing comprehensively the collection of "Bukan" books, which is the best seller through the 200 years of Edo period, and constructing core information platform about Edo period in terms of human and geospatial information about Daimyo (lords) and Shogunate government.

North China Railway Archive

A research database on North China Railway Company by linking company's promotional stock photographs with its transportation network, and studying the activities of the company from the theme and location of photographs.

Digital Silk Road

Digital humanities research project about creating digital archives of cultural heritage based on collaboration between informatics and humanities.

Memory Platform / Memorygraph

Memorygraph is a new photographic technique to create the layer of memories, and the project aims to develop the Memorygraph app to use it for field work of cultural heritage, tourism, and recovery from disasters.


A project that aims at integrating geographic information science (GIS) and natural language processing (NLP) to develop a geo-tagging system that transforms text to maps automatically.

List of Software

IIIF-based Image Delivery and Case Studies

The usage of IIIF (International Image Interoperability Framework) for image delivery in large-scale image databases ranging from humanities to natural sciences, with a long-term goal to contributing to international communities.

IIIF Curation Platform

Focusing on the concept of "curation," we build a next generation IIIF platform that is open and user-driven.

IIIF Curation Viewer

An open-source IIIF image viewer that takes advantage of IIIF Image API and IIIF Presentation API, and proposes and implements new specifications such as Curation API, Timeline API and Cursor API.

IIIF Curation Finder

An open-source IIIF Search tool for searching curations created by IIIF Curation Viewer and creating new curations by re-editing.

IIIF Curation Editor

An open-source tool for editing curations created by IIIF Curation Viewer, etc.

IIIF Curation Player

An open-source tool for playing curations created by IIIF Curation Viewer, etc.


A flask web application for storing JSON documents; with some special functions for JSON-LD.

Canvas Indexer

A flask web application that crawls Activity Streams for IIIF Canvases and offers a search API.

ICP Docker

Scripts for installing IIIF Curation Platform on a Docker environment.


Read more


Read more