KaoKore Dataset

KaoKore Dataset is a dataset derived from Collection of Facial Expressions and contains facial expression images cropped from Japanese artworks, such as picture scrolls (絵巻物, Emakimono) and picture books (絵本, Ehon), in a format convenient for machine learning. Please refer to Citation, License and Guidelines before using the dataset.


KaoKore Dataset is the collection of location information and associated metadata of facial expression images cropped from Japanese picture scrolls and picture books publicly available from multiple organizations using IIIF (International Image Interoperability Framework). As of February 2020, the dataset contains 5,552 images, and the content of the dataset is as follows.

  1. Location information (URL) of facial expression images (256x256 size each), as a text file
  2. Attribute information (metadata) annotated by experts, as a text file
  3. Labels and splits for machine learning, as a CSV file

Difference from the original KaoKore is that the size of the image is normalized to 256x256, and the format of data is converted for machine learning.

The following GitHub repository offers KaoKore dataset, together with scripts for downloading and pre-processing face images.

GitHub: rois-codh/kaokore: Dataset for the Collection of Facial Expressions from Japanese Artwork


Please consider citing the following paper when you publish research results using KaoKore Dataset.


KaoKore Dataset is created from images publicly available from multiple organizations, and licensed under a Creative Commons Attribution Share-Alike 4.0 International License.

You must show the attribution, the name of the dataset and the DOI, in a publication that uses this dataset. The following is an example of the attribution.

“KaoKore Dataset” (collected by CODH from multiple organizations), doi:10.20676/00000353.

It is not compulsory to show all content providers, because multiple organizations are involved, but please include the "List of content providers" file into your distribution package when you distribute a new dataset or software derived from KaoKore dataset,

Download: List of content providers

The following is the list of content providers.

  1. Dataset of Pre-Modern Japanese Text (National Institute of Japanese Literature and ROIS-DS Center for Open Data in the Humanities)
  2. Digital Collection of Keio University Libraries (Keio University Media Center)
  3. Kyoto University Rare Materials Digital Archive (Kyoto University Library Network)

When you modify and redistribute the content, any changes made to the original should be clearly indicated. You should label the work to show you have changed it, so that other users know who made the changes.

Usage Guidelines

KaoKore Dataset (this dataset) provides the collection of facial expression images cropped from Japanese painting scrolls and painting books publicly available from multiple organizations. Please follow the guidelines when you use this dataset.

  1. This dataset may contain entities that are respected in religion, ideology, or for other reasons. Please respect diverse values and avoid degrading the respected subject.
  2. Please respect the original works, creators and providers of this dataset. We believe that a proper credit to the contribution of creators and providers is essential to promote open data movement.
  3. Public Domain Usage Guidelines - Europeana Collections is also useful as the guidelines of public domain works.

Guidelines are based on goodwill. They are not legal contract.

The Purpose and Goal of the Dataset

"Collection of Facial Expressions", which is the origin of this dataset was designed for usage by art history researchers, such as faceting, viewing individual images and comparing list of images by human eyes. At the same time, for the machine learning community, facial expressions in picture scrolls and books also potentially interesting datasets. Machine learning is expected to broaden potential of research and activities, such as detecting faces automatically from unexamined works for rapid assessment of subject and artistic style, or generating faces automatically for creative activities.

Hence we decided to release a new dataset "KaoKore dataset" designed for usage in the machine learning community. Relationship between KaoKore and KaoKore dataset is similar to that between Kuzushiji Dataset and Kuzushiji-MNIST. By converting the format of the dataset, we expect a wider adoption of the dataset to become resources for starting new research.



KaoKore Dataset was released.