20th CODH Seminar
The end of lexicography, welcome to the machine: On how ChatGPT can already take over all of the dictionary maker's tasks

Theme

The 20th CODH seminar welcomes Gilles-Maurice de Schryver, research professor of African linguistics at Ghent University and the former president of the European Association for Lexicograhy, and David Joffe, the creator of the computer-based dictionary making system called TLex. In this talk, they will discuss ChatGPT, the latest technical innovation of the chatbot developed by OpenAI, and its impact on lexicography especially on the process of dictionary compilation, its future, and beyond.

Archives

The YouTube video of the event is available at ROIS-DS CODH YouTube Channel

The following is the suggested citation of the content of the event.

de Schryver, Gilles-Maurice & David Joffe. 2023. The end of lexicography, welcome to the machine: On how ChatGPT can already take over all of the dictionary maker's tasks. Paper presented at the 20th CODH Seminar, ROIS-DS Center for Open Data in the Humanities, Tokyo, Japan, 27 February 2023. http://codh.rois.ac.jp/seminar/lexicography-chatgpt-20230227/

Program

Title The end of lexicography, welcome to the machine: On how ChatGPT can already take over all of the dictionary maker's tasks
Speaker Gilles-Maurice de Schryver (Ghent University / University of Pretoria)
David Joffe (TshwaneDJe HLT)
Date 17:30-19:00 JST (UTC+9), February 27 (Monday), 2023
Schedule 17:30-18:00 Introduction by Gilles-Maurice de Schryver
18:00-18:30 Live demonstration by David Joffe
18:30-19:00 Q & A (and probably further demos based on the prompts from the audience)
Venue Online: Zoom Webinar
Onsite: Room 2005 at National Institute of Informatics (Access)
Registration is required for online and onsite.
Language English
講演は英語のみで、日本語通訳はありません。
Co-Host
  • JSPS Bilateral programs (ILCAA-BantUGent): 'The Past and Present of Bantu Languages: Integrating Micro-Typology, Historical-Comparative Linguistics and Lexicography'
  • 二国間交流事業(ベルギー・ゲント大学との共同研究)「バントゥ諸語の過去と現在:ミクロ類型論、歴史比較言語学、辞書学の統合による新展開」

Abstract

With the release of ChatGPT just two months ago, the world has been in a frenzy. Few of us ever really thought that we’d see this kind of level of AI system in our lifetime, but here it is. Our standpoint is that we need to adapt and be part of the revolution. The arrival of ChatGPT (and soon its rivals) is quite a big change to society in general. As a case study, we will present its impact on lexicography. Broadly speaking, we will illustrate how ChatGPT can already be brought in for all major steps in modern lexicography: (1) corpus creation aspects; (2) dictionary compilation proper; (3) publishing to various formats; (4) dictionary sales, marketing and customer service; and (5) metalexicography.

The main focus during this talk will be on level (2). More specifically, we will illustrate how one can ask for entries in structured XML, how dictionaries may be authored by ChatGPT via our prompts, and how we can achieve more direct OpenAI GPT integration into a dictionary writing system like TLex (using their APIs). Reformulated, we will illustrate how we can feed ChatGPT a list of headwords and have it automatically create a list of articles in a sort of crude batch mode. In the process, we will show how it can also start making helper lists of meanings about words. Given that ChatGPT also seems to ‘know’ TEI (that slightly horrible XML standard) as well as TMX, TBX and XLIFF (more important formats in lexicography), it is possible to ask either to create data in this format, or given some simple instructions to rearrange data into these formats.

Apart from showcasing all this technological wizardry for the field of lexicography, we will also devote some time to the ethical implications of ‘using’ it.

At the end of it all, the Preface to the dictionary may look like this:

Bio

Gilles-Maurice de Schryver

Gilles-Maurice de Schryver (°1971) has been research professor of African Linguistics at Ghent University since 2015, and extraordinary professor at the University of Pretoria since 2014. He holds an MSc in Microelectronic Engineering (1995), as well as an MA (1999) and PhD (2005) in African Languages and Cultures. In 2002 he co-founded TshwaneDJe HLT to develop lexicographic software, and in 2006 he was a founding member of the African Language Technology group. He is the author or co-author of close to 400 books, book chapters, journal articles and conference papers, mainly on Bantu corpus linguistics and lexicography in general. His publications also include award-winning dictionaries for Northern Sotho, Zulu and Xhosa, published with Oxford University Press, as well as various online dictionaries, amongst others the most popular one for Swahili. He is a two-term past President of Afrilex (2009-2013) and past President of Euralex (2018-2021). Earlier, he also served in other capacities on the executive boards of Afrilex (2001-2009), Euralex (2006-2014), Asialex (2007-2013) and Australex (2008-2013). Most recently, he has been co-facilitating the creation of Americalex (2019-2023), and currently sits on the board of Globalex (2022-2023).

David Joffe

David Joffe is the co-founder and owner of TshwaneDJe HLT, and the original creator of the industry-leading TLex dictionary writing system. He has a BSc Computer Science degree (Univ. of Pretoria, South Africa) and over 20 years' software development and project management experience. He previously built, amongst others, a flight simulator visualisation system for the South African Air Force, and mining training simulators for Anglo Platinum. He currently manages the development of TLex, in use by major publishers including Oxford University Press and Pearson, as well as tlTerm (terminology management software), tlTranslate (Translation Memory software), tlCorpus (concordance software), and tlDatabase (database software).

Registration

Past CODH Seminars

2024-06-06

22th CODH Seminar - Hentaigana in the Digital Age: The Inheritance and New Developments of the Japanese Written Character Culture

2024-03-04

21th CODH Seminar - Digital History: Concepts and Practices

2023-02-27

20th CODH Seminar - The end of lexicography, welcome to the machine: On how ChatGPT can already take over all of the dictionary maker's tasks

2023-03-01

19th CODH Seminar - Collective Intelligence and Creative AI: A framework for augmenting creative human expression

2023-01-22

18th CODH Seminar - Micro Typology and Digital Archive: Case Studies on Bantu languages and Japanese-Ryukyuan languages

2022-07-01

DH 2022 Tokyo Commemorative Lecture Series / 17th CODH Seminar - Historical Big Data - THE DARK MATTER OF HISTORY

2022-03-28

16th CODH Seminar - Digital Archives for Cities and Towns - Historical Big Data and Usage in the Real World

2021-07-29

15th CODH Seminar - Art History Research to be Transformed by IIIF and AI - Interpreting Japanese Painting Scrolls in Middle Ages by Style Comparative Study on Large-Scale Facial Expression Data

2021-02-18

14th CODH Seminar - 100 Recipes for IIIF Curation Platform

2021-01-22

13th CODH Seminar - Present and Future of Historical Big Data Research

2020-08-05

12th CODH Seminar (Online) - AI for Culture: From Japanese Art to Anime

2020-02-21

12th CODH Seminar - AI for Culture: From Japanese Art to Anime

2019-09-25

11th CODH Seminar - Text Mining for Analyzing Research Communities: Sociological Topics and Socio-Technical Imaginaries

2019-03-11

10th CODH Seminar - Document Analysis and Character Recognition

2019-01-08

9th CODH Seminar - Computer Vision with Limited Labeled Data

2018-11-22

8th CODH Seminar - Exploring Deep Learning for Classical Japanese Literature, Machine Creativity, and Recurrent World Models!

2018-07-31

7th CODH Seminar - Manifold Mixup: Encouraging Meaningful On-Manifold Interpolation as a Regularizer

2018-03-12

6th CODH Seminar - Historical Big Data - Challenges in Transforming Historical Documents to Structured Data for the Integrated Analysis of Records in the Past -

2017-12-04

5th CODH Seminar - Trustworthy Data Repositories - Forum for Sharing Practical Information about CoreTrustSeal Certification -

2017-07-27

4th CODH Seminar - A New Trend on Image Delivery in Digital Archives - IIIF's Potential for Standardization and Sophistication of Image Access -

2017-05-30

3rd CODH Seminar - Usage of DOI for Humanities - Assignment of DOI for Scholarly Resources such as Research Data and Museum Collections -

2017-02-10

2nd CODH Seminar - Old Japanese Character Challenge - Future of Machine Recognition and Human Transcription -

2017-01-23

1st CODH Seminar - Big Data and Digital Humanities