The 20th CODH seminar welcomes Gilles-Maurice de Schryver, research professor of African linguistics at Ghent University and the former president of the European Association for Lexicograhy, and David Joffe, the creator of the computer-based dictionary making system called TLex. In this talk, they will discuss ChatGPT, the latest technical innovation of the chatbot developed by OpenAI, and its impact on lexicography especially on the process of dictionary compilation, its future, and beyond.
The YouTube video of the event is available.
YouTube: Presentation 20th CODH seminar: 'On how ChatGPT can take over all of the dictionary maker's tasks'
The following is the suggested citation of the content of the event.
de Schryver, Gilles-Maurice & David Joffe. 2023. The end of lexicography, welcome to the machine: On how ChatGPT can already take over all of the dictionary maker's tasks. Paper presented at the 20th CODH Seminar, ROIS-DS Center for Open Data in the Humanities, Tokyo, Japan, 27 February 2023. http://codh.rois.ac.jp/seminar/lexicography-chatgpt-20230227/
|Title||The end of lexicography, welcome to the machine: On how ChatGPT can already take over all of the dictionary maker's tasks|
Gilles-Maurice de Schryver (Ghent University / University of Pretoria)
David Joffe (TshwaneDJe HLT)
|Date||17:30-19:00 JST (UTC+9), February 27 (Monday), 2023|
17:30-18:00 Introduction by Gilles-Maurice de Schryver
18:00-18:30 Live demonstration by David Joffe
18:30-19:00 Q & A (and probably further demos based on the prompts from the audience)
Online: Zoom Webinar
Onsite: Room 2005 at National Institute of Informatics (Access)
Registration is required for online and onsite.
With the release of ChatGPT just two months ago, the world has been in a frenzy. Few of us ever really thought that we’d see this kind of level of AI system in our lifetime, but here it is. Our standpoint is that we need to adapt and be part of the revolution. The arrival of ChatGPT (and soon its rivals) is quite a big change to society in general. As a case study, we will present its impact on lexicography. Broadly speaking, we will illustrate how ChatGPT can already be brought in for all major steps in modern lexicography: (1) corpus creation aspects; (2) dictionary compilation proper; (3) publishing to various formats; (4) dictionary sales, marketing and customer service; and (5) metalexicography.
The main focus during this talk will be on level (2). More specifically, we will illustrate how one can ask for entries in structured XML, how dictionaries may be authored by ChatGPT via our prompts, and how we can achieve more direct OpenAI GPT integration into a dictionary writing system like TLex (using their APIs). Reformulated, we will illustrate how we can feed ChatGPT a list of headwords and have it automatically create a list of articles in a sort of crude batch mode. In the process, we will show how it can also start making helper lists of meanings about words. Given that ChatGPT also seems to ‘know’ TEI (that slightly horrible XML standard) as well as TMX, TBX and XLIFF (more important formats in lexicography), it is possible to ask either to create data in this format, or given some simple instructions to rearrange data into these formats.
Apart from showcasing all this technological wizardry for the field of lexicography, we will also devote some time to the ethical implications of ‘using’ it.
At the end of it all, the Preface to the dictionary may look like this:
Gilles-Maurice de Schryver
Gilles-Maurice de Schryver (°1971) has been research professor of African Linguistics at Ghent University since 2015, and extraordinary professor at the University of Pretoria since 2014. He holds an MSc in Microelectronic Engineering (1995), as well as an MA (1999) and PhD (2005) in African Languages and Cultures. In 2002 he co-founded TshwaneDJe HLT to develop lexicographic software, and in 2006 he was a founding member of the African Language Technology group. He is the author or co-author of close to 400 books, book chapters, journal articles and conference papers, mainly on Bantu corpus linguistics and lexicography in general. His publications also include award-winning dictionaries for Northern Sotho, Zulu and Xhosa, published with Oxford University Press, as well as various online dictionaries, amongst others the most popular one for Swahili. He is a two-term past President of Afrilex (2009-2013) and past President of Euralex (2018-2021). Earlier, he also served in other capacities on the executive boards of Afrilex (2001-2009), Euralex (2006-2014), Asialex (2007-2013) and Australex (2008-2013). Most recently, he has been co-facilitating the creation of Americalex (2019-2023), and currently sits on the board of Globalex (2022-2023).
David Joffe is the co-founder and owner of TshwaneDJe HLT, and the original creator of the industry-leading TLex dictionary writing system. He has a BSc Computer Science degree (Univ. of Pretoria, South Africa) and over 20 years' software development and project management experience. He previously built, amongst others, a flight simulator visualisation system for the South African Air Force, and mining training simulators for Anglo Platinum. He currently manages the development of TLex, in use by major publishers including Oxford University Press and Pearson, as well as tlTerm (terminology management software), tlTranslate (Translation Memory software), tlCorpus (concordance software), and tlDatabase (database software).
Past CODH Seminars
20th CODH Seminar - The end of lexicography, welcome to the machine: On how ChatGPT can already take over all of the dictionary maker's tasks
19th CODH Seminar - Collective Intelligence and Creative AI: A framework for augmenting creative human expression
18th CODH Seminar - Micro Typology and Digital Archive: Case Studies on Bantu languages and Japanese-Ryukyuan languages
DH 2022 Tokyo Commemorative Lecture Series / 17th CODH Seminar - Historical Big Data - THE DARK MATTER OF HISTORY
16th CODH Seminar - Digital Archives for Cities and Towns - Historical Big Data and Usage in the Real World
15th CODH Seminar - Art History Research to be Transformed by IIIF and AI - Interpreting Japanese Painting Scrolls in Middle Ages by Style Comparative Study on Large-Scale Facial Expression Data
14th CODH Seminar - 100 Recipes for IIIF Curation Platform
13th CODH Seminar - Present and Future of Historical Big Data Research
12th CODH Seminar (Online) - AI for Culture: From Japanese Art to Anime
12th CODH Seminar - AI for Culture: From Japanese Art to Anime
11th CODH Seminar - Text Mining for Analyzing Research Communities: Sociological Topics and Socio-Technical Imaginaries
10th CODH Seminar - Document Analysis and Character Recognition
9th CODH Seminar - Computer Vision with Limited Labeled Data
8th CODH Seminar - Exploring Deep Learning for Classical Japanese Literature, Machine Creativity, and Recurrent World Models!
7th CODH Seminar - Manifold Mixup: Encouraging Meaningful On-Manifold Interpolation as a Regularizer
6th CODH Seminar - Historical Big Data - Challenges in Transforming Historical Documents to Structured Data for the Integrated Analysis of Records in the Past -
5th CODH Seminar - Trustworthy Data Repositories - Forum for Sharing Practical Information about CoreTrustSeal Certification -
4th CODH Seminar - A New Trend on Image Delivery in Digital Archives - IIIF's Potential for Standardization and Sophistication of Image Access -
3rd CODH Seminar - Usage of DOI for Humanities - Assignment of DOI for Scholarly Resources such as Research Data and Museum Collections -
2nd CODH Seminar - Old Japanese Character Challenge - Future of Machine Recognition and Human Transcription -