Using AI to turn the Global Language Archive into the “Louvre of Languages”
In 2012, I proposed the idea of creating a Global Language Archive. This was roughly the same time that the Endangered Languages Project was getting kicked off, which I’ll discuss in more detail below. But I always viewed a Global Language Archive as being much more than an online effort.
The world is losing languages at a rapid clip. Over 500 languages have less than 10 people still speaking them and many of these native speakers are losing the will to struggle forward to keep them viable.
While I understand the conflicted feelings of people loosing their heritage, I also had people telling me that the world would become a far easier place with fewer languages. For me, this was a difficult problem to resolve.
Logically, the world would be a simpler place if we had fewer languages to deal with, yet loosing a language meant having zero record of all the fascinating communities of people who helped build the world around us.
I had many question.
Is saving languages really necessary? Much like animals going extinct, isn’t it just nature’s way? How will the world be a better place 100 years from now if most of our 7,000 languages survive? And what exactly does archiving a language mean?
It took me several months, but these are some of the conclusions I came to.
To begin with, our society today, including all of our social, legal, and governmental systems, has been built on the backs of literally billions of our ancestors, all struggling to create a better place for future generations. Language has been a central ingredient in forming our heritage, modern culture, and even our way of thinking.
We are the prime beneficiaries of the struggles, so in many ways we owe it to them to somehow preserve their legacy.
While we seldom consider it, most of history’s greatest stories have never been recorded, happening among people who left no recorded version of it.
Even though language is our greatest tool, used to accomplish our greatest achievements, it’s also been a huge obstacle, blocking our understanding of what truly happened.
Our words are a way of expressing of our emotions, kindness, and love. They’re a tool for business and give of a reason to persevere. They describe our fears, our intentions, and offer hope for our daily struggles.
Without having a language-level understanding of our past, we struggle to understand who we are today.
How can humanity possibly know where it’s going if we don’t know where we’ve come from?
The purpose of the Global Language Archive is to preserve the legacy of those who have gone before us, through the languages they used to communicate with.
But it needs to be far more than a dusty old museum filled with past recording of native speakers. It needs to be a “living museum.”
Perhaps the most important reason for developing the kind of “living museum” that I’ll describe below, are the unknown things we’ll discover once we create it. We should think of it as a never-ending work site for future discoveries.
“How can humanity possibly know where it’s going if we don’t know where we’ve come from?”
What does it mean to archive a language?
Language is far more than the verbal sounds that come from our mouths. It’s a combination of facial expressions, intonations, gestures, symbols, postures, and body language used to convey the intellectual concepts, verbal syntax, and emotionally values involved in basic human-to-human communications.
In general, the minimum requirements for archiving a language is sufficient evidence of past forms of communication for an AI (artificially intelligent) Language Recreation Engine to sufficiently reassemble a functional language that can be taught to others.
Inputs will involve the collection of sufficient video, audio, and written documents for an AI Language Recreation Engine to generate a functional three-dimensional avatar capable of teaching the language to someone wanting to learn it.
While there is currently no such form of AI in existence, there is growing evidence that a language recreation engine is not only possible, but also likely to be developed soon.
Taking it a couple steps further, not only will this give us the ability to recreate the language but it will likely enable us to “fill in the gaps” and find missing words, create a written language if none exists, and do seamless translation from one language to the next.
For this reason, the process of archiving a language will involve the accumulation of sufficient remnants of a failing language so the AI Engine can take over. Each language collection will include sufficient fragments of written and spoken words, definitions, common phrases, expressions, explanations, and value systems to begin the process.
Since most people can gain a function level of language proficiency with roughly 2,500 of the most common words, I’m estimating that will be the approximate range of words needed to begin the process.
If possible, the archive for each language will involve far more comprehensive collections that attempt to capture the lifestyles, cultures, and routines involved in normal day-to-day living and communication.
Collections will include whatever is available including such things as artwork, books, music, pieces of clothing, photographs, weapons, cookware, maps, videos, and more. These will of course vary from one language to the next.
The loneliest books in the world are those written in languages that no longer exist. Yet these books hold clues to an unknown history filled with unknown value and importance that cannot yet be expressed.
The greatest moments in human history were never recorded in any traditional fashion, and are currently inaccessible to modern people.
The Endangered Languages Project
When I first started talking about a Global Language Archive, another effort was taking shape.
The Endangered Languages Project has so far collected information on 3,410 languages. Its purpose is to be a worldwide collaboration between Indigenous language organizations, linguists, institutions of higher education, and key industry partners to strengthen endangered languages.
At the heart of their project is a website that was launched in June 2012 with funding from Google.
While Google oversaw the development and launch of the website, their long term goal was for it to be led by true experts in the field of language preservation.
For this reason, the project is now managed by First Peoples’ Cultural Council and the Endangered Languages Catalogue/Endangered Languages Project (ELCat/ELP) team at University of Hawaiʻi at Mānoa in coordination with the Governance Council.
In the words of the Endangered Languages Project:
“Humanity today is facing a massive extinction: languages are disappearing at an unprecedented pace. And when that happens, a unique vision of the world is lost. With every language that dies we lose an enormous cultural heritage; the understanding of how humans relate to the world around us; scientific, medical and botanical knowledge; and most importantly, we lose the expression of communities’ humor, love and life. In short, we lose the testimony of centuries of life.
Languages are entities that are alive and in constant flux, and their extinction is not new; however, the pace at which languages are disappearing today has no precedent and is alarming. Over 40 percent of the world’s approximate 7,000 languages are at risk of disappearing. But today we have tools and technology at our fingertips that could become a game changer.”
Users of the Endangered Languages Project website play an active role in putting their languages online by submitting information or samples in the form of text, audio, links or video files. Once uploaded to the website, users can tag their submissions by resource category to ensure they are easily searchable.
The Endangered Languages Project serves as a great first step, setting the stage for some far greater opportunities ahead. But several other resources like Wikipedia, National Geographic, Global Oneness Project, UNESCO, and many more are attempting to draw attention to this problem in their own way.
A few examples of endangered languages
The Endangered Languages Project puts technology in the hands of organizations and individuals working to revive struggling languages and save themselves from extinction.
Some had developed hundreds of words for beads, fish, leathers, and snow because those had become focal points of daily living. Here are a few examples:
- Voro has fewer than 50,000 native speakers and is spoken in the southeastern corner of Estonia and the Pskov Province in Russian.
- Bisu has roughly 2,740 native speakers. In China, Bisu spoken in one village of 240 people. In Burma, it’s spoken by 2,000 in two or three villages. In Thailand, Bisu is spoken by some members in two villages with a population of 500.
- Bakairi is spoken by approximately 900 people in Brazil. This language has two rather divergent dialects: Eastern Bakairi, spoken by seven hundred people in seven villages, and Western Bakairi spoken by 200 people in two villages.
- Cimbrian is spoken by fewer than 2,000 people in Italy, in the towns of Giazza, Roana, Mezzaselva, and Rotzo, and Luserna. People who speak Cimbrian also speak Italian, German, and Venetan.
- Tjupany is an Australian language with only 10 native speakers remaining in the world.
- Karelian is a language closely related to the Finnish language with 63,000 native speakers in Russia and Finland.
- El Molo is spoken by roughly 700 in a small community of fishermen living in two settlements along the eastern shore of Lake Turkana, in northern Kenya.
- Tuscarora is a dying language spoken in Ontario, Canada. Only two or three speakers of Tuscarora remain, all over the age of 80.
The goal of the Global Language Archive
Creating a physical place that represents a focal point for language preservation brings with it tremendous opportunity. Unlike today’s cultural museums that capture physical fragments of history, the Global Language Archive will have a mission to preserve the communications, stories, and dreams of our ancestors.
Online efforts only go so far. By adding physical dimensions, human contact, audio stories, and peripheral experiences, we breathe life into these otherwise single-dimensional languages.
As “last speakers” begin to dwindle, the final-person-responsibility brings with it tremendous stress and anxiety. The loss of a language means the loss of birthright, heritage, and customs. It somehow breaks the connection with their ancestors and invalidates all of the accomplishments of the past, dishonoring the culture of their families.
But much of this stress can be diffused by taking these speakers through a formal preservation process that transforms them from “crazy person clinging to the past” to “cultural expert with a deep understanding of their ancestors.”
Curators of languages are different than curators of artifacts. Languages are constantly morphing tools of expression with deep emotional ties. Done correctly, the Global Language Archive will attract massive crowds from around the world and draw attention to this critically important problem. It will be a one-of-a-kind facility serving as a magnet for linguistic scientists and cultural researchers around the globe.
Once an AI Language Recreation Engine can be developed, it opens the doors for entirely new kinds of research we can only speculate will be possible.
In this context, language itself becomes a cultural taxonomy, and with upwards of 7,000 languages left to preserve, it has the potential for becoming the largest museum in the world with associated universities, hotels, culture-inspired retail centers, and much more.
At the same time, many question still need to be answered:
- Will we need to develop a triage system saving dying languages?
- If you decided to learn one of the endangered languages, how would you make that decision?
- If it becomes easy to learn a new language, how many will you want your children to know?
- What are the revenue streams needed to sustain a Global Language Archive?
- What’s the ideal location for this type of facility?
- How can the entire world be recruited to support this venture?
What’s the best way to experience a language?
Yes, it is possible to experience pieces of these native tongues through a website, but having access to local experts, cultural guides, and linguistic coaches takes it to a whole new level.
In our ever-expanding virtual world, it’s easy to start thinking that proximity isn’t important, but it is. Being surrounded by like-minded people at the Global Language Archive who share a common interest is very important.
Much like the difference between seeing an online copy of the Mona Lisa or traveling to the Louvre in Paris and experiencing it first hand, it becomes and entirely different level of engagement.
The Global Language Archive is envisioned to become the “Louvre of Languages.”