Conlangs with over 10,000 words

From FrathWiki
Revision as of 23:02, 10 December 2024 by Khemehekis (talk | contribs) (→‎The list: Itlani has 24,000. Woo!)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Because scrappers and sandboxers create and scrap so many conlangs, the vast majority of conlangs have a lexicon that doesn't get very large by the time they're scrapped. In fact, many conlangs have about 7 to 10 words by the time they're scrapped, and some never have a single word of lexicon created.

In this forest of creosote bushes, however, there are some sequoias that have astoundingly large lexica, such as Talossan, Arka, or Spocanian. Occasionally, a conlang will become known mainly for its large lexicon size (as happened with Arka).

Here is a list of the largest of the large -- conlangs that have at least ten thousand words in their lexicon.

Excluded are:

  1. Superset and subset languages.
  2. Conlangs that consist of making only a few predictable changes to an existing language, such as Muphridian. Note that this includes all language games such as Pig Latin and Adibikicyan, as well as conlangs that are ciphers in the truest sense of the word (e.g. the No Man's Sky conlangs, or a conlang in which every A in the English word becomes an I, every B in the English word becomes a T, every C in the English word becomes an L, every D in the English word becomes an X, every E in the English word becomes a U, and so on). Conlangs that are relexes, but not cryptographic ciphers of natlang words, such as Froogleyboy's Aveata, although they may be frowned upon in the conlanging community, are eligible for this list, as are conlangs in which someone relexed a list like WordNet, or the Landau Core Vocabulary, or the Ethnologue lexicon questionnaire.
  3. Conlangs whose lexicon consists of an unedited computer-generated data dump, such as Nunihongo. (Computer-generated conlangs whose lexicon has been cleaned up, such as Classical Yiklamu, are eligible for this list).
  4. Computer-generated material for making new words for a conlang, such as that made by Larry Rogers of Michigan (BA Linguistics, Michigan State University, 2009) for his expansion of Marc Okrand's Atlantean language. This was not intended as a conlang dictionary but merely as material which could be used to make new words, computer-generated words approximating the phonology of the original words.

Although James Landau, who has researched this list, has done quite an extensive search, there are no doubt at least a few conlangs that belong on this list that he's missed out on. Anyone is free to add them here.

Finally, it should be clarified that this list counts only how many words the creator or creators have actually created in the language, not the number of words that the creator says exist in-universe.

The list

  • about 10,000 Lingwa de planeta (Lidepla) - Dmitri Ivanov, A. Lysenko and others; worldlang based on 10 languages
  • 10,000 Farlingo - Vladimir Farber & Matvei Farber; auxlang based on SAE languages, Russian, Hebrew and Esperanto
  • 10,000 Faudanian - Josh Hien; a posteriori personal language
  • 10,000 Fith - Jeffrey Henning; alien language with "last in, first out" structure
  • 10,000 Lugasuese - Jurre Lagerwaard; spoken on the planet Aranii, based on Germanic and Tolkienian languages
  • 10,000 Unish - Language Research Institute, Sejong University; auxlang based on 16 languages from many phyla
  • more than 10,000 Loglan - James Cooke Brown; logical language with a posteriori lexicon
  • 11,000 Aepsognian - KozmoRobot; developed for a game called Robotic Run
  • 11,000 LANGUST - Grigoriy Korolev; auxlang based on 45 "atoms", aUI-style
  • over 11,000 Uropi - Joël Landais; zonal language for Europe based on Indo-European roots
  • 11,200 Paolanté - B. Christopher Suchsland-Gutiérrez; fictional Romance language
  • 11,287 aUI - John Weilgart; oligosynthetic language popularly known as "the Language of Space"
  • 11,500 Interslavic - Juraj Križanic; Slavic zonal language
  • 11,759 Rireinutire - Prettydragoon; spoken on the planet Rireinu
  • about 12,000 Rodinian - Rodiniye; worldlang repurposed as an artlang
  • 12,000 HOOM - Rood Hume; intuitive a priori auxlang for the universe
  • 12,000 Tundrian - Gábor Sándi; Romance language spoken in the fictional country of Tundria
  • 12,130 Géarthnuns - Douglas Koller; a priori language spoken on a fictional island in the Sea of Japan
  • 12,363 Mango - Natalia Gruscha; spoken by the Tiger People of the planet Pii, based on Indo-Aryan languages
  • around 12,500 Slovio - Mark Hucko; Slavic zonal language
  • 12,568 Alurhsa - Tony Harris; spoken on the planet Aluria at the edge of the Andromeda Galaxy
  • 13,000 Mila - Gary Taylor-Raebel; set in an alien colony, invented as a fauxlang but has since evolved naturally
  • 13,552 Tomato - Catty; innovative a priori auxlang
  • 14,000 Europeano - Jay Bowks; Euroclone
  • 14,000 Ido - Louis de Beaufront, Louis Couturat; Esperanto spinoff
  • 14,750 Kavrinian - Ultimate Ridley et al.; collaborative language spoken in Lhavres on the planet Sahar; most words created by others
  • 14,787 Minhyan - Jeffrey Henning; VSO fictional diachronic language with a priori vocabulary
  • 15,000 Chaldon-Siberian - Yaroslav Zolotaryov; purification of Slavic languages without Old Church Slavonic borrowings
  • around 15,000 Deyryck - Threr; fictional language spoken in the multiverse of Alaaban
  • over 15,000 Town Speech/Urban Basanawa - k1234567890y; West Germanic artlang (its Germanic nature obscured by writing system and large amount of Sino-Xenic vocabulary)
  • 16,000 Ortatürk - Baxtiyar Kärimov and Shoahmad Mutalov; zonal language that represents a statistical average of the Turkic languages
  • 16,000 Otg - Spencer Spurgeon; quirky fictional diachronic language inspired by Celtic and Turkish
  • about 16,000 Xhaimeran - Leo Flavum; used for songs and poetry
  • 16,627 Arka - Seren Arbazard; spoken on the planet Kaldia
  • over 17,000 Sambahsa-Mundialect - Dr. Olivier Simon; worldlang based on PIE, Arabic, Chinese, Indonesian, Swahili and Turkish
  • more than 17,000 Kotava - Staren Fetcey; a priori auxlang
  • about 18,000 Interslavic - Jan van Steenbergen et al.; Slavic zonal language based on Old Church Slavonic, previously known as Slovianski
  • 20,000 Aixosixomi - Alexander M. Koch; language of a fictional hunter-gatherer people of Earth
  • 20,000 Sydvetlish - Matheus Filipe da Silva Leal; Germanic auxlang
  • over 20,000 Lingua Franca Nova - George C. Boeree; Romance auxlang with creole grammar
  • 20,710 Noxilo - Sentaro Mizuta; worldlang that allows for many different word orders
  • around 21,000 Celinese - Andy Ayres; macrolanguage spoken on the planet Lorech
  • 21,100 Basha Humrayan - MissTerry; a conlang and autocryptolect 90% based on the Sanskrit spectrum (ancient Vedic to Classical and "Modern") with very regular Esperanto style grammar
  • 21,715 Lojban - The Logical Language Group; logical language, a reform of Loglan
  • 22,956 Pantakakiano - Javier Valladolid Antoranz; used in the novel El sueño en verso
  • 24,000 Itlani - James E. Hopkins; spoken on the planet Itlán
  • over 25,000 Spocanian - Rolandt Tweehuysen; a priori fictional language from the Atlantic phylum
  • 25,234 FeNeKeRe - Jonathan Sodt; a priori fauxlang of Earth's Dragon People of the arts, with millions of possible names
  • 26,339 Blissymbolics - Charles K. Bliss; pasigraphy
  • 26,352 Sermo - Jose Soares Da Silva; Euroclone
  • over 35,000 Talossan - R. Ben Madison; micronational language, Romance but lacking a consistent derivation from Latin
  • 39,765 Nuu - Thomas Keyes; a priori engelang/artlang spoken on Ung
  • 46,950 EDA/Edanic - Arne Arotnow; Euroclone based on Italian
  • 50,000 Kattish - Favour Benyin; spoken in Kattland, a fictional place on Earth; a priori with borrowings from immigrant languages
  • 51,831 Vedanic - Arne Arotnow; reform of Edanic
  • over 60,000 Interlingua - International Auxiliary Language Association; Euroclone
  • some 75,000 Neo - Arturo Alfandari; Euroclone
  • 77,000 Esperanto - Ludwig Lazarus Zamenhof; auxlang and the world's most successful conlang
  • over 83,000 xuxuxi - John Cowan; based on the same principles as Classical Yiklamu
  • 91,591 Classical Yiklamu - Mark P. Line; engelang with computer-generated vocabulary based on WordNet and no derivation
  • 100,000 M̄av̄ī - Imjustadudeontheinternet; language created with the sole purpose of having 100,000 words. Relex of Toki Pona with a huge number of synonyms for each word.
  • 105,000 Kankonian - James Landau; spoken on the planet Kankonia in the Lehola Galaxy

The M̄av̄ī Controversy

In March of 2024, Imjustadudeontheinternet completed M̄av̄ī, which had 100,000 words. Khemehekis, creator of Kankonian (the conlang with the largest non-computer-generated lexicon), made a post about it on the CBB thread devoted to lexicon milestones. The post stated: "After spending decades on top, Classical Yiklamu has been dethroned [...] M̄av̄ī has 100,000 words, all defined with a Toki Pona definition, and was created for the sole purpose of having sextuple digits in its lexicon." This text was accompanied by a link to the Conlang Fandom "Largest Conlangs" page and a wide-eyed emoji.

After this post, CBB users Visions1 and Arayaz responded with congratulations to Imjustadudeontheinternet and statements of their trust that eventually, the title would be Khemehekis's. This was followed by the following post from user criminalmammal: "There is no way this should count for anything. This person seems to have randomly generated their hundred thousand words and then assigned each of them a random, singular Toki Pona word: so it's a relex of Toki Pona where every word has seven hundred-odd synonyms, with no reason or nuance. It's a stunt and it seems sort of insulting considering the work that is put into languages that actually have thousands of dictionary entries."

Following criminalmammal's post, a discussion ensued. Arayaz responded saying: "Wait, each one has *one* Toki Pona definition? I thought the person was just a hardcore Toki Pona fan and defined all their words in Toki Pona for fun. If it's a randomly generated relex, congratulations revoked. I suggest we update the requirements of the FrathWiki page." Following this, Pabappa suggested that similar insinuations could be made against Classical Yiklamu, the previous record-holder ─ Classical Yiklamu was entirely computer-generated and filled in a wordlist heavily tied to English ("even to the point of having the word for 'bat' mean both an animal and a sports implement," as Pabappa notes).

Khemehekis then expressed that he believed that Classical Yiklamu should still count, but that M̄av̄ī was "basically cheating" and the requirements of the "Conlangs with over 10,000 words" FrathWiki page (this one) should be updated to exclude it. "I mean," noted Khemehekis, "it's taken me 27 years -- the length of Kurt Cobain or Jimi Hendrix's life -- to get Kankonian up to even 86,400 words."

Arayaz then proposed that relexes should be disallowed, noting that she believed Classical Yiklamu shouldn't count either. She also suggested that "Perhaps one could require distinctive grammar of the language; i.e. something doesn't count if it's only a lexicon/wordlist. (Could be too extreme?) Or the rule related to unmodified computer-generated outputs could be extended."

Khemehekis then proposed to "[...] disqualify computer-generated relexes of a wordlist/vocabulary if they are created for the express purpose of having huge lexica? That'd disqualify Mmavvii, and it would keep someone from piggybacking onto my Kankonian lexicon to create a conlang with even more words than Kankonian."

As of March 29, 2024, no further responses were been made to the forum, but the rules of the FrathWiki page have not been updated, so M̄av̄ī remained the #1 largest conlang lexicon. On May 8, 2024, Kankonian reached 91,592 words, thereby surpassing Classical Yiklamu, and knocking it down to third place among conlang lexicon sizes. This all became a moot point when, at 8:49 p.m. on September 1, 2024, Kankonian reached 100,001 words, taking the title of Number One from M̄av̄ī.

Slovio

I stumbled upon this page by accident and noticed that Slovio is listed with 44,000 words, a number that I believe was mentioned on Wikipedia a long time ago. However, I've done thorough research on the Slovio dictionary and found that this number is far from correct, because it is not the number of words, but the number of entries in an Excel file. To quote from my article "The Slovio Myth":

Numbers play an important role in Slovio propaganda, and the size of its dictionary is no exception. Most Wikipedia articles mention something about the number of words contained in the Slovio dictionary, an Excel file with currently ca. 65,000 entries. “More than some of the ‘natural’ languages”, says the main page. However, thorough analysis of the file yields the following results:

  • thousands of (English) words are not translated into Slovio at all
  • thousands of entries refer to internet domains, languages, geographic entities, inhabitants of countries (male, female and gender-neutral), corresponding adjectives, etc.
  • the plural of practically every noun is given as a separate entry, even though plurals are always formed regularly
  • likewise, many verb forms (for example, past tenses) are given as separate entries
  • it contains lots of doublets and endless numbers of synonyms: when six English words can be used to translate one Slovio word, it is listed six times
  • a lot of nouns are also given as adjectives (just by adding -ju), which often results in very strange words
  • it contains numerous sentences, expressions and nonsense words like Europju banan-soiuz.

In other words, the dictionary is stuffed with redundant entries that no “real” dictionary would ever include. If we remove these, we can conclude that Slovio’s actual word stock is somewhere in the range between 10,000 and 15,000. Still a respectable number, and definitely enough for a functional language – but not quite as impressive.

Cheers, IJzeren Jan (talk) 11:48, 31 March 2024 (PDT)