Skip Navigation

What do you think about Abstract Wikipedia?

Wikifunctions is a new site that has been added to the list of sites operated by WMF. I definitely see uses for it in automating updates on Wikipedia and bots (and also for programmers to reference), but their goal is to translate Wikipedia articles to more languages by writing them in code that has a lot of linguistic information. I have mixed feelings about this, as I don't like existing programs that automatically generate articles (see the Cebuano and Dutch Wikipedias), and I worry that the system will be too complicated for average people.

25 comments
  • but their goal is to translate Wikipedia articles to more languages by writing them in code that has a lot of linguistic information

    That'll get unruly really fast.

    Languages simply don't agree on how to split the usage of words. Or grammatical case. Or if, when and how to do agreement.

    Just for the sake of example: how are they going to keep track of case in a way that doesn't break Hindi, or Basque, or English, or Guarani? Or grammatical gender for a word like "milk"? (not even the Romance languages agree in it.) At a certain point, it gets simply easier to write the article in all those languages than to code something to make it for you.


    I think that the best use scenario is to automate tidbits of highly changing data. It's fairly limited but it could be useful.

    • Languages simply don’t agree on how to split the usage of words. Or grammatical case. Or if, when and how to do agreement.

      Just for the sake of example: how are they going to keep track of case in a way that doesn’t break Hindi, or Basque, or English, or Guarani? Or grammatical gender for a word like “milk”? (not even the Romance languages agree in it.) At a certain point, it gets simply easier to write the article in all those languages than to code something to make it for you.

      I don't know what the WMF is planning here but what you're pointing out is precisely what abstraction would solve.

      If you had an abstract way to represent a sentence, you would be independent of any one order or case or whatever other grammatical feature. In the end you obviously do need actual sentences with these features. To get these, you'd build a mechanism that would convert the abstract sentence representation into a concrete sentences for specific languages that is correctly constructed according to those specific languages' rules.

      Same with gender. What you'd store would not be that e.g. some german sentence is talking about the feminine milk but rather that it's talking about the abstract concept of milk. How exactly that abstract concept is represented in words would then be up to individual languages to decide.

      I have absolutely no idea whether what I'm talking about here would be practical to implement but it in theory it could work.

      • Abstractions are not magic, and they cannot make info appear out of nowhere. Somewhere inside that abstraction you'll need to have the pieces of info that Spanish "leche" [milk] is feminine, that Zulu "ubisi" [milk] is class 11, that English predicative uses the ACC form, so goes on.

        And you'll need people to mark a multitude of distinctions in their sentences, when writing them down, that the abstraction layer would demand for other languages. Such as tagging the "I" in "I see a boy" as "+masculine, +older-person, +informal" so Japanese correctly conveys it as "ore" instead of "boku", "atashi, "watashi" etc.

        Even the idea of "abstract concept of milk" doesn't work as well as it sounds like, because languages will split even the abstract concepts in different ways. For example, does the abstract concept associated with a living pig includes its flesh?

        And the language itself cannot decide those things. A language is not an agent; it doesn't "do" something. You'd need people to actively insert those pieces of info for each language, that's perhaps doable for the most spoken ones, but those are the ones that would benefit the least from this.

    • They're just going to write all the articles in lojban.

      • Not even that would do the trick - practical usage of Lojban heavily relies on fu'ivla, that carry with themselves the semantic scope assigned to the original words. .u'i I want to see them trying though.

    • I'll reply to myself to highlight a point, and issue a challenge for those who assume that WMF's apparent goal - to translate Wikipedia articles to more languages by writing them in code that has a lot of linguistic information - is actually viable:

      Here's an excerpt from an actual Wikipedia article: "the solubility of these gases depending on the temperature and salinity of the water." Show me all the linguistic information that a writer would need to input, to convey the same information, in that system idealised by the goal, in a way that would not output "then who was phone?" tier nonsense for some languages. Then I'll show you why it would still output nonsense for some languages.

      Too much work? Then feel free to do it just for "of the water". It's a single PP, how hard would it be? /s

      Hic Rhodes, hic salta.

      [Edit reason: clarity.]

25 comments