Graph of the day: Tweets in translation

20120331_WOC931

From The Economist. h/t @SultanAlQassemi

2 Responses

  1. Indonesian takes up 20% more in translation, but I’m not sure about in tweets. Reduplication is simply abbreviated (eg. hidup-hidup becomes hidup2) and because of the consistency and syllable-stressed nature of Indonesian, vowels can be dispensed with easily (tentang becomes ttg). A lot of meaning can be contained, so it’s not surprising that they’ve taken to the platform so heavily. Chinese wins this hands down, but others aren’t necessarily as disadvantaged as the graph might imply.

  2. A better measure would be bits. English and most Latin-based text fits comfortably in 7 bit characters, some of the others require 8 bits. Arabic I believe requires 16 bit characters, and Chinese requires 32 bit characters. So, using a 100 character English message, that is 700 bits, maybe 800 bits if they use 8 bit characters anyways. Arabic is going to require 86 characters * 16 bits = 1376 bits and Chinese 31 characters * 32 bits = 992 bits. So, the tweet size looks quite different in that respect.