Over the last couple of weeks, I’ve been tinkering with PHP’s gettext to set up internationalisation for one of my web apps (i.e. getting it ready for translation into different languages). Even though there were many step-by-step guides and Stack Overflow topics on the web, all detailing a similar set of instructions, following them did not work things out for me.
After some frustration and a lot of time tinkering, it turns out that these guides were missing some pieces of information. If you are tearing your hair out troubleshooting PHP gettext, this article might be just what you’re looking for.
Introduction
PHP’s gettext runs quite differently depending on which operating system you use, because its functioning is closely tied to how its operating system handles localisation. Hence, this article has been separated into a few different sections, so you can scroll directly to them if you are only interested in a specific platform:
General Tips
Here are some tricks that you can try, regardless of which operating system you are using.
Make sure gettext is installed and enabled on PHP
If you don’t have gettext installed and enabled on PHP, you won’t be able to use it. To check for this, you can run phpinfo()
on any one of your pages.
In the information output by phpinfo()
, use Ctrl + F to search for “gettext”. If you find a section like the one pictured below, you can skip onto the next section.
Otherwise, you’ll have to install gettext and enable it. If you’re on XAMPP, your PHP installation should already have gettext — you’ll only need to enable it. Find php.ini
on XAMPP, and search for this line:
;extension=gettext
Note: For XAMPP users, php.ini
should be in C:\xampp\php\php.ini
.
You’ll need uncomment it by removing the ;
in front to enable gettext. If you don’t find the line, you’ll need to add it in (preferrably to the extensions section).
On Ubuntu (and other Linux-based systems), you’ll also need to check if you have the php-gettext
package installed. As different distributions use different package managers, the commands are going to be slightly different amongst them. For Ubuntu 18.04 and 20.04, you’ll use this command:
apt install php-gettext
Set the language for your HTML document
The opening <html>
tag is capable of taking a lang
attribute, which is meant to specify what language the HTML page is in, e.g. <html lang="en-US">
.
For your translated pages, you need to make sure that this lang
attribute is specified and updated accordingly.
Language codes are what goes into the attribute. You can read this topic on Stack Overflow to find a pretty comprehensive list. Below is a table with some language codes, as well as the locales they represent.
Language family | Language tag | Language variant |
---|---|---|
English | en-GB | English (Great Britian) |
en-US | English (United States) | |
en-CA | English (Canada) | |
en-IN | English (India) | |
en-SG | English (Singapore) | |
Chinese | zh-CN | Simplified Chinese (China) |
zh-SG | Simplified Chinese (Singapore) | |
zh-TW | Traditional Chinese (Taiwan) | |
French | fr-BE | French (Belgia) |
fr-CH | French (Switzerland) | |
fr-FR | French (France) | |
Portuguese | pt-PT | Portuguese (Portugal) |
pt-BR | Portuguese (Brazil) |
Specify your character set using bind_textdomain_codeset()
This is especially important for people using Linux-based distributions like Ubuntu, who have to install the right locales for gettext to work. It’s also mostly unmentioned in a lot of the topics showing you how to set up translated pages.
In the part of your PHP setup code where you call setlocale()
and bindtextdomain()
, you also need to call bind_textdomain_codeset()
to specify the character set that your MO
translation file is using.
You should find the snippet below in many guides online, but most of them do not mention the highlighted portion.
$lang = $_GET['lang']; // Set the language. if(defined('LC_MESSAGES')) { // For Linux-based distributions. setlocale(LC_MESSAGES, $lang); } else { // For Windows. putenv('LC_ALL=' . $lang); } // Sets the folder where the messages textdomain will check for translation files. bindtextdomain('messages', realpath(APPPATH . 'language')); // Specifies the character set that the translation file uses. bind_textdomain_codeset('messages', 'UTF-8'); // Set the text domain to use. textdomain('messages');
XAMPP on Windows
Many people online have run into issues getting gettext to work on XAMPP, apparently because Windows does not natively support gettext. Whatever the reasons are, doing those 2 things below worked for me:
Make sure gettext and iconv are installed
For gettext to work on Windows, you’ll need to install gettext and iconv for Windows. This comes with the added benefit of installing gettext’s command line binaries onto your device, so you can use them in both the Command Prompt and the Windows Powershell.
Note: You have to install the application with the installer. Just downloading the ZIP file and adding it to your environment paths won’t work.
Set the LANG
attribute on XAMPP shell
Finally, to get gettext working in XAMPP, you need to set the LANG
attribute on the XAMPP shell before you fire up the XAMPP Control Panel every time you want to use XAMPP.
The XAMPP shell is right inside the install folder, which means it should be at C:\xampp\xampp_shell.bat
for most people. Run it (preferrably as an Administrator), and type the following:
set LANG=YOUR_LANGUAGE_CODE
YOUR_LANGUAGE_CODE
needs to be replaced with the appropriate language code of your localisation. For example, if my localisation was for Simplified Chinese, it would look like this:
set LANG=zh_CN
Once done, use the following command to open the XAMPP Control Panel from the shell.
xampp-control
Note: You have to close (i.e. Right-click > Quit) out of all instances of the XAMPP Control Panel before you set LANG
in the XAMPP shell. The command does not retroactively apply to instances of XAMPP that are already open.
In my case, setting up XAMPP in this way gets the translations for only one particular language working. I have to quit XAMPP, and then set the LANG
variable again on XAMPP shell every time I want to change languages. It’s fine for me because I use XAMPP purely for testing, but if you’re using it to host a live site, you’re obviously going to need to find a way around this.
If you’ve found a solution to this issue, please share it in the comments below. I will appreciate it very much!
Ubuntu (18.04 and 20.04)
On Ubuntu, and probably many other Linux-based distributions too, gettext works reliably well. Where people run into problems is getting the locales of the machine properly set up.
Make sure the correct locales are installed
You’ve probably seen this one in a couple of Stack Overflow topics addressing this problem — for your server to support a particular language, you need to generate the localisation file for that particular language first. So, general advice goes like this:
- Use
locale -a
to see if the language you are supporting is already installed. - If it is not, use
locale-gen YOUR_LANGUAGE
Hence, if I wanted to add the locale files for Simplified Chinese, I would type in the following commands (highlighted portions):
$ locale -a C C.UTF-8 POSIX en_US.utf8 $ locale-gen zh_CN Generating locales (this might take a while)… zh_CN.GB2312… done Generation complete.
This generates a locale file for zh_CN
, which you can now see in the list of locale -a
(highlighted below):
$ locale -a C C.UTF-8 POSIX en_US.utf8 zh_CN zh_CN.gb2312
While this generates the correct locale files, it may not be enough for gettext to work. This is because the locale file generated may not match the character set used in your MO
file. The zh_CN
localisation, for instance, has 4 different variants:
- zh_CN (GB2313)
- zh_CN.GB18030
- zh_CN.GBK
- zh_CN.UTF8
Using locale-gen zh_CN
downloads the default variant, which uses the GB2313 character set. That means if your…
- HTML page;
MO
file, and;bind_textdomain_codeset()
…are not using the GB2313 character set, then your translations will either not display, be intermittently displayed, or be displayed wrongly. Hence, instead of using locale-gen
, I suggest using dpkg-reconfigure locales
to generate locale files. This gives you a list of all possible character sets a locale can use during installation.
The interface is, admittedly, much more cumbersome than locale-gen
. You use < and > to scroll through the options, Space to select or deselect a locale, Enter to finish, and Esc to cancel.
Personally, I always use the UTF-8 character set for all locales I install, because this means I don’t need to worry about changing the character set in my HTML pages and PHP code for different langauges. Not all character sets use UTF-8 as the default though, which means that I also need to…
Specify the character set extension in setlocale()
This section explores concepts which are a continuation of the section above, so be sure to give it a read first.
When specifying your language using setlocale()
in your PHP code, it is best to also specify the character set extension of the locale, like so:
setlocale(LC_MESSAGES, 'en_US.utf-8');
Note that the extensions must be spelled exactly as locale -a
displays them, down to the capitalisation of letters.
If you don’t specify the character set extension, then the default character set for the language will be used. For example, en_US
is the same as en_US.utf-8
, because UTF-8 is the default character set for en_US
. Similarly, zh_CN
is the same as zh_CN.gb2313
because GB2313 is the default character set for zh_CN
.
Conclusion
This article was written because there was implementation information about gettext that was not documented on the net. Hence, if you find any errors, or if there is any missing information, please feel free to contribute using the comments section at the bottom. Your help will be greatly appreciated!
If you’re still stuck after following my guide, feel free to drop a question in the comments too. I’ll help you out as much as I’m able.
As a bonus (and reference material to other developers out there), here is my PHP script for setting up gettext:
$lang = $_GET['lang']; // Listing all the languages I'm supporting. $lang_arr = array( 'zh_CN' => 'zh_CN.utf-8', 'zh_TW' => 'zh_TW.utf-8' ); // If the language requested is supported.. if(isset($lang_arr[$lang])) { // Set the language. if(defined('LC_MESSAGES')) { // For Linux-based distributions. setlocale(LC_MESSAGES, $lang_arr[$lang]); } else { // For Windows putenv('LC_ALL=' . $lang); } // Sets the folder where the messages textdomain will check for translation files. bindtextdomain('messages', realpath(APPPATH . 'language')); // Specifies the character set that the translation file uses. bind_textdomain_codeset('messages', 'UTF-8'); // Set the text domain. textdomain('messages'); }
Happy coding!