Troubleshooting PHP gettext

Troubleshooting PHP gettext

Over the last couple of weeks, I’ve been tinkering with PHP’s gettext to set up internationalisation for one of my web apps (i.e. getting it ready for translation into different languages). Even though there were many step-by-step guides and Stack Overflow topics on the web, all detailing a similar set of instructions, following them did not work things out for me.

After some frustration and a lot of time tinkering, it turns out that these guides were missing some pieces of information.

Introduction

PHP’s gettext runs quite differently depending on which operating system you use, because its functioning is closely tied to how its operating system handles localisation. Hence, this article has been separated into a few different sections, so you can scroll directly to them if you are only interested in a specific platform:

  1. General Tips
    1. Make sure gettext is installed and enabled on PHP
    2. Set the language for your HTML document
    3. Specify your character set using bind_textdomain_codeset()
  2. XAMPP on Windows
    1. Make sure gettext and iconv are installed
    2. Set the LANG attribute on XAMPP shell
  3. Ubuntu (18.04 and 20.04)
    1. Make sure the correct locales are installed
    2. Specify the character set extension in setlocale()
  4. Conclusion
    1. Code Snippet

Article continues after the advertisement:


General Tips

Here are some tricks that you can try, regardless of which operating system you are using.

Make sure gettext is installed and enabled on PHP

If you don’t have gettext installed and enabled on PHP, you won’t be able to use it. To check for this, you can run phpinfo() on any one of your pages.

In the information output by phpinfo(), use Ctrl + F to search for “gettext”. If you find a section like the one pictured below, you can skip onto the next section.

gettext in phpinfo()
Make sure that the text besides gettext says enabled.

Otherwise, you’ll have to install gettext and enable it. If you’re on XAMPP, your PHP installation should already have gettext — you’ll only need to enable it. Find php.ini on XAMPP, and search for this line:

;extension=gettext

Note: For XAMPP users, php.ini should be in C:\xampp\php\php.ini.

You’ll need uncomment it by removing the ; in front to enable gettext. If you don’t find the line, you’ll need to add it in (preferrably to the extensions section).

On Ubuntu (and other Linux-based systems), you’ll also need to check if you have the php-gettext package installed. As different distributions use different package managers, the commands are going to be slightly different amongst them. For Ubuntu 18.04 and 20.04, you’ll use this command:

apt install php-gettext

Set the language for your HTML document

The opening <html> tag is capable of taking a lang attribute, which is meant to specify what language the HTML page is in, e.g. <html lang="en-US">.

For your translated pages, you need to make sure that this lang attribute is specified and updated accordingly.

Language codes are what goes into the attribute. You can read this topic on Stack Overflow to find a pretty comprehensive list. Below is a table with some language codes, as well as the locales they represent.

Language family Language tag Language variant
English en-GB English (Great Britian)
en-US English (United States)
en-CA English (Canada)
en-IN English (India)
en-SG English (Singapore)
Chinese zh-CN Simplified Chinese (China)
zh-SG Simplified Chinese (Singapore)
zh-TW Traditional Chinese (Taiwan)
French fr-BE French (Belgia)
fr-CH French (Switzerland)
fr-FR French (France)
Portuguese pt-PT Portuguese (Portugal)
pt-BR Portuguese (Brazil)

Specify your character set using bind_textdomain_codeset()

This is especially important for people using Linux-based distributions like Ubuntu, who have to install the right locales for gettext to work. It’s also mostly unmentioned in a lot of the topics showing you how to set up translated pages.

In the part of your PHP setup code where you call setlocale() and bindtextdomain(), you also need to call bind_textdomain_codeset() to specify the character set that your MO translation file is using.

You should find the snippet below in many guides online, but most of them do not mention the highlighted portion.

$lang = $_GET['lang'];

// Set the language.
if(defined('LC_MESSAGES')) {
	// For Linux-based distributions.
	setlocale(LC_MESSAGES, $lang);
} else {
	// For Windows.
	putenv('LC_ALL=' . $lang);
}

// Sets the folder where the messages textdomain will check for translation files.
bindtextdomain('messages', realpath(APPPATH . 'language'));

// Specifies the character set that the translation file uses.
bind_textdomain_codeset('messages', 'UTF-8');

// Set the text domain to use.
textdomain('messages');

Article continues after the advertisement:


XAMPP on Windows

Many people online have run into issues getting gettext to work on XAMPP, apparently because Windows does not natively support gettext. Whatever the reasons are, doing those 2 things below worked for me:

Make sure gettext and iconv are installed

For gettext to work on Windows, you’ll need to install gettext and iconv for Windows. This comes with the added benefit of installing gettext’s command line binaries onto your device, so you can use them in both the Command Prompt and the Windows Powershell.

Note: You have to install the application with the installer. Just downloading the ZIP file and adding it to your environment paths won’t work.

Set the LANG attribute on XAMPP shell

Finally, to get gettext working in XAMPP, you need to set the LANG attribute on the XAMPP shell before you fire up the XAMPP Control Panel every time you want to use XAMPP.

The XAMPP shell is right inside the install folder, which means it should be at C:\xampp\xampp_shell.bat for most people. Run it (preferrably as an Administrator), and type the following:

set LANG=YOUR_LANGUAGE_CODE

YOUR_LANGUAGE_CODE needs to be replaced with the appropriate language code of your localisation. For example, if my localisation was for Simplified Chinese, it would look like this:

set LANG=zh_CN

Once done, use the following command to open the XAMPP Control Panel from the shell.

xampp-control

Note: You have to close (i.e. Right-click > Quit) out of all instances of the XAMPP Control Panel before you set LANG in the XAMPP shell. The command does not retroactively apply to instances of XAMPP that are already open.

In my case, setting up XAMPP in this way gets the translations for only one particular language working. I have to quit XAMPP, and then set the LANG variable again on XAMPP shell every time I want to change languages. It’s fine for me because I use XAMPP purely for testing, but if you’re using it to host a live site, you’re obviously going to need to find a way around this.

If you’ve found a solution to this issue, please share it in the comments below. I will appreciate it very much!


Article continues after the advertisement:


Ubuntu (18.04 and 20.04)

On Ubuntu, and probably many other Linux-based distributions too, gettext works reliably well. Where people run into problems is getting the locales of the machine properly set up.

Make sure the correct locales are installed

You’ve probably seen this one in a couple of Stack Overflow topics addressing this problem — for your server to support a particular language, you need to generate the localisation file for that particular language first. So, general advice goes like this:

  1. Use locale -a to see if the language you are supporting is already installed.
  2. If it is not, use locale-gen YOUR_LANGUAGE

Hence, if I wanted to add the locale files for Simplified Chinese, I would type in the following commands (highlighted portions):

$ locale -a
C
C.UTF-8
POSIX
en_US.utf8
$ locale-gen zh_CN
Generating locales (this might take a while)…
zh_CN.GB2312… done
Generation complete.

This generates a locale file for zh_CN, which you can now see in the list of locale -a (highlighted below):

$ locale -a
C
C.UTF-8
POSIX
en_US.utf8
zh_CN
zh_CN.gb2312

While this generates the correct locale files, it may not be enough for gettext to work. This is because the locale file generated may not match the character set used in your MO file. The zh_CN localisation, for instance, has 4 different variants:

  1. zh_CN (GB2313)
  2. zh_CN.GB18030
  3. zh_CN.GBK
  4. zh_CN.UTF8

Using locale-gen zh_CN downloads the default variant, which uses the GB2313 character set. That means if your…

  1. HTML page;
  2. MO file, and;
  3. bind_textdomain_codeset()

…are not using the GB2313 character set, then your translations will either not display, be intermittently displayed, or be displayed wrongly. Hence, instead of using locale-gen, I suggest using dpkg-reconfigure locales to generate locale files. This gives you a list of all possible character sets a locale can use during installation.

dkpg-reconfigure locales command
It’s cumbersome, but it also offers much more control.

The interface is, admittedly, much more cumbersome than locale-gen. You use < and > to scroll through the options, Space to select or deselect a locale, Enter to finish, and Esc to cancel.

Personally, I always use the UTF-8 character set for all locales I install, because this means I don’t need to worry about changing the character set in my HTML pages and PHP code for different langauges. Not all character sets use UTF-8 as the default though, which means that I also need to…

Specify the character set extension in setlocale()

This section explores concepts which are a continuation of the section above, so be sure to give it a read first.

When specifying your language using setlocale() in your PHP code, it is best to also specify the character set extension of the locale, like so:

setlocale(LC_MESSAGES, 'en_US.utf-8');

Note that the extensions must be spelled exactly as locale -a displays them, down to the capitalisation of letters.

If you don’t specify the character set extension, then the default character set for the language will be used. For example, en_US is the same as en_US.utf-8, because UTF-8 is the default character set for en_US. Similarly, zh_CN is the same as zh_CN.gb2313 because GB2313 is the default character set for zh_CN.


Article continues after the advertisement:


Conclusion

This article was written because there was implementation information about gettext that was not documented on the net. Hence, if you find any errors, or if there is any missing information, please feel free to contribute using the comments section at the bottom. Your help will be greatly appreciated!

If you’re still stuck after following my guide, feel free to drop a question in the comments too. I’ll help you out as much as I’m able.

As a bonus (and reference material to other developers out there), here is my PHP script for setting up gettext:

$lang = $_GET['lang'];
// Listing all the languages I'm supporting.
$lang_arr = array(
	'zh_CN' => 'zh_CN.utf-8',
	'zh_TW' => 'zh_TW.utf-8'
);
			
// If the language requested is supported..
if(isset($lang_arr[$lang])) {
				
	// Set the language.
	if(defined('LC_MESSAGES')) {
		//  For Linux-based distributions.
		setlocale(LC_MESSAGES, $lang_arr[$lang]);
	} else {
		// For Windows
		putenv('LC_ALL=' . $lang);
	}
		
	// Sets the folder where the messages textdomain will check for translation files.	
	bindtextdomain('messages', realpath(APPPATH . 'language'));

	// Specifies the character set that the translation file uses.
	bind_textdomain_codeset('messages', 'UTF-8');
				
	// Set the text domain.
	textdomain('messages');
	
}

Happy coding!

Leave a Reply

Your email address will not be published.