Skip to main content

    Lesson 34 • Advanced

    Localization & i18n 🌍

    By the end of this lesson you'll build multilingual PHP apps: a translation system with placeholders, correct plurals for every language, locale-aware money and dates with the intl extension, and automatic language detection — without hardcoding a single string.

    What You'll Learn in This Lesson

    • Explain the difference between i18n (internationalization) and l10n (localization)
    • Build a translation lookup with :placeholder interpolation and a fallback locale
    • Format currency and dates per locale with NumberFormatter and IntlDateFormatter
    • Pick the correct plural form in any language using MessageFormatter
    • Detect the visitor's locale from URL, cookie, and the Accept-Language header
    • Keep accents intact with UTF-8 and the mb_ string functions

    1️⃣ Translation Systems: Never Hardcode Text

    The single most important habit in i18n is to separate text from code. Instead of writing echo "Welcome!", you store every user-facing string in a message file under a key like welcome, then look it up: t('welcome'). A placeholder such as :name is a slot you fill at runtime, so "Hello, :name!" becomes "¡Hola, Alice!" in Spanish. The ?? operator below is the null-coalescing operator — "use the left value, or the right one if it's missing" — which gives you a clean fallback chain.

    A PHP translator with placeholders and a fallback
    <?php
    // i18n vs l10n — two words you'll hear constantly:
    //   i18n (internationalization) = making your code CAPABLE of many languages
    //                                 (no hardcoded text, placeholders, plural slots)
    //   l10n (localization)         = the actual translations + locale formatting
    //
    // The golden rule: NEVER hardcode user-facing text. Look it up by a KEY instead.
    
    class Translator
    {
        /** @var array<string, array<string,string>>  locale => (key => text) */
        private array $messages = [];
    
        public function __construct(
            private string $locale,          // the language we're showing now
            private string $fallback = 'en', // used when a key is missing
        ) {}
    
        // Load one language's message file (here just an array).
        public function load(string $locale, array $messages): void
        {
            $this->messages[$locale] = $messages;
        }
    
        public function setLocale(string $locale): void
        {
            $this->locale = $locale;
        }
    
        // Look up $key for the current locale; fall back to English, then the key.
        // Replace :placeholders (e.g. :name) with the values you pass in.
        public function t(string $key, array $params = []): string
        {
            $msg = $this->messages[$this->locale][$key]
                ?? $this->messages[$this->fallback][$key]   // ?? = "use this if null"
                ?? $key;                                     // last resort: show the key
    
            foreach ($params as $name => $value) {
                $msg = str_replace(":{$name}", (string) $value, $msg);
            }
            return $msg;
        }
    }
    
    $trans = new Translator('en');
    
    $trans->load('en', [
        'welcome'  => 'Welcome to our store!',
        'greeting' => 'Hello, :name!',
        'checkout' => 'Proceed to Checkout',
    ]);
    $trans->load('es', [
        'welcome'  => '¡Bienvenido a nuestra tienda!',
        'greeting' => '¡Hola, :name!',
        'checkout' => 'Proceder al pago',
    ]);
    
    foreach (['en', 'es'] as $locale) {
        $trans->setLocale($locale);
        echo "[{$locale}]\n";
        echo "  " . $trans->t('welcome') . "\n";              // no placeholder
        echo "  " . $trans->t('greeting', ['name' => 'Alice']) . "\n"; // :name -> Alice
        echo "  " . $trans->t('checkout') . "\n";
    }
    Output
    [en]
      Welcome to our store!
      Hello, Alice!
      Proceed to Checkout
    [es]
      ¡Bienvenido a nuestra tienda!
      ¡Hola, Alice!
      Proceder al pago
    This is real code — run it for free atonecompiler.com/phpor in your own editor.

    Notice that the code never changed between locales — only the data did. That is i18n (the structure) and l10n (the Spanish array) working together. Real apps load these arrays from per-language files (en.php, es.php, or JSON), which is exactly how Laravel and Symfony work under the hood.

    2️⃣ setlocale, gettext & the UTF-8 Rule

    PHP's classic localization tools are setlocale() and gettext. setlocale(LC_MONETARY, 'de_DE.UTF-8') changes how some built-in functions behave, but it depends on locales being installed on the server, so it's fragile — prefer the intl extension in the next section. gettext is the GNU translation standard: you wrap text in _('Welcome') and ship compiled .mo files per language. Whichever you choose, the non-negotiable rule is UTF-8 everywhere — and to count or cut non-ASCII text you must use the multibyte mb_ functions, because the plain ones work in bytes, not characters.

    setlocale / gettext pattern + why mb_ functions matter
    <?php
    // setlocale() changes how PHP's BUILT-IN functions behave for a category:
    //   LC_MONETARY  -> money,  LC_TIME -> dates,  LC_ALL -> everything.
    // It depends on locales being installed on the server, so prefer intl
    // (next section) for real formatting. This shows the classic approach.
    
    // gettext is the GNU translation standard. You wrap text in _() and ship
    // compiled .mo files per language. The pattern (not run here) looks like:
    //
    //   setlocale(LC_MESSAGES, 'es_ES.UTF-8');
    //   bindtextdomain('messages', __DIR__ . '/locale');
    //   textdomain('messages');
    //   echo _('Welcome');         // -> "Bienvenido" if es_ES is installed
    //   echo ngettext('1 file', '%d files', $n);  // gettext picks the plural form
    
    // Whichever system you use, ALWAYS declare UTF-8 so accents survive:
    header('Content-Type: text/html; charset=UTF-8');         // for web pages
    mb_internal_encoding('UTF-8');                            // multibyte string ops
    
    $word = 'Crème brûlée';
    echo $word . "\n";
    echo "strlen (bytes):   " . strlen($word) . "\n";       // counts BYTES
    echo "mb_strlen (chars): " . mb_strlen($word) . "\n";   // counts CHARACTERS
    Output
    Crème brûlée
    strlen (bytes):   13
    mb_strlen (chars): 12
    This is real code — run it for free atonecompiler.com/phpor in your own editor.

    See the difference? strlen reports 13 because the accented letters take two bytes each in UTF-8, while mb_strlen correctly reports 12 characters. Get this wrong and you'll slice a character in half and produce mojibake — those garbled é symbols.

    3️⃣ The intl Extension: Currency, Dates & Locale Detection

    The same number is written completely differently around the world: 1,234,567.89 in the US is 1.234.567,89 in Germany, and Japanese yen has no decimal places at all. PHP's intl extension wraps ICU — the industry-standard Unicode library — so NumberFormatter and IntlDateFormatter apply each locale's real rules for you, including the correct currency symbol and its position. Dates also need a time zone: always store times in UTC, then convert to the visitor's zone when you display them.

    NumberFormatter (currency), IntlDateFormatter & locale detection
    <?php
    // PHP's intl extension wraps ICU — the same Unicode library Chrome and Java use.
    // It formats numbers, currency, dates and percents using each locale's REAL rules.
    
    $amount = 1234567.89;
    $date   = new DateTimeImmutable('2025-03-15 14:30', new DateTimeZone('UTC'));
    
    $locales = [
        'en-US' => 'USD',   // United States dollars
        'de-DE' => 'EUR',   // German euros
        'ja-JP' => 'JPY',   // Japanese yen (no decimal places)
    ];
    
    foreach ($locales as $locale => $currency) {
        // CURRENCY: pass the ISO code; the formatter adds the right symbol & decimals.
        $cur  = new NumberFormatter($locale, NumberFormatter::CURRENCY);
        // PERCENT: 0.1547 becomes "15%" / "15 %" depending on locale.
        $pct  = new NumberFormatter($locale, NumberFormatter::PERCENT);
        // DATES: LONG date, with the visitor's time zone applied.
        $df   = new IntlDateFormatter(
            $locale,
            IntlDateFormatter::LONG,   // e.g. "March 15, 2025"
            IntlDateFormatter::SHORT,  // time part, e.g. "2:30 PM"
            'Europe/Paris'             // convert the UTC time into this zone
        );
    
        echo "[{$locale}]\n";
        echo "  currency: " . $cur->formatCurrency($amount, $currency) . "\n";
        echo "  percent:  " . $pct->format(0.1547)                     . "\n";
        echo "  date:     " . $df->format($date)                       . "\n";
    }
    
    // === Detect the visitor's locale (best-to-worst priority) ===
    function detectLocale(): string {
        if (isset($_GET['lang']))    return $_GET['lang'];      // ?lang=fr
        if (isset($_COOKIE['lang'])) return $_COOKIE['lang'];   // saved choice
        // Accept-Language header — let ICU pick the closest match we support:
        $header = $_SERVER['HTTP_ACCEPT_LANGUAGE'] ?? '';
        if ($header !== '') {
            return Locale::acceptFromHttp($header) ?: 'en';     // "fr-CH,fr;q=0.9" -> "fr_CH"
        }
        return 'en';                                            // safe default
    }
    Needs PHP's intl extension (bundled in the standard install). Output is locale-aware — typically: $1,234,567.89 (en-US), 1.234.567,89 € (de-DE), ¥1,234,568 (ja-JP). Exact symbols/spacing vary by ICU version.

    The detectLocale() function shows the priority order professionals use: an explicit choice (?lang=fr or a /fr/ URL prefix) beats a saved cookie, which beats the browser's Accept-Language header, which beats a default. Locale::acceptFromHttp() parses that messy header (fr-CH,fr;q=0.9,en;q=0.8) and returns the best match.

    4️⃣ Plurals Done Right with MessageFormatter

    "5 items" feels trivial in English, but plural rules vary enormously: English has 2 forms, Polish has 3 (for 1, for 2–4, and for 5+), and Arabic has 6. If you write $n === 1 ? 'item' : 'items' you've baked English grammar into your code and it will be wrong everywhere else. MessageFormatter uses ICU's {count, plural, ...} syntax, reads the locale's plural rules, and picks the right form — the # is replaced by the number, formatted for that locale.

    Correct plurals for English and Polish
    <?php
    // "5 items" is easy in English, but plural RULES differ wildly:
    //   English: 1 = singular, everything else = plural
    //   Polish:  different forms for 1, 2–4, and 5+
    //   Arabic:  SIX forms (zero, one, two, few, many, other)
    // Never build plurals with if/else — use ICU's MessageFormatter {plural}.
    
    $pattern = [
        'en' => '{count, plural, =0 {No items} one {# item} other {# items}}',
        'pl' => '{count, plural, =0 {Brak} one {# produkt} few {# produkty} many {# produktów} other {# produktu}}',
    ];
    
    foreach (['en', 'pl'] as $locale) {
        echo "[{$locale}]\n";
        foreach ([0, 1, 3, 5] as $count) {
            // ICU reads the locale's plural rules and picks the right form.
            // '#' is replaced by the number, formatted for the locale.
            $text = MessageFormatter::formatMessage($locale, $pattern[$locale], ['count' => $count]);
            echo "  count={$count}: {$text}\n";
        }
    }
    Output
    [en]
      count=0: No items
      count=1: 1 item
      count=3: 3 items
      count=5: 5 items
    [pl]
      count=0: Brak
      count=1: 1 produkt
      count=3: 3 produkty
      count=5: 5 produktów
    This is real code — run it for free atonecompiler.com/phpor in your own editor.

    5️⃣ Your Turn: Translate & Format

    Now you drive. The first script is almost complete — fill in each ___ using the 👉 hint, then run it and check it against the Output panel.

    🎯 Your turn: finish the French translation
    <?php
    // 🎯 YOUR TURN — finish this translator so French works too.
    // Fill in each blank marked ___ , then run it and check the Output panel.
    
    $messages = [
        'en' => ['hello' => 'Hello, :name!'],
        'fr' => ['hello' => ___],     // 👉 the French greeting, e.g. 'Bonjour, :name !'
    ];
    
    function t(array $messages, string $locale, string $key, array $params = []): string {
        $msg = $messages[$locale][$key] ?? $messages['en'][$key] ?? $key;
        foreach ($params as $name => $value) {
            // 👉 replace :name etc. with the value. Fill in the search needle:
            $msg = str_replace(___, (string) $value, $msg);   // 👉 e.g. ":{$name}"
        }
        return $msg;
    }
    
    echo t($messages, 'en', 'hello', ['name' => 'Sam']) . "\n";
    echo t($messages, 'fr', 'hello', ['name' => 'Sam']) . "\n";
    
    // ✅ Expected output:
    //    Hello, Sam!
    //    Bonjour, Sam !
    Output
    Hello, Sam!
    Bonjour, Sam !
    Fill in the French greeting and the str_replace needle (":{$name}"), then run it. You should get two greeting lines.

    One more. This time you'll format the same price as two currencies with NumberFormatter. Fill in the locale and the formatter type.

    🎯 Your turn: format currency for two locales
    <?php
    // 🎯 YOUR TURN — show the same price in two currencies using intl.
    // Fill in the ___ blanks, then run it. (Needs the intl extension.)
    
    $price = 49.5;
    
    // 1) A US-dollar formatter for the en-US locale:
    $usd = new NumberFormatter('en-US', NumberFormatter::___);  // 👉 CURRENCY
    echo $usd->formatCurrency($price, 'USD') . "\n";
    
    // 2) A euro formatter for the de-DE locale:
    $eur = new NumberFormatter(___, NumberFormatter::CURRENCY);  // 👉 'de-DE'
    echo $eur->formatCurrency($price, 'EUR') . "\n";
    
    // ✅ Expected output (symbols/spacing may vary by ICU version):
    //    $49.50
    //    49,50 €
    Replace the blanks with CURRENCY and 'de-DE', then run it (needs intl). Expect about $49.50 and 49,50 €.

    Common Errors (and the fix)

    • Hardcoded or concatenated stringsecho "You have " . $count . " items" can't be translated and breaks where word order differs. Move the whole sentence into a message file with a placeholder: t('cart.items', ['count' => $count]).
    • Wrong plural rules$n === 1 ? 'item' : 'items' bakes in English grammar and fails in Polish, Arabic, Russian, etc. Use MessageFormatter with a {count, plural, ...} pattern so ICU picks the right form.
    • Accents become é or ? (mojibake) — an encoding mismatch. Save files as UTF-8, send charset=UTF-8, call mb_internal_encoding('UTF-8'), and use utf8mb4 in MySQL. Count/cut text with mb_strlen / mb_substr, never strlen / substr.
    • "Class 'NumberFormatter' not found" — the intl extension isn't enabled. Enable extension=intl in php.ini (or apt install php-intl); check with php -m | grep intl.
    • Currency formatting with number_format() — it forces you to hardcode the symbol, separators, and decimal count, which you'll get wrong for locales you don't speak (and yen has no decimals). Use $fmt->formatCurrency($amount, 'EUR') instead.

    Pro Tips

    • 💡 Use URL prefixes (/fr/products) over ?lang=fr — search engines index each language as its own page, and you can add hreflang tags so Google serves the right one.
    • 💡 Store all times in UTC and convert to the visitor's zone only when displaying. Mixing zones in storage is a classic source of "off by a day" bugs.
    • 💡 Plan for RTL. Arabic, Hebrew, Persian, and Urdu read right-to-left. Set dir="rtl" and lang="ar" on <html>, use CSS logical properties (margin-inline-start, not margin-left), and let the browser mirror the layout.
    • 💡 Always give translators context. A key like button.save beats save — the same English word can need different translations as a noun vs a verb.

    📋 Quick Reference — Localization

    ToolExampleWhat It Does
    NumberFormatter$f->formatCurrency($n,'EUR')Locale-aware money / numbers / percent
    IntlDateFormatter$d->format($date)Locale & timezone-aware dates
    MessageFormatter::formatMessage($loc,$p,$args)Correct plurals per language
    Locale::acceptFromHttp(...$_SERVER['HTTP_ACCEPT_LANGUAGE'])Best locale from the browser header
    setlocale / gettext_('Welcome')Classic GNU translation system
    mb_strlen / mb_substrmb_strlen($s)Count / cut by character, not byte (UTF-8)

    Frequently Asked Questions

    Q: What is the difference between i18n and l10n?

    Internationalization (i18n — 18 letters between the i and the n) is the engineering work that makes your app capable of any language: no hardcoded text, placeholders for names and counts, and slots for plural forms. Localization (l10n) is the per-language work that fills those slots — the actual translations plus locale-aware formatting of dates, numbers, and currency. You do i18n once in the code, then l10n many times, once per market.

    Q: Should I use gettext or array-based message files?

    Both are valid. gettext is the long-standing GNU standard: you wrap text in _(), and translators edit .po files that compile to fast binary .mo files, with built-in plural support via ngettext. Array files (or JSON/YAML loaded into a Translator class) are simpler to set up, easy to version in git, and what most modern frameworks like Laravel and Symfony use under the hood. For a new project, array/JSON files with an intl-based plural layer are usually the least friction; reach for gettext when you already have a translator workflow built around .po files.

    Q: Why use the intl extension instead of number_format() and date()?

    Because intl wraps ICU, the industry-standard Unicode library, so it already knows every locale's real rules. number_format() and date() force you to hardcode the decimal separator, thousands separator, currency symbol, and month names — and you will get them wrong for locales you don't speak. NumberFormatter, IntlDateFormatter, and MessageFormatter handle currency placement, Japanese yen having no decimals, German using a comma for decimals, and correct plural forms automatically. Always prefer intl for user-facing output.

    Q: How do I handle plurals correctly across languages?

    Never build them with if (count === 1). English has 2 plural forms, Polish has 3, and Arabic has 6 (zero, one, two, few, many, other). Use ICU's MessageFormatter with a {count, plural, ...} pattern, or gettext's ngettext(). ICU reads the locale's plural rules and selects the right form for you, so the same code prints '5 items' in English and '5 produktów' in Polish without any branching in your PHP.

    Q: Why do accented characters turn into question marks or mojibake?

    Almost always an encoding mismatch. Make everything UTF-8 end to end: save your PHP files as UTF-8, send header('Content-Type: text/html; charset=UTF-8'), set mb_internal_encoding('UTF-8'), and configure your database connection to utf8mb4. Also use the mb_ string functions (mb_strlen, mb_substr, mb_strtoupper) for non-ASCII text, because the plain versions count and cut bytes, not characters, and will slice a multi-byte character in half.

    Q: How should I detect which language to show a visitor?

    Check sources in priority order: an explicit ?lang= choice or /fr/ URL prefix, then a saved cookie or session value, then the browser's Accept-Language header via Locale::acceptFromHttp(), and finally a default locale. For SEO, prefer a URL segment like /fr/products over a query parameter so search engines index each language as its own page. Whatever the user actively chooses should win and be remembered in a cookie.

    Mini-Challenge: Order Summary

    No code is filled in this time — just a brief and an outline. Write it yourself, run it on onecompiler.com/php or your own machine, then check your result against the expected output in the comments. This combines a plural phrase and a currency format — exactly the write-run-check loop you'll use on real localized features.

    🎯 Mini-Challenge: build a locale-aware order summary
    <?php
    // 🎯 MINI-CHALLENGE: a locale-aware "order summary" line.
    // No code is filled in — work from the steps, then run it.
    //
    // 1. Set $locale = 'de-DE' and $count = 3 and $total = 1499.0 .
    // 2. Build a plural-correct item phrase with MessageFormatter, e.g.
    //      '{count, plural, one {# Artikel} other {# Artikel}}'
    // 3. Format $total as EUR currency with NumberFormatter(... CURRENCY).
    // 4. echo one line:  "<itemPhrase> — <formattedTotal>"
    //
    // Tip: MessageFormatter::formatMessage($locale, $pattern, ['count' => $count])
    // Tip: don't forget mb_internal_encoding('UTF-8') if you add accents.
    //
    // ✅ Expected output (de-DE, spacing may vary):
    //    3 Artikel — 1.499,00 €
    
    // your code here
    Combine MessageFormatter (plural) and NumberFormatter (EUR) for de-DE, then run it. Your line should read something like 3 Artikel — 1.499,00 €.

    🎉 Lesson Complete!

    • i18n makes code language-capable; l10n supplies each language's translations and formats
    • ✅ Never hardcode text — look it up by key with :placeholder slots and a fallback locale
    • ✅ Use the intl extension: NumberFormatter for money, IntlDateFormatter for dates (always store UTC)
    • MessageFormatter picks the correct plural form for any language
    • ✅ Detect locale in priority order (URL → cookie → Accept-Language → default), and plan for RTL
    • ✅ Keep everything UTF-8 and reach for the mb_ string functions
    • Next lesson: Search Features — add full-text search and Elasticsearch to your app

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service