GithubHelp home page GithubHelp logo

About japanesse chars about forceutf8 HOT 3 CLOSED

neitanod avatar neitanod commented on August 19, 2024
About japanesse chars

from forceutf8.

Comments (3)

tegansnyder avatar tegansnyder commented on August 19, 2024

Do you know what charset your are working with? I just tried this extension on some Shift-JS charset characters and it failed to encode them in UTF-8. I think this extension is not detecting Shift-JS correctly.

Try doing it manually with iconv.

$string_to_encode = '質量';
$encoded_str = iconv('shift-jis','utf-8'.'//TRANSLIT', $string_to_encode);
echo $encoded_str;

from forceutf8.

neitanod avatar neitanod commented on August 19, 2024

The main function of this package is toUTF8(), which encodes Latin1 to UTF8 but detects characters that are already UTF-8 encoded and keeps them as they are, avoiding the usual problem of double encoding them.

FixUTF8 is an auxiliary function that fix Latin1 characters on double-encoded UTF8 strings, converting, for instance, "República" back to "República", but losing all non-Latin1 characters.

In other words, do not try to use it with japanese charactes (or any character outside the Latin1 set). It will break them.
Do not use it in production either. It's a hacky and slow function designed to be used in manual, human assisted batch processes by people that only use characters in the Latin1 set.

from forceutf8.

didix16 avatar didix16 commented on August 19, 2024

Oh okay thanks for the explanation! Anyway I wrote a function to handle the problem with it. I post it here if anyone want to use it or improve it. $nickname represents an unicode string wich means it has X pairs of bytes. In my case I'm working with BigEndian format. This function sucessfully returns any unicode string by getting its hexadecimal value.

function fixNick($nickname){
        $fixNick ="";
        $i=0;
        $strlen = strlen($nickname)-2;
        $unicode = (ord($nickname[$i+1]) << 8)+ord($nickname[$i]);
        while($i<$strlen and  $unicode>0){
            $i=$i+2;
            $fixNick.=mb_convert_encoding('&#'.$unicode, 'UTF-8', 'HTML-ENTITIES');
            $unicode = (ord($nickname[$i+1]) << 8)+ord($nickname[$i]);
        }

        return $fixNick;
    }

from forceutf8.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.