GithubHelp home page GithubHelp logo

neitanod / forceutf8 Goto Github PK

View Code? Open in Web Editor NEW
1.6K 95.0 363.0 141 KB

PHP Class Encoding featuring popular Encoding::toUTF8() function --formerly known as forceUTF8()-- that fixes mixed encoded strings.

PHP 100.00%

forceutf8's Introduction

forceutf8

PHP Class Encoding featuring popular \ForceUTF8\Encoding::toUTF8() function --formerly known as forceUTF8()-- that fixes mixed encoded strings.

Description

If you apply the PHP function utf8_encode() to an already-UTF8 string it will return a garbled UTF8 string.

This class addresses this issue and provides a handy static function called \ForceUTF8\Encoding::toUTF8().

You don't need to know what the encoding of your strings is. It can be Latin1 (ISO 8859-1), Windows-1252 or UTF8, or the string can have a mix of them. \ForceUTF8\Encoding::toUTF8() will convert everything to UTF8.

Sometimes you have to deal with services that are unreliable in terms of encoding, possibly mixing UTF8 and Latin1 in the same string.

Update:

I've included another function, \ForceUTF8\Encoding::fixUTF8(), which will fix the double (or multiple) encoded UTF8 string that looks garbled.

Usage:

use \ForceUTF8\Encoding;

$utf8_string = Encoding::toUTF8($utf8_or_latin1_or_mixed_string);

$latin1_string = Encoding::toLatin1($utf8_or_latin1_or_mixed_string);

also:

$utf8_string = Encoding::fixUTF8($garbled_utf8_string);

Examples:

use \ForceUTF8\Encoding;

echo Encoding::fixUTF8("Fédération Camerounaise de Football\n");
echo Encoding::fixUTF8("Fédération Camerounaise de Football\n");
echo Encoding::fixUTF8("Fédération Camerounaise de Football\n");
echo Encoding::fixUTF8("Fédération Camerounaise de Football\n");

will output:

Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football

Options:

By default, Encoding::fixUTF8 will use the Encoding::WITHOUT_ICONV flag, signalling that iconv should not be used to fix garbled UTF8 strings.

This class also provides options for iconv processing, such as Encoding::ICONV_TRANSLIT and Encoding::ICONV_IGNORE to enable these flags when the iconv class is utilized. The functionality of such flags are documented in the PHP iconv documentation.

Examples:

use \ForceUTF8\Encoding;

$str = "Fédération Camerounaise—de—Football\n"; // Uses U+2014 which is invalid ISO8859-1 but exists in Win1252
echo Encoding::fixUTF8($str); // Will break U+2014
echo Encoding::fixUTF8($str, Encoding::ICONV_IGNORE); // Will preserve U+2014
echo Encoding::fixUTF8($str, Encoding::ICONV_TRANSLIT); // Will preserve U+2014

will output:

Fédération Camerounaise?de?Football
Fédération Camerounaise—de—Football
Fédération Camerounaise—de—Football

while:

use \ForceUTF8\Encoding;

$str = "čęėįšųūž"; // Uses several characters not present in ISO8859-1 / Win1252
echo Encoding::fixUTF8($str); // Will break invalid characters
echo Encoding::fixUTF8($str, Encoding::ICONV_IGNORE); // Will remove invalid characters, keep those present in Win1252
echo Encoding::fixUTF8($str, Encoding::ICONV_TRANSLIT); // Will trasliterate invalid characters, keep those present in Win1252

will output:

????????
šž
ceeišuuž

Install via composer:

Edit your composer.json file to include the following:

{
    "require": {
        "neitanod/forceutf8": "~2.0"
    }
}

Tips:

You can tip me with Bitcoin if you want. :)

1Awfu4TZpy99H7Pyzt1mooxU1aP2mJVdHP

forceutf8's People

Contributors

byjg avatar codelingobot avatar hugopakula avatar j03k64 avatar j0k3r avatar mcuadros avatar mmarynich avatar neitanod avatar neoteknic avatar pborreli avatar podolinek avatar postalservice14 avatar redolent avatar superhero avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

forceutf8's Issues

Some string failed to convert

This package was very helpful for me, but still, there're some rare case that it fail to convert to proper utf8. I just want to post it here, in case the author or anyone , someday might have interest in further perfect this package.
Here're some string sample i have, while trying to read RSS from some Thailand sources, and input into MySQL database.


<p>หลังจากที่ภรรยา ฮารุ คลอดลูกคนที่สาม น้องเฮเดน คุณพ่อลู […]</p>
<p>The post <a rel="nofollow" href="http://www.tvpoolonline.com/content/357922">(คลิป) ฟังไปเสียวไป!!! เมื่อ “กาย รัชชานนท์” เล่าวินาทีทำหมันให้เห็นเป็นภาพ…ปิดอู่ลูก 3 เป็นที่เรียบร้อย</a> appeared first on <a rel="nofollow" href="http://www.tvpoolonline.com/">TV Pool</a>.</p>

and


<p>หลังจากที่ภรรยา <strong>ฮารุ</strong> คลอดลูกคนที่สาม <strong>น้องเฮเดน</strong> คุณพ่อลูกดก<strong> กาย รัชชานนท์</strong> มีแพลนว่าจะทำหมัน และแล้ววันนี้ก็มาถึงเพราะ ล่าสุด<strong> ฮารุ </strong>ได้โพสต์ภาพและคลิปวีดีโอ หลังจากที่ <strong>กาย รัชชานนท์</strong> ทำหมันเสร็วว่าอย่างไรไปฟัง</p>
<p> </p>
<blockquote class="instagram-media" data-instgrm-captioned data-instgrm-version="7" style=" background:#FFF; border:0; border-radius:3px; box-shadow:0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width:658px; padding:0; width:99.375%; width:-webkit-calc(100% - 2px); width:calc(100% - 2px);">
<div style="padding:8px;">
<div style=" background:#F8F8F8; line-height:0; margin-top:40px; padding:50.0% 0; text-align:center; width:100%;">
<div style=" background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACwAAAAsCAMAAAApWqozAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAMUExURczMzPf399fX1+bm5mzY9AMAAADiSURBVDjLvZXbEsMgCES5/P8/t9FuRVCRmU73JWlzosgSIIZURCjo/ad+EQJJB4Hv8BFt+IDpQoCx1wjOSBFhh2XssxEIYn3ulI/6MNReE07UIWJEv8UEOWDS88LY97kqyTliJKKtuYBbruAyVh5wOHiXmpi5we58Ek028czwyuQdLKPG1Bkb4NnM+VeAnfHqn1k4+GPT6uGQcvu2h2OVuIf/gWUFyy8OWEpdyZSa3aVCqpVoVvzZZ2VTnn2wU8qzVjDDetO90GSy9mVLqtgYSy231MxrY6I2gGqjrTY0L8fxCxfCBbhWrsYYAAAAAElFTkSuQmCC); display:block; height:44px; margin:0 auto -44px; position:relative; top:-22px; width:44px;"></div>
</div>
<p style=" margin:8px 0 0 0; padding:0 4px;"> <a href="https://www.instagram.com/p/BQxBd0_jmHX/" style=" color:#000; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px; text-decoration:none; word-wrap:break-word;" target="_blank">นางเล่าเรื่องการทำหมันซะเห็นภาพเลย 5555 part1 (เดี๋ยวมาต่อpart2คลิปต่อไปนะ) <img src="https://s.w.org/images/core/emoji/2.2.1/72x72/1f602.png" alt="😂" class="wp-smiley" style="height: 1em; max-height: 1em;"><img src="https://s.w.org/images/core/emoji/2.2.1/72x72/1f602.png" alt="😂" class="wp-smiley" style="height: 1em; max-height: 1em;"> #ไม่ฝากร้านฝากงานนะจ๊ะ</a></p>
<p style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; line-height:17px; margin-bottom:0; margin-top:8px; overflow:hidden; padding:8px 0 7px; text-align:center; text-overflow:ellipsis; white-space:nowrap;">A post shared by HARU SUPRAKOB (YAMAGUCHI) ❣ (@haruyamaguchi) on <time style=" font-family:Arial,sans-serif; font-size:14px; line-height:17px;" datetime="2017-02-21T07:54:09+00:00">Feb 20, 2017 at 11:54pm PST</time></p>
</div>
</blockquote>
<p></p>
<p> </p>
<p> </p>
<blockquote class="instagram-media" data-instgrm-captioned data-instgrm-version="7" style=" background:#FFF; border:0; border-radius:3px; box-shadow:0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width:658px; padding:0; width:99.375%; width:-webkit-calc(100% - 2px); width:calc(100% - 2px);">
<div style="padding:8px;">
<div style=" background:#F8F8F8; line-height:0; margin-top:40px; padding:50.0% 0; text-align:center; width:100%;">
<div style=" background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACwAAAAsCAMAAAApWqozAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAMUExURczMzPf399fX1+bm5mzY9AMAAADiSURBVDjLvZXbEsMgCES5/P8/t9FuRVCRmU73JWlzosgSIIZURCjo/ad+EQJJB4Hv8BFt+IDpQoCx1wjOSBFhh2XssxEIYn3ulI/6MNReE07UIWJEv8UEOWDS88LY97kqyTliJKKtuYBbruAyVh5wOHiXmpi5we58Ek028czwyuQdLKPG1Bkb4NnM+VeAnfHqn1k4+GPT6uGQcvu2h2OVuIf/gWUFyy8OWEpdyZSa3aVCqpVoVvzZZ2VTnn2wU8qzVjDDetO90GSy9mVLqtgYSy231MxrY6I2gGqjrTY0L8fxCxfCBbhWrsYYAAAAAElFTkSuQmCC); display:block; height:44px; margin:0 auto -44px; position:relative; top:-22px; width:44px;"></div>
</div>
<p style=" margin:8px 0 0 0; padding:0 4px;"> <a href="https://www.instagram.com/p/BQxCJ9vjWt0/" style=" color:#000; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px; text-decoration:none; word-wrap:break-word;" target="_blank">มาฟังต่อ part2 <img src="https://s.w.org/images/core/emoji/2.2.1/72x72/1f4aa.png" alt="💪" class="wp-smiley" style="height: 1em; max-height: 1em;"><img src="https://s.w.org/images/core/emoji/2.2.1/72x72/1f3fb.png" alt="🏻" class="wp-smiley" style="height: 1em; max-height: 1em;">✌<img src="https://s.w.org/images/core/emoji/2.2.1/72x72/1f3fb.png" alt="🏻" class="wp-smiley" style="height: 1em; max-height: 1em;"> ขอบคุณด่าด๊านะที่เสียสละทำหมันให้ แต้งกิ้วนะ love you <img src="https://s.w.org/images/core/emoji/2.2.1/72x72/1f495.png" alt="💕" class="wp-smiley" style="height: 1em; max-height: 1em;"> #ไม่ฝากร้านฝากงานนะจ๊ะ @guyratchanont</a></p>
<p style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; line-height:17px; margin-bottom:0; margin-top:8px; overflow:hidden; padding:8px 0 7px; text-align:center; text-overflow:ellipsis; white-space:nowrap;">A post shared by HARU SUPRAKOB (YAMAGUCHI) ❣ (@haruyamaguchi) on <time style=" font-family:Arial,sans-serif; font-size:14px; line-height:17px;" datetime="2017-02-21T08:00:10+00:00">Feb 21, 2017 at 12:00am PST</time></p>
</div>
</blockquote>
<p></p>
<p> </p>
<p>The post <a rel="nofollow" href="http://www.tvpoolonline.com/content/357922">(คลิป) ฟังไปเสียวไป!!! เมื่อ “กาย รัชชานนท์” เล่าวินาทีทำหมันให้เห็นเป็นภาพ…ปิดอู่ลูก 3 เป็นที่เรียบร้อย</a> appeared first on <a rel="nofollow" href="http://www.tvpoolonline.com/">TV Pool</a>.</p>
  
[2017-02-22 09:48:11] local.ERROR: description: 
<p>หลังจากที่ภรรยา ฮารุ คลอดลูกคนที่สาม น้องเฮเดน คุณพ่อลู […]</p>
<p>The post <a rel="nofollow" href="http://www.tvpoolonline.com/content/357922">(คลิป) ฟังไปเสียวไป!!! เมื่อ “กาย รัชชานนท์” เล่าวินาทีทำหมันให้เห็นเป็นภาพ…ปิดอู่ลูก 3 เป็นที่เรียบร้อย</a> appeared first on <a rel="nofollow" href="http://www.tvpoolonline.com/">TV Pool</a>.</p>

gb2312 encoding not supported

Hey there, great class! But i have problems with an gb2312 encoded document.
Is it right that this isn't supported?
If yes, is it possible that you implement this?

Thanks
Julian

Create a release

It would be really good if you could create a release for this so we can target a specific release of this. I feel uncomfortable targeting 'master'.

Thanks in advance!

Does not handle illegal UTF-8 chars

The Wikipedia article on UTF-8 (as well as other documents around the web) mention a handful of situations where a structurally-valid UTF-8 character should actually be rejected or modified. One example of this is overlong characters.
Your script doesn't do anything about such cases.

Packagist was not updated with the new tags

First of all thank you for create the tags.

Now, packagist did not updated with these tags. I realize that the package neitanod/forceutf8 does not belongs to you neither to a any forceutf8 developer. I already contact the owner of packagist (https://github.com/nidelson) but there is no answer.

I suggest you to claim the ownership of this package to avoid this happens again. The packagist email is [email protected].

Sorry If I am boring you but I prefer the project to be maintained by the creator instead to create a fork and separate the codes.

No effect

Everyone seems to have this working, but I get the same output before and after:

Before:
96 kr/mån

After:
96 kr/mån

If I run: utf_decode on the string I get:
96 kr/mån

I tried to find the issue my self, but with no result..

License

Hi,

It seems to works just fine but what is the license used by your library ?

Some UTF8 characters become ? with fixUTF8

Hello neitanod, great lib!
However, some characters are converted into question marks with fixUTF8 method.
For example, the whole Russian alphabet and letters š and ž.

echo ForceUTF8::fixUTF8('hello žš'); //outputs hello ??
echo ForceUTF8::fixUTF8('привет'); //outputs ?????? 

W3C Test strings

I tried some of the strings from http://www.w3.org/2001/06/utf-8-test/UTF-8-demo.html and they failed to convert.

Specifically

Mathematics and Sciences:

  ∮ E⋅da = Q,  n → ∞, ∑ f(i) = ∏ g(i), ∀x∈ℝ: ⌈x⌉ = −⌊−x⌋, α ∧ ¬β = ¬(¬α ∨ β),

  ℕ ⊆ ℕ₀ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ, ⊥ < a ≠ b ≡ c ≤ d ≪ ⊤ ⇒ (A ⇔ B),

Unused local variable $enc

In Encoding.php on line 297 my debugger says "Unused local variable $enc"

$enc = preg_replace('/[^a-zA-Z0-9\s]/', '', $encoding);

Should this be changed to:

$encoding = preg_replace('/[^a-zA-Z0-9\s]/', '', $encoding);

?

Not working

Trying on this word but not working at all. Also tried the example but not working.

Medveđa

how to call the class ?

kindly:
how I can call the class ?
this make me error:
use \ForceUTF8\Encoding;
suggestions ?
thx

Unable to convert string

I am unable to convert the string, no mater what function i try to use.

<?php

require_once 'vendor/forceutf8/src/ForceUTF8/Encoding.php';

$string = "“Grinvich�";

echo "string to convert: {$string}<br>";

$string1 = \ForceUTF8\Encoding::toUTF8($string);

echo "<hr>toUTF8: {$string1}";

$string2 = \ForceUTF8\Encoding::fixUTF8($string);

echo "<hr>fixUTF8: {$string2}";

$string3 = \ForceUTF8\Encoding::UTF8FixWin1252Chars($string);

echo "<hr>UTF8FixWin1252Chars: {$string3}";

$string4 = \ForceUTF8\Encoding::toWin1252($string);

echo "<hr>toWin1252: {$string4}";

Failed to detect 4 bytes chars

I'm putting the text as quoted-printable

Testing emojis =F0=9F=98=84

That char (F09F9884) doesn't get correctly detected

japanese characters are turned to ?

Hi,

I'm using your library and it works but I only have one problem when I enter chinese or japanese characters it converted to "?". Any solution for this?

Thanks

The example from the readme does not work anymore

The example from the readme

echo Encoding::fixUTF8("Fédération Camerounaise de Football");
echo Encoding::fixUTF8("FÃédÃération Camerounaise de Football");
echo Encoding::fixUTF8("FÃÃédÃÃération Camerounaise de Football");
echo Encoding::fixUTF8("FÃÃÃédÃÃÃération Camerounaise de Football");

outputs the following for me:

Fédération Camerounaise de Football
FÃédÃération Camerounaise de Football
FÃÃédÃÃération Camerounaise de Football
FÃÃÃédÃÃÃération Camerounaise de Football

Basically it only seems to run one pass over the string, the first string looks ok though.
Am I doing something wrong? Perhaps there should be a tiny test suite?

I'm on debian 7, and this is the relevant PHP data:

PHP 5.4.4-14+deb7u8 (cli) (built: Feb 17 2014 09:18:47)
Copyright (c) 1997-2012 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2012 Zend Technologies

Garbles some characters

The script garbles some characters.
For instance if applied directly to a garbled source that contains "'â€�'", this script will convert it into "â??"
When using str_replace first, to turn "'â€�'" into the correct "”", then fixUTF8 will convert the "”" into a simple "?"

The problem is that forceUTF8 will turn almost all of these characters into the same representation, thus making it impossible to apply both str_replace and forceUTF8 on the same string.

Here's the chars that it doesn't correctly convert:

'“'
'�'
'’'
'‘'
'—'
'–'
'•'
'…'

Please help..

Hello,
I received error:
XML Parsing Error at line 1:
PCDATA invalid Char value 4.

Im using notepad++ for xml files.

How to find that error.. xml file big as universe... :( it should be some char written in bad encoding I guess, but how to find it in all file.. :/

Thank You for your help

toUTF8() return Invalid UTF-8 ?

Hi there
great lib. Works nearly perfectly on my project.
However, it seems that sometimes toUTF8() returns an invalid utf8-encoded string.

See this example:

$utf8_string = \ForceUTF8\Encoding::toUTF8(base64_decode('PGk+PCFbQ0RBVEFbMS4gUGVyZm9ybWFuY2UgYW5kIMxm5mdi9G504WF56atlOiBHZW5yZSwgS25vd2xlZGdlLCBhbmQgUG9saXRpY3MgIDIuIEEgQ3JpdGlxdWUgb2YgWW9y+WLhIEp1ZGdtZW50OiBJbmRpdmlkdWFsIEF1dGhvcml0eSwgQ29tbXVuaXR5IENyZWF0aW9uLCBhbmQgdGhlIEVtYm9kaW1lbnQgb2YgwKv3PEJSPjMuIFdoYXQgTWF0dGVyIFdobyBEYW5jZXM6IFNlbGYtZmFzaGlvbmluZywgKG5vbilTdWJqZWN0cywgYW5kIHRoZSBOYXRpb248QlI+NC4gTm8gVmljdG9yLCBubyBWYW5xdWlzaGVkLCBubyBQYXN0OiBPbGEgUm90aW1pLCBZYWt1YnUgR293b24sIFNhbmkgQWJhY2hhLCBhbmQgICcgJ1RoZSBFbmQgb2YgTmlnZXJpYW4gSGlzdG9yeSAnICc8QlI+NS4gVmFsdWVzIGJleW9uZCBFdGhpY3M6IEZyb20gU3RlbGxhIERpYSBPeWVkZXBvIHRvIFRlc3MgT253dWVtZTxCUj42LiBDb25jbHVzaW9uczogQ2l2aWwgR292ZXJuYW5jZSBhbmQgdGhlIFBvbGl0aWNzIG9mIFlvcvli4SBUaGVhdHJlPEJSPkJpYmxpb2dyYXBoeTxCUj5JbmRleDxCUj4gIF1dPjwvaT4NCg=='));

var_export(mb_check_encoding($utf8_string, 'UTF-8'));

Should print "true" but got "false".

Any Ideas?

Thank you

Support for  before group B in toUTF8 function

The toUTF8 function is currently unusable for me because of  not supported before group B. The non-breaking space character is alas, widely used in various sites. Would it be possible to add a specific fix for this character when followed by group B characters?

fixUTF8 Problem with certain characters

Hi, I have found that the fixUTF8 has issues when the input string has ligature characters such as Œ, this is converted to a ? sign, even tough the input string does not need any fixing.

Input: Café Nöel Œuf Aoüt
Output: Café Nöel ?uf Août

simple quote encoding

Hello, i use forceutf8 to encode some of email bosy in php imap.
I find an issue to encode simple quote.
It appears like '?'
Any method to fix this?

First example in readme works but not the other 3

Examples:

echo Encoding::fixUTF8("Fédération Camerounaise de Football");
echo Encoding::fixUTF8("FÃédÃération Camerounaise de Football");
echo Encoding::fixUTF8("FÃÃédÃÃération Camerounaise de Football");
echo Encoding::fixUTF8("FÃÃÃédÃÃÃération Camerounaise de Football");

will output:

Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football

The 1st works but the 3 after it only remove one of the à instead of all of them.

script breaks two-byte (e.g. Czech) symbols

$ cat a.php
<?php
require_once('Encoding.php');
use \ForceUTF8\Encoding;
$str = 'šřěī';
$newstr = Encoding::fixUTF8($str);
var_dump($str);
var_dump($newstr);

$ php a.php
string(8) "šřěī"
string(4) "????"

Truncated text when strlen is overloaded by mb_strlen.

Text is truncated when calling Encoding::fixUTF8 if mb_strlen is overloading strlen. This is because mb_strlen returns the char length instead of the byte length of the string.

This fixes the issue:

/** Encoding::toUTF8 @line 184 */
if ( function_exists('mb_strlen') && ((int) ini_get('mbstring.func_overload')) & 2) {
  $max = mb_strlen($text,'8bit');
} else {
  $max = strlen($text);
}

Leave alone numbers

Hi.
I'm working with this library in an existing project.
We are having an issue with numbers. They return inside quotes. And we'd like to leave them alone.

Something like this would be nice:

if(is_numeric($text)){
   return $textM
}

So, I'm wondering if this chunk intended to do that):

if(!is_string($text)) {
      return $text;
}

Because I don't wanna touch something that make break something else in the existing code.

Do you remember?

Thanks in advance.

About japanesse chars

Hello neianod! I wouldr apreciate your work! Bu now I'm experimenting an issue and is that i can't visualize japanesse characters.. Can you tell how can I visualize it?

EDIT: Here is the code and comparation:
fixUTF8() Example 5 not working. -> FAILED 17 tests passed. 1 tests failed.
Test::identical("fixUTF8() Example 5 not working.",
Encoding::fixUTF8("Á0ë0¿0ê0¹0a\n"),
"チルタリス\n");

Thanks in advance!

Special chars

Hello, i'm using this class and o notice that when I try to convert UTF-8 with special chars I got an problem.
The words are converted correctly but the special chars like • not

To latin1 didn't work

i've tried with pure utf8 string:
$s = 'Những Ca Khúc Nhạc Vàng';

and utf8 mixed:
$s = 'Những Ca Khúc Nhạc Vàng';

to latin1 but didn't worked.

unable to fix a string

Hi i try to fix a garbled UTF8 string but i don't know why it doesn't work :

require_once("./Encoding.php");

echo \ForceUTF8\Encoding::fixUTF8("Fédération Camerounaise de Football\n");

echo \ForceUTF8\Encoding::fixUTF8("Fédération Camerounaise de Football\n");

echo \ForceUTF8\Encoding::fixUTF8("Fédération Camerounaise de Football\n");

echo \ForceUTF8\Encoding::fixUTF8("Fédération Camerounaise de Football\n");

$data = "Année 80 International 587 tubes année par année";

$new = \ForceUTF8\Encoding::UTF8FixWin1252Chars($data);
echo "Test 1 
"; echo $new; $new = \ForceUTF8\Encoding::fixUTF8($data); echo "Test 2
"; echo $new;

Return me this :

Fédération Camerounaise de Football 

Fédération Camerounaise de Football 

Fédération Camerounaise de Football 

Fédération Camerounaise de Football 

Test 1 
Année 80 International 587 tubes année par année

Test 2 
Année 80 International 587 tubes année par année

Thanks for the help

The example does not work

<?php
require_once("ForceUTF8/Encoding.php");

echo \ForceUTF8\Encoding::fixUTF8("Fédération Camerounaise de Football");
echo "<br>";
echo \ForceUTF8\Encoding::fixUTF8("FÃédÃération Camerounaise de Football");
echo "<br>";
echo \ForceUTF8\Encoding::fixUTF8("FÃÃédÃÃération Camerounaise de Football");
echo "<br>";
echo \ForceUTF8\Encoding::fixUTF8("FÃÃÃédÃÃÃération Camerounaise de Football");

output:

Fédération Camerounaise de Football
FÃédÃération Camerounaise de Football
FÃÃédÃÃération Camerounaise de Football
FÃÃÃédÃÃÃération Camerounaise de Football

PCDATA invalid Char value 31

First of all, thanks !!
Your tools helps me a lot for UTF8 string conversion

I would save a string in XML and I would like to format it in UTF8,
But I have this error when I try to load XML after save.

DOMDocument::load(): PCDATA invalid Char value 31

I look for an answer in Google and I found that.

http://stackoverflow.com/questions/14463573/php-simplexml-load-file-invalid-character-error

function utf8_for_xml($string)
{
    return preg_replace ('/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u', ' ', $string);
}

Can you tell me if there is an other way to solve it?

Mathieu

convert character à to à problem

Hello

I've a problem when tried to convert string with char "à" : is not convert, so print Ã.
it's normal ?

I use fixUTF8() for this string : "La vidéo-surveillance se développe à Chambéry
Thanks

Translit symbols

Hi

According to PHP manual, iconv with //TRANSLIT flag should convert symbols (eg: € to EUR)

BUT it doesn't work with your version (still € after finction call)

It is because you are using "windows-1252" instead of "iso-8859-1" used as an example of the PHP manual ?

I am using it with Encoding::fixUTF8($a, Encoding::ICONV_TRANSLIT)
with $a = array('symbol' => '€')

Any help on this matter would be appreciated.

Also, isn't it odd not to allow flags for "toUTF8" and "toLatin1" functions

Could a new v1.5 release get tagged?

There's a lot of fantastic work happening and I love seeing so much activity. But having a newer tag release would help me feel more comfortable with my build stability.

Not converting ISO-8859-2

As said, toUTF8 method does not convert properly latin special characters from ISO-8859-2 encoding.

Is this project still active?

Hi, this project is very useful for the community and there are a lot of suggestions and requests but there is no answers from the maintainers. Is this project still alive?

Fatal error: Class 'Encoding' not found

hello,
Can you help me with this error?

require_once('config/encoding.php');

function convertUtf8($str)
{
return Encoding::toUTF8($str);
}
echo convertUtf8($field);

Some strings that were failed to Fix

I have some strings with broken encoding:

$testStr1 = <<<TEXT
China. In 1953, Max̥s parents decided
by ÌÕnew-ageÌÒ meditative
Peter Max (American, b.1937) ÌÕFour KennedysÌÒ, screenprint in colors, 1989, signed in white pencil
TEXT;

echo Encoding::UTF8FixWin1252Chars($testStr1), "\n\n", Encoding::fixUTF8($testStr1), "\n\n";

None of them were fixed:

China. In 1953, Max̥s parents decided
by ÌÕnew-ageÌÒ meditative
Peter Max (American, b.1937) ÌÕFour KennedysÌÒ, screenprint in colors, 1989, signed in white pencil

China. In 1953, Max?s parents decided
by ÌÕnew-ageÌÒ meditative
Peter Max (American, b.1937) ÌÕFour KennedysÌÒ, screenprint in colors, 1989, signed in white pencil

Any idea?

Conversion fix

Hello,

I have this string:

...in „Test string�

This string is stored in a database field which previously had the collation latin1_sweedish_ci. I have converted this to utf8_general_ci.

The value should be

...in „Test string”

I have tried so many solutions including yours, but it seems none of them are working. If you have a quick solution to this, would be very nice.

Cheers !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.