jbroadway / urlify Goto Github PK

A fast PHP slug generator and transliteration library that converts non-ascii characters for use in URLs.

License: BSD 3-Clause "New" or "Revised" License

PHP 100.00%

urlify php slugs transliteration blogging seo slug slugify ascii unicode

urlify's Issues

1.2.4 changed transliteration behaviour

Upgrading from 1.2.3 to 1.2.4 broke our test suite, in particular some characters are transliterated differently, breaking assertions and semver.

E.g. we test that това е текст на бълрагски за тест becomes tova-e-tekst-na-blragski-za-test which is true in 1.2.3 and false in 1.2.4.

In 1.2.4 it instead transliterates to tova-e-tekst-na-bielragski-za-test.

urlify version	in	out
1.2.3	бълрагски	blragski
1.2.4	бълрагски	bielragski

I'm sure the dependency has its reasons for doing this, but composer pulled in 1.2.4 automatically and broke out test suites, this should have been a 1.3.0 or a 2.0.0 release.

how to reverse url slug

echo URLify::slug('中文简体');
result zhong-wen-jian-ti
how to get back slug in chines
i means how can reverse slug translate

Please retain license

If this is a port of URLify.js as you write in the README, please retain the original license otherwise you don't have the right to port.

From a quick look original license is this one here: https://github.com/django/django/blob/master/LICENSE

Copyright (c) Django Software Foundation and individual contributors.
All rights reserved.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

    1. Redistributions of source code must retain the above copyright notice,
       this list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright
       notice, this list of conditions and the following disclaimer in the
       documentation and/or other materials provided with the distribution.

    3. Neither the name of Django nor the names of its contributors may be used
       to endorse or promote products derived from this software without
       specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Passing certain characters to add_chars() method causes "preg_match_all(): Unknown modifier ']'"

Consider the following:

URLify::add_chars(['/' => '']);

This causes a language exception, preg_match_all(): Unknown modifier ']', because the / character is used as the regular expression delimiter within the URLify library.

The above example derives from a fairly common and reasonable use-case: I want to remove all illegal characters from a file name, and on UNIX and Windows, / is illegal.

To fix this, PHP's preg_quote() function must be called on the keys in the array argument passed to add_chars().

I'll submit a PR shortly that seeks to fix the issue.

Lithuanian map :)

    'lithuanian_map' => array (
        'ą' => 'a', 'č' => 'c', 'ę' => 'e', 'ė' => 'e', 'į' => 'i', 'š' => 's', 'ų' => 'u', 'ū' => 'u', 'ž' => 'z',
        'Ą' => 'A', 'Č' => 'C', 'Ę' => 'E', 'Ė' => 'E', 'Į' => 'I', 'Š' => 'S', 'Ų' => 'U', 'Ū' => 'U', 'Ž' => 'Z'
    )

preserve case feature

Hello,

I think there could be an option to preserve case?

Sometimes we want:

"Alfred is good"
to be converted to "Alfred-is-good" instead of all lowercase...

save file extension when $file_name = true

\URLify::filter('abcdefghi.jpg', 6, 'en', true); // returns abcdef

It would be good to save extension of the file name so to have the result 'abcdef.jpg' in this case.

Difficulty generating a slug with / and ,

When generating a slug that contains / and , , it replaces these characters with nothing, when it should correctly replace them with a separator.

Test string: Bomba Submersa 1/4HP 0,25 110V Lepono
Incorrect: bomba-submersa-14hp-025-110v-lepono
Correct: bomba-submersa-1-4hp-0-25-110v-lepono

I modified the following code snippet:

$string = (string) \preg_replace(
            [
                // 1) remove un-needed chars
                '/[^' . $separatorEscaped . $removePatternAddOn . '\-a-zA-Z0-9\s]/u',
                // 2) convert spaces to $separator
                '/[\s]+/u',
                // 3) remove some extras words
                $removeWordsSearch,
                // 4) remove double $separator's
                '/[' . ($separatorEscaped ?: ' ') . ']+/u',
                // 5) remove $separator at the end
                '/[' . ($separatorEscaped ?: ' ') . ']+$/u',
            ],
            [
                '',
                $separator,
                '',
                $separator,
                '',
            ],
            $string
        );

To:

$string = (string) \preg_replace(
            [
                // 1) remove un-needed chars
                '/[^' . $separatorEscaped . $removePatternAddOn . '\-a-zA-Z0-9\s]/u',
                // 2) convert spaces to $separator
                '/[\s]+/u',
                // 3) remove some extras words
                $removeWordsSearch,
                // 4) remove double $separator's
                '/[' . ($separatorEscaped ?: ' ') . ']+/u',
                // 5) remove $separator at the end
                '/[' . ($separatorEscaped ?: ' ') . ']+$/u',
            ],
            [
                $separator,
                $separator,
                '',
                $separator,
                '',
            ],
            $string
        );

And it worked correctly.

Replacing underscores with spaces

Hi. I'm trying to modify the code so that underscores are not treated as spaces. I thought it would be as simple as commenting out that line of code, but that doesn't work. Any ideas why?

Unable to urlify properly

Hi there,

I've been trying to urlify a very simple string but the last part is being dropped. It's probably a wanted behaviour but it could be useful if there may be an option to avoid that.

My string is "Brazilian Série A" and I want it to become "brazilian-serie-a". It becomes "brazilian-serie" instead without the final "-a" part. Any way I can do this?

Below my code:

\URLify::filter('Brazilian Série A') // produces "brazilian-serie"

Tried also with:

\URLify::filter('Brazilian Série A', 120, 'en') // produces "brazilian-serie"

Ó => o

This is probably a typo in the code:

The uppercase Ó is coverted to lowercase o due to line 72 in URLify.php:

'Ó' => 'o',

correct:

'Ó' => 'O',

stop after & symbol

hello,
why nothing traslit after & symbol in the string?

Why is $underscoreToSpace removed ?

Hi,

Why is $underscoreToSpace removed from the filter ? It was pretty handy to make underscores hypens of you wanted, or spaces ofcourse.

I hope there is a good reason for it!

Thanks

Support more characters by default

Had to add the following chars for our transliteration test to pass:

        URLify::add_chars(
            array(
                'Ÿ' => 'Y',
                'µ' => 'u',
                '¥' => 'Y',
                'Ĉ' => 'C',
                'ĉ' => 'c',
                'Ċ' => 'C',
                'ċ' => 'c',
                'Ĝ' => 'G',
                'ĝ' => 'g',
                'Ġ' => 'G',
                'ġ' => 'g',
                'Ĥ' => 'H',
                'ĥ' => 'h',
                'Ħ' => 'H',
                'ħ' => 'h',
                'Ĕ' => 'E',
                'ĕ' => 'e',
                'Ĭ' => 'I',
                'ĭ' => 'i',
                'Ĵ' => 'J',
                'ĵ' => 'j',
                'Ĺ' => 'L',
                'ĺ' => 'l',
                'Ľ' => 'L',
                'ľ' => 'l',
                'Ŀ' => 'L',
                'ŀ' => 'l',
                'ŉ' => 'n',
                'Ō' => 'O',
                'ō' => 'o',
                'Ŏ' => 'O',
                'ŏ' => 'o',
                'Ŕ' => 'R',
                'ŕ' => 'r',
                'Ŗ' => 'R',
                'ŗ' => 'r',
                'Ŝ' => 'S',
                'ŝ' => 's',
                'Ŧ' => 'T',
                'ŧ' => 't',
                'Ŭ' => 'U',
                'ŭ' => 'u',
                'Ŵ' => 'W',
                'ŵ' => 'w',
                'Ŷ' => 'Y',
                'ŷ' => 'y',
                'ſ' => 'i',
                'ƒ' => 'f',
                'O' => 'O',
                'o' => 'o',
                'U' => 'U',
                'u' => 'u',
                'Ǎ' => 'A',
                'ǎ' => 'a',
                'Ǐ' => 'I',
                'ǐ' => 'i',
                'Ǒ' => 'O',
                'ǒ' => 'o',
                'Ǔ' => 'U',
                'ǔ' => 'u',
                'Ǖ' => 'U',
                'ǖ' => 'u',
                'Ǘ' => 'U',
                'ǘ' => 'u',
                'Ǚ' => 'U',
                'ǚ' => 'u',
                'Ǜ' => 'U',
                'ǜ' => 'u',
                'Ǻ' => 'A',
                'ǻ' => 'a',
                'Ǿ' => 'O',
                'ǿ' => 'o',
                'Ǽ' => 'Ae',
                'ǽ' => 'ae',
                'Ĳ' => 'IJ',
                'ĳ' => 'ij',
                'J' => 'J',
                'ĸ' => 'k',
                'Ŋ' => 'N',
                'ŋ' => 'n',
                'Ẁ' => 'W',
                'ẁ' => 'w',
                'Ẃ' => 'W',
                'ẃ' => 'w',
                'Ẅ' => 'W',
                'ẅ' => 'w',
            )
        );

Unfortunately, since I do not know what language they belong to, I find it difficult to provide a PR when the code is structured based on language.

Missing A char

Hi, I found a strange bug, look at the below code (local ENV: php 5.6 on mac os, dev-prod ENV: php 5.6 on ubuntu 16):

var_dump(\URLify::filter('Text sample A')); // text-sample
var_dump(\URLify::filter('Text sample B')); // text-sample-b
var_dump(\URLify::filter('Text sample AA')); // text-sample-aa

Where is, in the first var_dump, the last "a" char?

Is this package still maintained?

All word lowercase

How can I make all characters lowercase?

Not compatable with Laravel 9

Since Laravel 9 is requiring voku/portable-ascii:^2.0 and this repo is requiring voku/portable-ascii:^1.4 it causes a conflict when trying to update composer.

Make use of PHP's Transliterator class

We could extend character support by making use of PHP's Transliterator class. May even be faster too.

Wider language support for transliteration?

There's another project, called Unidecode which appears to have complete transliteration tables for US-ASCII.

Maybe Urlify can use them?

Underscores as spaces

I just had URLify take the title _Summer and return the slug _summer, when I was actually expecting just summer. Maybe this is just me though?

Going forward I have my wrapper replace all occurrences of underscores with spaces. This matches at least my own internal logic much better but I wanted to throw it out there and see if maybe someone else also liked this behavior. Then I could roll a PR for it.

jbroadway / urlify Goto Github PK

urlify's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs