GithubHelp home page GithubHelp logo

jbroadway / urlify Goto Github PK

View Code? Open in Web Editor NEW
668.0 34.0 77.0 162 KB

A fast PHP slug generator and transliteration library that converts non-ascii characters for use in URLs.

License: BSD 3-Clause "New" or "Revised" License

PHP 100.00%
urlify php slugs transliteration blogging seo slug slugify ascii unicode

urlify's Introduction

URLify for PHP

GitHub Workflow Status (branch) Packagist License Packagist Version Packagist PHP Version Support Packagist Downloads

A fast PHP slug generator and transliteration library, started as a PHP port of URLify.js from the Django project.

Handles symbols from latin languages, Arabic, Azerbaijani, Bulgarian, Burmese, Croatian, Czech, Danish, Esperanto, Estonian, Finnish, French, Switzerland (French), Austrian (French), Georgian, German, Switzerland (German), Austrian (German), Greek, Hindi, Kazakh, Latvian, Lithuanian, Norwegian, Persian, Polish, Romanian, Russian, Swedish, Serbian, Slovak, Turkish, Ukrainian and Vietnamese, and many other via ASCII::to_transliterate().

Symbols it cannot transliterate it can omit or replace with a specified character.

Installation

Install the latest version with:

$ composer require jbroadway/urlify

Usage

First, include Composer's autoloader:

require_once 'vendor/autoload.php';

To generate slugs for URLs:

<?php

echo URLify::slug (' J\'étudie le français ');
// "jetudie-le-francais"

echo URLify::slug ('Lo siento, no hablo español.');
// "lo-siento-no-hablo-espanol"

To generate slugs for file names:

<?php

echo URLify::filter ('фото.jpg', 60, "", true);
// "foto.jpg"

To simply transliterate characters:

<?php

echo URLify::downcode ('J\'étudie le français');
// "J'etudie le francais"

echo URLify::downcode ('Lo siento, no hablo español.');
// "Lo siento, no hablo espanol."

/* Or use transliterate() alias: */

echo URLify::transliterate ('Lo siento, no hablo español.');
// "Lo siento, no hablo espanol."

To extend the character list:

<?php

URLify::add_chars ([
	'¿' => '?', '®' => '(r)', '¼' => '1/4',
	'½' => '1/2', '¾' => '3/4', '¶' => 'P'
]);

echo URLify::downcode ('¿ ® ¼ ¼ ¾ ¶');
// "? (r) 1/2 1/2 3/4 P"

To extend the list of words to remove:

<?php

URLify::remove_words (['remove', 'these', 'too']);

To prioritize a certain language map:

<?php

echo URLify::filter ('Ägypten und Österreich besitzen wie üblich ein Übermaß an ähnlich öligen Attachés', 60, 'de');
// "aegypten-und-oesterreich-besitzen-wie-ueblich-ein-uebermass-aehnlich-oeligen-attaches"

echo URLify::filter ('Cağaloğlu, çalıştığı, müjde, lazım, mahkûm', 60, 'tr');
// "cagaloglu-calistigi-mujde-lazim-mahkum"

Please note that the "ü" is transliterated to "ue" in the first case, whereas it results in a simple "u" in the latter.

urlify's People

Contributors

andgrankin avatar bamse16 avatar drfairy avatar geminorum avatar grahamcampbell avatar jbroadway avatar jmontoyaa avatar karptonite avatar knackebrot avatar korvinszanto avatar lux avatar madman-81 avatar mente avatar mkraemer avatar nayjest avatar nickl- avatar nikmarchenko avatar patrickheck avatar pincombe avatar plaker avatar quangbahoa avatar rinogo avatar samnela avatar scorp13 avatar shefi avatar skyosev avatar tobiassjosten avatar tobion avatar voku avatar ywarnier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

urlify's Issues

Unable to urlify properly

Hi there,

I've been trying to urlify a very simple string but the last part is being dropped. It's probably a wanted behaviour but it could be useful if there may be an option to avoid that.

My string is "Brazilian Série A" and I want it to become "brazilian-serie-a". It becomes "brazilian-serie" instead without the final "-a" part. Any way I can do this?

Below my code:

\URLify::filter('Brazilian Série A') // produces "brazilian-serie"

Tried also with:

\URLify::filter('Brazilian Série A', 120, 'en') // produces "brazilian-serie"

Support more characters by default

Had to add the following chars for our transliteration test to pass:

        URLify::add_chars(
            array(
                'Ÿ' => 'Y',
                'µ' => 'u',
                '¥' => 'Y',
                'Ĉ' => 'C',
                'ĉ' => 'c',
                'Ċ' => 'C',
                'ċ' => 'c',
                'Ĝ' => 'G',
                'ĝ' => 'g',
                'Ġ' => 'G',
                'ġ' => 'g',
                'Ĥ' => 'H',
                'ĥ' => 'h',
                'Ħ' => 'H',
                'ħ' => 'h',
                'Ĕ' => 'E',
                'ĕ' => 'e',
                'Ĭ' => 'I',
                'ĭ' => 'i',
                'Ĵ' => 'J',
                'ĵ' => 'j',
                'Ĺ' => 'L',
                'ĺ' => 'l',
                'Ľ' => 'L',
                'ľ' => 'l',
                'Ŀ' => 'L',
                'ŀ' => 'l',
                'ʼn' => 'n',
                'Ō' => 'O',
                'ō' => 'o',
                'Ŏ' => 'O',
                'ŏ' => 'o',
                'Ŕ' => 'R',
                'ŕ' => 'r',
                'Ŗ' => 'R',
                'ŗ' => 'r',
                'Ŝ' => 'S',
                'ŝ' => 's',
                'Ŧ' => 'T',
                'ŧ' => 't',
                'Ŭ' => 'U',
                'ŭ' => 'u',
                'Ŵ' => 'W',
                'ŵ' => 'w',
                'Ŷ' => 'Y',
                'ŷ' => 'y',
                'ſ' => 'i',
                'ƒ' => 'f',
                'O' => 'O',
                'o' => 'o',
                'U' => 'U',
                'u' => 'u',
                'Ǎ' => 'A',
                'ǎ' => 'a',
                'Ǐ' => 'I',
                'ǐ' => 'i',
                'Ǒ' => 'O',
                'ǒ' => 'o',
                'Ǔ' => 'U',
                'ǔ' => 'u',
                'Ǖ' => 'U',
                'ǖ' => 'u',
                'Ǘ' => 'U',
                'ǘ' => 'u',
                'Ǚ' => 'U',
                'ǚ' => 'u',
                'Ǜ' => 'U',
                'ǜ' => 'u',
                'Ǻ' => 'A',
                'ǻ' => 'a',
                'Ǿ' => 'O',
                'ǿ' => 'o',
                'Ǽ' => 'Ae',
                'ǽ' => 'ae',
                'IJ' => 'IJ',
                'ij' => 'ij',
                'J' => 'J',
                'ĸ' => 'k',
                'Ŋ' => 'N',
                'ŋ' => 'n',
                'Ẁ' => 'W',
                'ẁ' => 'w',
                'Ẃ' => 'W',
                'ẃ' => 'w',
                'Ẅ' => 'W',
                'ẅ' => 'w',
            )
        );

Unfortunately, since I do not know what language they belong to, I find it difficult to provide a PR when the code is structured based on language.

Not compatable with Laravel 9

Since Laravel 9 is requiring voku/portable-ascii:^2.0 and this repo is requiring voku/portable-ascii:^1.4 it causes a conflict when trying to update composer.

save file extension when $file_name = true

\URLify::filter('abcdefghi.jpg', 6, 'en', true); // returns abcdef

It would be good to save extension of the file name so to have the result 'abcdef.jpg' in this case.

Lithuanian map :)

    'lithuanian_map' => array (
        'ą' => 'a', 'č' => 'c', 'ę' => 'e', 'ė' => 'e', 'į' => 'i', 'š' => 's', 'ų' => 'u', 'ū' => 'u', 'ž' => 'z',
        'Ą' => 'A', 'Č' => 'C', 'Ę' => 'E', 'Ė' => 'E', 'Į' => 'I', 'Š' => 'S', 'Ų' => 'U', 'Ū' => 'U', 'Ž' => 'Z'
    )

how to reverse url slug

echo URLify::slug('中文简体');
result zhong-wen-jian-ti
how to get back slug in chines
i means how can reverse slug translate

Missing A char

Hi, I found a strange bug, look at the below code (local ENV: php 5.6 on mac os, dev-prod ENV: php 5.6 on ubuntu 16):

  • var_dump(\URLify::filter('Text sample A')); // text-sample
  • var_dump(\URLify::filter('Text sample B')); // text-sample-b
  • var_dump(\URLify::filter('Text sample AA')); // text-sample-aa

Where is, in the first var_dump, the last "a" char?

Is this package still maintained?

Ó => o

This is probably a typo in the code:

The uppercase Ó is coverted to lowercase o due to line 72 in URLify.php:

'Ó' => 'o',

correct:

'Ó' => 'O',

preserve case feature

Hello,

I think there could be an option to preserve case?

Sometimes we want:

"Alfred is good"
to be converted to "Alfred-is-good" instead of all lowercase...

Why is $underscoreToSpace removed ?

Hi,

Why is $underscoreToSpace removed from the filter ? It was pretty handy to make underscores hypens of you wanted, or spaces ofcourse.

I hope there is a good reason for it!

Thanks

Difficulty generating a slug with / and ,

When generating a slug that contains / and , , it replaces these characters with nothing, when it should correctly replace them with a separator.

  • Test string: Bomba Submersa 1/4HP 0,25 110V Lepono
  • Incorrect: bomba-submersa-14hp-025-110v-lepono
  • Correct: bomba-submersa-1-4hp-0-25-110v-lepono

I modified the following code snippet:

$string = (string) \preg_replace(
            [
                // 1) remove un-needed chars
                '/[^' . $separatorEscaped . $removePatternAddOn . '\-a-zA-Z0-9\s]/u',
                // 2) convert spaces to $separator
                '/[\s]+/u',
                // 3) remove some extras words
                $removeWordsSearch,
                // 4) remove double $separator's
                '/[' . ($separatorEscaped ?: ' ') . ']+/u',
                // 5) remove $separator at the end
                '/[' . ($separatorEscaped ?: ' ') . ']+$/u',
            ],
            [
                '',
                $separator,
                '',
                $separator,
                '',
            ],
            $string
        );

To:

$string = (string) \preg_replace(
            [
                // 1) remove un-needed chars
                '/[^' . $separatorEscaped . $removePatternAddOn . '\-a-zA-Z0-9\s]/u',
                // 2) convert spaces to $separator
                '/[\s]+/u',
                // 3) remove some extras words
                $removeWordsSearch,
                // 4) remove double $separator's
                '/[' . ($separatorEscaped ?: ' ') . ']+/u',
                // 5) remove $separator at the end
                '/[' . ($separatorEscaped ?: ' ') . ']+$/u',
            ],
            [
                $separator,
                $separator,
                '',
                $separator,
                '',
            ],
            $string
        );

And it worked correctly.

1.2.4 changed transliteration behaviour

Upgrading from 1.2.3 to 1.2.4 broke our test suite, in particular some characters are transliterated differently, breaking assertions and semver.

E.g. we test that това е текст на бълрагски за тест becomes tova-e-tekst-na-blragski-za-test which is true in 1.2.3 and false in 1.2.4.

In 1.2.4 it instead transliterates to tova-e-tekst-na-bielragski-za-test.

urlify version in out
1.2.3 бълрагски blragski
1.2.4 бълрагски bielragski

I'm sure the dependency has its reasons for doing this, but composer pulled in 1.2.4 automatically and broke out test suites, this should have been a 1.3.0 or a 2.0.0 release.

Underscores as spaces

I just had URLify take the title _Summer and return the slug _summer, when I was actually expecting just summer. Maybe this is just me though?

Going forward I have my wrapper replace all occurrences of underscores with spaces. This matches at least my own internal logic much better but I wanted to throw it out there and see if maybe someone else also liked this behavior. Then I could roll a PR for it.

Please retain license

If this is a port of URLify.js as you write in the README, please retain the original license otherwise you don't have the right to port.

From a quick look original license is this one here: https://github.com/django/django/blob/master/LICENSE

Copyright (c) Django Software Foundation and individual contributors.
All rights reserved.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

    1. Redistributions of source code must retain the above copyright notice,
       this list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright
       notice, this list of conditions and the following disclaimer in the
       documentation and/or other materials provided with the distribution.

    3. Neither the name of Django nor the names of its contributors may be used
       to endorse or promote products derived from this software without
       specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Replacing underscores with spaces

Hi. I'm trying to modify the code so that underscores are not treated as spaces. I thought it would be as simple as commenting out that line of code, but that doesn't work. Any ideas why?

Passing certain characters to add_chars() method causes "preg_match_all(): Unknown modifier ']'"

Consider the following:

URLify::add_chars(['/' => '']);

This causes a language exception, preg_match_all(): Unknown modifier ']', because the / character is used as the regular expression delimiter within the URLify library.

The above example derives from a fairly common and reasonable use-case: I want to remove all illegal characters from a file name, and on UNIX and Windows, / is illegal.

To fix this, PHP's preg_quote() function must be called on the keys in the array argument passed to add_chars().

I'll submit a PR shortly that seeks to fix the issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.