jbroadway / urlify Goto Github PK
View Code? Open in Web Editor NEWA fast PHP slug generator and transliteration library that converts non-ascii characters for use in URLs.
License: BSD 3-Clause "New" or "Revised" License
A fast PHP slug generator and transliteration library that converts non-ascii characters for use in URLs.
License: BSD 3-Clause "New" or "Revised" License
Upgrading from 1.2.3 to 1.2.4 broke our test suite, in particular some characters are transliterated differently, breaking assertions and semver.
E.g. we test that това е текст на бълрагски за тест
becomes tova-e-tekst-na-blragski-za-test
which is true in 1.2.3 and false in 1.2.4.
In 1.2.4 it instead transliterates to tova-e-tekst-na-bielragski-za-test
.
urlify version | in | out |
---|---|---|
1.2.3 | бълрагски | blragski |
1.2.4 | бълрагски | bielragski |
I'm sure the dependency has its reasons for doing this, but composer
pulled in 1.2.4 automatically and broke out test suites, this should have been a 1.3.0
or a 2.0.0
release.
echo URLify::slug('中文简体');
result zhong-wen-jian-ti
how to get back slug in chines
i means how can reverse slug translate
If this is a port of URLify.js as you write in the README, please retain the original license otherwise you don't have the right to port.
From a quick look original license is this one here: https://github.com/django/django/blob/master/LICENSE
Copyright (c) Django Software Foundation and individual contributors. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of Django nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Consider the following:
URLify::add_chars(['/' => '']);
This causes a language exception, preg_match_all(): Unknown modifier ']'
, because the /
character is used as the regular expression delimiter within the URLify library.
The above example derives from a fairly common and reasonable use-case: I want to remove all illegal characters from a file name, and on UNIX and Windows, /
is illegal.
To fix this, PHP's preg_quote()
function must be called on the keys in the array argument passed to add_chars()
.
I'll submit a PR shortly that seeks to fix the issue.
'lithuanian_map' => array (
'ą' => 'a', 'č' => 'c', 'ę' => 'e', 'ė' => 'e', 'į' => 'i', 'š' => 's', 'ų' => 'u', 'ū' => 'u', 'ž' => 'z',
'Ą' => 'A', 'Č' => 'C', 'Ę' => 'E', 'Ė' => 'E', 'Į' => 'I', 'Š' => 'S', 'Ų' => 'U', 'Ū' => 'U', 'Ž' => 'Z'
)
Hello,
I think there could be an option to preserve case?
Sometimes we want:
"Alfred is good"
to be converted to "Alfred-is-good" instead of all lowercase...
\URLify::filter('abcdefghi.jpg', 6, 'en', true); // returns abcdef
It would be good to save extension of the file name so to have the result 'abcdef.jpg' in this case.
When generating a slug that contains /
and ,
, it replaces these characters with nothing, when it should correctly replace them with a separator
.
I modified the following code snippet:
$string = (string) \preg_replace(
[
// 1) remove un-needed chars
'/[^' . $separatorEscaped . $removePatternAddOn . '\-a-zA-Z0-9\s]/u',
// 2) convert spaces to $separator
'/[\s]+/u',
// 3) remove some extras words
$removeWordsSearch,
// 4) remove double $separator's
'/[' . ($separatorEscaped ?: ' ') . ']+/u',
// 5) remove $separator at the end
'/[' . ($separatorEscaped ?: ' ') . ']+$/u',
],
[
'',
$separator,
'',
$separator,
'',
],
$string
);
To:
$string = (string) \preg_replace(
[
// 1) remove un-needed chars
'/[^' . $separatorEscaped . $removePatternAddOn . '\-a-zA-Z0-9\s]/u',
// 2) convert spaces to $separator
'/[\s]+/u',
// 3) remove some extras words
$removeWordsSearch,
// 4) remove double $separator's
'/[' . ($separatorEscaped ?: ' ') . ']+/u',
// 5) remove $separator at the end
'/[' . ($separatorEscaped ?: ' ') . ']+$/u',
],
[
$separator,
$separator,
'',
$separator,
'',
],
$string
);
And it worked correctly.
Hi. I'm trying to modify the code so that underscores are not treated as spaces. I thought it would be as simple as commenting out that line of code, but that doesn't work. Any ideas why?
Hi there,
I've been trying to urlify a very simple string but the last part is being dropped. It's probably a wanted behaviour but it could be useful if there may be an option to avoid that.
My string is "Brazilian Série A" and I want it to become "brazilian-serie-a". It becomes "brazilian-serie" instead without the final "-a" part. Any way I can do this?
Below my code:
\URLify::filter('Brazilian Série A') // produces "brazilian-serie"
Tried also with:
\URLify::filter('Brazilian Série A', 120, 'en') // produces "brazilian-serie"
This is probably a typo in the code:
The uppercase Ó is coverted to lowercase o due to line 72 in URLify.php:
'Ó' => 'o',
correct:
'Ó' => 'O',
hello,
why nothing traslit after & symbol in the string?
Hi,
Why is $underscoreToSpace removed from the filter ? It was pretty handy to make underscores hypens of you wanted, or spaces ofcourse.
I hope there is a good reason for it!
Thanks
Had to add the following chars for our transliteration test to pass:
URLify::add_chars(
array(
'Ÿ' => 'Y',
'µ' => 'u',
'¥' => 'Y',
'Ĉ' => 'C',
'ĉ' => 'c',
'Ċ' => 'C',
'ċ' => 'c',
'Ĝ' => 'G',
'ĝ' => 'g',
'Ġ' => 'G',
'ġ' => 'g',
'Ĥ' => 'H',
'ĥ' => 'h',
'Ħ' => 'H',
'ħ' => 'h',
'Ĕ' => 'E',
'ĕ' => 'e',
'Ĭ' => 'I',
'ĭ' => 'i',
'Ĵ' => 'J',
'ĵ' => 'j',
'Ĺ' => 'L',
'ĺ' => 'l',
'Ľ' => 'L',
'ľ' => 'l',
'Ŀ' => 'L',
'ŀ' => 'l',
'ʼn' => 'n',
'Ō' => 'O',
'ō' => 'o',
'Ŏ' => 'O',
'ŏ' => 'o',
'Ŕ' => 'R',
'ŕ' => 'r',
'Ŗ' => 'R',
'ŗ' => 'r',
'Ŝ' => 'S',
'ŝ' => 's',
'Ŧ' => 'T',
'ŧ' => 't',
'Ŭ' => 'U',
'ŭ' => 'u',
'Ŵ' => 'W',
'ŵ' => 'w',
'Ŷ' => 'Y',
'ŷ' => 'y',
'ſ' => 'i',
'ƒ' => 'f',
'O' => 'O',
'o' => 'o',
'U' => 'U',
'u' => 'u',
'Ǎ' => 'A',
'ǎ' => 'a',
'Ǐ' => 'I',
'ǐ' => 'i',
'Ǒ' => 'O',
'ǒ' => 'o',
'Ǔ' => 'U',
'ǔ' => 'u',
'Ǖ' => 'U',
'ǖ' => 'u',
'Ǘ' => 'U',
'ǘ' => 'u',
'Ǚ' => 'U',
'ǚ' => 'u',
'Ǜ' => 'U',
'ǜ' => 'u',
'Ǻ' => 'A',
'ǻ' => 'a',
'Ǿ' => 'O',
'ǿ' => 'o',
'Ǽ' => 'Ae',
'ǽ' => 'ae',
'IJ' => 'IJ',
'ij' => 'ij',
'J' => 'J',
'ĸ' => 'k',
'Ŋ' => 'N',
'ŋ' => 'n',
'Ẁ' => 'W',
'ẁ' => 'w',
'Ẃ' => 'W',
'ẃ' => 'w',
'Ẅ' => 'W',
'ẅ' => 'w',
)
);
Unfortunately, since I do not know what language they belong to, I find it difficult to provide a PR when the code is structured based on language.
Hi, I found a strange bug, look at the below code (local ENV: php 5.6 on mac os, dev-prod ENV: php 5.6 on ubuntu 16):
var_dump(\URLify::filter('Text sample A'));
// text-samplevar_dump(\URLify::filter('Text sample B'));
// text-sample-bvar_dump(\URLify::filter('Text sample AA'));
// text-sample-aaWhere is, in the first var_dump, the last "a" char?
Is this package still maintained?
How can I make all characters lowercase?
Since Laravel 9 is requiring voku/portable-ascii:^2.0
and this repo is requiring voku/portable-ascii:^1.4
it causes a conflict when trying to update composer.
We could extend character support by making use of PHP's Transliterator class. May even be faster too.
There's another project, called Unidecode which appears to have complete transliteration tables for US-ASCII.
Maybe Urlify can use them?
I just had URLify take the title _Summer
and return the slug _summer
, when I was actually expecting just summer
. Maybe this is just me though?
Going forward I have my wrapper replace all occurrences of underscores with spaces. This matches at least my own internal logic much better but I wanted to throw it out there and see if maybe someone else also liked this behavior. Then I could roll a PR for it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.