GithubHelp home page GithubHelp logo

url-detector's People

Contributors

jotomo avatar kanishkrastogi-lnkd avatar tzuhanjan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

url-detector's Issues

Normalized Url Detected http://null/

Detecting the following url

www.foo1111111111aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.com
  • works ok for normal UrlDetector
  • fails when calling NormalizedUrl.create and returns it as "http://null/

On Android I cant build my project.

Error:PARSE ERROR:
Error:unsupported class file version 52.0
Error:...while parsing com/linkedin/urls/HostNormalizer.class
Error:1 error; aborting
Error:Execution failed for task ':app:transformClassesWithDexForDebug'.

com.android.build.api.transform.TransformException: com.android.ide.common.process.ProcessException: java.util.concurrent.ExecutionException: java.lang.UnsupportedOperationException

URL-DETECTOR fails to detect a valid URL

Executing the following code Url.create("http://013.xxx/");
is resolved with the following error:

java.net.MalformedURLException: We couldn't find any urls in string: http://013.xxx/
	at com.linkedin.urls.Url.create(Url.java:69)

It looks like as if the utility treats the xxx part as invalid ip instead of a valid suffix.
Excepted result:
Url should be created, host should be 013.xxx

Using URL-Detector with Maven: jitpack stopgap

While #6 and #2 is still being resolved, as a temporary workaround, you can use Jitpack:

add:

...

<repositories>
        <repository>
            <id>jitpack.io</id>
            <url>https://jitpack.io</url>
        </repository>
    </repositories>

....

<dependency>
        <groupId>com.github.linkedin</groupId>
        <artifactId>URL-Detector</artifactId>
        <version>2a0fede05e</version>
    </dependency>

to your pom.xml.

Upload latest verion to MavenCentral

Would it be possible to upload your latest version to MavenCentral? In particular, we would like to take advantage of ae214b7

Our project that includes URLDetector includes JUnit4 unit tests which seem to be causing problems because of testng.

Thanks

StringIndexOutOfBoundsException on particular string

This string (excluding the double quotes) triggers a StringIndexOutOfBoundsException:
"://VIVE MARINE LE PEN//:@."

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
	at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:908) ~[na:1.8.0_60]
	at java.lang.StringBuilder.substring(StringBuilder.java:76) ~[na:1.8.0_60]
	at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:854) ~[na:1.8.0_60]
	at java.lang.StringBuilder.substring(StringBuilder.java:76) ~[na:1.8.0_60]
	at com.linkedin.urls.detection.UrlDetector.readDefault(UrlDetector.java:191) ~[url-detector-0.1.17.jar!/:na]
	at com.linkedin.urls.detection.UrlDetector.detect(UrlDetector.java:142) ~[url-detector-0.1.17.jar!/:na]

Some valid schemes are ignored

Thanks for a very useful library.

I note that the list of valid schemes is fairly small and this means that a URL with a file: schema is not parsed correctly, giving back http as the default schema. Could you add file: to the list of valid schemas, or perhaps create an option that allows anything that looks like a schema to be returned but perhaps with the addition of something like boolean isKnownSchema()

Cheers.

String: 'http://user:[email protected] host.com' causes exception

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -2
at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:908)
at java.lang.StringBuilder.substring(StringBuilder.java:76)
at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:854)
at java.lang.StringBuilder.substring(StringBuilder.java:76)
at com.linkedin.urls.detection.UrlDetector.readDefault(UrlDetector.java:191)
at com.linkedin.urls.detection.UrlDetector.detect(UrlDetector.java:142)
at com.mycompany.url.UrlTest.main(UrlTest.java:26)

Japanese Characters cause the entire string to be detected as a URL

If you run the detector in the text below, it thinks the whole text is a URL.

我进入你的主页很卡顿,也许是你的关注人数或者其他数据太多了,其他人主页没有这么卡顿。来自amethyst客户端

Characters 。 and , are single characters and are not considered spaces in this library.

Default Scheme on url with no scheme

When parsing a URL like "linkedin.com", the url object will add a default scheme of 'http' if one is not detected: URL.getScheme()

I can understand why some defaults were included but it would be nice if this behavior could be configured. I need to know whether the original input text contained the scheme.

I can always do something like url.getOriginalUrl().startsWith(url.getScheme()) but I don't want to have to do that everywhere.

Support for local Maven repository installation

In the interim, while issue #2 is being worked on, it would be ideal if it were possible to install the url-detector library in the local Maven repository (typically ~/.m2/repository/), so that other Maven-based build tools can consume the library.

Note that issue #2 is a vastly preferable solution to this problem, but allowing local installation (this issue) provides a short-term workaround.

Long run of periods causes detect() to throw NegativeArraySizeException "Backtracked max amount of characters. Endless loop detected."

String text = ".............:::::::::::;;;;;;;;;;;;;;;::...............................................:::::::::::::::::::::::::::::...................."; UrlDetector d = new UrlDetector(text, UrlDetectorOptions.Default); d.detect();

Running this will throw
Exception in thread "main" java.lang.NegativeArraySizeException: Backtracked max amount of characters. Endless loop detected. Bad Text: ':...............................................:::::::::::::::::::::::::::::....................' at com.linkedin.urls.detection.InputTextReader.checkBacktrackLoop(InputTextReader.java:144) at com.linkedin.urls.detection.InputTextReader.seek(InputTextReader.java:120) at com.linkedin.urls.detection.UrlDetector.readUserPass(UrlDetector.java:511) at com.linkedin.urls.detection.UrlDetector.readScheme(UrlDetector.java:458) at com.linkedin.urls.detection.UrlDetector.processColon(UrlDetector.java:293) at com.linkedin.urls.detection.UrlDetector.readDefault(UrlDetector.java:253) at com.linkedin.urls.detection.UrlDetector.detect(UrlDetector.java:142) at Main.main(Main.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)

False alarm detecting URLs

if i have a text contains 10.00hr, it is consider as a URL

runTest("10.00hr,", UrlDetectorOptions.Default);
it should return empty, but the results is [http://10.00hr]

URL-Detector is abandoned?

There are several issues (including fixes in pull requests) that are unaddressed in a long time. Could this be handed over to other maintainers? @tzuhanjan can you comment?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.