GithubHelp home page GithubHelp logo

Comments (17)

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
Thanks for your proposal.

It is to enable users to select necessary language profiles why langdetect 
separates ones.
So to adopt your proposal, I'm afraid langdetect needs to provide both jars 
with and without profiles...

But it is easier to use library including profiles, as you say...

Original comment by nakatani.shuyo on 18 Feb 2011 at 3:42

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
Dear Nakatani,

You can still provide langdetect in a single jar, as you do it now.

However, I package language profiles inside a jar in my application. And using 
the code I provide, I can access them without unzipping the jar.

So as you say, it will be easier to use your library.

Thanks!

Original comment by [email protected] on 18 Feb 2011 at 10:50

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
I see. Then I'll try to include your proposal. Thanks!

Original comment by nakatani.shuyo on 21 Feb 2011 at 3:16

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
I've added loadProfile(File) into DetectorFactory and commited new 
langdetect.jar so I'd like to keep calling File constructor once.

    http://code.google.com/p/language-detection/source/browse/trunk/lib/langdetect.jar

I think you can do like your report as the following.

    DetectorFactory.loadProfile(new File(MyClass.class.getResource("profiles").toURI()));

Would you like it?

Original comment by nakatani.shuyo on 24 Feb 2011 at 7:56

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
im having trouble loading the profile. 
in netbeans where do I suppose to put the profile folder? 

Thanks. 

Original comment by [email protected] on 4 May 2011 at 8:49

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
I don't use netbeans...
Could you specify the absolute path of profile directory for 
DetectorFactory.loadProfile?

Original comment by nakatani.shuyo on 6 May 2011 at 3:47

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
Currently if DetectorFactory.loadProfile() of any form has not been called, 
then detector.detect() throws an exception. 

I would suggest to have detect() fall back to the default profiles directory 
packaged with the library. It gives excellent results out of the box, suitable 
for most of the cases. (Now I have to copy the profiles directory to my 
project's resources to figure out the path and make sure that it stays in place 
if the final thing is packaged differently.)

Attached java file contains the proposed modification to 
DetectorFactory.createDetector():

static private Detector createDetector() throws LangDetectException {
    if (instance_.langlist.size()==0) {
        try {
            // Fall back to the default profiles
            loadProfile(new File(Detector.class.getResource("/profiles").toURI()));
        } catch (URISyntaxException e) {
            // Next clause will through the exception
        }
    }
    if (instance_.langlist.size()==0)
        throw new LangDetectException(ErrorCode.NeedLoadProfileError, "need to load profiles");
    Detector detector = new Detector(instance_);
    return detector;
}

Also I added 2 create() factory methods:

* static public Detector create(String text)
* static public Detector create(Reader reader)

These modifications would allow to detect the language with a single call:

String language = DetectorFactory.create(text).detect();

Original comment by [email protected] on 7 Jul 2011 at 9:46

Attachments:

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
An alternative way is to leave createDetector() as it is, and add a static 
initialization (attached):

...
    static private DetectorFactory instance_ = new DetectorFactory();
    static {
        try {
            // Load default profiles
            loadProfile(new File(Detector.class.getResource("/profiles").toURI()));
        } catch (URISyntaxException e) {
            // If default profiles failed to load, other profiles can be loaded later 
        } catch (LangDetectException e) {
        }
    }

Original comment by [email protected] on 7 Jul 2011 at 10:16

Attachments:

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
langdetect has some reasons to adopt the current interface.

At first, it is because I was not quite satisfied with other libraries which 
bundle profiles in the jar file.
So there are even the default language profiles outside its jar file.

And I had considered it has to provide Java-like interface, so creating an 
instance and detecting languages are separated.
But I understand what you want to do, hence I also like some functional 
languages and Ruby and so on. :D

Original comment by nakatani.shuyo on 11 Jul 2011 at 11:09

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
Right, good to have an interface implemented (though I see none yet). But I'm 
not trying to change the interface, just adding a couple of create() methods, 
like you have overloaded loadProfile(), nothing more, just one more way of 
doing things. It's all about usability.

To the second part, Java libraries are used outside Java too.

Cheers, and good luck to your project,
Sergei

Original comment by [email protected] on 13 Jul 2011 at 10:05

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
Hi Nakatani-san,

Thank you for the nice software.

I needed to package up langdetect class files, profiles and additional class 
files   into one jar in order to run a hadoop job. In that kind of a scenario, 
loadProfile(File) isn't enough; a File object cannot refer to a file inside a 
jar.

(And I understood your policy of not providing both jars with/without a 
profile. That's perfectly fine.)

Given that, I'd like to share a work around I used for my task:

1. Copy the profiles dir to any directory under classpath.

Since I'm a maven user, I put it here: src/main/resources/profiles

2. Add the following two methods to DetectorFactory.

  private static List<String> getProfileNames(String resourceName) throws IOException {
    List<String> profileNames = new ArrayList<String>();
    InputStream is = DetectorFactory.class.getResourceAsStream("/" + resourceName);
    BufferedReader br = new BufferedReader(new InputStreamReader(is));
    String line = null;
    while ((line = br.readLine()) != null) {
      if (!line.startsWith(".")) {
        profileNames.add(line);
      }
    }
    br.close();
    is.close();
    return profileNames;
  }

  //Mostly same as the original loadProfile method.
  public static void loadProfileFromClasspath(String resource) throws LangDetectException {
    try {
      List<String> profileNames = getProfileNames(resource);

      int langsize = profileNames.size(), index = 0;
      for (String profileName : profileNames) {
        InputStream is = null;
        try {
          is = DetectorFactory.class.getResourceAsStream(resource + "/" + profileName);
          LangProfile profile = JSON.decode(is, LangProfile.class);
          addProfile(profile, index, langsize);
          ++index;
        } catch (JSONException e) {
          throw new LangDetectException(ErrorCode.FormatError, "profile format error in '"
                  + profileName + "'");
        } catch (IOException e) {
          throw new LangDetectException(ErrorCode.FileLoadError, "can't open '" + profileName + "'");
        } finally {
          try {
            if (is != null)
              is.close();
          } catch (IOException e) {
          }
        }
      }
    } catch (Exception e) {
      throw new LangDetectException(ErrorCode.NeedLoadProfileError,
              "Not found profile in classpath: " + resource);
    }
  }

3. Build and package.

For instance by using a maven target "package assembly:single".

4. Use the new method to load the profiles stored inside the jar (under 
classpath).

  DetectorFactory.loadProfileFromClasspath( "profiles" );
  Detector detector = DetectorFactory.create();
  detector.append( "Hello world, this is a test." );
  System.out.println( detector.detect() );


I haven't tried the loadProfile(URI) method proposed above, because I wasn't 
aware of this discussion page at the time I did a work around. Anyway, the goal 
seems to be the same, and I hope this also helps someone!

Best,

-Hideki Shima

Original comment by hideki.shima on 17 Nov 2011 at 7:02

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
Thank for your sample.
I've requested the same problem for Hadoop at several times, so I begin 
wondering whether I should support profile-bundled jar... :D

language-detection supported loadProfiles(List<String>) at trunk of the 
repository.
You might implement more easily in using the method.

http://code.google.com/p/language-detection/issues/detail?id=24

Original comment by nakatani.shuyo on 22 Nov 2011 at 7:46

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
ok, it worked, but i needed to include the library jsonic-1.2.0.jar and i put 
the folder profiles in lib, so only NAMECLASS.init("lib/profiles"). I think, 
it'd better to put profiles in library and then delete funtion 
.loadProfile(String or File).
Best, leho

Original comment by [email protected] on 28 Mar 2012 at 5:32

Attachments:

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
If anybody is still wondering what a working example from the most recent maven 
build would look like, please refer to the attached file here.

Original comment by [email protected] on 17 Jan 2013 at 11:11

Attachments:

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
[deleted comment]

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
Here's a solution in Clojure FWIW; though it doesn't autodetect the language 
profiles that are available, I actually like having an easy reference of all of 
the languages that might be detected in the codebase:

(->> #{"af" "ar" "bg" "bn" "cs" "da" "de" "el" "en" "es" "et" "fa" "fi" "fr" 
"gu"
       "he" "hi" "hr" "hu" "id" "it" "ja" "kn" "ko" "lt" "lv" "mk" "ml" "mr" "ne"
       "nl" "no" "pa" "pl" "pt" "ro" "ru" "sk" "sl" "so" "sq" "sv" "sw" "ta" "te"
       "th" "tl" "tr" "uk" "ur" "vi" "zh-cn" "zh-tw"}
     (map (partial str "profiles/"))
     (map (comp slurp clojure.java.io/resource))
     com.cybozu.labs.langdetect.DetectorFactory/loadProfile)

Original comment by [email protected] on 25 Apr 2013 at 3:04

from language-detection.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
The file suggestion from Anto (#14) worked like a charm for profiles inside a 
jar

Original comment by [email protected] on 3 Dec 2014 at 8:54

from language-detection.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.