GithubHelp home page GithubHelp logo

bogdartysh / languagechecker-crawler Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 34.35 MB

a crawler for checking that all site is translated to a specified language

License: Other

Java 99.04% Shell 0.53% Batchfile 0.43%

languagechecker-crawler's Introduction

This crawler is designed for single purpouse:  to check that available web site (or portion of it) is fully translated from one language to another.

As a user you could open distro folder, find the last shipped distro, edit task.properties file (set task_external_id, start_urls, url_pattern, origin_language_code, shouldbe_language_code, pages_to_fill_all_forms) and run from terminal start.sh (or start.bat). Results will be in the results folder

Developer will need to:
1. install maven (maven.apache.org) and Java 8+
3. open terminal and execute
3.1 mvn clean install
3.2 mvn assembly:single
3.3 cd target
3.4 adjust settings of the project (file task.properties, examples in tasks folder)
3.5 execute : java -jar -Xsx1024m -Xmx1024 languagechecker-crawler-1.0-SNAPSHOT-jar-with-dependencies.jar >results.csv
results will be in results.csv (some times in cases of error that csv will be not readable in OpenOffice, so please check it with editor first).
log files are in target/logs
Dictionary files of you language should be named as ru.dict, en.dict, pl.dict and be places in target/dict folder. These files should be in utf-8 encoding and should consist of all words of chosen language.
For example please see en.dict and pl.dict files (they are extracted from aspell project)



For license info please see license file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.