rdfhdt / hdt-mr Goto Github PK
View Code? Open in Web Editor NEWMapReduce-based generation of HDT
License: GNU Lesser General Public License v2.1
MapReduce-based generation of HDT
License: GNU Lesser General Public License v2.1
The top-level README links to the old Google Code repository for the hdt-java dependency. The link should be updated to the current GitHub repository at https://github.com/rdfhdt/hdt-java.
In addition, the README needs a .md extension to be rendered with GitHub's web UI with headers formatted in HTML.
Is hdt-mr only for generating HDT files, or also for querying HDT files?
I got the following exception error when I run hadoop mr:
Sampling started
16/07/14 09:25:29 INFO input.FileInputFormat: **Total input paths to process : 0**
16/07/14 09:25:29 INFO partition.InputSampler: Using 0 samples
16/07/14 09:25:29 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
16/07/14 09:25:29 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.hadoop.mapreduce.lib.partition.InputSampler.writePartitionFile(InputSampler.java:340)
at org.rdfhdt.mrbuilder.HDTBuilderDriver.runDictionaryJob(HDTBuilderDriver.java:242)
at org.rdfhdt.mrbuilder.HDTBuilderDriver.main(HDTBuilderDriver.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Here is the code snippet causing exception:
InputSampler.writePartitionFile(job, new InputSampler.IntervalSampler<Text, Text>(this.conf.getSampleProbability()));
It seems the input files are not found... I created 'input' directory, and put ntriples '.nt' files in it.
Any idea?
Best,
Gang
Hello All,
I got a file not found exception when I ran a Hadoop (hadoop - 2.8.2 version) mapreduce program build using ant for face recognition (However, everything went fine when I ran a wordcount example). I have attached the java code if required. I need help resolving this.
These are the errors I got:
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/user/abiodun/lbpcascade_frontalface.xml#lbpcascade_frontalface.xml
at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1440)
at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1433)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1433)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:300)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:179)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:97)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:192)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359)
at FaceCount.run(FaceCount.java:207)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at FaceCount.main(FaceCount.java:212)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
I will appreciate if any one could help fix this problem,
Abi
$ mvn clean install package
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 16.564 s
[INFO] Finished at: 2017-10-27T12:45:58+02:00
[INFO] Final Memory: 13M/142M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project hdt-mr: Could not resolve dependencies for project org.rdfhdt:hdt-mr:jar:2.0: The following artifacts could not be resolved: org.rdfhdt:hdt-api:jar:2.0, org.rdfhdt:hdt-java-core:jar:2.0, com.hadoop.gplcompression:hadoop-lzo:jar:0.4.20-SNAPSHOT: Could not find artifact org.rdfhdt:hdt-api:jar:2.0 in central (https://repo.maven.apache.org/maven2) -> [Help 1]
Hi,
I used maven the compile the code, here are the maven hadoop dependencies:
org.apache.hadoop
hadoop-common
2.7.0
org.apache.hadoop
hadoop-mapreduce-client-core
2.6.0
I created an executable jar, and put the input ntriples file in input folder. Then I run hadoop:
hadoop jar /home/fug2/hdtrdf/hdt-mr/target/hdt-mr-2.0-jar-with-dependencies.jar -i input
I got a warning message:
WARNING: Only one Reducer. Dictionary creation as a single job is more efficient.
and file not found exception error:
Shared section = dictionary/shared
Exception in thread "main" java.io.FileNotFoundException: File hdfs://mesosdev/user/fug2/dictionary/shared does not exist.
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:658)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:104)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:716)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:712)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1525)
at org.rdfhdt.mrbuilder.HDTBuilderDriver.loadFromDir(HDTBuilderDriver.java:646)
at org.rdfhdt.mrbuilder.HDTBuilderDriver.buildDictionary(HDTBuilderDriver.java:382)
at org.rdfhdt.mrbuilder.HDTBuilderDriver.main(HDTBuilderDriver.java:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Does anyone know how to fix this?
Best,
Gang
Any dev on the list? Is this project still active? Or dead?
Hi,
When I built it using maven, I cannot find org.rdfhdt.hdt.trans package in the git repository, but it is required by two files:
[fug2@cbbdev11 hdt-java]$ grep -r 'org.rdfhdt.hdt.trans' .
./hdt-mr/src/main/java/org/rdfhdt/hdt/dictionary/impl/section/**TransientDictionarySection.java**:import org.rdfhdt.hdt.trans.TransientElement;
./hdt-mr/src/main/java/org/rdfhdt/mrbuilder/**HDTBuilderDriver**.java:import org.rdfhdt.hdt.trans.TransientElement;
Any idea about where I can find the package?
Best,
Gang
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.