GithubHelp home page GithubHelp logo

apache / accumulo Goto Github PK

View Code? Open in Web Editor NEW
1.0K 1.0K 443.0 99.43 MB

Apache Accumulo

Home Page: https://accumulo.apache.org

License: Apache License 2.0

Shell 0.62% Java 97.51% HTML 0.04% Thrift 0.35% C++ 0.28% JavaScript 0.65% CSS 0.06% Makefile 0.02% C 0.02% FreeMarker 0.45%
accumulo big-data hacktoberfest

accumulo's People

Contributors

adamjshook avatar alerman avatar billierinaldi avatar bimargulies avatar brianloss avatar busbey avatar cjnolet avatar cshannon avatar ctubbsii avatar ddanielr avatar dependabot[bot] avatar dhutchis avatar dlmarion avatar domgarguilo avatar drewfarris avatar edcoleman avatar ericnewton avatar hkeebler avatar ivakegg avatar jmark99 avatar joshelser avatar keith-turner avatar lstav avatar madrob avatar manno15 avatar mikewalch avatar milleruntime avatar mjwall avatar ohshazbot avatar wisellama avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

accumulo's Issues

Tiered Storage Volume Chooser

For systems with a mixture of high density, slower disks and low density faster disks provide a volume chooser that can be configured to put data accesses more often onto the faster disks. This will require having the fast/slow disks partitioned into seperate volumes, and will also require that the _majorCompact code be modified to use the volume chooser to determiner where to put new files, instead of relying on srv:dir. This choice will be made based on the tablet that is being compacted using a user defined strategy for how to decide which volume to use.

Needed:

  1. New TieredStorageVolumeChooser
  2. New configuration options for the volume chooser
  3. Updates to major compaction code that determines the output file location (Specifcally around line 1945 of Tablet.java)
  4. A VolumeChooserStrategy that can be user defined to allow users to define how to pick the new volumes

Inline BlockFile interfaces

I think we can take a step towards cleaning up RFile, BCFile and CacheableBlockFile by eliminating some of the intermediate interfaces. There are 4 in particular in org.apache.accumlo.core.file.blockfile that are redundant and only add confusion/complexity. I think the classes that implement these interfaces can be in-lined.

  • ABlockWriter
  • ABlockReader
  • BlockFileReader
  • BlockFileWriter

Here is an example where one of these interfaces is misleading:

private static class LocalityGroupReader extends LocalityGroup implements FileSKVIterator {
...
  private IndexIterator iiter;
  private int entriesLeft;
  private ABlockReader currBlock;
...
  currBlock = getDataBlock(indexEntry);

The LocalityGroupReader stores a ABlockReader for the currBlock data block which is actually a BlockRead...

GCS Hadoop Connector can't recover from Tablet Server failure.

I was attempting to use the GCS Connector (https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs) to back Accumulo on GCP.

All pretty straight forward (just pointed instance.volumes to my bucket gs://<bucketname>/accumulo

One of my Tablet Servers OOMed, and when I try to recover it, I end up getting the following error:

Failed to initiate log sort gs://<bucketname>/accumulo/wal/accumulo-gcs-w-1+9997/aa166493-637e-48e8-a9a6-3655dfb59a6c
	java.lang.IllegalStateException: Don't know how to recover a lease for com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
		at org.apache.accumulo.server.master.recovery.HadoopLogCloser.close(HadoopLogCloser.java:70)
		at org.apache.accumulo.master.recovery.RecoveryManager$LogSortTask.run(RecoveryManager.java:96)
		at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
		at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
		at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
		at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
		at java.lang.Thread.run(Thread.java:748)

I traced that back to HadoopLogCloser, and since the GoogleHadoopFileSystem is a FileSystem, not a DistributedFileSystem, LocalFileSystem, RawLocalFileSystem, it bottoms out here:

"Don't know how to recover a lease for " + ns.getClass().getName());

I'm not sure what to do from here. I was also told that Azure should work with accumulo no problem, but it looks like the NativeAzureFileSystem similarly doesn't implement any of these interfaces, so I would presume it would hit the same issue.

Add User initiated Compaction Information to IteratorEnvironment

Add User initiated Compaction Information to IteratorEnvironment

The IteratorEnvironment only indicates that a major compaction is a full compaction if it is in the final round of the full compaction.

For example, let's say there are 19 files in the tablet and the maximum number of files per major compaction is 10. Accumulo will compact those 19 files in two rounds. The first round will merge 10 files into 1, leaving the other 9 untouched. Once that round is complete , there will be 10 remaining files. This is considered a major compaction, but not a full major compaction, according to the IteratorEnvironment.

The second and final round will compact all the remaining files into 1 file. This final round is considered a full and major compaction.

There are some filter use cases that don't need to the complete compacted information before they operate and thus could optimize operations during round 1 of the compaction. There are also use cases that may only run during a user initiated compaction. By exposing this information filters could be constructed that can use both rounds of compaction optimally and save tablet server resources.

Seeing more data loss when running continuous ingest with agitation.

With the fixes for #432 and #441 applied to 1.9.0-SNAPSHOT I ran conitunuous ingest for 24hrs with agitation. I ran the agitator as follows.

nohup ./tserver-agitator.pl 1:10 1:10 1 3 &> logs/tserver-agitator.log &

The above command sleeps 1 to 10 minutes randomly between killing tserver processes and randomly kills 1 to 3 tablets servers. I think in the past I have run with a non random period, but not sure. I suspect the random period may be uncovering some new bugs.

After running verify I saw the following data UNDEFINED count, which indicates data was lost. This count is much lower than what I saw before #441 (saw ~600K). I am going to attempt to track the cause of this down.

        org.apache.accumulo.test.continuous.ContinuousVerify$Counts
                REFERENCED=22559054297
                UNDEFINED=6176
                UNREFERENCED=8007075

Organize Fate Operations

The RepOs of every fate operation are ALL in the org.apache.accumulo.master.tableOps package. It is annoying to track down which RepO goes with which operation. Creating a package in o.a.a.master.tableOps for each operation would organize them nicely. This would have to be done in 2.0 since it will break serialization.

Seeing data loss when running continuous ingest with agitation

I did not get a chance to run continuous ingest with agitation before 1.9.0 was released. I ran it after the release and saw some data loss. I have tracked the bug down and it happens when a closed write ahead log is only referenced by minor compacting tablets. When this happens the tablet server may prematurely mark the write ahead log as unreferenced in zookeeper. The following is an example of this bug

  1. tserverA creates WAL1
  2. Data is written to tabletX on tserverA
  3. tserverA creates WAL2
  4. Data is written to tabletX on tserverA
  5. Minor compaction of tabletX starts
  6. tserver A marks WAL1 as unreferenced
  7. tserver A is killed
  8. the master adds only WAL2 to tabletX
  9. tabletX is loaded on tserverB and recovers data from WAL2

In the example above the tablet has data in WAL1, however since the tablet server marked it as unreferenced the master did not assign it to the tablet. The tablet server should only mark WAL1 as unreferenced after the minor compaction finishes, not after it starts.

This bug probably exists in 1.8.0 and later.

Make bulk import metadata scan split tolerant

This is follow on work for #436. The metadata scan that is done for loading files should be split tolerant. The scan should check that the metadata table form a linked list and back up when it does not. The GC scans the metadata table like this.

Token file functionality for mapred code deprecated, but not replaced

As fallout from the recent Connector builder stuff that @mikewalch worked on for 2.0, many of the old APIs were deprecated in the MapReduce code to be replaced with ConnectionInfo. However, the mapreduce APIs for storing an authentication token in a job's distributed cache, and reading it from within the mappers does not have an equivalent in the new code.

What is needed is two things:

  1. The ability to serialize the entire connection info into the job, without exposing the credentials in the Hadoop configuration, and
  2. The ability to serialize arbitrary authentication tokens, not just the known ones that the current Connector builder has special code to handle. (A whole authentication token serializer was previously created for this purpose, but it is not used in the new code.)

Vfs2 error in Shell

I changed table.file.compress.type on a table and after writing to it so the vfs2 exception in the shell:

06:37:55 {master} ~/sw/uno$ accumulo org.apache.accumulo.test.TestIngest -i uno -u root -p secret --rows 3000
2018-04-24 18:38:08,541 [trace.DistributedTrace] INFO : SpanReceiver org.apache.accumulo.tracer.ZooTraceClient was loaded successfully.
       3,000 records written |   10,489 records/sec |    3,087,000 bytes written | 10,793,706 bytes/sec |  0.286 secs   
Exception in thread "Thread-2" java.lang.RuntimeException: No files-cache implementation set.
	at org.apache.commons.vfs2.provider.AbstractFileSystem.getCache(AbstractFileSystem.java:209)
	at org.apache.commons.vfs2.provider.AbstractFileSystem.getFileFromCache(AbstractFileSystem.java:222)
	at org.apache.commons.vfs2.provider.AbstractFileSystem.resolveFile(AbstractFileSystem.java:332)
	at org.apache.commons.vfs2.provider.AbstractFileSystem.resolveFile(AbstractFileSystem.java:317)
	at org.apache.commons.vfs2.provider.AbstractFileObject.resolveFile(AbstractFileObject.java:2007)
	at org.apache.commons.vfs2.provider.AbstractFileObject.resolveFiles(AbstractFileObject.java:2052)
	at org.apache.commons.vfs2.provider.AbstractFileObject.getChildren(AbstractFileObject.java:1222)
	at org.apache.commons.vfs2.impl.DefaultFileMonitor$FileMonitorAgent.checkForNewChildren(DefaultFileMonitor.java:553)
	at org.apache.commons.vfs2.impl.DefaultFileMonitor$FileMonitorAgent.check(DefaultFileMonitor.java:667)
	at org.apache.commons.vfs2.impl.DefaultFileMonitor$FileMonitorAgent.access$200(DefaultFileMonitor.java:423)
	at org.apache.commons.vfs2.impl.DefaultFileMonitor.run(DefaultFileMonitor.java:376)
	at java.lang.Thread.run(Thread.java:748)

06:40:10 {master} ~/sw/uno$ accumulo shell -u root -p secret -e 'flush -t test_ingest'
2018-04-24 18:40:22,106 [trace.DistributedTrace] INFO : SpanReceiver org.apache.accumulo.tracer.ZooTraceClient was loaded successfully.
2018-04-24 18:40:22,387 [shell.Shell] INFO : Flush of table test_ingest initiated...
Exception in thread "Thread-2" java.lang.RuntimeException: No files-cache implementation set.
	at org.apache.commons.vfs2.provider.AbstractFileSystem.getCache(AbstractFileSystem.java:209)
	at org.apache.commons.vfs2.provider.AbstractFileSystem.getFileFromCache(AbstractFileSystem.java:222)
	at org.apache.commons.vfs2.provider.AbstractFileSystem.resolveFile(AbstractFileSystem.java:332)
	at org.apache.commons.vfs2.provider.AbstractFileSystem.resolveFile(AbstractFileSystem.java:317)
	at org.apache.commons.vfs2.provider.AbstractFileObject.resolveFile(AbstractFileObject.java:2007)
	at org.apache.commons.vfs2.provider.AbstractFileObject.resolveFiles(AbstractFileObject.java:2052)
	at org.apache.commons.vfs2.provider.AbstractFileObject.getChildren(AbstractFileObject.java:1222)
	at org.apache.commons.vfs2.impl.DefaultFileMonitor$FileMonitorAgent.checkForNewChildren(DefaultFileMonitor.java:553)
	at org.apache.commons.vfs2.impl.DefaultFileMonitor$FileMonitorAgent.check(DefaultFileMonitor.java:667)
	at org.apache.commons.vfs2.impl.DefaultFileMonitor$FileMonitorAgent.access$200(DefaultFileMonitor.java:423)
	at org.apache.commons.vfs2.impl.DefaultFileMonitor.run(DefaultFileMonitor.java:376)
	at java.lang.Thread.run(Thread.java:748)

The data was written and flushed. Nothing in the tserver logs. Could be a configuration error but I was using Accumulo 1.8.1 and Hadoop 2.9.0.

Investigate ThriftBehaviorIT checking for runtime exception handling

ThriftBehaviorIT should be checking to ensure thrift is behaving as expected with regard to exception handling. However, the test seems to pass whether or not the thrift code is generated with the handle_runtime_exceptions flag. This indicates a potentially broken test, or insufficient test coverage. Either way, it must be investigated prior to releasing 2.0.

Mini Accumulo Cluster class loaders behaviour

Hey all!

I'm trying to use (the first attempt to move from accumulo mock client) Mini Accumulo Cluster in SBT tests and experience an issue with this line: https://github.com/apache/accumulo/blob/rel/1.9.1/minicluster/src/main/java/org/apache/accumulo/minicluster/impl/MiniAccumuloClusterImpl.java#L272

The thing is that it loads the following class loaders:

URLClassLoader with NativeCopyLoader with RawResources(
  urls = List(...),
  parent = DualLoader(a = java.net.URLClassLoader@1c345372, b = java.net.URLClassLoader@69ea3742),
  resourceMap = Set(app.class.path, boot.class.path),
  nativeTemp = ...
)
DualLoader(a = java.net.URLClassLoader@1c345372, b = java.net.URLClassLoader@69ea3742)
NullLoader // <!- here is the issue
sun.misc.Launcher$AppClassLoader@75b84c92
sun.misc.Launcher$ExtClassLoader@483f6d77

The error I'm getting is: Unknown classloader type : sbt.internal.inc.classpath.NullLoader
Where the NullLoader itself is:

package sbt.internal.inc.classpath
final class NullLoader() extends java.lang.ClassLoader { ... }

Mb it makes sense to skip unknown class loaders rather than to throw exceptions and to throw an exception only if there are no class loaders at all? Also any workarounds are welcome and let me know if it makes sense to create issue on the SBT side rather than here.

Would be glad to fix and to test this issue in case it's indeed required.

Update:

Also I discovered a workaround on the SBT side (accidentally right after posting this issue), to fork JVM in tests (SBT setting: fork in Test := true).

NoSuchElementException in AccumuloMonitorAppenderTest

Originally mentioned by @joshelser on https://issues.apache.org/jira/browse/ACCUMULO-4409. @milleruntime told me he also saw it.

It looks like the problem is that the Enumeration in the set is not thread-safe. We could synchronize on updates to the appender's children, but it's only ever updated in a single thread; the only other thread reading this set is the test... so it's easier to just have the test handle the race condition by catching the NoSuchElementException.

Find unused Accumulo properties

Several unused Accumulo properties were found and removed in #447 but there may be more! It would nice if someone went through Property.java and looked for others.

Exception while running integration tests using SunnyDay profile

Exception occurred in monitor module:

java.lang.RuntimeException: Unable to load category: org.apache.accumulo.test.categories.SunnyDayTests
	at org.apache.maven.surefire.group.match.SingleGroupMatcher.loadGroupClasses(SingleGroupMatcher.java:142)
	at org.apache.maven.surefire.common.junit48.FilterFactory.createGroupFilter(FilterFactory.java:100)
	at org.apache.maven.surefire.junitcore.JUnitCoreProvider.createJUnit48Filter(JUnitCoreProvider.java:279)
	at org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:126)
	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)
Caused by: java.lang.ClassNotFoundException: org.apache.accumulo.test.categories.SunnyDayTests
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.apache.maven.surefire.group.match.SingleGroupMatcher.loadGroupClasses(SingleGroupMatcher.java:138)
	... 7 more

Shell not working

I built the latest master and ran using Uno. Trying to run the Shell prints this error on the command line (with no other errors in the logs):
[shell.Shell] ERROR: Path must not end with / character

Sounds like an error coming from ZooKeeper or an external library because I couldn't find that string in the Accumulo code base.

1.8 Master marks bulk import as failed when it could still be successful

Using default configurations for bulk.timeout, bulk.retry.max, the master will sometimes mark a bulk import as failed even though it was successful. Sample timings:

11:18:08 master asks tserver1 to import
11:18:08 tserver1 gets request
11:23:08 master marks attempt 1 failed (SocketTimeoutException)
11:23:08 master asks tserver2 to import
11:23:08 tserver2 gets request
11:25:27 tserver1 finishes calculating overlapping tablets
11:25:27 tserver1 completes import
11:28:08 master marks attempt 2 failed (SocketTimeoutException)
11:28:22 tserver2 finishes calculating overlapping tablets
11:28:22 tserver2 successfully completes import
etc until marked completely failed.

Include examination of map files in tserver timeout? Check isActive between master reattempts? Other?

Re-implement write ahead log archiving

In the past the Accumulo could archive files and write ahead logs instead of deleting them. This was useful for debugging data loss situations that occurred during testing. It seems the functionality to archive write ahead logs no longer exists, it may be useful to bring it back.

A work around to not having this functionality during test is to not run the Accumulo GC process. However the drawback of this is that GC is not tested. What if the GC process would have deleted a file that it shouldn't if it were running? It would be nice to see this happen.

Add createIfNotExists method to table/namespace operations

Currently, if you want to create a namespace or table in Accumulo that may already exists, you need to try to create it and ignore the exception.

  try {
      connector.namespaceOperations().create("mynamespace");
    } catch (NamespaceExistsException e) {
      // ignore
    }

    try {
      connector.tableOperations().create("mynamespace.mytable");
    } catch (TableExistsException e) {
      // ignore
    }

It would be nice if a createIfNotExists method was added to the API that ignored this exception. This would simplify the above code to the following:

connector.namespaceOperations().createIfNotExists("mynamespace");
connector.tableOperations().createIfNotExists("mynamespace.mytable");

Can not create Connector from existing Connector

Using only the new Connector builder APIs introduced in ACCUMULO-4784 there is no way to create a Connector from an exisitng Connector with a different user. I noticed this when working on #410 and wrote a test that creates a user and then creates a Connector for that user. With the old APIs this could be done as follows

Connector conn = ...
//create connector as another user for same instance.
conn = conn.getInstance().getConnector("user", "pass");

WAL Recovery directories not being removed

While running test for 1.9.0, I noticed some files in /accumulo/recovery that were a few days old. I investigated this and could not find any code in the garbage collector that actually deletes WALs in the recovery directory. There is code in 1.7 to delete recovered WALs, so I suspect this problem was introduced in 1.8.0 with the change in how WALs are tracked.

I also found the property master.recovery.max.age has not been used by Accumulo internals since 1.4

-1 tablet log id used in WAL when table durability is set to None

When a tables durability is set to none tablets get a tablet log id of -1. The durability can be set per batch writer and the table setting can be changed. Howerver when this change in durability happens, the tablet id in the WALs is still -1. This means that a tablet may recover data from other tablets because multiple tablets would be mapped to the same id of -1.

I discovered this with changes I made for #458. My changes caused an IT to fail with this pre-existing bug. This bug may have existed since 1.7.0. This bug would only be seen when setting table.durability=NONE and then later changing it to something else.

Generate native header files with `-h` javac option instead of `javah`

We currently use javah via the native-maven-plugin to generate our native code JNI headers. This may not be necessary anymore, since https://bugs.openjdk.java.net/browse/JDK-7150368, because we should be able to just use the normal Java compiler (with the -h flag in Java 8+) to output those files during the Java compilation step.

(Also, javah is deprecated in Java 9: http://openjdk.java.net/jeps/313, so we should probably try to figure out the new method of generation.)

New bulk importer fails if directory contains files other than RFiles

This may be intended behavior... I'm not sure.
I tried to convert some bulk import code from AuditMessageIT:

auditConnector.tableOperations().importDirectory(THIRD_TEST_TABLE_NAME, exportDir.toString(), failDir.toString(), false);

to:

auditConnector.tableOperations().addFilesTo(THIRD_TEST_TABLE_NAME).from(exportDir.toString()).load();

The former works fine with the export directory containing:

distcp.txt
exportMetadata.zip
tmp/

However, the latter failed because it did not recognize the zip file format. I'm not sure why this test is trying to bulk import from an exportTable directory, but it works fine using the old method (which seems to silently ignore unrecognized files... although I can't find the relevant filtering code).

Do we want the new bulk importer to ignore unrecognized files like the old method did, or not?

Cache rfile file lengths

Rfiles store metadata at the end of the file. To open a RFile, the file length must first be obtained. If the file lengths were cached, this could possibly avoid a trip to the namenode. This caching could be done in tservers and in the new bulk import code (#436).

Utilize NewTableConfiguration in ITs

There are places in our ITs that I think could use NewTableConfiguration when tables are created. This should help make the tests more stable.

  • MiniAccumuloClusterTest
  • LargeRowIT
  • TabletIT
  • SpitIT
  • RegexGroupBalanceIT
  • SessionDurabilityIT
  • ConfigurableCompactionIT
  • DurabilityIT

Not sure about the Replication tests... If you have to create both tables first, then set the table replication property?

  • ReplicationIT
  • UnorderedWorkAssignerReplicationIT
  • CyclicReplicationIT
  • KerberosReplicationIT
  • MultiInstanceReplicationIT

LogSorter InputStream closes prematurely

Some time ago, LogSorter was modified to handle partial headers and the FSDataInputStream in sort() was moved to a try-with-resources. It appears this change caused the FSDataInputStream to close prematurely, before the finally can call the close() method. The close method gets the bytesCopied before closing the input so this probably results in bytesCopied in the Monitor to always be -1. I think this happens every time a WAL is sorted on recovery, showing this error:

2018-05-23 14:19:55,298 [log.LogSorter] ERROR: Error during cleanup sort/copy 2841a75d-8086-4fdb-a736-5f0bf60ff42e
java.io.IOException: Stream is closed!
        at org.apache.hadoop.fs.BufferedFSInputStream.getPos(BufferedFSInputStream.java:56)
        at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:72)
        at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.close(LogSorter.java:198)
        at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.sort(LogSorter.java:169)
        at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.process(LogSorter.java:96)
        at org.apache.accumulo.server.zookeeper.DistributedWorkQueue$1.run(DistributedWorkQueue.java:109)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
        at java.lang.Thread.run(Thread.java:748)

We could move the FSDataInputStream back to a regular try/catch or refactor the sort method to close things properly.

Hadoop2 Metrics does not retry sending if metrics server is down

Original issue: https://issues.apache.org/jira/browse/ACCUMULO-4849

2018-03-16 11:01:14,726 [impl.MetricsSystemImpl] WARN : Error creating sink 'graphite'
org.apache.hadoop.metrics2.impl.MetricsConfigException: Error creating plugin: org.apache.hadoop.metrics2.sink.GraphiteSink
at org.apache.hadoop.metrics2.impl.MetricsConfig.getPlugin(MetricsConfig.java:203)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.newSink(MetricsSystemImpl.java:529)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configureSinks(MetricsSystemImpl.java:501)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configure(MetricsSystemImpl.java:480)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:189)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:164)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50)
at org.apache.accumulo.server.metrics.MetricsSystemHelper$MetricsSystemHolder.<clinit>(MetricsSystemHelper.java:46)
at org.apache.accumulo.server.metrics.MetricsSystemHelper.getInstance(MetricsSystemHelper.java:50)
at org.apache.accumulo.tserver.metrics.TabletServerMetricsFactory.<init>(TabletServerMetricsFactory.java:45)
at org.apache.accumulo.tserver.TabletServer.<init>(TabletServer.java:401)
at org.apache.accumulo.tserver.TabletServer.main(TabletServer.java:3086)
at org.apache.accumulo.tserver.TServerExecutable.execute(TServerExecutable.java:43)
at org.apache.accumulo.start.Main.lambda$execKeyword$0(Main.java:122)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.metrics2.MetricsException: Error creating connection, localhost:2004
at org.apache.hadoop.metrics2.sink.GraphiteSink$Graphite.connect(GraphiteSink.java:160)
at org.apache.hadoop.metrics2.sink.GraphiteSink.init(GraphiteSink.java:64)
at org.apache.hadoop.metrics2.impl.MetricsConfig.getPlugin(MetricsConfig.java:199)
... 15 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at java.net.Socket.<init>(Socket.java:434)
at java.net.Socket.<init>(Socket.java:211)
at org.apache.hadoop.metrics2.sink.GraphiteSink$Graphite.connect(GraphiteSink.java:152)
... 17 more

Monitor 2.0 Bulk Import State is funky

Follow on for #436 and for new Monitor. The Monitor had previously displayed (or attempted to) the state of Bulk import processes. Hopefully, for Bulk Import 2.0, this won't be needed. If so, then the bulk import page could be removed in Monitor 2.0. Otherwise we need to determine how/what to monitor for the new bulk import.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.