GithubHelp home page GithubHelp logo

palantir / hadoop Goto Github PK

View Code? Open in Web Editor NEW

This project forked from apache/hadoop

3.0 9.0 13.0 470.08 MB

Mirror of Apache Hadoop

License: Apache License 2.0

Shell 0.48% Python 0.07% XSLT 0.02% Java 92.65% HTML 0.17% CSS 0.07% CMake 0.12% Batchfile 0.08% C 1.83% C++ 2.99% JavaScript 1.22% TeX 0.02% TLA 0.02% Dockerfile 0.01% TSQL 0.02% SCSS 0.03% Handlebars 0.21%
octo-correct-managed

hadoop's Introduction

## Archival

This repository has been unused as of April, 2021, and is archived.

## Original README

For the latest information about Hadoop, please visit our website at:

   http://hadoop.apache.org/

and our wiki, at:

   http://wiki.apache.org/hadoop/

This distribution includes cryptographic software.  The country in 
which you currently reside may have restrictions on the import, 
possession, use, and/or re-export to another country, of 
encryption software.  BEFORE using any encryption software, please 
check your country's laws, regulations and policies concerning the
import, possession, or use, and re-export of encryption software, to 
see if this is permitted.  See <http://www.wassenaar.org/> for more
information.

The U.S. Government Department of Commerce, Bureau of Industry and
Security (BIS), has classified this software as Export Commodity 
Control Number (ECCN) 5D002.C.1, which includes information security
software using or performing cryptographic functions with asymmetric
algorithms.  The form and manner of this Apache Software Foundation
distribution makes it eligible for export under the License Exception
ENC Technology Software Unrestricted (TSU) exception (see the BIS 
Export Administration Regulations, Section 740.13) for both object 
code and source code.

The following provides more details on the included cryptographic
software:
  Hadoop Core uses the SSL libraries from the Jetty project written 
by mortbay.org.

hadoop's People

Contributors

aajisaka avatar acmurthy avatar anuengineer avatar arp7 avatar atm avatar aw-was-here avatar cmccabe avatar cnauroth avatar elicollins avatar jian-he avatar jing9 avatar jlowe avatar junpingdu avatar kambatla avatar kihwal avatar oza avatar revans2 avatar rohithsharmaks avatar steveloughran avatar sunilgovind avatar szetszwo avatar toddlipcon avatar tomwhite avatar umbrant avatar vinayakumarb avatar vinoduec avatar wangdatan avatar xiao-chen avatar xiaoyuyao avatar yangwwei avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hadoop's Issues

actually run hadoop-aws tests

The tests for hadoop-aws require an s3 bucket and credentials to actually run: https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure.

So even though we're running the tests for hadoop-aws in hadoop-tools, that doesn't actually mean that the code works.

Ideally we can run the tests in hadoop-aws against a bucket in an automated way after every commit. Worst case is someone manually runs them when we make any change to s3a and before each release.

speed up tests

Even a 4-hour timeout wasn't enough, and https://circleci.com/gh/palantir/hadoop/120 has been running for almost seven hours now.

Some ideas to experiment with:

  • Test the different modules in parallel. Have n circle containers, each of which starts by compiling all the code but not running any tests, and then runs the tests for only a couple of modules.
  • Skip tests for modules we don't care about, e.g., Azure support, BookKeeper implementation of shared edit logs.

cc @ash211 @robert3005 in case you guys have more ideas

eventually stop ignoring these tests

Tracking the tests that we're ignoring that I'd prefer not to be ignoring. This does not include the s3n tests, which are quite intentionally ignored, nor does it include the tests that are ignored because we revert HADOOP-13188.

A complete list of these can be acquired by grepping the repo for @Ignore //palantir-hadoop, and then excluding the s3n tests and the results of reverting HADOOP-13188.

Tracking test failures on branch-2.8.2

From https://circleci.com/gh/palantir/hadoop/219:

Common

Tests in error: 
  TestZKFailoverController.testGracefulFailoverFailBecomingStandby:515 » ServiceFailed

MR

Tests in error: 
  TestMRTimelineEventHandling.testMapreduceJobTimelineServiceEnabled:174 » IO Jo...

Tools

Failed tests: 
  TestSLSRunner.testSimulatorRunning:60 TestSLSRunner catched exception from child thread (TaskRunner.Task): null
Failed tests: 
  TestIntegration.testUpdateGlobTargetMissingSingleLevel:431->checkResult:577 expected:<4> but was:<5>
  TestIntegration.testGlobTargetMissingMultiLevel:454->checkResult:577 expected:<4> but was:<5>
  TestIntegration.testGlobTargetMissingSingleLevel:408->checkResult:577 expected:<2> but was:<3>
  TestIntegration.testUpdateGlobTargetMissingMultiLevel:478->checkResult:577 expected:<6> but was:<8>
  TestIntegration.testUpdateGlobTargetMissingSingleLevel:431->checkResult:577 expected:<4> but was:<5>
  TestIntegration.testGlobTargetMissingMultiLevel:454->checkResult:577 expected:<4> but was:<5>
  TestIntegration.testGlobTargetMissingSingleLevel:408->checkResult:577 expected:<2> but was:<3>
  TestIntegration.testUpdateGlobTargetMissingMultiLevel:478->checkResult:577 expected:<6> but was:<8>
  TestIntegration.testUpdateGlobTargetMissingSingleLevel:431->checkResult:577 expected:<4> but was:<5>
  TestIntegration.testGlobTargetMissingMultiLevel:454->checkResult:577 expected:<4> but was:<5>
  TestIntegration.testGlobTargetMissingSingleLevel:408->checkResult:577 expected:<2> but was:<3>
  TestIntegration.testUpdateGlobTargetMissingMultiLevel:478->checkResult:577 expected:<6> but was:<8>
  TestDistCpViewFs.testUpdateGlobTargetMissingSingleLevel:326->checkResult:428 expected:<4> but was:<5>
  TestDistCpViewFs.testGlobTargetMissingMultiLevel:346->checkResult:428 expected:<4> but was:<5>
  TestDistCpViewFs.testGlobTargetMissingSingleLevel:306->checkResult:428 expected:<2> but was:<3>
  TestDistCpViewFs.testUpdateGlobTargetMissingMultiLevel:367->checkResult:428 expected:<6> but was:<8>

YARN

Tests in error: 
  TestWebAppProxyServlet.testAppReportForEmptyTrackingUrl:235 »  test timed out ...
Failed tests: 
  TestAbstractYarnScheduler.testResourceRequestRecoveryToTheRightAppAttempt:707 Attempt state is not correct (timedout): expected: SCHEDULED actual: ALLOCATED for the application attempt appattempt_1505097515756_0001_000002
  TestCapacitySchedulerSurgicalPreemption.testSurgicalPreemptionWithAvailableResource:222 expected:<3> but was:<1>
Failed tests: 
  TestAMRMClient.testAMRMClientWithContainerResourceChange:813->doContainerResourceChange:927 expected:<1> but was:<0>
Failed tests: 
  TestDistributedShell.testDSRestartWithPreviousRunningContainers:481 null
  TestDistributedShell.testDSShellWithCustomLogPropertyFile:615->verifyContainerLog:1000 null

Tests in error: 
  TestDistributedShell.testDSShellWithoutDomainV1_5:236->testDSShell:324->Object.wait:-2 » 

HDFS

Failed tests: 
  TestNameNodeMetadataConsistency.testGenerationStampInFuture:127 expected:<18> but was:<0>
  TestUpgradeDomainBlockPlacementPolicy.testPlacement:203 null

Tests in error: 
  TestFSImage.testCompression:71->setCompressCodec:77->testPersistHelper:83 » IO

stop ignoring test failures during CI

Should do #5 and stop telling maven to never fail. It's obviously a bad practice, and each time there's a CI run, I have to manually inspect the (ignored) test failures to make sure none of them look real, which is kind of annoying.

Can we publish sources?

Internally we depend on jars from this fork of hadoop. It's annoying that we have no source code included with this!

fix or ignore failing tests

Tracking tests that we've seen fail in what appears to be spurious manner. If we see this happen repeatedly, we'll ignore them.

  • HDFS
Failed tests: 
  TestMissingBlocksAlert.testMissingBlocksAlert:119 expected:<2> but was:<4>
Failed tests: 
  TestWrites.testOverlappingWrites:717->waitWrite:457 Write can't finish.
  • YARN
Tests in error: 
  TestWebAppProxyServlet.testAppReportForEmptyTrackingUrl:235 »  test timed out ...
Results :

Failed tests: 
  TestCapacitySchedulerLazyPreemption.testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers:410->waitKillableContainersSize:636 expected:<1> but was:<0>
  TestCapacitySchedulerSurgicalPreemption.testSurgicalPreemptionWithAvailableResource:220 expected:<3> but  #was:<2>

Tests in error: 
  TestRMWebServices.testDumpingSchedulerLogs:711 » YarnRuntime Appender is alrea...

Tests run: 1504, Failures: 5, Errors: 2, Skipped: 3
Failed tests: 
  TestDistributedShell.testDSShellWithDomain:225->testDSShell:385 expected:<2> but was:<3>

Tests in error: 
  TestDistributedShell.testDSShellWithoutDomain:230->testDSShell:324->Object.wait:-2 » 
Failed tests: 
  TestDistributedShell.testDSShellWithoutDomain:230->testDSShell:385 expected:<2> but was:<3>
  TestDistributedShell.testDSShellWithDomain:225->testDSShell:385 expected:<2> but was:<3>
  TestDistributedShell.testDSRestartWithPreviousRunningContainers:481 null
  TestDistributedShell.testDSShellWithoutDomainV1_5:236->testDSShell:385 expected:<2> but was:<0>

Tests in error: 
  TestDistributedShell.testDSShellWithMultipleArgs:682 »  test timed out after 9...
Failed tests: 
  TestAbstractYarnScheduler.testResourceRequestRecoveryToTheRightAppAttempt:707 Attempt state is not correct (timedout): expected: SCHEDULED actual: ALLOCATED for the application attempt appattempt_1500422137286_0001_000002
  TestCapacityScheduler.testAMLimitUsage:3170->verifyAMLimitForLeafQueue:3308 app shouldn't be null
  TestWorkPreservingRMRestart.testCapacitySchedulerRecovery:679->checkCSLeafQueue:447 expected:<<memory:2048, vCores:2>> but was:<<memory:0, vCores:0>>

Tests in error: 
  TestDelegationTokenRenewer.testCancelWithMultipleAppSubmissions:1255 »  test t...
  TestAMRestart.testAMRestartNotLostContainerCompleteMsg:774 »  test timed out a...
Failed tests: 
  TestAMRMClient.testAMRMClientWithContainerResourceChange:813->doContainerResourceChange:927 expected:<1> but was:<0>

None is particularly worrisome given the intent to only run this on clients, but should still fix or ignore broken tests.

backport HDFS-11538

Also revert HDFS-11431.

If we do this, then Spark and all other consumers who depend on hadoop-client will only get hadoop-hdfs-client, and not all of hadoop-hdfs. (This is a good thing.)

release 2.8.0-palantir3

  • Fixes dist names so they can be resolved

  • Catches up to latest RC (and probable actual release) of upstream 2.8.0

    • Includes fix for HDFS-11431 so can stop depending on hadoop-hdfs in spark

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.