GithubHelp home page GithubHelp logo

Comments (15)

anoopsjohn avatar anoopsjohn commented on September 28, 2024

There is an open jira in HBase to support cross region transaction for regions in same RS. Once that is done, the same thing should be used here and rewrite.

from hindex.

chrajeshbabu avatar chrajeshbabu commented on September 28, 2024

How batchMutateForIndex different with batchMuate is

  1. no check for resources (blocking memstore size check) - since we are checking for blocking memstore size intially in postStartRegionOperation, having this check should not cause any problem.
  2. no start/closeRegionOperations - with this calls we will just aquire lock and release it, having this also should not cause any problems.

So I am thinking we can use bacthMutate directly instead of having seperate method in kernel, What do you think?

from hindex.

anoopsjohn avatar anoopsjohn commented on September 28, 2024

No there were issues with having a resource check. It was giving a deadlock issue. I forgot the full context now. Ram can tell.

from hindex.

ramkrish86 avatar ramkrish86 commented on September 28, 2024

It was something related to flush and puts happening for the index region. During puts and flush the resource check would be done and would try to acquire a lock.
If you can find out that deadlock issue, we have a supporting testcase to prove that. If you revert the change and run that testcase it would lead to deadlock and with the fix it would avoid it. I can check it but may take some time.

from hindex.

ramkrish86 avatar ramkrish86 commented on September 28, 2024

If you can find out that deadlock issue
I mean the JIRA for that deadlock issue.

from hindex.

chrajeshbabu avatar chrajeshbabu commented on September 28, 2024

I will provide the details here

from hindex.

chrajeshbabu avatar chrajeshbabu commented on September 28, 2024

description of the deadlock issue - this wont be solved by postStart/CloseRegionOperations hooks?

The main table region is going on. As part of flush we will expect the MVCC to be completed.

// wait for all in-progress transactions to commit to HLog before
// we can start the flush. This prevents
// uncommitted transactions from being written into HFiles.
// We have to block before we start the flush, otherwise keys that
// were removed via a rollbackMemstore could be written to Hfiles.
mvcc.waitForRead(w);
At the same time the blocking memstore size is reached. So none of the puts to indextable and main table is getting thro. Now as the blocking memstore size is reached

// hook to complete the actual put
if (coprocessorHost != null) {
List mutations = new ArrayList();
for (int i = firstIndex; i < lastIndexExclusive; i++) {
// only for successful puts
if (batchOp.retCodeDetails[i].getOperationStatusCode() != OperationStatusCode.SUCCESS) {
continue;
}
Mutation m = batchOp.operations[i].getFirst();
mutations.add(m);
}
coprocessorHost.postBatchMutate(mutations, walEdit);
}

  // ------------------------------------------------------------------
  // STEP 8. Advance mvcc. This will make this put visible to scanners and getters.
  // ------------------------------------------------------------------
  if (w != null) {
    mvcc.completeMemstoreInsert(w);
    w = null;
  }

We do completeMemstoreInsert after postBatchMutate. So the first flush that was waiting for the MVCC to move just hangs up. This leads to a deadlock situation.

from hindex.

chrajeshbabu avatar chrajeshbabu commented on September 28, 2024

TestForComplexIssues#testHDP3015

from hindex.

ramkrish86 avatar ramkrish86 commented on September 28, 2024

@Rajesh
So you suggest that

So I am thinking we can use bacthMutate directly instead of having seperate method in kernel, What do you think?
doing this would solve the problem? Means my question is you already know this was the issue and the soln that you give now will solve the problem?

from hindex.

ramkrish86 avatar ramkrish86 commented on September 28, 2024

@chrajeshbabu
Where you able to reproduce the issue with the testcase?

from hindex.

chrajeshbabu avatar chrajeshbabu commented on September 28, 2024

if we dont have hooks postStart/CloseRegionOperations, this problem will come. Presently there is no deadlock issue in the code.
But because of
#26 the test case is getting hanged. I have corrected and ran the test case, then its passing.
Even if we use batchMutate instead of batchMutateForIndex also its passing. Thats why I am suggesting we can use batchMutate only. Then we can avoid kernel changes.

from hindex.

ramkrish86 avatar ramkrish86 commented on September 28, 2024

I will check the code and get back on this.

from hindex.

chrajeshbabu avatar chrajeshbabu commented on September 28, 2024

ok Ram.

from hindex.

ramkrish86 avatar ramkrish86 commented on September 28, 2024

I can tell you what was the reason for the fix.
Table region puts happened and we are in postBatchMutate.
Before the fix the postBatchMutate was calling batchMutate for index that would try to acquire the lock and also will checkForResources.
Now by this time if the memstoreSize has reached the main table will be trying to flush and that time will wait for the mvcc to be completed (this is for the main puts).
So in side the batchMutate that is called for index it would wait for the memStoresize to come down.
As the completeMemstoreinsert is after the postBatchMutate everything will be hanging.
If am trying to explain what you know already kindly excuse me.

So the idea for the fix is that even before you start the mutation operation for the main and index region you acquire the lock and check for the resources. So this means that once you say batchMutate() if at that time the resources are available there will not be any block on the current mutation on the index and main region.
So while now postBatchMutate is called, it would still proceed with the puts for the index region and once it is done the flush also would be completed.
We may have more data at that point of time in the memstore and the flush completion would again make room for new entries in memstore.

from hindex.

anoopsjohn avatar anoopsjohn commented on September 28, 2024

The fix is needed. We should have a new batchMutate in HRegion which will not do any resource check and acquire any lock. (Acquire lock again is fine?)
This new method will be part of the Jira which is raised to support the cross region transaction.

from hindex.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.