GithubHelp home page GithubHelp logo

Comments (13)

manuelgdlvh avatar manuelgdlvh commented on June 18, 2024 1

Hi @Michael-A-McMahon , I did the lasts tests using the jvm flag that you said me and now its working properly. I will investigate the interaction between kubernetes - jvm because using the same image in docker works. All information found will be shared with you and community. Thanks for all!

from oracle-r2dbc.

manuelgdlvh avatar manuelgdlvh commented on June 18, 2024 1

Perhaps that is a bug related with the containerized environments and the JVM.
Link to Bug here: https://bugs.openjdk.org/browse/JDK-8279484

from oracle-r2dbc.

Michael-A-McMahon avatar Michael-A-McMahon commented on June 18, 2024

Hi @manuelgdlvh. Thanks for reporting this.

If no database session is created, then it sounds like the driver is stuck at some step when opening a connection. Would it be possible to share a thread dump, such as one from a jstack command? This might tell us where the driver has become stuck.

from oracle-r2dbc.

manuelgdlvh avatar manuelgdlvh commented on June 18, 2024

Hi @Michael-A-McMahon this is the thread dump: threaddump.txt
For all you need, im here!

from oracle-r2dbc.

Michael-A-McMahon avatar Michael-A-McMahon commented on June 18, 2024

Thanks for the thread dump. I can't pin point a root cause just yet, but we do have some information. I have some questions which I think will help us find a resolution:

Have you tried opening a plain JDBC connection in this environment? I wonder if Oracle R2DBC is sending a login request to the database, and simply not receiving a response? If this is the case, then we should be able to see the same behavior with Oracle JDBC.

Are there any errors appearing in logs or stderr? It may be the case that the connection is failing, and the Exception is getting dropped somewhere along the way.

Is it possible that -Djava.util.concurrent.ForkJoinPool.common.parallelism=0? If this system property is set, then Oracle R2DBC/JDBC will have no threads to execute asynchronous callbacks. The driver will submit a task to the common ForkJoinPool, which will then silently fail to execute it.

Below are just my notes about the thread dump:

A thread named "oracle.jdbc.driver.BlockSource.ThreadedCachingBlockSource.BlockReleaser" is present. This thread is used by Oracle JDBC to manage a memory pool for bind data and row data. It is only started when a connection is requested from the driver. So we know that some part of this application, possibly Oracle R2DBC, requested a java.sql.Connection from the Oracle JDBC Driver.

A thread named "oracle.net.nt.TcpMultiplexer" is present. This thread is used by Oracle JDBC to poll for network readiness. It is only started when a method of Oracle JDBC's Reactive Extensions is called. So we know that some part of this application, almost certainly Oracle R2DBC, called a reactive method of Oracle JDBC.

No thread appears to be stuck at oracle.jdbc.driver.RestrictedLock.lock(RestrictedLock.java:247). This was a symptom of deadlocks we saw in earlier releases of Oracle R2DBC. (#48 (comment)). We can rule this out as a root cause.

No thread having a name like "ForkJoinPool.commonPool-worker-" is present. By default, Oracle R2DBC/JDBC use the common ForkJoinPool to execute callbacks for reactive methods. If a callback has ever executed, then we should see at least one common FJP thread.
I did note that the JavaDoc of FJP states "... threads are slowly reclaimed during periods of non-use ...". So could this thread dump have occured after an period of non-use, when all common FJP threads had been stopped? This might explain why we don't see them in the dump.
By default, the number of common FJP threads appears to be processors minus 1. However, if there is just one processor, then there is 1 common FJP thread.
A non-default number of common FJP threads can be configured with -Djava.util.concurrent.ForkJoinPool.common.parallelism. If this system property is set to
0, then tasks submitted to the common FJP are never executed (at least never executed asynchronously).
Another possibility is that Oracle JDBC simply never received a network response from the database. If no response is received, then no async callback is executed, and no common FJP thread ever needs to be created.

from oracle-r2dbc.

manuelgdlvh avatar manuelgdlvh commented on June 18, 2024

Hi @Michael-A-McMahon , the most strange thing is that with jdbc inside k8s cluster works well and with the service dockerized and running outside cluster works with jdbc and r2dbc.

from oracle-r2dbc.

Michael-A-McMahon avatar Michael-A-McMahon commented on June 18, 2024

If Oracle JDBC can connect synchronously, then something specific to its async/reactive code must be causing the failure. Could be something about using java.nio.channels.Selector, or something about using the common ForkJoinPool.
As we don't see any FJP threads in the dump, I can think of two possibilities:

  1. The Selector never sees read-readiness on the socket, meaning that the database never responded to JDBC's connection request.
  2. The common FJP is configured to use 0 threads. I believe that can only happen if java.util.concurrent.ForkJoinPool.common.parallelism=0.

The first possibility seems more likely at this point, as the second would require a non-default system property setting which I think you've already checked for. We could completely rule out the second possibility by testing with:
-Djava.util.concurrent.ForkJoinPool.common.parallelism=1

As an experiment, we could also try updating the ojdbc11 dependency:

      <groupId>com.oracle.database.jdbc</groupId>
      <artifactId>ojdbc11</artifactId>
      <version>23.2.0.0</version>

Oracle R2DBC currently depends on the 21.7.0.0 build. But code for reactive extensions has been significantly overhauled in the 23 version. It's possible we would get a different result with 23.

*Edited the artifactId

from oracle-r2dbc.

Michael-A-McMahon avatar Michael-A-McMahon commented on June 18, 2024

That is interesting! I was convinced that FJP should have at least 1 thread, unless explicitly configured otherwise. But it sounds like you had to explicitly configure parallelism to be 1 or greater. Looking forward to learning more.

from oracle-r2dbc.

Michael-A-McMahon avatar Michael-A-McMahon commented on June 18, 2024

Fascinating. I think Oracle R2DBC could provide a safe guard for this situation, something rooted in this pseudo-ish code:

static final Executor DEFAULT_EXECUTOR;
static {
  int commonFjpParallelism = ForkJoinPool.commonPool().parallelism();
  if (commonFjpParallelism < 1)
    DEFAULT_EXECUTOR = Executors.newSingleThreadExecutor();
  else
    DEFAULT_EXECUTOR = ForkJoinPool.commonPool();
}

Let's leave this issue open so I can look further into a fix. This is a nasty bug, and I don't want other users to hit it.

from oracle-r2dbc.

Michael-A-McMahon avatar Michael-A-McMahon commented on June 18, 2024

Hi again @manuelgdlvh. Could you share the JDK version where this issue occurred?
I'm wondering if it may have been caused by https://bugs.openjdk.org/browse/JDK-8274349

from oracle-r2dbc.

manuelgdlvh avatar manuelgdlvh commented on June 18, 2024

Hi @Michael-A-McMahon,

The JDK version where the issue occurs for me is the one provided by the openjdk17-alpine Docker image.

image

In addition to the solution of adding the JVM flag for parallelism, I also tested setting the minimum CPU and RAM resources in the Kubernetes deployment, and it worked well too. In my test, I set the minimum CPU to 1 and RAM to 1GB.

I believe that when using only Docker, the JVM can effectively read the resources of the host machine. However, in Kubernetes, if we don't provide minimum resource settings, it may not be able to correctly determine the available machine resources.

Anything I can help you with, feel free to contact me.

from oracle-r2dbc.

Michael-A-McMahon avatar Michael-A-McMahon commented on June 18, 2024

Hi @manuelgdlvh. Very sorry I left you hanging on this. I got wrapped up in preparations for CloudWorld 2023.

Your JDK version matches that of the afflicted version in https://bugs.openjdk.org/browse/JDK-8274349

If you update to the latest JDK 17, I think you'll find the issue resolves itself without having to set "java.util.concurrent.ForkJoinPool.common.parallelism", and without having to configure resources of Kubernetes. I could be wrong about this, so it would be good if you have a chance to test and verify.

At this point, I think the only case Oracle R2DBC needs to check for is "java.util.concurrent.ForkJoinPool.common.parallelism=0". That should be the only way we get an FJP with zero threads.
The other way, which I think you had, was a bug in the JDK. If the container has one CPU core, the common FJP would incorrectly configure itself with zero threads. The correct behavior, which is present in later builds of JDK 17, is to have 1 thread.

from oracle-r2dbc.

volcanic-tephra avatar volcanic-tephra commented on June 18, 2024

I've just spent days troubleshooting an AWS Fargate deployment where the request to get a connection would hang. This was because the ForkJoinPool.commonPool (default) had no free threads.

To work around this I supplied the connection factory with a custom executor:

    	var executor = Executors.newFixedThreadPool(threads);
    	
        var factory = ConnectionFactories.get(
                ConnectionFactoryOptions.builder()
                  .option(OracleR2dbcOptions.EXECUTOR, executor)
                  .option(ConnectionFactoryOptions.DRIVER, "pool")
                  .option(ConnectionFactoryOptions.PROTOCOL, "oracle")
                  .option(ConnectionFactoryOptions.HOST, properties.getHost())
                  .option(ConnectionFactoryOptions.PORT, properties.getPort())
                  .option(ConnectionFactoryOptions.DATABASE, properties.getDatabase())
                  .option(ConnectionFactoryOptions.USER, properties.getUser())
                  .option(ConnectionFactoryOptions.CONNECT_TIMEOUT, properties.getConnectionTimeout() == null 
                  													? Duration.ofSeconds(10) : properties.getConnectionTimeout())
                  .option(ConnectionFactoryOptions.STATEMENT_TIMEOUT, properties.getStatementTimeout() == null 
					? Duration.ofSeconds(10) : properties.getStatementTimeout())                  
                  .option(ConnectionFactoryOptions.PASSWORD, properties.getPassword())
                  .build());

I raised this issue in #136

from oracle-r2dbc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.