GithubHelp home page GithubHelp logo

Comments (7)

singhravidutt avatar singhravidutt commented on June 15, 2024

It internally calls getItemInfo on objects, is it possible itemInfo for existing objects come as NULL
No (Unless there is a glitch from gcs)

Also alternatively, if overwrite is set to true, should GCS connector in any case send write generation zero?
Can the condition be changed to be like this?

if (!info.exists()) {
return overwrite ? 1 : 0;
}

It is important to send 0 because that will make sure that it will finalize the object if an only if there isn't any version existing for that object (Which can happen if there was another write which got performed and completed)
Will try to explain it with as example

let's assume that these are the timeline of two write requests
Request 1: (T0) |----------------------|(T3)
Request 2: (T1)|--------|(T2)

Here T0<T1<T2<T3.

at time=T0 there was a request to write object1. Generation for it will be 0. Request1 will have ifGenerationMatch=0 in it's write request.

at time=T1 there was another request to write object1. As object is not existent yet we will attach ifGenerationMatch=0 in Request2 as well.

at time=T2 Request 2 is finalized. object1 will be exist with generationId > 0.

at time= T3 Request 1 will try to finalize and get exception because object1 generation is != 0, there is a version already existing for object1.

from hadoop-connectors.

mudit-97 avatar mudit-97 commented on June 15, 2024

@singhravidutt , thanks for the explanation
I have 2 followup doubts on this:

  1. In what cases, can GCS glitch it to be NULL, are there any known cases which we have
  2. If the itemInfo is returned as NULL, can connector code has some configurable retry support where we retry getItemInfo configurable amount of times to check this

from hadoop-connectors.

singhravidutt avatar singhravidutt commented on June 15, 2024

By "NULL" we mean item is not existent.

  1. In what cases, can GCS glitch it to be NULL, are there any known cases which we have

Haven't came across a scenario where item was existing but gcs reported it to be otherwise.

If the itemInfo is returned as NULL, can connector code has some configurable retry support where we retry getItemInfo configurable amount of times to check this

We have logic in place to handle case where gcs throws exception, library don't report itemInfo as NULL in case of exception form GCS (obvisouly except for FileNotFoundException)

from hadoop-connectors.

mudit-97 avatar mudit-97 commented on June 15, 2024

@singhravidutt , we have seen cases where item is existing always but still generationMatch value is getting passed as 0, what can be the other scenarios in which this can happen? And we created a support ticket also, where GCP support confirmed they saw some Object Not found exceptions intermittently. Although the object is always existing

from hadoop-connectors.

jayadeep-jayaraman avatar jayadeep-jayaraman commented on June 15, 2024

Hi @mudit-97 - are you writing concurrently to the same location via multiple jobs or are you using spark.streaming.concurrentJobs parameter ?

from hadoop-connectors.

mudit-97 avatar mudit-97 commented on June 15, 2024

@jayadeep-jayaraman , yes we are using this parameter spark.streaming.concurrentJobs

from hadoop-connectors.

jayadeep-jayaraman avatar jayadeep-jayaraman commented on June 15, 2024

Please check this SPARK-21065. spark.streaming.concurrentJobs is not recommended to be used in production systems and has been removed in newer versions of Spark

from hadoop-connectors.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.