GithubHelp home page GithubHelp logo

Comments (7)

srib avatar srib commented on September 17, 2024 1

Thanks for reporting this issue, @mstormo!

Looks like you are on Windows operating system.

This SO issue recommends to run the snippet of the code within if __name__ == "__main__" guard. Can you please run within the if __name__ == "__main__" guard to see if it works?

from mdio-python.

mstormo avatar mstormo commented on September 17, 2024

This does work around the issue.

However, it's a highly unexpected requirement, given that my own code does not imply any multiprocessing, and the fact that an imported module will force this requirement may have significant implications in non-trivial systems, such as another module where you cannot enforce that the method of execution is isolated like this.

The Python process was forked over 30 times for this simple example.

Please note that Ctrl+C will not abort the process, since the other forked processes indeed continues in the background. So, the above suggested work-around only hides a larger issue.

  File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\segyio\trace.py", line 50, in wrapindex                                                                                                            
Traceback (most recent call last):                                                                                                                                                                                           
                                                                                                                                                                                                                             
    if not 0 <= i < len(self):                                                                                                                                                                                               
    self.run()rs\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\process.py", line 315, in _bootstrap                                                                                                             
  File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\process.py", line 108, in run                                                                                                                    
                                                                                                                                                                                                                             
    self._target(*self._args, **self._kwargs)                                                                                                                                                                                
  File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\pool.py", line 125, in worker                                                                                                                    
    result = (True, func(*args, **kwds))                                                                                                                                                                                     
  File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\pool.py", line 48, in mapstar                                                                                                                    
    return list(map(*args))                                                                                                                                                                                                  
  File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\mdio\segy\_workers.py", line 271, in trace_worker_map                                                                                              
    return trace_worker(*args)                                                                                                                                                                                               
  File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\mdio\segy\_workers.py", line 182, in trace_worker                                                                                                  
    data_array.set_basic_selection(                                                                                                                                                                                          
  File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\zarr\core.py", line 1448, in set_basic_selection                                                                                                   
    return self._set_basic_selection_nd(selection, value, fields=fields)                                                                                                                                                     
  File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\zarr\core.py", line 1748, in _set_basic_selection_nd                                                                                               
    self._set_selection(indexer, value, fields=fields)                                                                                                                                                                       
  File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\zarr\core.py", line 1820, in _set_selection                                                                                                        
    self._chunk_setitems(lchunk_coords, lchunk_selection, chunk_values,                                                                                                                                                      
  File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\zarr\core.py", line 2018, in _chunk_setitems                                                                                                       
    to_store = {k: self._encode_chunk(v) for k, v in cdatas.items()}                                                                                                                                                         
  File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\zarr\core.py", line 2018, in <dictcomp>                                                                                                            
    to_store = {k: self._encode_chunk(v) for k, v in cdatas.items()}                                                                                                                                                         
  File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\zarr\core.py", line 2194, in _encode_chunk                                                                                                         
    cdata = self._compressor.encode(chunk)                                                                                                                                                                                   
  File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\numcodecs\zfpy.py", line 70, in encode                                                                                                             
    return _zfpy.compress_numpy(                                                                                                                                                                                             
KeyboardInterrupt                                                                                                                                                                                                            
Ingesting SEG-Y in 24 chunks:  38%|█████████████                           | 9/24 [00:23<00:11,  1.28block/s]

from mdio-python.

mstormo avatar mstormo commented on September 17, 2024

Correction, Ctrl+C kills the current batch of Python processes, forks a new set, which just hangs on 0% CPU usage.

from mdio-python.

srib avatar srib commented on September 17, 2024

Thank you for your comments, @mstormo.

I don't think the if __name__ == "__main__" guard is unexpected. Documentation of multiprocessing recommends it as part of their programming guidelines.

Importing segy_to_mdio from mdio by default uses workers from https://github.com/TGSAI/mdio-python/blob/main/src/mdio/converters/segy.py#L19 which in turn imports multiprocessing.

The alternatives are as follows:

  1. We can run as single process on Windows which is less desirable.
  2. Allow multiprocessing as an option on Windows
  3. Specify the usage of guard in the documentation.

What do you suggest as a fix? Happy to accept a PR.

from mdio-python.

mstormo avatar mstormo commented on September 17, 2024

The unexpected statement was for MDIO as a "back box" for the end-user, with no mention of the required guard statement in the documentation or examples, while explicitly mentioning Dask for the purpose of parallel distributed processing.

Note that my example here was a trivial reproducible example just to illustrate the problem. In my own case, the implementation was in a plugin in a larger system, where the main system started several executions as a result. Rather convoluted to understand what was going on, given that I was not using subprocessees or the multiprocessing library anywhere in my own code.

As such, it's unexpected for the end-user, while not a surprise to you as the implementor, of course.

Given that the process forking makes the end-user lose control of the terminal (hangs on six idle sub-processes of Python), I think the only option is to run it as a single process by default on Windows, with an option to enable it, if desired. (Option 1 & 2, combined)

I expect the main-guard to be required on Linux too, so examples would need to be updated to indicate such if you keep it enabled by default.

from mdio-python.

tasansal avatar tasansal commented on September 17, 2024

Hi @mstormo, thanks for letting us know!

We will make updates to the documentation based on your feedback.

As @srib mentioned, when we use multiprocessing in Python, it almost always needs a main guard; by default, the ingestion uses multiprocessing (not Dask). The reading, writing, and export can use Dask if needed, but it's off by default. We have plans to Daskify the ingestion as well. We should clarify this for sure.

from mdio-python.

tasansal avatar tasansal commented on September 17, 2024

https://mdio-python.readthedocs.io/en/latest/reference.html#seismic-data

from mdio-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.