janeliascicomp / bigstitcher-spark Goto Github PK

Running compute-intense parts of BigStitcher distributed

License: BSD 2-Clause "Simplified" License

Java 96.99% Shell 3.01%

bigstitcher-spark's Introduction

BigStitcher-Spark

Supported by the HHMI Janelia Open Science Software Initative

This package allows you to run compute-intense parts of BigStitcher distributed on your workstation, a cluster or the cloud using Apache Spark. The following modules are currently available in BigStitcher-Spark listed as JavaClassName/cmd-line-tool-name (you can find documentation below, but a good start is also to just check out the cmd-line arguments, they mostly follow the BigStitcher GUI; each module takes an existing XML):

SparkResaveN5/resave (resave an XML dataset you defined in BigStitcher - use virtual loading only - into N5 for processing)
SparkInterestPointDetection/detect-interestpoints (detect interest points for alignment)
SparkGeometricDescriptorMatching/match-interestpoints (perform pair-wise interest point matching)
SparkPairwiseStitching/stitching (run pairwise stitching between overlapping tiles)
Solver/solver (perform the global solve, works with interest points and stitching)
SparkAffineFusion/affine-fusion (fuse the aligned dataset using affine models, including translation)
SparkNonRigidFusion/nonrigid-fusion (fuse the aligned dataset using non-rigid models)

Additonally there are some utility methods:

SparkDownsample/downsample (perform downsampling of existing volumes)
ClearInterestPoints/clear-interestpoints (clears interest points)
ClearRegistrations/clear-registrations (clears registrations)

Note: BigStitcher-Spark is designed to work hand-in-hand with BigStitcher. You can always verify the results of each step BigStitcher-Spark step interactively using BigStitcher by simply opening the XML. You can of course also run certain steps in BigStitcher, and others in BigStitcher-Spark. Not all functionality is 100% identical between BigStitcher and BigStitcher-Spark; important differences in terms of capabilities is described in the respective module documentation below (typically BigStitcher-Spark supports a specific feature that was hard to implement in BigStitcher and vice-versa).

Content

Install and Run
- Local
- Cluster
- Cloud
Example Datasets
Usage

Install and Run

To run it on your local computer

Prerequisites: Java and maven must be installed.
Clone the repo and cd into BigStitcher-Spark
Run the included bash script ./install -t <num-cores> -m <mem-in-GB> specifying the number of cores and available memory in GB for running locally. This should build the project and create the executable resave, detect-interestpoints, register-interestpoints, stitching, solver, affine-fusion, nonrigid-fusion, downsample, clear-interestpoints and clear-registrations in the working directory.

If you run the code directly from your IDE, you will need to add JVM paramters for the local Spark execution (e.g. 8 cores, 50GB RAM):

-Dspark.master=local[8] -Xmx50G

To run it on a compute cluster

mvn clean package -P fatjar builds target/BigStitcher-Spark-0.0.1-SNAPSHOT.jar for distribution.

Important: if you use HDF5 as input data in a distributed scenario, you need to set a common path for extracting the HDF5 binaries (see solved issue here), e.g.

--conf spark.executor.extraJavaOptions=-Dnative.libpath.jhdf5=/groups/spruston/home/moharb/libjhdf5.so

Please ask your sysadmin for help how to run it on your cluster, below are hopefully helpful tutorials for different kinds of clusters. They can be helpful to transfer the knowledge to your home institution.

Instructions/stories how people set up Spark/BigStitcher-Spark on their respective clusters:

HHMI Janelia (LSF cluster): Tutorial on YouTube by @trautmane

To run it on the cloud

mvn clean package -P fatjar builds target/BigStitcher-Spark-0.0.1-SNAPSHOT.jar for distribution.

For running the fatjar on the cloud check out services such as Amazon EMR. An implementations of image readers and writers that support cloud storage can be found here. Note that running it on the cloud is an ongoing effort with @kgabor, @tpietzsch and the AWS team that currently works as a prototype but is further being optimized. We will provide an updated documentation in due time. Note that some modules support prefetching --prefetch, which is important for cloud execution due to its delays as it pre-loads all image blocks in parallel before processing.

Example Datasets

We provide two example datasets (one for interest-point based registration, one that works well with Stitching), which are available for download several times with increasing level of reconstruction so you can test different modules of BigStitcher-Spark directly. The datasets are again linked throughout the documentation for the individual modules. If you would like to test the entire pipeline we suggest to start with RAW datasets and run the entire pipeline. Here is an overview of the two datasets at different stages:

Dataset for Stitching:
- As TIFF (unaligned, no BigStitcher project defined)
- As TIFF/XML (unaligned)
- As N5/XML (unaligned)
- As N5/XML containing pairwise stitching results (unaligned)
- As N5/XML (aligned)
Dataset for Interest Points:
- As TIFF (unaligned, no BigStitcher project defined)
- As TIFF/XML (unaligned)
- As N5/XML (unaligned)
- As N5/XML containing interest points (unaligned)
- As N5/XML containing matched interest points (unaligned)
- As N5/XML (aligned)

For this tutorial I extracted the Stitching dataset into ~/SparkTest/Stitching and the dataset for experimenting with interest points into ~/SparkTest/IP.

Usage

Resave Dataset

When working with BigStitcher the first step is to define a dataset, where the goal is to provide sufficient meta-data in order to allow BigStitcher (and BigDataViewer) to load your images. This step is typically done in the BigStitcher GUI, but some people have written scripts to automatically generate XML's for their datasets (e.g. here). If you want to start testing an entire BigStitcher(-Spark) pipeline from scratch, please use this dataset for stitching or this one using interest points.

After the dataset is defined one usually re-saved the input data (TIFF, CZI, ...) into a multi-resolution format that makes it possible to interactively display and work with the data the image as various resolution levels, and is essential for distributed processing. Right now, we use the N5 format for temporary storage of the input data. This resaving process can take substantial amounts of time if your input is large and can be distributed using Spark. Importantly, you need to define your dataset using the Automatic Loader (Bioformats based) and select to Load data directly and load data virtually.

For testing the re-saving (and multi-resolution pyramid creation) with Spark use your defined dataset(s), or download this dataset for stitching or this dataset for interest points

The command for resaving the stitching dataset could look like this and will overwrite the input XML, a backup XML will be automatically created:

./resave -x ~/SparkTest/Stitching/dataset.xml -xo ~/SparkTest/Stitching/dataset.xml

It is analogous for the interest point dataset:

./resave -x ~/SparkTest/IP/dataset.xml -xo ~/SparkTest/IP/dataset.xml

Please run resave without parameters to get help for all command line arguments. Using --blockSize you can change the blocksize of the N5, and --blockScale defines how many blocks at once will be processed by a Spark job. With -ds you can define your own downsampling steps if the automatic ones are not well suited.

Note: --dryRun allows the user to test the functionality without writing any data. The Spark implementation parallelizes over user-defined blocks across all input images at once, so also few, very big images will be processed efficiently.

Pairwise Stitching

To perform classical stitching (translation only), first pair-wise stitching between all overlapping tiles needs to be computed. So far we only support standard grouping where all channels and illuminations of a specific Tile will be grouped together as one image and stitching is performed individually per Timepoint and Angle. To run the stitching with default paramters you can run the following command on this example dataset:

./stitching -x ~/SparkTest/Stitching/dataset.xml

The results will be written in to the XML, in order to compute transformation models and apply them to the images you need to run the solver that computes a global optimization next.

Please run stitching without parameters to get help for all command line arguments. -ds sets the resolution at which cross correlation is performed; 2,2,1 is the default and usually superior to 1,1,1 due to suppressed noise, even higher resolution levels typically work well too since by default the peaks are located with subpixel accuracy. --disableSubpixelResolution disables subpixel accurate shifts. -p sets the number of phase correlation peaks that are checked with cross-correlation (incrementing this number can help with stitching issues). --minR and --maxR are filters that specify the accepted range for cross correlation for any pair of overlapping tiles, reducing --minR may be useful to accept more pair and excluding a --maxR of 1 may be useful too if you get wrong links with r=1.0. --maxShiftX/Y/Z and --maxShiftTotal set the maximal allowed shift between any pair of images relative to their current location; limiting it if the current position is close to the correct one might be useful. If your dataset contains multiple channels or illumination directions per Tile, you can select how they will be combined for the pair-wise stitching process using --channelCombine and --illumCombine, which can be either AVERAGE or PICK_BRIGHTEST.

You can choose which Tiles --tileId, Channels --channelId, Iluminations --illuminationId, Angles --angleId and Timepoints --timepointId will be processed, a typical choice could be --timepointId 18 --tileId 0,1,2,3,6,7,8 to only process the timepoint 18 and select Tiles. If you would like to choose Views more fine-grained, you can specify their ViewIds directly, e.g. -vi '0,0' -vi '0,1' -vi '1,1' to process ViewId 0 & 1 of Timepoint 0 and ViewId 1 of Timepoint 1. By default, everything will be processed.

Note: --dryRun allows the user to test the functionality without writing any data. The Spark implementation parallelizes over pairs of images.

Detect Interest Points

Interest-point based registration is generally more reliable and faster than stitching while supporting various transformation models including (regularized) Translation, (regularized) Rigid, (regularized) Affine, and Non-Rigid. At the same time parameter selection is more involved. The first step is to detect interest points in the images. A typical command line call that works well on this example dataset looks as follows:

./detect-interestpoints -x ~/SparkTest/IP/dataset.xml -l beads -s 1.8 -t 0.008 -dsxy 2 --minIntensity 0 --maxIntensity 255

The results will be written in the XML and the interestpoints.n5 directory, in order to compute transformation models and apply them to the images you need to run matching followed by the solver that computes a global optimization. If you want to inspect the interest points you can open the XML in BigStitcher, go to multiview mode, right click, start the Interest Point Explorer and click on the number of detections that will then be overlaid onto the images (see screenshot below).

Please run detect-interestpoints without parameters to get help for all command line arguments. -s and -t define the sigma and threshold of the Difference-of-Gaussian, respectively and -l specifies the label for the interest points. --minIntensity and --maxIntensity set the intensity range in which all processed blocks are normalized to [0...1]; these values are mandatory since each individual Spark job is unable to figure out correct min/max values of the images. You could find good guesses for all these values by starting the interactive interest point detection in BigStitcher. -dsxy and -dsz define the downsampling at which interest point detection is performed. Using --localization you can specify the type of subpixel localization, either NONE or QUADRATIC. --type allows to set which type of intensity peaks should be identified; MIN, MAX or BOTH. Finally, --blockSize sets the blocksize that will be processed in each Spark job.

--overlappingOnly is a feature that will only identify interest points in areas of each image that is currently overlapping with another image. --storeIntensities will extract intensities of each interest point and store it in the interestpoints.n5 directory as extra datasets. --prefetch will use parallel threads to pre-load all image data blocks ahead of the computation, which is desirable for cloud execution.

Note: --dryRun allows the user to test the functionality without writing any data. The Spark implementation parallelizes over user-defined block across all processed images at once.

Match Interest Points

After interest points are detected they are pair-wise matching between all views/images (Note: this also works for Stitching, try it out). Several point cloud matching methods and ways how views can be grouped are supported, which will be explained below. Importantly, matching & solving can be performed once or iteratively; typical workflows that match & solve more than once are 1) to first align each timepoint of a series using affine models individually followed by registration across time using translation models or 2) to first align using geometric descriptor matching to then subsequently refine the result using Iterative Closest Point (ICP) that only works once the current transformation is very good (Note: ICP alignment creates many corresponding interest points that might be desirable for Non-Rigid fusion where all corresponding interest points are perfectly matched on top of each other).

A typical, simple command line call to register each timepoint alignment individually using this example looks like:

./match-interestpoints -x ~/SparkTest/IP/dataset.xml -l beads -m FAST_ROTATION --clearCorrespondences

Please run match-interestpoints without parameters to get help for all command line arguments. -l defines the label of the detected interest points used for matching. -tm specifies the transformation model to be used (TRANSLATION, RIGID or (default)AFFINE), -rm defines the regularization model (NONE, IDENTITY, TRANSLATION, (default)RIGID or AFFINE) and --lambda [0..1] is the lambda for the regularization model, which is set to 0.1 by default. -vr defines which views/images will be matched; OVERLAPPING_ONLY or ALL_AGAINST_ALL. --clearCorrespondence removes potentially existing, stored matches between views, if it is not called the identified matches will be added to the existing ones.

-m defines the matching method; FAST_ROTATION, FAST_TRANSLATION, PRECISE_TRANSLATION or ICP.

FAST_ROTATION is a rotation invariant method that uses geometric hashing and can find corresponding constellation of points even if they are significantly rotated relative to each other.
- -s defines the significance during descriptor matching, to establish a correspondence the best matching descriptor has to be s times better than the second best matching descriptor.
- -r is the level of redundancy during descriptor matching, it adds extra neighbors to each descriptor and tests all combinations of neighboring points.
FAST_TRANSLATION is a translation invariant method that uses geometric hashing and can find corresponding constellation of points irrespective of their location in the image. It tolerates small rotatation of up to a few degrees.
- supports the same parameters -r, -s as FAST_ROTATION above
PRECISE_TRANSLATION is a translation invariant method that uses exhaustive search to find corresponding constellation of points irrespective of their location in the image. It tolerates small rotatation of up to a few degrees.
- supports the same parameters -r, -s as FAST_ROTATION above, and additionally support -n to specify the number of neighboring points that are used to build geometric descriptors
ICP is a method that iteratively assignes closest pairs of points between two images until convergence and can be used for fine alignment.
- -ime is the ICP maximum allowed error, -iit defines the number of ICP iterations and --icpUseRANSAC enables RANSAC at every ICP iteration

All methods use RANSAC to robustly identify a set of corresponding points in the set or correspondence candidates (optional for ICP). -rit defines the number of RANSAC iterations (increasing might help to find more correspondences), -rme the maximum error (epsilon) for RANSAC (increasing might help to find more correspondences), -rmir the minimum inlier ratio (setting to 0.0 might help to find more correspondences), and -rmif defines the minimum inlier factor for RANSAC (i.e. how many times the minimal number of inliers required by the transformation model need to be identified so a pair is valid).

By default, all views/images are matched individually. However, under certain conditions it may be useful to group certain views together (see illustration below). --splitTimepoints groups all angles/channels/illums/tiles that belong to the same timepoint as one single View, e.g. for stabilization across time. --groupChannels groups all channels that belong to the same angle/illumination/tile/timepoint together as one view, e.g. to register all channels together as one. --groupTiles groups all tiles that belong to the same angle/channel/illumination/timepoint together as one view, e.g. to align across angles. --groupIllums groups all illumination directions that belong to the same angle/channel/tile/timepoint together as one view, e.g. to register illumation directions together. Importantly, interest points in overlapping areas of grouped views need to be merged; --interestPointMergeDistance allows to set the merge distance. Note: You do not need to group views for interest point matching in order to group views during solving, these are two independent operations. However, it (usually) is not advisable to only group during matching.

When performing timeseries alignment, grouping is often a good choice (--splitTimepoints) and further details regarding matching across time need to be specified. Important: if you are running a second (or third) round of matching, you always need to solve in between to bake in the resulting transformations. -rtp defines the type of time series registration; TIMEPOINTS_INDIVIDUALLY (i.e. no registration across time), TO_REFERENCE_TIMEPOINT, ALL_TO_ALL or ALL_TO_ALL_WITH_RANGE. Depending on your choice you may need to define the range of timepoints --rangeTP or the reference timepoint --referenceTP. Below is an example command line call for aligning all views against all across time:

./match-interestpoints -x ~/SparkTest/IP/dataset.xml -l beads -m FAST_ROTATION --clearCorrespondences -rtp ALL_TO_ALL --splitTimepoints

Note: --dryRun allows the user to test the functionality without writing any data. The Spark implementation parallelizes over pairs of images.

Solver

The Solver computes a globally optimized result (one transformation per view/image) using all pairwise matches (interest points or stitching), specifically by minimizing the distance between all corresponding points (paiwise stitching is also expressed as a set of corresponding points) across all images/views. A typical call for running the solver on stitching results is (e.g. this dataset):

./solver -x ~/SparkTest/Stitching/dataset.xml -s STITCHING

and when using matched interestpoints individually per timepoint it is (e.g. this dataset):

./solver -x ~/SparkTest/IP/dataset.xml -s IP -l beads

Please run solver without parameters to get help for all command line arguments. -s switches between STITCHING and IP (interest points) mode, -l defines the interest point label in the latter case. By default the first view of each timepoint will be fixed, -fv allows to specify certain views to be fixed and --disableFixedViews will not fix any views (in this case make sure to not use plain affine models). -tm specifies the transformation model to be used (TRANSLATION, RIGID or (default)AFFINE), -rm defines the regularization model (NONE, IDENTITY, TRANSLATION, (default)RIGID or AFFINE) and --lambda [0..1] is the lambda for the regularization model, which is set to 0.1 by default.

--maxError sets the maximum allowed error for the solve (it will iterate at least until it is under that value), --maxIterations defines the maximum number of iterations, and --maxPlateauwidth defines the number of iterations that are used to estimate if the solve converged (and is thus also the minimal number of iterations).

There are several types of solvers available; --method allows to choose ONE_ROUND_SIMPLE, ONE_ROUND_ITERATIVE, TWO_ROUND_SIMPLE or TWO_ROUND_ITERATIVE. Two round handles unconnected tiles (it moves them into their approximate right location using metadata using a second solve where all views that are connected are grouped together), while iterative tries to identify and remove wrong/inconsistent links between pairs of views. The iterative strategies are parameterized by --relativeThreshold (relative error threshold, how many times worse than the average error a link between a pair of views needs to be) and the --absoluteThreshold (absoluted error threshold to drop a link between a pair of views - in pixels). The error is computed as the difference between the pairwise alignment of two views and their alignment after running the solve.

By default, all views/images are matched individually. However, under certain conditions it may be useful to group certain views together (see illustration above). --splitTimepoints groups all angles/channels/illums/tiles that belong to the same timepoint as one single View, e.g. for stabilization across time. --groupChannels groups all channels that belong to the same angle/illumination/tile/timepoint together as one view, e.g. to register all channels together as one. --groupTiles groups all tiles that belong to the same angle/channel/illumination/timepoint together as one view, e.g. to align across angles. --groupIllums groups all illumination directions that belong to the same angle/channel/tile/timepoint together as one view, e.g. to register illumation directions together.

When performing timeseries alignment, grouping is usually a good choice (--splitTimepoints) and further details regarding matching across time need to be specified. Important: if you are running a second (or third) round of matching, you usually need to match interest points again. -rtp defines the type of time series registration; TIMEPOINTS_INDIVIDUALLY (i.e. no registration across time), TO_REFERENCE_TIMEPOINT, ALL_TO_ALL or ALL_TO_ALL_WITH_RANGE. Depending on your choice you may need to define the range of timepoints --rangeTP or the reference timepoint --referenceTP. Below is an example command line call for aligning all views against all across time:

When using interestpoints (for timeseries alignment with grouping all views of a timepoint together) a tyical call could look like that:

./solver -x ~/SparkTest/IP/dataset.xml -s IP -l beads -rtp ALL_TO_ALL_WITH_RANGE --splitTimepoints

Note: --dryRun allows the user to test the functionality without writing any data. The solver currently only runs multi-threaded.

Affine Fusion

Performs fusion using affine transformation models computed by the solve (including translations) that are stored in the XML (Warning: not tested on 2D). By default the affine fusion will create an output image that contains all transformed input views/images. While this is good in some cases such as tiled stitching tasks, the output volume can be unnecessarily large for e.g. multi-view datasets. Thus, prior to running the fusion it might be useful to define a custom bounding box in BigStitcher.

A typical set of calls (because it is three channels) for affine fusion into a multi-resolution ZARR using only translations on the stitching dataset is (e.g. this dataset):

./affine-fusion -x ~/SparkTest/Stitching/dataset.xml -o ~/SparkTest/Stitching/fused.zarr -d /ch0/s0 -s ZARR --multiRes --preserveAnisotropy --UINT8 --minIntensity 0 --maxIntensity 255 --channelId 0

./affine-fusion -x ~/SparkTest/Stitching/dataset.xml -o ~/SparkTest/Stitching/fused.zarr -d /ch1/s0 -s ZARR --multiRes --preserveAnisotropy --UINT8 --minIntensity 0 --maxIntensity 255 --channelId 1

./affine-fusion -x ~/SparkTest/Stitching/dataset.xml -o ~/SparkTest/Stitching/fused.zarr -d /ch2/s0 -s ZARR --multiRes --preserveAnisotropy --UINT8 --minIntensity 0 --maxIntensity 255 --channelId 2

You can open the ZARR in Fiji (File > Import > HDF5/N5/ZARR/OME-NGFF or Plugins > BigDataViewer > HDF5/N5/ZARR/OME-NGFF), using n5-view in the n5-utils package (./n5-view -i ~/SparkTest/Stitching/fused.zarr -d /ch0) or in Napari (simply drag&drop e.g. the ch0 or a s0 folder).

The dataset that was aligned using interest points can be fused in a similar way, except that here we choose to use the bounding box embryo that was specified using BigStitcher and we choose to save as an BDV/BigStitcher project using N5 as underlying export data format:

./affine-fusion -x ~/SparkTest/IP/dataset.xml -o ~/SparkTest/IP/fused.n5 -xo ~/SparkTest/IP/dataset-fused.xml -s N5 -b embryo --bdv 18,0 --multiRes --UINT8 --minIntensity 0 --maxIntensity 255 --timepointId 18

./affine-fusion -x ~/SparkTest/IP/dataset.xml -o ~/SparkTest/IP/fused.n5 -xo ~/SparkTest/IP/dataset-fused.xml -s N5 -b embryo --bdv 30,0 --multiRes --UINT8 --minIntensity 0 --maxIntensity 255 --timepointId 30

In additon to the opening methods mentioned above, you can also directly open the dataset-fused.xml in BigStitcher or BigDataViewer; unfortunately opening of N5's in a vanilla Napari is not supported.

Note: since both acquisitions have more than one channel or timepoint it is important to fuse these into seperate output volumes, respectively.

Running affine-fusion without parameters lists help for all command line arguments. -o defines the output volume base location and -s the output type N5, ZARR, or HDF5 (latter only when running on a single computer). Importantly, one can fuse several volumes into the same N5, ZARR or HDF5 container by running the fusion consecutively and specifying different folders or BDV ViewIds. --bdv will create fused volumes together with an XML that can be directly opened by BigStitcher or BigDataViewer, where -xo defines the location of the XML for the fused dataset. Alternatively, you need to specify a dataset location instead using -d. --multiRes will create multiresolution pyramids of the fused image; when using -d the dataset needs to end with s0 in order to be able to create the multiresolution pyramid. -ds allows to optionally specify the downsampling steps for the multiresolution pyramid manually.

You can fuse the image using datatypes --UINT8 [0..255], --UINT16 [0..65535] or by default --FLOAT32. UINT8 and UINT16 requires you to set --minIntensity and --maxIntensity, which define the range of intensities that will be mapped to [0..255] or [0..65535], respectively. If you want to specify a bounding box use -b. --preserveAnisotropy will preserve the anisotropy of the input dataset, which is a recommended setting if all views/images are taken in the same orientation, e.g. when processing a tiled dataset.

--blockSize defaults to 128x128x128, which you might want to reduce when using HDF5. --blockScale defines how many blocks to fuse in a single processing step, e.g. 4,4,1 means for blockSize of 128,128,64 that each spark thread processes 512,512,64 blocks.

You can choose which Tiles --tileId, Channels --channelId, Iluminations --illuminationId, Angles --angleId and Timepoints --timepointId will be processed. For fusion one normally chooses a specific timepoint and channel, e.g. --timepointId 18 --channelId 0 to only fuse timepoint 18 and Channel 0 into a single volume. If you would like to choose Views more fine-grained, you can specify their ViewIds directly, e.g. -vi '0,0' -vi '0,1' to process ViewId 0 & 1 of Timepoint 0. By default, all images/views will be fused into a single volume, which is usually not desired.

Note: --dryRun allows the user to test the functionality without writing any data. It scales to large datasets as it tests for each block that is written which images are overlapping. For cloud execution one can additionally pre-fetch all input data for each compute block in parallel. You need to specify the XML of a BigSticher project and decide which channels, timepoints, etc. to fuse.

Non-Rigid Fusion

nonrigid-fusion performs non-rigid distributed fusion using net.preibisch.bigstitcher.spark.SparkNonRigidFusion. The arguments are identical to the Affine Fusion, and one needs to additionally define the corresponding interest points, e.g. -ip beads that will be used to compute the non-rigid transformation.

bigstitcher-spark's People

Contributors

Stargazers

Watchers

Forkers

sharmishtaa jtyoung84 allenneuraldynamics lightspeedmicro mkitti volkerh tpietzsch takashi310 akhanf

bigstitcher-spark's Issues

Using the same parameters to process multi-channel data

Hi all!
we used BigStitcher-Spark for data stitching, which works well for single-channel data. However, when using multi-channel data, I process each channel one by one through a loop. I am not sure if the stitching parameters between different channels are consistent. I want to use one of these channels for data stitching and then apply the stitching parameters of this channel to other channels. I don't know if there is a way to do this.
thanks!

Update build dependency versions

I need to update bigdataviewer-omezarr dependency version, so I'll try to harmonize more or less with the Fiji released versions the other packages in the pom.

Solver does not converge in a detection, matching, solving sequence

If I use IP_N5_XML.zip and do detection, matching, and solving; the solver fails to converge, and the solution is unusable. The solver gives residuals in the order of the dimensions of the dataset. However, if I use the IPs from IP_N5_XML_IPs.zip then do matching and solving, it succeeds.

In terms of number of IPs, RANSAC matches, there are minimal differences, also in bdv the interest points are visually the same in both cases.

Reproduction

As of 2024-04-16 main branch version installation of BigStitcher-Spark, using the IP_N5_XML.zip dataset, the problem can be reproduced with 2 tiles.

Failing scenario

~/BigStitcher-Spark/detect-interestpoints --label=beads -s 1.8 -t 0.008 --xml=dataset.xml --downsampleXY=2 -i0 0 -i1 255 -vi '18,0' -vi '18,1'
~/BigStitcher-Spark/match-interestpoints --clearCorrespondences --label=beads --method=FAST_ROTATION --xml=dataset.xml -vi '18,0' -vi '18,1'
~/BigStitcher-Spark/solver --xml=dataset.xml --sourcePoints=IP --label=beads -vi '18,0' -vi '18,1'

The solver log looks like this:

9997: 378.49471691893393 378.49471691890756
9998: 378.49471691893393 378.49471691890756
9999: 378.49471691893393 378.49471691890756
Concurrent tile optimization loop took 1087 ms, total took 1088 ms
Successfully optimized configuration of 2 tiles after 10000 iterations:
  average displacement: 378.495px
  minimal displacement: 378.495px
  maximal displacement: 378.495px

Succeeding scenario

~/BigStitcher-Spark/detect-interestpoints --label=beads -s 1.8 -t 0.008 --xml=dataset.xml --downsampleXY=2 -i0 0 -i1 255 -vi '18,0' -vi '18,1' --blockSize="1600,1600,1600"
~/BigStitcher-Spark/match-interestpoints --clearCorrespondences --label=beads --method=FAST_ROTATION --xml=dataset.xml -vi '18,0' -vi '18,1'
~/BigStitcher-Spark/solver --xml=dataset.xml --sourcePoints=IP --label=beads -vi '18,0' -vi '18,1'

The solver log looks like this:

200: 0.8092636982485427 0.8092636982485416
201: 0.8092636982485427 0.8092636982485416
Concurrent tile optimization loop took 110 ms, total took 110 ms
Successfully optimized configuration of 2 tiles after 202 iterations:
  average displacement: 0.809px
  minimal displacement: 0.809px
  maximal displacement: 0.809px

Full logs and interestpoint.n5 are available here.

So, the solver succeeds if I use such a large blocksize in the detection step so that all IPs are detected in one spark task. My hypothesis is that in merging of IPs from different blocks pathological values enter the catalog, perhaps NaN?

Can BigStitcher-Spark do fusion on a downsampled resolution (s1)?

Can BigStitcher-Spark do fusion based on s1 instead of s0 resolution?

For large data (20 tiles; 2TB), this would be a very useful feature, as in many cases s1, and even s2 s3 are enough for downstream tasks. I know this can be done by first fusing a s0 image and then generating a multi-resolution pyramid. But it is not clear to me that if we can directly generate a downsampled fused image.

Thanks!

Which file formats are supported?

Hi @StephanPreibisch and all,

first of all thanks for making this tool open source!

We are looking into using this tool for registering channels in large light sheet volumes. Before starting with this I wanted to double check which data formats are supported?

I assume [n5-bdv])https://github.com/bigdataviewer/bigdataviewer-core/blob/master/BDV%20N5%20format.md) is supported?
Any other zarr or n5 based format that is supported? (Looks like NGFF support is comming in #9).
Any other things to be aware when converting the data?

NoSuchMethodError for Gson library

Hi @kgabor,

You mentioned that after adding zarr export support to AffineFusion, you ran into the following error when running a distributed spark instance:

Exception in thread "main" java.lang.NoSuchMethodError: com.google.gson.reflect.TypeToken.getParameterized(Ljava/lang/reflect/Type;[Ljava/lang/reflect/Type;)Lcom/google/gson/reflect/TypeToken;
        at org.janelia.saalfeldlab.n5.zarr.N5ZarrReader.getZArraryAttributes(N5ZarrReader.java:259)
        ...

I think this problem occurs because the Hadoop libraries used by Spark pull in an ancient version of Gson (likely 2.2.4 or similar) and the n5 zarr library (currently) relies upon Gson 2.8.6. Even though Gson 2.8.6 is bundled in the big-stitcher fat jar, the Hadoop stuff is higher in the classpath when running a Spark cluster - so you end up running with ancient Gson. This post describes the issue very nicely.

As the post mentions, the best way to fix this issue is to force Spark to use a newer Gson library by specifying additional --conf arguments when launching spark-submit like this:

/misc/local/spark-3.0.1/bin/spark-submit 
  --deploy-mode client 
  --master spark://... 
  --conf spark.driver.extraClassPath=/groups/scicompsoft/home/trautmane/bigstitcher/gabor/gson-2.8.6.jar
  --conf spark.executor.extraClassPath=/groups/scicompsoft/home/trautmane/bigstitcher/gabor/gson-2.8.6.jar
  ...

You'll need to:

find a gson-2.8.6.jar file - I pulled it from my local maven repo: ${HOME}/.m2/repository/com/google/code/gson/gson/2.8.6/gson-2.8.6.jar,
copy it to a network filesystem location that your spark driver and workers can access, and
then add the path to the spark.driver.extraClassPath and spark.executor.extraClassPath configuration as I did above.

Give this a shot and let me know if it solves the errors you were getting.
I ran a small test case at Janelia and was able to successfully produce a zarr result - so, I'm hopeful this will work for you.

Finally while debugging this problem, I made a few minor tweaks to your commits here that mean you will need to specify -s ZARR (capitalized) instead of your original lower case version if you pull and run with the latest code.

Let me know how it goes,
Eric

Dependencies need version updates in BigStitcher-Spark

For the records, I'm ticketing our discussion with @tpietzsch about the BigStitcher-Spark dependencies on 2023-03-09.

It seems that the dependencies of BigStitcher-Spark are getting too old... I tried to run BigStitcher-Spark main checkout as of 2023-03-08 with with a bigdataviewer-omezarr main branch build.

java -Xmx94g -cp $(cat cp.txt) -Dspark.master=local net.preibisch.bigstitcher.spark.AffineFusion 
-x /home/jupyter/data/exaSPIM_609281_2022-11-03_proc_2022-11-22/affine_ip_alignment.xml 
-o /home/jupyter/test-spark_2023-03-08 --bdv 0,0 --storage N5 
--xmlout=/home/jupyter/fusion_test_2023-03-08/dataset.xml

where cp.txt is generated from the BigStitcher-Spark build (bigdataviewer-omezarr is added manually)

mvn dependency:build-classpath -Dmdep.outputFile=cp.txt

maven builds bigdataviewer-omezarr with jackson 2.12.5 that is incompatible with BigStitcher-Spark
the moving of SharedQueue from bdv.util.volatiles.SharedQueue tobdv.cache.SharedQueue causes NoSuchMethodError at bdv.util.volatiles.VolatileViews.wrapAsVolatile(Lnet/imglib2/RandomAccessibleInterval;Lbdv/cache/SharedQueue;)Lnet/imglib2/RandomAccessibleInterval, as we understood, because bigdataviewer-omezarr is already built with the new version while BigStitcher-Spark may use the old one.

Preserve original data anisotropy?

Hi all,

We've got this working fairly well locally. Still struggling with our school Hadoop cluster, which I think is a config issue.

A lot of our data has anisotropic xy pixel size vs z steps. What is the best way to get the BigStitcher-Spark affine fusion to act the same way as the "preserve original data anisotropy" setting in BigStitcher?

One thought I had was to edit the XML to change the calibration before fusion.

Thanks!

Fusion fails on local Spark instance

Hi @trautmane,

As requested, here are the details on what we are running into trying to fuse a BDV file using BigStitcher-Spark. The plugin was built with the code changes on main, but not fix_bdv_n5 as I got a conflict when I tried to merge the two branches. I've attached the XML as well.

It wasn't totally clear if the extraJavaOptions should be passed to the driver or executors when in local mode, so we tried both. The same error as pasted here message pops up. We also tried allocating more RAM to the executors, same error message as pasted here pops up.

The error usually occurs once >6,000 files within the N5 have been written. In this particular case, ~7,200 files were written.

Please let me know what other information I can provide.

Thanks!
Doug

Linux version

Linux qi2labserver 5.4.0-74-generic #83~18.04.1-Ubuntu SMP Tue May 11 16:01:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Maven and java details

Apache Maven 3.6.0
Maven home: /usr/share/maven
Java version: 1.8.0_312, vendor: Private Build, runtime: /usr/lib/jvm/java-8-openjdk-amd64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.4.0-74-generic", arch: "amd64", family: "unix"

Spark details

22/01/04 11:54:33 WARN Utils: Your hostname, qi2labserver resolves to a loopback address: 127.0.1.1; using 10.206.25.77 instead (on interface enp5s0f0)
22/01/04 11:54:33 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.0
      /_/

Using Scala version 2.12.15, OpenJDK 64-Bit Server VM, 1.8.0_312
Branch HEAD
Compiled by user ubuntu on 2021-10-06T12:46:30Z
Revision 5d45a415f3a29898d92380380cfd82bfc7f579ea
Url https://github.com/apache/spark
Type --help for more information.

Call to Spark

spark-submit --master local[32,8] 
--conf spark.driver.memory=100G 
--conf "spark.executor.extraJavaOptions=-XX:ActiveProcessorCount=1" 
--class net.preibisch.bigstitcher.spark.AffineFusion ~/Documents/github/BigStitcher-Spark/target/BigStitcher-Spark-0.0.1-SNAPSHOT.jar 
-x /mnt/opm2/20210924b/deskew_flatfield_output/bdv/AMC_cy7_test_bdv.xml 
-o /mnt/opm2/20210924b/n5/output.n5 
-d /DAPI/s0 
--channelId 0 
--UINT16 
--minIntensity 0
--maxIntensity 65535

Error message

(Tue Jan 04 11:21:37 MST 2022): Requesting Img from ImgLoader (tp=0, setup=7), using level=0, [1.0 x 1.0 x 1.0]
22/01/04 11:21:37 ERROR Executor: Exception in task 28.0 in stage 0.0 (TID 28)
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:717)
        at net.imglib2.cache.queue.FetcherThreads.<init>(FetcherThreads.java:92)
        at net.imglib2.cache.queue.FetcherThreads.<init>(FetcherThreads.java:70)
        at bdv.img.hdf5.Hdf5ImageLoader.open(Hdf5ImageLoader.java:209)
        at bdv.img.hdf5.Hdf5ImageLoader.<init>(Hdf5ImageLoader.java:158)
        at bdv.img.hdf5.Hdf5ImageLoader.<init>(Hdf5ImageLoader.java:144)
        at bdv.img.hdf5.Hdf5ImageLoader.<init>(Hdf5ImageLoader.java:139)
        at bdv.img.hdf5.XmlIoHdf5ImageLoader.fromXml(XmlIoHdf5ImageLoader.java:70)
        at bdv.img.hdf5.XmlIoHdf5ImageLoader.fromXml(XmlIoHdf5ImageLoader.java:49)
        at mpicbg.spim.data.generic.sequence.XmlIoAbstractSequenceDescription.fromXml(XmlIoAbstractSequenceDescription.java:111)
        at mpicbg.spim.data.generic.XmlIoAbstractSpimData.fromXml(XmlIoAbstractSpimData.java:153)
        at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:164)
        at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:52)
        at mpicbg.spim.data.generic.XmlIoAbstractSpimData.load(XmlIoAbstractSpimData.java:95)
        at net.preibisch.bigstitcher.spark.AffineFusion.lambda$call$c48314ca$1(AffineFusion.java:208)
        at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1(JavaRDDLike.scala:352)
        at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1$adapted(JavaRDDLike.scala:352)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
        at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1012)
        at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1012)
        at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
22/01/04 11:21:37 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 28.0 in stage 0.0 (TID 28),5,main]
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:717)
        at net.imglib2.cache.queue.FetcherThreads.<init>(FetcherThreads.java:92)
        at net.imglib2.cache.queue.FetcherThreads.<init>(FetcherThreads.java:70)
        at bdv.img.hdf5.Hdf5ImageLoader.open(Hdf5ImageLoader.java:209)
        at bdv.img.hdf5.Hdf5ImageLoader.<init>(Hdf5ImageLoader.java:158)
        at bdv.img.hdf5.Hdf5ImageLoader.<init>(Hdf5ImageLoader.java:144)
        at bdv.img.hdf5.Hdf5ImageLoader.<init>(Hdf5ImageLoader.java:139)
        at bdv.img.hdf5.XmlIoHdf5ImageLoader.fromXml(XmlIoHdf5ImageLoader.java:70)
        at bdv.img.hdf5.XmlIoHdf5ImageLoader.fromXml(XmlIoHdf5ImageLoader.java:49)
        at mpicbg.spim.data.generic.sequence.XmlIoAbstractSequenceDescription.fromXml(XmlIoAbstractSequenceDescription.java:111)
        at mpicbg.spim.data.generic.XmlIoAbstractSpimData.fromXml(XmlIoAbstractSpimData.java:153)
        at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:164)
        at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:52)
        at mpicbg.spim.data.generic.XmlIoAbstractSpimData.load(XmlIoAbstractSpimData.java:95)
        at net.preibisch.bigstitcher.spark.AffineFusion.lambda$call$c48314ca$1(AffineFusion.java:208)
        at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1(JavaRDDLike.scala:352)
        at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1$adapted(JavaRDDLike.scala:352)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
        at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1012)
        at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1012)
        at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
22/01/04 11:21:37 ERROR Inbox: An error happened while processing message in the inbox for LocalSchedulerBackendEndpoint
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:717)
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
        at org.apache.spark.scheduler.TaskResultGetter.enqueueFailedTask(TaskResultGetter.scala:137)
        at org.apache.spark.scheduler.TaskSchedulerImpl.liftedTree2$1(TaskSchedulerImpl.scala:817)
        at org.apache.spark.scheduler.TaskSchedulerImpl.statusUpdate(TaskSchedulerImpl.scala:791)
        at org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receive$1.applyOrElse(LocalSchedulerBackend.scala:71)
        at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
        at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Exception in thread "dispatcher-event-loop-30" java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:717)
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
        at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

XML file:
AMC_cy7_test_bdv.zip

OutOfMemoryError caused by creation of too many N5ImageLoader fetcher threads

While working through issue 2 with a larger data set, @boazmohar discovered many OutOfMemoryError: unable to create new native thread exceptions in the worker logs. These exceptions are raised because parallelized RDDs create many N5ImageLoader instances like this one and each N5ImageLoader instance in turn creates Runtime.getRuntime().availableProcessors() fetcher threads.

I reduced some of the fetcher thread creation by reusing loaders in this commit. However, reusing loaders did not completely solve the problem.

I think the best solution is to parameterize the number of fetcher threads in the N5ImageLoader and then explicitly set fetcher thread counts in spark clients. This issue can remain open until that happens or until another long term solution is developed.

In the mean time as a work-around, overriding the default availableProcessors value with a -XX:ActiveProcessorCount=1 JVM directive seems to fix the problem.

More specifically, here are the spark-janelia flintstone.sh environment parameters I used to successfully process @boazmohar 's larger data set:

# --------------------------------------------------------------------
# Default Spark Setup (11 cores per worker)
# --------------------------------------------------------------------
export N_EXECUTORS_PER_NODE=2
export N_CORES_PER_EXECUTOR=5
export N_OVERHEAD_CORES_PER_WORKER=1
# Note: N_CORES_PER_WORKER=$(( (N_EXECUTORS_PER_NODE * N_CORES_PER_EXECUTOR) + N_OVERHEAD_CORES_PER_WORKER ))

# To distribute work evenly, recommended number of tasks/partitions is 3 times the number of cores.
export N_TASKS_PER_EXECUTOR_CORE=3

export N_CORES_DRIVER=1

# setting ActiveProcessorCount to 1 ensures Runtime.availableProcessors() returns 1
export SUBMIT_ARGS="--conf spark.executor.extraJavaOptions=-XX:ActiveProcessorCount=1"

With the limited active processor count and reusing loaders, no OutOfMemory exceptions occur and processing completes much faster. @boazmohar noted that with his original setup, it took 3.5 hours using a Spark cluster with 2011 cores. My run with the parameters above took 7 minutes using 2200 cores (on 200 11-core worker nodes). Boaz's original run might have had other configuration issues, so this isn't necessarily apples-to-apples. Nevertheless, my guess is that his performance was adversely affected by the fetcher thread problem.

Finally, @StephanPreibisch may want to revisit the getTransformedBoundingBox code and any other loading/reading to see if there are other options for reducing/reusing loaded data within the parallelized RDD loops. Broadcast variables might be suitable/helpful for this use case - but I'm not sure.

Output in a way that BigStitcher can open the fused data

This should support downsampling (@trautmane) and an XML and maybe integration of several channels in one XML (@boazmohar).

Maybe we should adjust how we load N5's in BDV @tpietzsch?

Error Could not initialize class ch.systemsx.cisd.hdf5.CharacterEncoding on AffineExport on h5 file

Hi @StephanPreibisch,

I am trying to do an AffineExport with spark:

~/spark-janelia/flintstone.sh 4 \
/groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-0.0.2-SNAPSHOT.jar \ 
net.preibisch.bigstitcher.spark.AffineFusion \
-x '/groups/mousebrainmicro/mousebrainmicro/data/Lightsheet/20210812_AG/ML_Rendering-test/aligned_data.xml' \
-o  '/nrs/svoboda/moharb/test_ML.n5' -d '/s0'

And get this error:

2022-04-21 15:45:37,731 [task-result-getter-0] ERROR [TaskSetManager]: Task 1 in stage 0.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 78, 10.36.107.42, executor 0): java.lang.NoClassDefFoundError: Could not initialize class ch.systemsx.cisd.hdf5.CharacterEncoding
	at ch.systemsx.cisd.hdf5.HDF5BaseReader.<init>(HDF5BaseReader.java:143)
	at ch.systemsx.cisd.hdf5.HDF5BaseReader.<init>(HDF5BaseReader.java:126)
	at ch.systemsx.cisd.hdf5.HDF5ReaderConfigurator.reader(HDF5ReaderConfigurator.java:86)
	at ch.systemsx.cisd.hdf5.HDF5FactoryProvider$HDF5Factory.openForReading(HDF5FactoryProvider.java:54)
	at ch.systemsx.cisd.hdf5.HDF5Factory.openForReading(HDF5Factory.java:55)
	at bdv.img.hdf5.Hdf5ImageLoader.open(Hdf5ImageLoader.java:183)
	at bdv.img.hdf5.Hdf5ImageLoader.getSetupImgLoader(Hdf5ImageLoader.java:381)
	at bdv.img.hdf5.Hdf5ImageLoader.getSetupImgLoader(Hdf5ImageLoader.java:79)
	at net.preibisch.bigstitcher.spark.util.ViewUtil.getTransformedBoundingBox(ViewUtil.java:32)
	at net.preibisch.bigstitcher.spark.AffineFusion.lambda$call$7b7a6284$1(AffineFusion.java:268)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1(JavaRDDLike.scala:351)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1$adapted(JavaRDDLike.scala:351)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:986)
	at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:986)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2139)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

I can open it in Fiji and look at the data with BigStitcher without an issue.
The xml is in:
/groups/mousebrainmicro/mousebrainmicro/data/Lightsheet/20210812_AG/ML_Rendering-test/aligned_data.xml
Any idea what to do?
Found this, might be related.

Thanks,
Boaz

using bigstitcher with slurm

Hi everyone!

I am a sysadmin, trying to help our users run bigstitcher on the HPC cluster. I don't necessarily know how BigStitcher works or what it does, and I am also not too familiar with spark. I was hoping that you could give us a few pointers to how to run this in a distributed mode in slurm?

Here is how I currently run it within a single node

#!/bin/bash
#
#SBATCH --job-name=bs-test # give your job a name
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
#SBATCH --ntasks=8
#SBATCH --time=00:30:00
#SBATCH --mem=16GB

module purge
module load bigstitcher-spark/20231220
affine-fusion -x /software/apps/bigstitcher-spark/20231220/example/test/dataset.xml \
		-o ./test-spark.n5 \
		-d '/ch488/s0' \
		--UINT8 \
		--minIntensity 1 \
		--maxIntensity 254 \
		--channelId 0

How can I tell affine-fusion to distribute across multiple compute nodes (once I request multiple nodes from slurm).

Thanks!

Build failure issue on local server

Hi all,

I just pulled the most release recent (about 10 minutes ago) and tried to build this project on our Linux Mint 19 server.

Running (added flags to get debugging):
mvn -e -X clean package -P fatjar

I get a build error. I have attached the build log at the end of this message.

It looks like it might be a Java version mismatch? Any suggestions on how to correctly build the project? I can change the Java JDK if I know which one to install.

Relevant versions:

mvn -version
Apache Maven 3.6.0
Maven home: /usr/share/maven
Java version: 11.0.13, vendor: Ubuntu, runtime: /usr/lib/jvm/java-11-openjdk-amd64
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.4.0-74-generic", arch: "amd64", family: "unix"

log.txt

Thanks!

Please support OME-NGFF Zarr

BigStitcher spark Dependency problem

I was running BigStitcher Spark on a cluster single-node. At the end of first step "detect interesting point" its throwing an exceptions. Any suggestion ??

I have pasted logs below:

processing tpId=263 setupId=7, [724, 1536, 0] -> [1191, 1919, 46], dimensions (468, 384, 47) of full interval [724, 0, 0] -> [1191, 1919, 46], dimensions (468, 1920, 47)
Total number of jobs: 44571648
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
24/04/05 01:44:11 INFO SparkContext: Running Spark version 3.3.2
24/04/05 01:44:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/04/05 01:44:12 INFO ResourceUtils: ==============================================================
24/04/05 01:44:12 INFO ResourceUtils: No custom resources configured for spark.driver.
24/04/05 01:44:12 INFO ResourceUtils: ==============================================================
24/04/05 01:44:12 INFO SparkContext: Submitted application: SparkInterestPointDetection
24/04/05 01:44:12 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memunt: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
24/04/05 01:44:12 INFO ResourceProfile: Limiting resource is cpu
24/04/05 01:44:12 INFO ResourceProfileManager: Added ResourceProfile id: 0
24/04/05 01:44:13 INFO SecurityManager: Changing view acls to: salam
24/04/05 01:44:13 INFO SecurityManager: Changing modify acls to: salam
24/04/05 01:44:13 INFO SecurityManager: Changing view acls groups to:
24/04/05 01:44:13 INFO SecurityManager: Changing modify acls groups to:
24/04/05 01:44:13 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(salam); groups with view permissiondify permissions: Set()
24/04/05 01:44:13 INFO Utils: Successfully started service 'sparkDriver' on port 37185.
24/04/05 01:44:13 INFO SparkEnv: Registering MapOutputTracker
24/04/05 01:44:13 INFO SparkEnv: Registering BlockManagerMaster
24/04/05 01:44:13 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
24/04/05 01:44:13 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
Exception in thread "main" java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x69ecddc6) cannot access class sun.nio.ch.Direort sun.nio.ch to unnamed module @0x69ecddc6
at org.apache.spark.storage.StorageUtils$.(StorageUtils.scala:213)
at org.apache.spark.storage.StorageUtils$.(StorageUtils.scala)
at org.apache.spark.storage.BlockManagerMasterEndpoint.(BlockManagerMasterEndpoint.scala:114)
at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:353)
at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:290)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:339)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279)
at org.apache.spark.SparkContext.(SparkContext.scala:464)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at net.preibisch.bigstitcher.spark.SparkInterestPointDetection.call(SparkInterestPointDetection.java:276)
at net.preibisch.bigstitcher.spark.SparkInterestPointDetection.call(SparkInterestPointDetection.java:79)
at picocli.CommandLine.executeUserObject(CommandLine.java:2041)
at picocli.CommandLine.access$1500(CommandLine.java:148)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
at picocli.CommandLine.execute(CommandLine.java:2170)
at net.preibisch.bigstitcher.spark.SparkInterestPointDetection.main(SparkInterestPointDetection.java:758)**

Add brightest interestpoint filtering to match-interestpoint step in interestpointsForReg=OVERLAPPING_ONLY mode

In our data, the overlapping tile areas are significantly photobleached due to second acquisition exposure.
We observe a factor of 10 difference in the number of detected interestpoints (with the same intensity and threshold parameters) in the pariwise overlapping area between the tiles. This can disrupt the geometric descriptor based matching.

As interestpoint intensities can be available, we propose the introduction of --pairwiseBrightest=N option for the --interestpointsForReg=OVERLAPPING_ONLY case. If N>0, then in the pairwise matching step, the number of interestpoints should be limited to the min(N,N_ta,N_tb) brightest ones on each tile in the pairwise overlapping area. (N_ta, N_tb is the number of points in the tileA pairwise overlapping area and tileB pairwise overlapping area respectively.) So that the number of interestpoints becomes equal that go into the descriptor based matching and the brightest ones supposed to be the same on both tiles.
(Or maybe min(N, 2* N_ta, 2* N_tb) to allow for some noisy detections.)

Translation only solver crashes on certain datasets

The solver with TRANSLATION model and TWO_ROUND_SIMPLE mode crashes sometimes, depending on the data.

Thu May 16 07:33:34 UTC 2024: Not more than one group left after first round of global opt (all views are connected), this means we are already done.

Final models: 
java.lang.NullPointerException: Cannot invoke "mpicbg.models.Tile.getModel()" because "tile" is null
        at net.preibisch.bigstitcher.spark.Solver.call(Solver.java:297)
        at net.preibisch.bigstitcher.spark.Solver.call(Solver.java:57)
        at picocli.CommandLine.executeUserObject(CommandLine.java:2041)
        at picocli.CommandLine.access$1500(CommandLine.java:148)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
        at picocli.CommandLine.execute(CommandLine.java:2170)
        at net.preibisch.bigstitcher.spark.Solver.main(Solver.java:612)

Full log and example interestpoints.n5 and xml files to reproduce error.

computational complexity of big sticher fusion shows undesirable scaling behaviour with number of tiles

Hi @StephanPreibisch ,

thanks for this new project. Need to set up Spark first, but keen to give this a try.
Saw the announcement on twitter but as I don't have a twitter account I'll ask a related question here:

We ran into issues running fusion with a large number of 2D tiles (not using the Spark version). The fusion step would just take many hours when fusing around 700 individual 2D tiles (mosaic scan of a whole slide). We observed that the scaling behaviour with the number of tiles was very unfortunate (polynomic), where I would expect it should only grow approximately linearly with the number of output pixels.

As I had the impression that Big Stiticher was primarily developed for Light-Sheet data (fewer but much larger volume tiles and not many 2D tiles) this scaling behaviour with the number of tiles might have gone unnoticed?

EDIT to add:

The above behaviour was noticed on the non-Spark version of affine fusion, any chance this has already been fixed with this code?

could not find XmlIoBasicImgLoader implementation for format bdv.n5

Hi @StephanPreibisch,

Tried running this on the Janelia cluster and got this error.

mpicbg.spim.data.SpimDataInstantiationException: could not find XmlIoBasicImgLoader implementation for format bdv.n5

I am using the latest spark-janelia from here

This is how I build the repo and ran it

  [login1 - moharb@e05u15]~>git clone https://github.com/PreibischLab/BigStitcher-Spark.git
Cloning into 'BigStitcher-Spark'...
remote: Enumerating objects: 181, done.
remote: Counting objects: 100% (181/181), done.
remote: Compressing objects: 100% (104/104), done.
remote: Total 181 (delta 69), reused 108 (delta 22), pack-reused 0
Receiving objects: 100% (181/181), 35.31 KiB | 2.35 MiB/s, done.
Resolving deltas: 100% (69/69), done.
[login1 - moharb@e05u15]~>cd BigStitcher-Spark/
[login1 - moharb@e05u15]~/BigStitcher-Spark>~/apache-maven-3.8.4/bin/mvn clean package
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------< net.preibisch:BigStitcher-Spark >-------------------
[INFO] Building BigStitcher Spark 0.0.1-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:3.1.0:clean (default-clean) @ BigStitcher-Spark ---
[INFO]
[INFO] --- maven-enforcer-plugin:3.0.0-M3:enforce (enforce-rules) @ BigStitcher-Spark ---
[INFO] Adding ignore: module-info
[INFO] Adding ignore: META-INF/versions/*/module-info
[INFO] Adding ignore: com.esotericsoftware.kryo.*
[INFO] Adding ignore: com.esotericsoftware.minlog.*
[INFO] Adding ignore: com.esotericsoftware.reflectasm.*
[INFO] Adding ignore: com.google.inject.*
[INFO] Adding ignore: jnr.ffi.*
[INFO] Adding ignore: org.apache.hadoop.yarn.*.package-info
[INFO] Adding ignore: org.apache.spark.unused.UnusedStubClass
[INFO] Adding ignore: org.hibernate.stat.ConcurrentStatisticsImpl
[INFO] Adding ignore: org.jetbrains.kotlin.daemon.common.*
[INFO] Adding ignore: org.junit.runner.Runner
[INFO] Adding ignore: module-info
[INFO] Adding ignore: module-info
[INFO]
[INFO] --- build-helper-maven-plugin:3.0.0:regex-property (sanitize-version) @ BigStitcher-Spark ---
[INFO]
[INFO] --- build-helper-maven-plugin:3.0.0:regex-property (guess-package) @ BigStitcher-Spark ---
[INFO]
[INFO] --- buildnumber-maven-plugin:1.4:create (default) @ BigStitcher-Spark ---
[INFO] Executing: /bin/sh -c cd '/groups/spruston/home/moharb/BigStitcher-Spark' && 'git' 'rev-parse' '--verify' 'HEAD'
[INFO] Working directory: /groups/spruston/home/moharb/BigStitcher-Spark
[INFO] Storing buildNumber: e2b676364526588195f16931e998a7a756ca778b at timestamp: 1640470297294
[INFO] Storing buildScmBranch: main
[INFO]
[INFO] --- scijava-maven-plugin:2.0.0:set-rootdir (set-rootdir) @ BigStitcher-Spark ---
[INFO] Setting rootdir: /groups/spruston/home/moharb/BigStitcher-Spark
[INFO]
[INFO] --- jacoco-maven-plugin:0.8.6:prepare-agent (jacoco-initialize) @ BigStitcher-Spark ---
[WARNING] The artifact xml-apis:xml-apis:jar:2.0.2 has been relocated to xml-apis:xml-apis:jar:1.0.b2
[INFO] argLine set to -javaagent:/groups/spruston/home/moharb/.m2/repository/org/jacoco/org.jacoco.agent/0.8.6/org.jacoco.agent-0.8.6-runtime.jar=destfile=/groups/spruston/home/moharb/BigStitcher-Spark/target/jacoco.exec
[INFO]
[INFO] --- maven-resources-plugin:3.1.0:resources (default-resources) @ BigStitcher-Spark ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /groups/spruston/home/moharb/BigStitcher-Spark/src/main/resources
[INFO]
[INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @ BigStitcher-Spark ---
[INFO] Compiling 6 source files to /groups/spruston/home/moharb/BigStitcher-Spark/target/classes
[INFO]
[INFO] --- maven-resources-plugin:3.1.0:testResources (default-testResources) @ BigStitcher-Spark ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /groups/spruston/home/moharb/BigStitcher-Spark/src/test/resources
[INFO]
[INFO] --- maven-compiler-plugin:3.8.1:testCompile (default-testCompile) @ BigStitcher-Spark ---
[INFO] No sources to compile
[INFO]
[INFO] --- maven-surefire-plugin:2.22.2:test (default-test) @ BigStitcher-Spark ---
[INFO] No tests to run.
[INFO]
[INFO] --- maven-jar-plugin:3.2.0:jar (default-jar) @ BigStitcher-Spark ---
[INFO] Building jar: /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark.jar
[INFO]
[INFO] >>> maven-source-plugin:3.2.1:jar (attach-sources-jar) > generate-sources @ BigStitcher-Spark >>>
[INFO]
[INFO] --- maven-enforcer-plugin:3.0.0-M3:enforce (enforce-rules) @ BigStitcher-Spark ---
[INFO] Adding ignore: module-info
[INFO] Adding ignore: META-INF/versions/*/module-info
[INFO] Adding ignore: com.esotericsoftware.kryo.*
[INFO] Adding ignore: com.esotericsoftware.minlog.*
[INFO] Adding ignore: com.esotericsoftware.reflectasm.*
[INFO] Adding ignore: com.google.inject.*
[INFO] Adding ignore: jnr.ffi.*
[INFO] Adding ignore: org.apache.hadoop.yarn.*.package-info
[INFO] Adding ignore: org.apache.spark.unused.UnusedStubClass
[INFO] Adding ignore: org.hibernate.stat.ConcurrentStatisticsImpl
[INFO] Adding ignore: org.jetbrains.kotlin.daemon.common.*
[INFO] Adding ignore: org.junit.runner.Runner
[INFO] Adding ignore: module-info
[INFO] Adding ignore: module-info
[INFO]
[INFO] --- build-helper-maven-plugin:3.0.0:regex-property (sanitize-version) @ BigStitcher-Spark ---
[INFO]
[INFO] --- build-helper-maven-plugin:3.0.0:regex-property (guess-package) @ BigStitcher-Spark ---
[INFO]
[INFO] --- buildnumber-maven-plugin:1.4:create (default) @ BigStitcher-Spark ---
[INFO]
[INFO] --- scijava-maven-plugin:2.0.0:set-rootdir (set-rootdir) @ BigStitcher-Spark ---
[INFO]
[INFO] --- jacoco-maven-plugin:0.8.6:prepare-agent (jacoco-initialize) @ BigStitcher-Spark ---
[INFO] argLine set to -javaagent:/groups/spruston/home/moharb/.m2/repository/org/jacoco/org.jacoco.agent/0.8.6/org.jacoco.agent-0.8.6-runtime.jar=destfile=/groups/spruston/home/moharb/BigStitcher-Spark/target/jacoco.exec
[INFO]
[INFO] <<< maven-source-plugin:3.2.1:jar (attach-sources-jar) < generate-sources @ BigStitcher-Spark <<<
[INFO]
[INFO]
[INFO] --- maven-source-plugin:3.2.1:jar (attach-sources-jar) @ BigStitcher-Spark ---
[INFO] Building jar: /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-sources.jar
[INFO]
[INFO] --- jacoco-maven-plugin:0.8.6:report (jacoco-site) @ BigStitcher-Spark ---
[INFO] Skipping JaCoCo execution due to missing execution data file.
[INFO]
[INFO] --- maven-assembly-plugin:3.1.1:single (make-assembly) @ BigStitcher-Spark ---
[INFO] Building jar: /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar
[INFO]
[INFO] --- maven-jar-plugin:3.2.0:test-jar (default) @ BigStitcher-Spark ---
[INFO] Skipping packaging of the test-jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  32.978 s
[INFO] Finished at: 2021-12-25T17:12:02-05:00
[INFO] ------------------------------------------------------------------------
[login1 - moharb@e05u15]~/BigStitcher-Spark>TERMINATE=1 RUNTIME=8:00 TMPDIR=~/tmp ~/spark-janelia/flintstone.sh 3 \
> ~/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar \
> net.preibisch.bigstitcher.spark.AffineFusion \
> -x '/nrs/svoboda/moharb/dataset.xml' \
> -o  '/nrs/svoboda/moharb/output3.n5' \
> -d '/GFP/s0' \
> --channelId 0 \
> --UINT8 \
> --minIntensity 1 \
> --maxIntensity 254

On e05u15.int.janelia.org with Python 3.6.10 :: Anaconda, Inc., running:

  /groups/spruston/home/moharb/spark-janelia/spark-janelia  --consolidate_logs --nnodes=3 --gb_per_slot=15 --driverslots=32 --worker_slots=32 --minworkers=1 --hard_runtime=8:00 --submitargs=" --verbose --conf spark.default.parallelism=270 --conf spark.executor.instances=6 --conf spark.executor.cores=5 --conf spark.executor.memory=75g --class net.preibisch.bigstitcher.spark.AffineFusion /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar -x /nrs/svoboda/moharb/dataset.xml -o /nrs/svoboda/moharb/output3.n5 -d /GFP/s0 --channelId 0 --UINT8 --minIntensity 1 --maxIntensity 254" generate-and-launch-run


Created:
  /groups/spruston/home/moharb/.spark/20211225_171249/conf
  /groups/spruston/home/moharb/.spark/20211225_171249/logs
  /groups/spruston/home/moharb/.spark/20211225_171249/scripts

Running:
  /groups/spruston/home/moharb/.spark/20211225_171249/scripts/00-queue-lsf-jobs.sh

Sat Dec 25 17:12:49 EST 2021 [e05u15.int.janelia.org] submitting jobs to scheduler
This job will be billed to svoboda
Job <114095184> is submitted to default queue <local>.
This job will be billed to svoboda
Job <114095185> is submitted to default queue <short>.
This job will be billed to svoboda
Job <114095186> is submitted to default queue <local>.
This job will be billed to svoboda
Job <114095189> is submitted to default queue <local>.
This job will be billed to svoboda
Job <114095190> is submitted to default queue <short>.

Queued jobs are:
JOBID      USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
114095184  moharb  PEND  local      e05u15                  *171249_ma Dec 25 17:12
114095185  moharb  PEND  short      e05u15                  *171249_ur Dec 25 17:12
114095186  moharb  PEND  local      e05u15                  *249_wo[1] Dec 25 17:12
114095186  moharb  PEND  local      e05u15                  *249_wo[2] Dec 25 17:12
114095186  moharb  PEND  local      e05u15                  *249_wo[3] Dec 25 17:12
114095189  moharb  PEND  local      e05u15                  *171249_dr Dec 25 17:12
114095190  moharb  PEND  short      e05u15                  *171249_sd Dec 25 17:12


To get web user interface URL after master has started, run:
  grep "Bound MasterWebUI to" /groups/spruston/home/moharb/.spark/20211225_171249/logs/01-master.log

These are the driver logs with the error

Sat Dec 25 17:13:34 EST 2021 [h07u13] running /misc/local/spark-3.0.1/bin/spark-submit --deploy-mode client --master spark://10.36.107.39:7077  --verbose --conf spark.default.parallelism=270 --conf spark.executor.instances=6 --conf spark.executor.cores=5 --conf spark.executor.memory=75g --class net.preibisch.bigstitcher.spark.AffineFusion /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar -x /nrs/svoboda/moharb/dataset.xml -o /nrs/svoboda/moharb/output3.n5 -d /GFP/s0 --channelId 0 --UINT8 --minIntensity 1 --maxIntensity 254
Using properties file: /groups/spruston/home/moharb/.spark/20211225_171249/conf/spark-defaults.conf
Adding default property: spark.storage.blockManagerHeartBeatMs=30000
Adding default property: spark.driver.maxResultSize=0
Adding default property: spark.kryoserializer.buffer.max=1024m
Adding default property: spark.rpc.askTimeout=300s
Adding default property: spark.driver.memory=479g
Adding default property: spark.submit.deployMode=cluster
Adding default property: spark.rpc.retry.wait=30s
Adding default property: spark.core.connection.ack.wait.timeout=600s
Parsed arguments:
  master                  spark://10.36.107.39:7077
  deployMode              client
  executorMemory          75g
  executorCores           5
  totalExecutorCores      null
  propertiesFile          /groups/spruston/home/moharb/.spark/20211225_171249/conf/spark-defaults.conf
  driverMemory            479g
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            6
  files                   null
  pyFiles                 null
  archives                null
  mainClass               net.preibisch.bigstitcher.spark.AffineFusion
  primaryResource         file:/groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar
  name                    net.preibisch.bigstitcher.spark.AffineFusion
  childArgs               [-x /nrs/svoboda/moharb/dataset.xml -o /nrs/svoboda/moharb/output3.n5 -d /GFP/s0 --channelId 0 --UINT8 --minIntensity 1 --maxIntensity 254]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file /groups/spruston/home/moharb/.spark/20211225_171249/conf/spark-defaults.conf:
  (spark.default.parallelism,270)
  (spark.driver.memory,479g)
  (spark.executor.instances,6)
  (spark.executor.memory,75g)
  (spark.rpc.askTimeout,300s)
  (spark.storage.blockManagerHeartBeatMs,30000)
  (spark.kryoserializer.buffer.max,1024m)
  (spark.submit.deployMode,cluster)
  (spark.core.connection.ack.wait.timeout,600s)
  (spark.driver.maxResultSize,0)
  (spark.rpc.retry.wait,30s)
  (spark.executor.cores,5)

    
2021-12-25 17:13:36,473 [main] WARN [NativeCodeLoader]: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Main class:
net.preibisch.bigstitcher.spark.AffineFusion
Arguments:
-x
/nrs/svoboda/moharb/dataset.xml
-o
/nrs/svoboda/moharb/output3.n5
-d
/GFP/s0
--channelId
0
--UINT8
--minIntensity
1
--maxIntensity
254
Spark config:
(spark.storage.blockManagerHeartBeatMs,30000)
(spark.driver.maxResultSize,0)
(spark.jars,file:/groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar)
(spark.kryoserializer.buffer.max,1024m)
(spark.app.name,net.preibisch.bigstitcher.spark.AffineFusion)
(spark.rpc.askTimeout,300s)
(spark.driver.memory,479g)
(spark.executor.instances,6)
(spark.submit.pyFiles,)
(spark.default.parallelism,270)
(spark.submit.deployMode,client)
(spark.master,spark://10.36.107.39:7077)
(spark.rpc.retry.wait,30s)
(spark.executor.memory,75g)
(spark.executor.cores,5)
(spark.core.connection.ack.wait.timeout,600s)
Classpath elements:
file:/groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar


[-x, /nrs/svoboda/moharb/dataset.xml, -o, /nrs/svoboda/moharb/output3.n5, -d, /GFP/s0, --channelId, 0, --UINT8, --minIntensity, 1, --maxIntensity, 254]
mpicbg.spim.data.SpimDataInstantiationException: could not find XmlIoBasicImgLoader implementation for format bdv.n5
	at mpicbg.spim.data.generic.sequence.ImgLoaders.createXmlIoForFormat(ImgLoaders.java:72)
	at mpicbg.spim.data.generic.sequence.XmlIoAbstractSequenceDescription.fromXml(XmlIoAbstractSequenceDescription.java:110)
	at mpicbg.spim.data.generic.XmlIoAbstractSpimData.fromXml(XmlIoAbstractSpimData.java:153)
	at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:164)
	at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:52)
	at mpicbg.spim.data.generic.XmlIoAbstractSpimData.load(XmlIoAbstractSpimData.java:95)
	at net.preibisch.bigstitcher.spark.AffineFusion.call(AffineFusion.java:94)
	at net.preibisch.bigstitcher.spark.AffineFusion.call(AffineFusion.java:39)
	at picocli.CommandLine.executeUserObject(CommandLine.java:1853)
	at picocli.CommandLine.access$1100(CommandLine.java:145)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2255)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2249)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2213)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2080)
	at picocli.CommandLine.execute(CommandLine.java:1978)
	at net.preibisch.bigstitcher.spark.AffineFusion.main(AffineFusion.java:283)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2021-12-25 17:13:37,007 [Thread-1] INFO [ShutdownHookManager]: Shutdown hook called
2021-12-25 17:13:37,007 [Thread-1] INFO [ShutdownHookManager]: Deleting directory /tmp/spark-fcf876a8-e1e9-4de3-bffe-d05ee3fb0a4a

------------------------------------------------------------
Sender: LSF System <lsfadmin@h07u13>
Subject: Job 114095189: <spark_moharb_20211225_171249_dr> in cluster <Janelia> Exited

Job <spark_moharb_20211225_171249_dr> was submitted from host <e05u15> by user <moharb> in cluster <Janelia> at Sat Dec 25 17:12:49 2021
Job was executed on host(s) <32*h07u13>, in queue <local>, as user <moharb> in cluster <Janelia> at Sat Dec 25 17:13:31 2021
</groups/spruston/home/moharb> was used as the home directory.
</groups/spruston/home/moharb/BigStitcher-Spark> was used as the working directory.
Started at Sat Dec 25 17:13:31 2021
Terminated at Sat Dec 25 17:13:37 2021
Results reported at Sat Dec 25 17:13:37 2021

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
/groups/spruston/home/moharb/.spark/20211225_171249/scripts/04-launch-driver.sh
------------------------------------------------------------

Exited with exit code 1.

Resource usage summary:

    CPU time :                                   3.96 sec.
    Max Memory :                                 11 MB
    Average Memory :                             11.00 MB
    Total Requested Memory :                     491520.00 MB
    Delta Memory :                               491509.00 MB
    Max Swap :                                   -
    Max Processes :                              4
    Max Threads :                                5
    Run time :                                   6 sec.
    Turnaround time :                            48 sec.

The output (if any) is above this job summary.

Same happens for nonRigid, am I doing something wrong?

Thanks!
Boaz

janeliascicomp / bigstitcher-spark Goto Github PK

bigstitcher-spark's Introduction

BigStitcher-Spark

Content

Install and Run

To run it on your local computer

To run it on a compute cluster

Instructions/stories how people set up Spark/BigStitcher-Spark on their respective clusters:

To run it on the cloud

Example Datasets

Usage

Resave Dataset

Pairwise Stitching

Detect Interest Points

Match Interest Points

Solver

Affine Fusion

Non-Rigid Fusion

bigstitcher-spark's People

Contributors

Stargazers

Watchers

Forkers

bigstitcher-spark's Issues

Reproduction

Failing scenario

Succeeding scenario

Recommend Projects

Recommend Topics

Recommend Org

Jobs