GithubHelp home page GithubHelp logo

tbepler / topaz Goto Github PK

View Code? Open in Web Editor NEW
170.0 13.0 62.0 113.7 MB

Pipeline for particle picking in cryo-electron microscopy images using convolutional neural networks trained from positive and unlabeled examples. Also featuring micrograph and tomogram denoising with DNNs.

License: GNU General Public License v3.0

Python 1.66% Dockerfile 0.01% Jupyter Notebook 95.79% HTML 2.54% Singularity 0.01%
cryoem particle-picking machine-learning cnn positive-unlabeled object-detection pytorch topaz topaz-denoise denoising

topaz's Introduction

Python package Documentation Status Anaconda-Server Badge Anaconda-Server Badge

Topaz

A pipeline for particle detection in cryo-electron microscopy images using convolutional neural networks trained from positive and unlabeled examples. Topaz also includes methods for micrograph and tomogram denoising using deep denoising models.

Check out our Discussion section for general help, suggestions, and tips on using Topaz. You can also find our documentation site here.

New in v0.2.5

  • Added Relion integration scripts
  • Topaz extract can now write particle coordinates to one file per input micrograph
  • Added Gaussian filter option for after 3D denoising
  • Added info on Topaz Workshops
  • Topaz GUI update
  • Various bug fixes

New in v0.2.4

  • Added 3D denoising with topaz denoise3d and two pretrained 3D denoising models
  • Added argument for setting number of threads to multithreaded commands
  • Topaz GUI update
  • Various bug fixes

New in v0.2.3

  • Improvements to the pretrained denoising models
  • Topaz now includes pretrained particle picking models
  • Updated tutorials
  • Updated GUI to include denoising commands
  • Denoising paper preprint is available here

New in v0.2.2

  • The Topaz publication is out here
  • Bug fixes and GUI update

New in v0.2.0

  • Topaz now supports the newest versions of pytorch (>= 1.0.0). If you have pytorch installed for an older version of topaz, it will need to be upgraded. See installation instructions for details.
  • Added topaz denoise, a command for denoising micrographs using neural networks.
  • Usability improvements to the GUI.

Prerequisites

  • An Nvidia GPU with CUDA support for GPU acceleration.

  • Basic Unix/Linux knowledge.

Installation

(Recommended) Click here to install using Anaconda

If you do not have the Anaconda python distribution, please install it following the instructions on their website.

We strongly recommend installing Topaz into a separate conda environment. To create a conda environment for Topaz:

conda create -n topaz python=3.6 # or 2.7 if you prefer python 2
source activate topaz # this changes to the topaz conda environment, 'conda activate topaz' can be used with anaconda >= 4.4 if properly configured
# source deactivate # returns to the base conda environment

More information on conda environments can be found here.

Install Topaz

To install the precompiled Topaz package and its dependencies, including pytorch:

conda install topaz -c tbepler -c pytorch

This installs pytorch from the official channel. To install pytorch for specific cuda versions, you will need to add the 'cudatoolkit=X.X' package. E.g. to install pytorch for CUDA 9.0:

conda install cudatoolkit=9.0 -c pytorch

or combined into a single command:

conda install topaz cudatoolkit=9.0 -c tbepler -c pytorch

See here for additional pytorch installation instructions.

That's it! Topaz is now installed in your anaconda environment.

Click here to install using Pip

We strongly recommend installing Topaz into a virtual environment. See installation instructions and user guide for virtualenv.

Install Topaz

To install Topaz for Python 3.X

pip3 install topaz-em

for Python 2.7

pip install topaz-em

See here for additional pytorch installation instructions, including how to install pytorch for specific CUDA versions.

That's it! Topaz is now installed through pip.

Click here to install using Docker

Do you have Docker installed? If not, click here

Linux/MacOS    (command line)

Download and install Docker 1.21 or greater for Linux or MacOS.

Consider using a Docker 'convenience script' to install (search on your OS's Docker installation webpage).

Launch docker according to your Docker engine's instructions, typically docker start.

Note: You must have sudo or root access to install Docker. If you do not wish to run Docker as sudo/root, you need to configure user groups as described here: https://docs.docker.com/install/linux/linux-postinstall/

Windows    (GUI & command line)

Download and install Docker Toolbox for Windows.

Launch Kitematic.

If on first startup Kitematic displays a red error suggesting that you run using VirtualBox, do so.

Note: Docker Toolbox for MacOS has not yet been tested.

What is Docker?

This tutorial explains why Docker is useful.


A Dockerfile is provided to build images with CUDA support. Build from the github repo:

docker build -t topaz https://github.com/tbepler/topaz.git

or download the source code and build from the source directory

git clone https://github.com/tbepler/topaz
cd topaz
docker build -t topaz .

Click here to install using Singularity

A prebuilt Singularity image for Topaz is available here and can be installed with:

singularity pull shub://nysbc/topaz

Then, you can run topaz from within the singularity image with (paths must be changed appropriately):

singularity exec --nv -B /mounted_path:/mounted_path /path/to/singularity/container/topaz_latest.sif /usr/local/conda/bin/topaz

Click here to install from source

Recommended: install Topaz into a virtual Python environment
See https://conda.io/docs/user-guide/tasks/manage-environments.html or https://virtualenv.pypa.io/en/stable/ for setting one up.

Install the dependencies

Tested with python 3.6 and 2.7

  • pytorch (>= 1.0.0)
  • torchvision
  • pillow (>= 6.2.0)
  • numpy (>= 1.11)
  • pandas (>= 0.20.3)
  • scipy (>= 0.19.1)
  • scikit-learn (>= 0.19.0)

Easy installation of dependencies with conda

conda install numpy pandas scikit-learn
conda install -c pytorch pytorch torchvision

For more info on installing pytorch for your CUDA version see https://pytorch.org/get-started/locally/

Download the source code

git clone https://github.com/tbepler/topaz

Install Topaz

Move to the source code directory

cd topaz

By default, this will be the most recent version of the topaz source code. To install a specific older version, checkout that commit. For example, for v0.1.0 of Topaz:

git checkout v0.1.0

Note that older Topaz versions may have different dependencies. Refer to the README for the specific Topaz version.

Install Topaz into your Python path including the topaz command line interface

pip install .

To install for development use

pip install -e .

Topaz is also available through SBGrid.

Tutorial

The tutorials are presented in Jupyter notebooks. Please install Jupyter following the instructions here.

  1. Quick start guide
  2. Complete walkthrough
  3. Cross validation
  4. Micrograph denoising

The tutorial data can be downloaded here.

To run the tutorial steps on your own system, you will need to install Jupyter and matplotlib which is used for visualization.

With Anaconda this can be done with:

conda install jupyter matplotlib

If you installed Topaz using anaconda, make sure these are installed into your Topaz evironment.

User guide

Click here for a description of the Topaz pipeline and its commands

The command line interface is structured as a single entry command (topaz) with different steps defined as subcommands. A general usage guide is provided below with brief instructions for the most important subcommands in the particle picking pipeline.

To see a list of all subcommands with a brief description of each, run topaz --help

Image preprocessing

Downsampling (topaz downsample)

It is recommened to downsample and normalize images prior to model training and prediction.

The downsample script uses the discrete Fourier transform to reduce the spacial resolution of images. It can be used as

topaz downsample --scale={downsampling factor} --output={output image path} {input image path} 
usage: topaz downsample [-h] [-s SCALE] [-o OUTPUT] [-v] file

positional arguments:
  file

optional arguments:
  -h, --help            show this help message and exit
  -s SCALE, --scale SCALE
                        downsampling factor (default: 4)
  -o OUTPUT, --output OUTPUT
                        output file
  -v, --verbose         print info

Normalization (topaz normalize)

The normalize script can then be used to normalize the images. This script fits a two component Gaussian mixture model with an additional scaling multiplier per image to capture carbon pixels and account for differences in exposure. The pixel values are then adjusted by dividing each image by its scaling factor and then subtracting the mean and dividing by the standard deviation of the dominant Gaussian mixture component. It can be used as

topaz normalize --destdir={directory to put normalized images} [list of image files]
usage: topaz normalize [-h] [-s SAMPLE] [--niters NITERS] [--seed SEED]
                       [-o DESTDIR] [-v]
                       files [files ...]

positional arguments:
  files

optional arguments:
  -h, --help            show this help message and exit
  -s SAMPLE, --sample SAMPLE
                        pixel sampling factor for model fit (default: 100)
  --niters NITERS       number of iterations to run for model fit (default:
                        200)
  --seed SEED           random seed for model initialization (default: 1)
  -o DESTDIR, --destdir DESTDIR
                        output directory
  -v, --verbose         verbose output

Single-step preprocessing (topaz preprocess)

Both downsampling and normalization can be performed in one step with the preprocess script.

topaz preprocess --scale={downsampling factor} --destdir={directory to put processed images} [list of image files]
usage: topaz preprocess [-h] [-s SCALE] [-t NUM_WORKERS]
                        [--pixel-sampling PIXEL_SAMPLING] [--niters NITERS]
                        [--seed SEED] -o DESTDIR [-v]
                        files [files ...]

positional arguments:
  files

optional arguments:
  -h, --help            show this help message and exit
  -s SCALE, --scale SCALE
                        rescaling factor for image downsampling (default: 4)
  -t NUM_WORKERS, --num-workers NUM_WORKERS
                        number of processes to use for parallel image
                        downsampling (default: 0)
  --pixel-sampling PIXEL_SAMPLING
                        pixel sampling factor for model fit (default: 100)
  --niters NITERS       number of iterations to run for model fit (default:
                        200)
  --seed SEED           random seed for model initialization (default: 1)
  -o DESTDIR, --destdir DESTDIR
                        output directory
  -v, --verbose         verbose output

Model training

File formats

The training script requires a file listing the image file paths and another listing the particle coordinates. Coordinates index images from the top left. These files should be tab delimited with headers as follows:

image file list

image_name	path
...

particle coordinates

image_name	x_coord	y_coord
...

Train region classifiers with labeled particles (topaz train)

Models are trained using the topaz train command. For a complete list of training arguments, see

topaz train --help

Segmentation and particle extraction

Segmention (topaz segment, optional)

Images can be segmented using the topaz segment command with a trained model.

usage: topaz segment [-h] [-m MODEL] [-o DESTDIR] [-d DEVICE] [-v]
                     paths [paths ...]

positional arguments:
  paths                 paths to image files for processing

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        path to trained classifier
  -o DESTDIR, --destdir DESTDIR
                        output directory
  -d DEVICE, --device DEVICE
                        which device to use, <0 corresponds to CPU (default:
                        GPU if available)
  -v, --verbose         verbose mode

Particle extraction (topaz extract)

Predicted particle coordinates can be extracted directly from saved segmented images (see above) or images can be segmented and particles extracted in one step given a trained model using the topaz extract command.

usage: topaz extract [-h] [-m MODEL] [-r RADIUS] [-t THRESHOLD]
                     [--assignment-radius ASSIGNMENT_RADIUS]
                     [--min-radius MIN_RADIUS] [--max-radius MAX_RADIUS]
                     [--step-radius STEP_RADIUS] [--num-workers NUM_WORKERS]
                     [--targets TARGETS] [--only-validate] [-d DEVICE]
                     [-o OUTPUT]
                     paths [paths ...]

positional arguments:
  paths                 paths to image files for processing

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        path to trained subimage classifier, if no model is
                        supplied input images must already be segmented
  -r RADIUS, --radius RADIUS
                        radius of the regions to extract
  -t THRESHOLD, --threshold THRESHOLD
                        score quantile giving threshold at which to terminate
                        region extraction (default: 0.5)
  --assignment-radius ASSIGNMENT_RADIUS
                        maximum distance between prediction and labeled target
                        allowed for considering them a match (default: same as
                        extraction radius)
  --min-radius MIN_RADIUS
                        minimum radius for region extraction when tuning
                        radius parameter (default: 5)
  --max-radius MAX_RADIUS
                        maximum radius for region extraction when tuning
                        radius parameters (default: 100)
  --step-radius STEP_RADIUS
                        grid size when searching for optimal radius parameter
                        (default: 5)
  --num-workers NUM_WORKERS
                        number of processes to use for extracting in parallel,
                        0 uses main process (default: 0)
  --targets TARGETS     path to file specifying particle coordinates. used to
                        find extraction radius that maximizes the AUPRC
  --only-validate       flag indicating to only calculate validation metrics.
                        does not report full prediction list
  -d DEVICE, --device DEVICE
                        which device to use, <0 corresponds to CPU
  -o OUTPUT, --output OUTPUT
                        file path to write

This script uses the non maxima suppression algorithm to greedily select particle coordinates and remove nearby coordinates from the candidates list. Two additional parameters are involved in this process.

  • radius: coordinates within this parameter of selected coordinates are removed from the candidates list
  • threshold: specifies the score quantile below which extraction stops

The radius parameter can be tuned automatically given a set of known particle coordinates by finding the radius which maximizes the average precision score. In this case, predicted coordinates must be assigned to target coordinates which requires an additional distance threshold (--assignment-radius).

Choosing a final particle list threshold (topaz precision_recall_curve)

Particles extracted using Topaz still have scores associated with them and a final particle list should be determined by choosing particles above some score threshold. The topaz precision_recall_curve command can facilitate this by reporting the precision-recall curve for a list of predicted particle coordinates and a list of known target coordinates. A threshold can then be chosen to optimize the F1 score or for specific recall/precision levels on a heldout set of micrographs.

usage: topaz precision_recall_curve [-h] [--predicted PREDICTED]
                                    [--targets TARGETS] -r ASSIGNMENT_RADIUS

optional arguments:
  -h, --help            show this help message and exit
  --predicted PREDICTED
                        path to file containing predicted particle coordinates
                        with scores
  --targets TARGETS     path to file specifying target particle coordinates
  -r ASSIGNMENT_RADIUS, --assignment-radius ASSIGNMENT_RADIUS
                        maximum distance between prediction and labeled target
                        allowed for considering them a match

Click here for a description of the model architectures, training methods, and training radius

Model architectures

Currently, there are several model architectures available for use as the region classifier

  • resnet8 [receptive field = 71]
  • conv127 [receptive field = 127]
  • conv63 [receptive field = 63]
  • conv31 [receptive field = 31]

ResNet8 gives a good balance of performance and receptive field size. Conv63 and Conv31 can be better choices when less complex models are needed.

The number of units in the base layer can be set with the --units flag. ResNet8 always doubles the number of units when the image is strided during processing. Conv31, Conv63, and Conv127 do not by default, but the --unit-scaling flag can be used to set a multiplicative factor on the number of units when striding occurs.

The pooling scheme can be changed for the conv* models. The default is not to perform any pooling, but max pooling and average pooling can be used by specifying "--pooling=max" or "--pooling=avg".

For a detailed layout of the architectures, use the --describe flag.

Training methods

The PN method option treats every coordinate not labeled as positive (y=1) as negative (y=0) and then optimizes the standard classification objective: $$ \piE_{y=1}[L(g(x),1)] + (1-\pi)E_{y=0}[L(g(x),0)] $$ where $\pi$ is a parameter weighting the positives and negatives, $L$ is the misclassifiaction cost function, and $g(x)$ is the model output.

The GE-binomial method option instead treats coordinates not labeled as positive (y=1) as unlabeled (y=?) and then optimizes an objective including a generalized expectation criteria designed to work well with minibatch SGD.

The GE-KL method option instead treats coordinates not labeled as positive (y=1) as unlabeled (y=?) and then optimizes the objective: $$ E_{y=1}[L(g(x),1)] + \lambdaKL(\pi, E_{y=?}[g(x)]) $$ where $\lambda$ is a slack parameter (--slack flag) that specifies how strongly to weight the KL divergence of the expecation of the classifier over the unlabeled data from $\pi$.

The PU method uses the objective function proposed by Kiryo et al. (2017)

Radius

This sets how many pixels around each particle coordinate are treated as positive, acting as a form of data augmentation. These coordinates follow a distribution that results from which pixel was selected as the particle center when the data was labeled. The radius should be chosen to be large enough that it covers a reasonable region of pixels likely to have been selected but not so large that pixels outside of the particles are labeled as positives.

A user guide is also built into the Topaz GUI.

Integration

Topaz also integrates with RELION, CryoSPARC, Scipion, and Appion. You can find information and tutorials here:

RELION: https://github.com/tbepler/topaz/tree/master/relion_run_topaz

CryoSPARC: https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/deep-picking/deep-picking

Scipion: https://github.com/scipion-em/scipion-em-topaz

References

Topaz

Bepler, T., Morin, A., Rapp, M., Brasch, J., Shapiro, L., Noble, A.J., Berger, B. Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs. Nat Methods 16, 1153–1160 (2019). https://doi.org/10.1038/s41592-019-0575-8

Bibtex

@Article{Bepler2019,
author={Bepler, Tristan
and Morin, Andrew
and Rapp, Micah
and Brasch, Julia
and Shapiro, Lawrence
and Noble, Alex J.
and Berger, Bonnie},
title={Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs},
journal={Nature Methods},
year={2019},
issn={1548-7105},
doi={10.1038/s41592-019-0575-8},
url={https://doi.org/10.1038/s41592-019-0575-8}
}

Topaz-Denoise

Bepler, T., Kelley, K., Noble, A.J., Berger, B. Topaz-Denoise: general deep denoising models for cryoEM and cryoET. Nat Commun 11, 5208 (2020). https://doi.org/10.1038/s41467-020-18952-1

Bibtex

@Article{Bepler2020_topazdenoise,
author={Bepler, Tristan
and Kelley, Kotaro
and Noble, Alex J.
and Berger, Bonnie},
title={Topaz-Denoise: general deep denoising models for cryoEM and cryoET},
journal={Nature Communications},
year={2020},
issn={2041-1723},
doi={10.1038/s41467-020-18952-1},
url={https://doi.org/10.1038/s41467-020-18952-1}
}

Authors

Tristan Bepler

Alex J. Noble

Topaz Workshop

To request a Topaz Workshop for academic or non-academic purposes, send a request to:

<alexjnoble [at] gmail [dot] com> & <tbepler [at] gmail [dot] com>

License

Topaz is open source software released under the GNU General Public License, Version 3.

Bugs & Suggestions

Please report bugs and make specific feature requests and suggestions for improvements as a Github issue.

For general help, questions, suggestions, tips, and installation/setup assistance, please take a look at our new Discussion section.

topaz's People

Contributors

alexjnoble avatar biochem-fan avatar darnellgranberry avatar fullerjamesr avatar guillawme avatar jayjaewonyoo avatar kephale avatar tbepler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

topaz's Issues

bug in running tutorial

I install v0.2.3 when I run the tutorial, as well as my own data, it always come to the error like below:
UserWarning: Couldn't retrieve source code for container of type LinearClassifier. It won't be checked for correctness upon loading.
UserWarning: Couldn't retrieve source code for container of type ResNet8. It won't be checked for correctness upon loading.
UserWarning: Couldn't retrieve source code for container of type Sequential. It won't be checked for correctness upon loading.
UserWarning: Couldn't retrieve source code for container of type BasicConv2d. It won't be checked for correctness upon loading.
UserWarning: Couldn't retrieve source code for container of type Conv2d. It won't be checked for correctness upon loading.
UserWarning: Couldn't retrieve source code for container of type ReLU. It won't be checked for correctness upon loading.
UserWarning: Couldn't retrieve source code for container of type ResidA. It won't be checked for correctness upon loading.
and the program stuck!
please help! Thanks!

Minor documentation

Matplotlib is mentioned as necessary for visualization in the tutorial, but not as a core dependency.

Just running

$ topaz --help

requires matplotlib to be installed.

Not so significant, but I think matplotlib should be listed in the deps for installing from source.

`topaz preprocess --scale 1` produces numpy error

Changing --scale from 1 to 2 solves the problem:

[node48 ~]$ /gpfs/sw/bin/topaz preprocess /my/files/training_images/*mrc --scale 1 --num-workers -1 --format mrc,png --pixel-sampling 25 --niters 200 --seed 1 --verbose --destdir /my/files/training_images/pre

WARNING: Could not find the Nvidia SMI binary to bind into container

[1 of 32] downsampled: gr1_00001gr_00007sq_v02_00006hln_v01_pixelsize63.68

[2 of 32] downsampled: gr1_00021gr_00024sq_v02_00008hln_v01_pixelsize63.68

[3 of 32] downsampled: gr1a_00011gr_00054sq_v02_00003hln_v01_pixelsize63.68

[4 of 32] downsampled: gr1a_00011gr_00055sq_v02_00004hln_v02_pixelsize63.68

[5 of 32] downsampled: gr1a_00017gr_00005sq_v02_00005hln_v01_pixelsize63.68

[6 of 32] downsampled: gr1a_00017gr_00006sq_v02_pixelsize246.24

[7 of 32] downsampled: gr1a_00017gr_00017sq_v01_pixelsize246.24

[8 of 32] downsampled: gr1a_00021gr_00002sq_v02_pixelsize246.24

[9 of 32] downsampled: ps2_00001gr_00021sq_v02_00015hln_pixelsize92.60

[10 of 32] downsampled: ps2_00002gr_pixelsize4069.76

[11 of 32] downsampled: ps2_00006gr_00030sq_v02_00015hln_v01_pixelsize92.60

[12 of 32] downsampled: ps2_00009gr_00015sq_v02_00009hln_pixelsize92.60

[13 of 32] downsampled: ps2_00016gr_00007sq_v01_pixelsize208.88

[14 of 32] downsampled: ps2_00023gr_00022sq_v01_pixelsize208.88

[15 of 32] downsampled: ps2_00023gr_00024sq_pixelsize208.88

[16 of 32] downsampled: ps2_00023gr_pixelsize4069.76

[17 of 32] downsampled: gr9_00030gr_00008sq_v01_00002hl_pixelsize70.68

[18 of 32] downsampled: gr9_00030gr_00011sq_pixelsize370.88

[19 of 32] downsampled: grid1_00009gr_pixelsize4069.76

[20 of 32] downsampled: grid2_00005gr_00001sq_v02_00004hln

[21 of 32] downsampled: grid4_00016gr_00001sq_v02_00002hln_v02_pixelsize130.55

[22 of 32] downsampled: grid4_00018gr_00034sq_pixelsize208.88

[23 of 32] downsampled: grid4_00021gr_00043sq_v01_pixelsize208.88

[24 of 32] downsampled: grid4_00022gr_pixelsize4069.76

[25 of 32] downsampled: grid1_00008gr_pixelsize3327.60

[26 of 32] downsampled: grid1_00012gr_00051sq_pixelsize246.82

[27 of 32] downsampled: grid1_00015gr_00058sq_pixelsize246.82

[28 of 32] downsampled: grid1_00016gr_00002sq_00001hl_pixelsize173.21

[29 of 32] downsampled: grid1_00022gr_00010sq_v01_00002hl_pixelsize_173.21

[30 of 32] downsampled: grid1_00030gr_pixelsize3327.60

[31 of 32] downsampled: gold_00002gr_pixelsize4069.76

[32 of 32] downsampled: negstain_00025gr_pixelsize3240.00

fit scaled GMM, niters=200, sample=25, seed=1

Traceback (most recent call last):

File "/usr/local/anaconda3/envs/topazenv/bin/topaz", line 11, in

load_entry_point('topaz==0.1.0', 'console_scripts', 'topaz')()

File "/usr/local/anaconda3/envs/topazenv/lib/python3.6/site-packages/topaz-0.1.0-py3.6.egg/topaz/main.py", line 144, in main

args.func(args)

File "/usr/local/anaconda3/envs/topazenv/lib/python3.6/site-packages/topaz-0.1.0-py3.6.egg/topaz/commands/preprocess.py", line 120, in main

images, metadata = sgmm_scaling(images, niters, samples, seed, verbose) 

File "/usr/local/anaconda3/envs/topazenv/lib/python3.6/site-packages/topaz-0.1.0-py3.6.egg/topaz/commands/preprocess.py", line 72, in sgmm_scaling

Xsample = [x.ravel()[::sample] for x in X]

File "/usr/local/anaconda3/envs/topazenv/lib/python3.6/site-packages/topaz-0.1.0-py3.6.egg/topaz/commands/preprocess.py", line 72, in

Xsample = [x.ravel()[::sample] for x in X]

AttributeError: 'Image' object has no attribute 'ravel'

[node48 ~]$ /gpfs/sw/bin/topaz preprocess /my/files/training_images/*mrc --scale 2 --num-workers -1 --format mrc,png --pixel-sampling 25 --niters 200 --seed 1 --verbose --destdir /my/files/training_images/pre

WARNING: Could not find the Nvidia SMI binary to bind into container

[1 of 32] downsampled: gr1_00001gr_00007sq_v02_00006hln_v01_pixelsize63.68

[2 of 32] downsampled: gr1_00021gr_00024sq_v02_00008hln_v01_pixelsize63.68

[3 of 32] downsampled: gr1a_00011gr_00054sq_v02_00003hln_v01_pixelsize63.68

[4 of 32] downsampled: gr1a_00011gr_00055sq_v02_00004hln_v02_pixelsize63.68

[5 of 32] downsampled: gr1a_00017gr_00005sq_v02_00005hln_v01_pixelsize63.68

[6 of 32] downsampled: gr1a_00017gr_00006sq_v02_pixelsize246.24

[7 of 32] downsampled: gr1a_00017gr_00017sq_v01_pixelsize246.24

[8 of 32] downsampled: gr1a_00021gr_00002sq_v02_pixelsize246.24

[9 of 32] downsampled: ps2_00001gr_00021sq_v02_00015hln_pixelsize92.60

[10 of 32] downsampled: ps2_00002gr_pixelsize4069.76

[11 of 32] downsampled: ps2_00006gr_00030sq_v02_00015hln_v01_pixelsize92.60

[12 of 32] downsampled: ps2_00009gr_00015sq_v02_00009hln_pixelsize92.60

[13 of 32] downsampled: ps2_00016gr_00007sq_v01_pixelsize208.88

[14 of 32] downsampled: ps2_00023gr_00022sq_v01_pixelsize208.88

[15 of 32] downsampled: ps2_00023gr_00024sq_pixelsize208.88

[16 of 32] downsampled: ps2_00023gr_pixelsize4069.76

[17 of 32] downsampled: gr9_00030gr_00008sq_v01_00002hl_pixelsize70.68

[18 of 32] downsampled: gr9_00030gr_00011sq_pixelsize370.88

[19 of 32] downsampled: grid1_00009gr_pixelsize4069.76

[20 of 32] downsampled: grid2_00005gr_00001sq_v02_00004hln

[21 of 32] downsampled: grid4_00016gr_00001sq_v02_00002hln_v02_pixelsize130.55

[22 of 32] downsampled: grid4_00018gr_00034sq_pixelsize208.88

[23 of 32] downsampled: grid4_00021gr_00043sq_v01_pixelsize208.88

[24 of 32] downsampled: grid4_00022gr_pixelsize4069.76

[25 of 32] downsampled: grid1_00008gr_pixelsize3327.60

[26 of 32] downsampled: grid1_00012gr_00051sq_pixelsize246.82

[27 of 32] downsampled: grid1_00015gr_00058sq_pixelsize246.82

[28 of 32] downsampled: grid1_00016gr_00002sq_00001hl_pixelsize173.21

[29 of 32] downsampled: grid1_00022gr_00010sq_v01_00002hl_pixelsize_173.21

[30 of 32] downsampled: grid1_00030gr_pixelsize3327.60

[31 of 32] downsampled: gold_00002gr_pixelsize4069.76

[32 of 32] downsampled: negstain_00025gr_pixelsize3240.00

fit scaled GMM, niters=200, sample=25, seed=1

[000] logp=-2820459.6133811227

[001] logp=-2806864.78066376

[002] logp=-2806819.454160788

[003] logp=-2806814.8822830096

[004] logp=-2806806.411051859

[005] logp=-2806793.5246113804

[006] logp=-2806787.9846159355

[007] logp=-2806808.928660144

logp tolerance reached

weights: [0.501386273757172, 0.49861372624282796]

means: [4737.694066615208, 4829.18809045471]

variances: [7306609.740475055, 8764106.58159098]

saving: /my/files/training_images/pre/gr1_00001gr_00007sq_v02_00006hln_v01_pixelsize63.68.mrc

saving: /my/files/training_images/pre/gr1_00001gr_00007sq_v02_00006hln_v01_pixelsize63.68.png

saving: /my/files/training_images/pre/gr1_00021gr_00024sq_v02_00008hln_v01_pixelsize63.68.mrc

saving: /my/files/training_images/pre/gr1_00021gr_00024sq_v02_00008hln_v01_pixelsize63.68.png

saving: /my/files/training_images/pre/gr1a_00011gr_00054sq_v02_00003hln_v01_pixelsize63.68.mrc

saving: /my/files/training_images/pre/gr1a_00011gr_00054sq_v02_00003hln_v01_pixelsize63.68.png

saving: /my/files/training_images/pre/gr1a_00011gr_00055sq_v02_00004hln_v02_pixelsize63.68.mrc

saving: /my/files/training_images/pre/gr1a_00011gr_00055sq_v02_00004hln_v02_pixelsize63.68.png

saving: /my/files/training_images/pre/gr1a_00017gr_00005sq_v02_00005hln_v01_pixelsize63.68.mrc

saving: /my/files/training_images/pre/gr1a_00017gr_00005sq_v02_00005hln_v01_pixelsize63.68.png

saving: /my/files/training_images/pre/gr1a_00017gr_00006sq_v02_pixelsize246.24.mrc

saving: /my/files/training_images/pre/gr1a_00017gr_00006sq_v02_pixelsize246.24.png

saving: /my/files/training_images/pre/gr1a_00017gr_00017sq_v01_pixelsize246.24.mrc

saving: /my/files/training_images/pre/gr1a_00017gr_00017sq_v01_pixelsize246.24.png

saving: /my/files/training_images/pre/gr1a_00021gr_00002sq_v02_pixelsize246.24.mrc

saving: /my/files/training_images/pre/gr1a_00021gr_00002sq_v02_pixelsize246.24.png

saving: /my/files/training_images/pre/ps2_00001gr_00021sq_v02_00015hln_pixelsize92.60.mrc

saving: /my/files/training_images/pre/ps2_00001gr_00021sq_v02_00015hln_pixelsize92.60.png

saving: /my/files/training_images/pre/ps2_00002gr_pixelsize4069.76.mrc

saving: /my/files/training_images/pre/ps2_00002gr_pixelsize4069.76.png

saving: /my/files/training_images/pre/ps2_00006gr_00030sq_v02_00015hln_v01_pixelsize92.60.mrc

saving: /my/files/training_images/pre/ps2_00006gr_00030sq_v02_00015hln_v01_pixelsize92.60.png

saving: /my/files/training_images/pre/ps2_00009gr_00015sq_v02_00009hln_pixelsize92.60.mrc

saving: /my/files/training_images/pre/ps2_00009gr_00015sq_v02_00009hln_pixelsize92.60.png

saving: /my/files/training_images/pre/ps2_00016gr_00007sq_v01_pixelsize208.88.mrc

saving: /my/files/training_images/pre/ps2_00016gr_00007sq_v01_pixelsize208.88.png

saving: /my/files/training_images/pre/ps2_00023gr_00022sq_v01_pixelsize208.88.mrc

saving: /my/files/training_images/pre/ps2_00023gr_00022sq_v01_pixelsize208.88.png

saving: /my/files/training_images/pre/ps2_00023gr_00024sq_pixelsize208.88.mrc

saving: /my/files/training_images/pre/ps2_00023gr_00024sq_pixelsize208.88.png

saving: /my/files/training_images/pre/ps2_00023gr_pixelsize4069.76.mrc

saving: /my/files/training_images/pre/ps2_00023gr_pixelsize4069.76.png

saving: /my/files/training_images/pre/gr9_00030gr_00008sq_v01_00002hl_pixelsize70.68.mrc

saving: /my/files/training_images/pre/gr9_00030gr_00008sq_v01_00002hl_pixelsize70.68.png

saving: /my/files/training_images/pre/gr9_00030gr_00011sq_pixelsize370.88.mrc

saving: /my/files/training_images/pre/gr9_00030gr_00011sq_pixelsize370.88.png

saving: /my/files/training_images/pre/grid1_00009gr_pixelsize4069.76.mrc

saving: /my/files/training_images/pre/grid1_00009gr_pixelsize4069.76.png

saving: /my/files/training_images/pre/grid2_00005gr_00001sq_v02_00004hln.mrc

saving: /my/files/training_images/pre/grid2_00005gr_00001sq_v02_00004hln.png

saving: /my/files/training_images/pre/grid4_00016gr_00001sq_v02_00002hln_v02_pixelsize130.55.mrc

saving: /my/files/training_images/pre/grid4_00016gr_00001sq_v02_00002hln_v02_pixelsize130.55.png

saving: /my/files/training_images/pre/grid4_00018gr_00034sq_pixelsize208.88.mrc

saving: /my/files/training_images/pre/grid4_00018gr_00034sq_pixelsize208.88.png

saving: /my/files/training_images/pre/grid4_00021gr_00043sq_v01_pixelsize208.88.mrc

saving: /my/files/training_images/pre/grid4_00021gr_00043sq_v01_pixelsize208.88.png

saving: /my/files/training_images/pre/grid4_00022gr_pixelsize4069.76.mrc

saving: /my/files/training_images/pre/grid4_00022gr_pixelsize4069.76.png

saving: /my/files/training_images/pre/grid1_00008gr_pixelsize3327.60.mrc

saving: /my/files/training_images/pre/grid1_00008gr_pixelsize3327.60.png

saving: /my/files/training_images/pre/grid1_00012gr_00051sq_pixelsize246.82.mrc

saving: /my/files/training_images/pre/grid1_00012gr_00051sq_pixelsize246.82.png

saving: /my/files/training_images/pre/grid1_00015gr_00058sq_pixelsize246.82.mrc

saving: /my/files/training_images/pre/grid1_00015gr_00058sq_pixelsize246.82.png

saving: /my/files/training_images/pre/grid1_00016gr_00002sq_00001hl_pixelsize173.21.mrc

saving: /my/files/training_images/pre/grid1_00016gr_00002sq_00001hl_pixelsize173.21.png

saving: /my/files/training_images/pre/grid1_00022gr_00010sq_v01_00002hl_pixelsize_173.21.mrc

saving: /my/files/training_images/pre/grid1_00022gr_00010sq_v01_00002hl_pixelsize_173.21.png

saving: /my/files/training_images/pre/grid1_00030gr_pixelsize3327.60.mrc

saving: /my/files/training_images/pre/grid1_00030gr_pixelsize3327.60.png

saving: /my/files/training_images/pre/gold_00002gr_pixelsize4069.76.mrc

saving: /my/files/training_images/pre/gold_00002gr_pixelsize4069.76.png

saving: /my/files/training_images/pre/negstain_00025gr_pixelsize3240.00.mrc

saving: /my/files/training_images/pre/negstain_00025gr_pixelsize3240.00.png

saving metadata: /my/files/training_images/pre/metadata.json

[node48 ~]$

Question about cross-validation (choosing 'n' and number of epochs)

I'm trying to find the appropriate parameters for training using the cross-validation walk-through. On a manually picked data-set (~3000 picks), I fond a clear peak in the plot of auprc vs epoch for various n.

However, for a data-set in which I am trying re-pick using a more even particle distribution (had a severe orientation bias on DoG picks). I use ~90,000 picks for training with r=3 on 7.2A/px mics (bin8) of a ~120 A diameter D2 symmetric particle I get:

image

Is this hyperbolic character evidence of something that I messed up or that I need to tune a particular parameter?

If it helps, I expect 300-500 pctls per micrographs based on manually picking a few.

topaz star_particles_threshold command crashed

Try to use this command to keep particles that have a score >= -2 but it crashed.

command:

topaz star_particles_threshold -o particles-t-2.star -t -2 predicted_particles_all_upsampled.star

error message:
Traceback (most recent call last):
File "/home/conda/apps/conda/envs/topaz/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2657, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'ParticleScore'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/conda/apps/conda/envs/topaz/bin/topaz", line 11, in
load_entry_point('topaz==0.1.0', 'console_scripts', 'topaz')()
File "/home/conda/apps/conda/envs/topaz/lib/python3.6/site-packages/topaz/main.py", line 144, in main
args.func(args)
File "/home/conda/apps/conda/envs/topaz/lib/python3.6/site-packages/topaz/commands/star_particles_threshold.py", line 27, in main
particles['ParticleScore'] = [float(s) for s in particles['ParticleScore']]
File "/home/conda/apps/conda/envs/topaz/lib/python3.6/site-packages/pandas/core/frame.py", line 2927, in getitem
indexer = self.columns.get_loc(key)
File "/home/conda/apps/conda/envs/topaz/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'ParticleScore'

Model Training Failed

Dear developers,

The model training has been failed all the time on one of my data sets. There is no error message but "Killed" at the end. Here is the complete input and output of model training:

topaz train -n 100 --num-workers=8 --train-images processed/micrographs/ --train-targets particles.txt --save-prefix=save_model/model -o save_model/model_training.txt

Loading model: resnet8

Model parameters: units=32, dropout=0.0, bn=on

Loading pretrained model: resnet8_u32

Receptive field: 71

Using device=0 with cuda=True

Loaded 6038 training micrographs with 1000 labeled particles

source split p_observed num_positive_regions total_regions

0 train 2.17e-05 29000 1339089526

Specified expected number of particle per micrograph = 100.0

With radius = 3

Setting pi = 0.01307619816301961

minibatch_size=256, epoch_size=1000, num_epochs=10

Killed

Any suggestions?

Thanks in advance.

error upon launching topaz

Hi,

I installed using pip. Upon running topaz -h I get following error:
Traceback (most recent call last):
File "/lmb/home/palcon/.local/bin//topaz", line 10, in
sys.exit(main())
File "/lmb/home/palcon/.local/lib/python2.7/site-packages/topaz/main.py", line 62, in main
import topaz.commands.extract
File "/lmb/home/palcon/.local/lib/python2.7/site-packages/topaz/commands/extract.py", line 16, in
from topaz.algorithms import non_maximum_suppression, match_coordinates
File "/lmb/home/palcon/.local/lib/python2.7/site-packages/topaz/algorithms.py", line 4, in
from scipy.optimize import linear_sum_assignment
ImportError: cannot import name linear_sum_assignment

Regards,
Shabih

Request to add CUDA warning

We have Topaz installed on our cluster with Singularity and we ran into an issue that we solved, but it took some time and could have been solved much quicker with a simple warning in Topaz. Our scenario is that we have nodes with different versions of CUDA: some with 8, some with 9, and some with 10. Topaz runs on GPU fine on the nodes with 9 or 10, but when we run it on a node with 8, it silently falls back to CPU. Well, not completely silently. All you see is this:

using device=0 with cuda=False

When it is working, this says:

using device=0 with cuda=True

Could you add a warning for the user so that when device >= 0, but cuda=False, then Topaz tells you that it is falling back to CPU because the CUDA version is likely not 9+?

Thanks!

Error when importing load_image

Hi Tristan,

I have installed Topaz and can't wait to give it a try to analyze my data!

I'm working through the "complete walkthrough". I have an issue when I do:
from topaz.utils.data.loader import load_image

ImportError Traceback (most recent call last)
in ()
----> 1 from topaz.utils.data.loader import load_image
ImportError: No module named topaz.utils.data.loader
NameError: name 'load_image' is not defined

Following steps work fine till I try to load the micrographs for visualization, since I need to use load_image.

Can you please help me to understand what is missing?

Thanks!

Better RELION integration

I and @scheres are interested in better RELION integration of Topaz.

Several things we wish are:

  • Take a list of micrographs as a file (Issue #47)
    Note that there is a limitation in the length of command line arguments a shell can accept.
  • Input/Output one coordinate STAR file per micrograph
    RELION's ManualPicker writes this and particle displayer and Extract job require this, instead of what topaz convert supports now.
  • Respect the directory structure
    For example, a user might have Dataset1/001.mrc and Dataset2/001.mrc. Currently Topaz only looks at the file name, so these two get mixed up.
  • Process only new files, skipping files when outputs are already present.
    This is useful for an automatic processing loop.

Some of these can be implemented outside Topaz as a separate converter or a wrapper, but I think it is more efficient to have them inside Topaz itself. For example, a wrapper can make a new working directory and makes symbolic links to relevant files and call Topaz, but this can easily get messy.

@alexjnoble Are you working on any of them? (I saw your tweet: https://twitter.com/alexjamesnoble/status/1267000205838364673) If you are too busy to work on them, I can try myself and send a pull request. Do you have something you don't want to have inside Topaz?

Topaz commands not running.

I installed the topaz environment with Python 3.6. When I try to enter a command, I get the following:

topaz --help
'topaz' is not recognized as an internal or external command,
operable program or batch file.
I checked my paths and I am sure I am running on 3.6 so I am not sure what is going on.
Any help would be nice!

segfault from denoising

I'm getting the following error on 0.2.1:

(topaz) bash-4.2$ topaz denoise MotionCorr/job002/Micrographs/KJ071819_43-1_12_19_43.mrc -o MotionCorr/job002/Micrographs/denoise_all/
# using device=0 with cuda=False
# using model: L2
Segmentation fault (core dumped)

These images were collected on a K3 (which are rectangular instead of square). It worked fine for previous images collected on a K2 and Falcon 3.

Allow 'topaz convert' to set STAR file metadata

E.g. accept --voltage, --detector-pixel-size, --magnification options similar to the old 'coordinates_to_star' command. Also, allow rlnMicrographName column to be set to the full file paths of the micrographs for better compatibility with relion.

topaz tutorial error

Our lab just installed Topaz via SBgrid and I just start to learn how to use it. I am following the tutorial data and found that it gave me error on the train step. I am not sure what cause it (I am just copying and pasting things).Thank you in advance for your help!
Below is the output message: # Loading model: resnet8

Model parameters: units=32, dropout=0.0, bn=on

Receptive field: 71

Using device=0 with cuda=True

Loaded 20 training micrographs with 1000 labeled particles

Loaded 10 test micrographs with 500 labeled particles

source split p_observed num_positive_regions total_regions

0 train 0.00654 29000 4435200

0 test 0.00654 14500 2217600

Specified expected number of particle per micrograph = 300.0

With radius = 3

Setting pi = 0.03923160173160173

minibatch_size=256, epoch_size=5000, num_epochs=10

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument

Error when resizing particles with topaz particle_stack

I tried to make a particle stack from topaz coordinate picks with the --resize flag, but I got the following error:
NameError: name 'downsample' is not defined

I fixed it by adding:
from topaz.utils.image import downsample
to particle_stack.py

Can't open the walkthrough

Hi,

I can't open the walkthrough on GitHub, it's giving me this error:

Sorry, something went wrong. Reload?

And it never loads. I'm reading it as a text file but it's a little annoying. Am I missing something? Does it need Jupyter installed?

Best regards,


Ruda Santos, PhD student
Huilin Li Lab, Van Andel Institute

Coverting STAR files to CSV: TypeError: 'NoneType' object has no attribute '__getitem__'

For those looking, while using the topaz star_to_coordinates XXX.star function. I found this little bug while trying to process a star file of the following format:

data_
loop_ 
_rlnMicrographName #1
_rlnCoordinateX #2 
_rlnCoordinateY #3 
2100.mrc  435.000000  1925.000000
2100.mrc  485.000000  1615.000000
...
Traceback (most recent call last):
  File "/usr/local/programs/anaconda2/bin/topaz", line 11, in <module>
    sys.exit(main())
  File "/usr/local/programs/anaconda2/lib/python2.7/site-packages/topaz/main.py", line 116, in main
    args.func(args)
  File "/usr/local/programs/anaconda2/lib/python2.7/site-packages/topaz/commands/star_to_coordinates.py", line 74, in main
    table = table[['rlnMicrographName', 'rlnCoordinateX', 'rlnCoordinateY']]
TypeError: 'NoneType' object has no attribute '__getitem__'format:

I simply edited line 22 in star_to_coordinates.py from

def parse_star(f):
    lines = f.readlines()
    for i in range(len(lines)):
        line = lines[i]
        if line.startswith('data_image'):   # Assumes old format?
            return parse_star_body(lines[i+1:])

to

def parse_star(f):
    lines = f.readlines()
    for i in range(len(lines)):
        line = lines[i]
        if line.startswith('data_'):        # Should fix.
            return parse_star_body(lines[i+1:])

Happy picking!

Is there a way to decrease the restriction distance between predicted particles?

Greetings, I am a researcher working on Cryo-EM 3D reconstruction of microtubule proteins, specifically the doublet microtubules from cilia and flagella. We are currently using topaz to perform automatic picking of filamentous structures. Currently, topaz is picking very well except that there seems to be a fixed minimum distance between neighboring particles. How to achieve a denser extraction and which of the parameters should be changed during the extraction?

Extraction command:
topaz extract -r 40 -m saved_models_full_pick/model_epoch10.sav
-o data/topaz_full_pick/predicted_particles_all.txt
data/processed/training_micrographs/*.mrc

Picking results:
image

Ground truth:
image

Training test dataset failed

Hi

I just installed topaz and tried to run it on the test data but ran into problems with training the model. Any help would be greatly appreciated.

Envs

conda create -n topaz python=3.7

Install

conda install topaz -c tbepler -c pytorch

I also tried installing using conda install topaz cudatoolkit=9.1 -c tbepler -c pytorch was given package not found errors for cudatoolkit=9.1 so tried 9.0

Command

topaz train --train-images /home/mqbpkml3/em/topaz/data/EMPIAR-10025/preprocessed/micros --train-targets /home/mqbpkml3/em/topaz/data/EMPIAR-10025/topaz_picks_3Jun2019_19h59m.csv --k-fold 5 --fold 0 --radius 7 --model resnet8 --image-ext .png --units 32 --dropout 0.0 --bn on --unit-scaling 2 --ngf 32 --method GE-binomial --autoencoder 0 --num-particles 300 --l2 0 --learning-rate 0.0002 --minibatch-size 256 --minibatch-balance 0.0625 --epoch-size 5000 --num-epochs 10 --num-workers -1 --test-batch-size 1 --device 1 --save-prefix /home/mqbpkml3/em/topaz/data/EMPIAR-10025/topaz_output/training/model --output /home/mqbpkml3/em/topaz/data/EMPIAR-10025/topaz_output/training/results.txt

Result

# Loading model: resnet8
# Model parameters: units=32, dropout=0.0, bn=on
# Receptive field: 71
# Using device=1 with cuda=True
# Loaded 30 training micrographs with 80 labeled particles
# Split into 24 train and 6 test micrographs
# source split p_observed num_positive_regions total_regions
# 0 train 0.00181 9638 5322240
# 0 test 0.00168 2235 1330560
# Specified expected number of particle per micrograph = 300.0
# With radius = 7
# Setting pi = 0.20156926406926406
# minibatch_size=256, epoch_size=5000, num_epochs=10
Traceback (most recent call last):
File "/home/mqbpkml3/anaconda3/envs/topaz/bin/topaz", line 11, in <module> load_entry_point('topaz==0.2.0', 'console_scripts', 'topaz')()
File "/home/mqbpkml3/anaconda3/envs/topaz/lib/python3.7/site-packages/topaz/main.py", line 146, in main
args.func(args)
File "/home/mqbpkml3/anaconda3/envs/topaz/lib/python3.7/site-packages/topaz/commands/train.py", line 655, in main
, save_prefix=save_prefix, use_cuda=use_cuda, output=output)
File "/home/mqbpkml3/anaconda3/envs/topaz/lib/python3.7/site-packages/topaz/commands/train.py", line 548, in fit_epochs
, use_cuda=use_cuda, output=output)
File "/home/mqbpkml3/anaconda3/envs/topaz/lib/python3.7/site-packages/topaz/commands/train.py", line 528, in fit_epoch
metrics = step_method.step(X, Y)
File "/home/mqbpkml3/anaconda3/envs/topaz/lib/python3.7/site-packages/topaz/methods.py", line 123, in step
log_binom = scipy.stats.binom.logpmf(np.arange(0,N+1),N,self.pi)
File "/home/mqbpkml3/anaconda3/envs/topaz/lib/python3.7/site-packages/torch/tensor.py", line 458, in __array__
return self.numpy()
TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Problem with data loader

I'm working through the "complete walkthrough" and ran into a snag when I got to the training step.

My input

(base) [himesb@london testTopaz]$ topaz train -n 300 \
>             --num-workers=8 \
>             --train-images data/EMPIAR-10025/processed/image_list_train.txt \
>             --train-targets data/EMPIAR-10025/processed/particles_train.txt \
>             --test-images data/EMPIAR-10025/processed/image_list_test.txt \
>             --test-targets data/EMPIAR-10025/processed/particles_test.txt \
>             --save-prefix=saved_models/EMPIAR-10025/model \
>             -o saved_models/EMPIAR-10025/model_training.txt

Output:

# Loading model: resnet8
# Model parameters: units=32, dropout=0.0, bn=on
# Receptive field: 71
# Using device=0 with cuda=True
# Loaded 20 training micrographs with 1000 labeled particles
# Loaded 10 test micrographs with 500 labeled particles
# source        split   p_observed      num_positive_regions    total_regions
# 0     train   0.00654 29000   4435200
# 0     test    0.00654 14500   2217600
# Specified expected number of particle per micrograph = 300.0
# With radius = 3
# Setting pi = 0.03923160173160173
# minibatch_size=256, epoch_size=5000, num_epochs=10

Error Message

Traceback (most recent call last):
  File "/groups/grigorieff/home/himesb/thirdParty/anaconda3/bin/topaz", line 11, in <module>
    load_entry_point('topaz', 'console_scripts', 'topaz')()
  File "/groups/grigorieff/home/himesb/thirdParty/topaz/topaz/main.py", line 144, in main
    args.func(args)
  File "/groups/grigorieff/home/himesb/thirdParty/topaz/topaz/commands/train.py", line 649, in main
    classifier.width, split, args)
  File "/groups/grigorieff/home/himesb/thirdParty/topaz/topaz/commands/train.py", line 471, in make_data_iterators
    , num_workers=num_workers)
  File "/groups/grigorieff/home/himesb/thirdParty/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 805, in __init__
    batch_sampler = BatchSampler(sampler, batch_size, drop_last)
  File "/groups/grigorieff/home/himesb/thirdParty/anaconda3/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 146, in __init__
    .format(sampler))

ValueError: sampler should be an instance of torch.utils.data.Sampler, but got sampler=<topaz.utils.data.sampler.StratifiedCoordinateSampler object at 0x7f6b74093cf8>

(last line pulled from error msg for emphasis)

Can't extract

Hi,

I ran the tutorial without issue. When I run on my data, the training completes successfully and generates models, but when I run the extraction job, nothing happens - by which I mean topaz uses the GPU, sits there for some time, but writes nothing to the log and no files are generated. Am I doing something obviously dense? And is there a way to make the extraction job more verbose (I can't see a -v flag in the help)? This is the command I used:

topaz extract -r 7 -x 16 -m saved_models/model_epoch10.sav -o topaz/predicted_picks_upsampled.txt processed/mics/*mrc >& log &

Cheers
Oli

THCudaCheck FAIL error

Hi,

I'm trying to run Topaz on a test dataset (50 mic subset of EMPIAR-10288), but I am getting an error at the training step (see attached). This is using an RTX-2080, with CUDA 10.1 on Ubuntu 18.04. Topaz was installed using anaconda. It still seems to be running, so I'm wondering whether I can ignore this error, or if not are there any known solutions?

Cheers
Oli

image

Training the denoise model

Dear developers:

I notice that there is a denoise model added in the v0.2 version. I would like to training from scratch using our own data. My idea is to put the first half of frames of all movie files into directory A and put the last half of frames of all movie files into directory B. Then run the training script using topaz denoise -a A -b B --save-prefix out. (I have 342 image pairs for training and 38 image pairs for validation)

However, the training loss does not decrease much in the 100 epoch and denoising using the final model is much worse than the pretrained model.

Any suggestions would be highly appreciated. Thanks in advance!

error when using topaz convert

Hi Tristan,

I would like to use topaz convert to convert a particle star file to .box files (I want to do a direct comparison with crYOLO using same training data).

When I run the following command:

topaz convert --from star --to box extracted_particles.star --output boxfiles/

I get this error:

  File "/home/user/software/miniconda2/envs/topaz/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'image_name'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/software/miniconda2/envs/topaz/bin/topaz", line 11, in <module>
    load_entry_point('topaz-em==0.2.3', 'console_scripts', 'topaz')()
  File "/home/user/software/miniconda2/envs/topaz/lib/python3.6/site-packages/topaz/main.py", line 146, in main
    args.func(args)
  File "/home/user/software/miniconda2/envs/topaz/lib/python3.6/site-packages/topaz/commands/convert.py", line 165, in main
    coords = file_utils.read_coordinates(path, format=from_forms[i])
  File "/home/user/software/miniconda2/envs/topaz/lib/python3.6/site-packages/topaz/utils/files.py", line 158, in read_coordinates
    table['image_name'] = table['image_name'].apply(strip_ext)
  File "/home/user/software/miniconda2/envs/topaz/lib/python3.6/site-packages/pandas/core/frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/user/software/miniconda2/envs/topaz/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'image_name'

This is using Topaz 0.2.4.
Here are the first few lines of the input star, converted from cryosparc using csparc2star.py from the pyem package:


loop_
_rlnVoltage #1
_rlnSphericalAberration #2
_rlnAmplitudeContrast #3
_rlnOpticsGroup #4
_rlnImagePixelSize #5
_rlnImageDimensionality #6
300.000000 0.001000 0.100000 0 5.500000 2

data_particles

loop_
_rlnImageName #1
_rlnMicrographName #2
_rlnCoordinateX #3
_rlnCoordinateY #4
_rlnDefocusU #5
_rlnDefocusV #6
_rlnDefocusAngle #7
_rlnPhaseShift #8
_rlnOpticsGroup #9
000001@J968/extract/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted_particles.mrc full_data/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted.mrc 1113 1858 32539.451172 26328.779297 268.115906 0.000000 0
000002@J968/extract/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted_particles.mrc full_data/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted.mrc 1870 1743 32699.949219 26489.277344 268.115906 0.000000 0
000003@J968/extract/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted_particles.mrc full_data/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted.mrc 1051 436 32476.630859 26265.958984 268.115906 0.000000 0
000004@J968/extract/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted_particles.mrc full_data/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted.mrc 644 2095 32523.177734 26312.505859 268.115906 0.000000 0
000005@J968/extract/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted_particles.mrc full_data/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted.mrc 2731 320 32772.179688 26561.505859 268.115906 0.000000 0
000006@J968/extract/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted_particles.mrc full_data/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted.mrc 1767 2652 32519.345703 26308.673828 268.115906 0.000000 0
000007@J968/extract/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted_particles.mrc full_data/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted.mrc 1664 794 32617.072266 26406.400391 268.115906 0.000000 0
000008@J968/extract/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted_particles.mrc full_data/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted.mrc 1309 2327 32522.851562 26312.179688 268.115906 0.000000 0
000009@J968/extract/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted_particles.mrc full_data/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted.mrc 933 1053 32516.515625 26305.843750 268.115906 0.000000 0
000010@J968/extract/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted_particles.mrc full_data/19may15d_00018sq_v03_00003hln_00008enn.frames_patch_aligned_doseweighted.mrc 2911 1842 32656.023438 26445.351562 268.115906 0.000000 0~~~

Does topaz need to be able to locate the particle mrc file for this to work?

Installation problem and presence of Unet-3d-10a in v0.2.4

Hi Topaz creators and community.

I have just started using Topaz for cryoET and ran into an installation problem.

I installed Topaz into my conda environment and installation went well but after I activate my Topaz environment, it does not seem to recognize Topaz commands. A screenshot is attached. I guess something is wrong with my [conda?](url
Topaz Installation problem
)

Another question is regarding the presence of Unet-3d-10a in v0.2.4. I guess using the
-m Unet-3d-10a should suffice or do we need to currently download this model from somewhere else?

Thanks for the software and your support.

Cheers,

error when reading topaz denoise3d output

Hi authors,

I use topaz denoise3d to denoise tomogram, and error occurs when I use imod 3dmod to see the output.

  1. It says " ERROR: mrcReadSectionAny -reading data from file. 3dmod :Fatal Error -- while reading image data. System error: Resource temporarily unavaliable " . And when I try to use mrcfile package in python to open the data, it says " ValueError: Expected 2229534720 bytes in data block but could only read 2229534453
  2. the result is wrong, raw tomo(left) and denoised result(right) is here
    image

Thank you !

problems with topaz test

Dear developers I am installing topaz and testing the software i found a problem in convert
the initial steps of the tutorial were fine, it seems to complain about the cuda, any ideas?

/home/guillermo/miniconda3/bin/topaz train -n 400 --num-workers=8 --train-images data/EMPIAR-10025/processed/micrographs/ --train-targets data/EMPIAR-10025/processed/particles.txt --save-prefix=saved_models/EMPIAR-10025/model -o saved_models/EMPIAR-10025/model_training.txt

Loading model: resnet8

Model parameters: units=32, dropout=0.0, bn=on

Loading pretrained model: resnet8_u32

Receptive field: 71

Using device=0 with cuda=True

Loaded 30 training micrographs with 1500 labeled particles

source split p_observed num_positive_regions total_regions

0 train 0.00163 43500 26669790

Specified expected number of particle per micrograph = 400.0

With radius = 3

Setting pi = 0.0130484716977524

minibatch_size=256, epoch_size=1000, num_epochs=10

Traceback (most recent call last):
File "/home/guillermo/miniconda3/bin/topaz", line 11, in
load_entry_point('topaz-em==0.2.3', 'console_scripts', 'topaz')()
File "/home/guillermo/miniconda3/lib/python3.7/site-packages/topaz/main.py", line 146, in main
args.func(args)
File "/home/guillermo/miniconda3/lib/python3.7/site-packages/topaz/commands/train.py", line 685, in main
, save_prefix=save_prefix, use_cuda=use_cuda, output=output)
File "/home/guillermo/miniconda3/lib/python3.7/site-packages/topaz/commands/train.py", line 572, in fit_epochs
, use_cuda=use_cuda, output=output)
File "/home/guillermo/miniconda3/lib/python3.7/site-packages/topaz/commands/train.py", line 552, in fit_epoch
metrics = step_method.step(X, Y)
File "/home/guillermo/miniconda3/lib/python3.7/site-packages/topaz/methods.py", line 103, in step
score = self.model(X).view(-1)
File "/home/guillermo/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/guillermo/miniconda3/lib/python3.7/site-packages/topaz/model/classifier.py", line 28, in forward
z = self.features(x)
File "/home/guillermo/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/guillermo/miniconda3/lib/python3.7/site-packages/topaz/model/features/resnet.py", line 54, in forward
z = self.features(x)
File "/home/guillermo/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/guillermo/miniconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/guillermo/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/guillermo/miniconda3/lib/python3.7/site-packages/topaz/model/features/resnet.py", line 335, in forward
h = self.conv0(x)
File "/home/guillermo/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/guillermo/miniconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 338, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

GPU memory usage for denoising

Hi guys,

I'm having trouble optimizing the memory usage on our P100 GPUs during denoising training.

On K3 (binned not super-resolution images), at 5760 x 4092 32 bit real, the file size is 90 MB. When using a crop size of 800 and a batch size of 10, I get the following error:

RuntimeError: CUDA out of memory. Tried to allocate 782.00 MiB (GPU 0; 15.90 GiB total capacity; 14.61 GiB already allocated; 579.88 MiB free; 47.48 MiB cached)

I would calculate around 730 MB required for 10 images with patches of 800 which agrees with the request in the error message but clearly the 16 GB GPU memory is maxed out. To get training to run, I have to use a crop size 7 and patch of 800 but this still uses close to 15.5 GB of memory.

What is using the remainder of the GPU memory if the request is only ~ 800 MB? The --preload option is not available in Topaz 0.2.2 so it cannot be this.

Any help would be appreciated.

Regards,
Jason
eBIC for Industry
Diamond Light Source

P.S. Loving Topaz! Immensely powerful and fast for difficult projects.

Input micrographs as file

Hi guys, great software :)

It would be great if topaz could take a file with a list of micrographs as input

Is this a possibility at all right now?

Quality of denoising

Hello authors

Many congratulations on producing this impressive denoising pipeline.

I am running into some issues regarding the quality of my denoising, which has not been nearly as good. I am attaching a screenshot of the tomographic slice without denoising and after denoising. There may be something that I am doing wrong.

I am using the Topaz command as suggested in this wonderfully written website: https://emgweb.nysbc.org/topaz.html
So my command for denoising is as follows:
topaz denoise3d Tomogram_full.rec --model unet-3d-20a --device -2 --patch-size 96 --patch-padding 48 --output ./

where Tomogram_full.rec is the 4X binned tomogram in 16 bit.
I am using unet-3d-20a because it is a 4X binned tomogram.
Rest all the parameters are the same as suggested.

Please note that I am not applying a gaussian filter after denoising. This filter, I noticed, is a new feature of the topaz. I still have the older version of the topaz that does not seem to support the --gaussian flag. Initially, I thought that maybe your trained model is not 'specific' enough for our tomograms. So I trained and created my own model. The results with my model are also not as good.

What do you think about my results? Is there something wrong with the way I am doing the denoising?

Thanks and cheers,
Digvijay

4X Binned Tomogram_Afte denoising with unet-3d-20a
4X Binned Tomogram_Before Denoising

Add ETA indicator to topaz preprocess?

Hi,

When running topaz_preprocess to downsample and normalize, it would be nice if it gave an indication of how long it was going to take. I am running it on a large (3300 mic) dataset right now, and with 24 workers it has already taken ~30min, with no indication (apart from running processes) that it is doing anything - no mics written and nothing written to stdout. Is it normal that it does not output the processed micrographs until the end?

Cheers
Oli

Number of threads

Even with topaz preprocess -t 1, the process uses all cores. One has to use OMP_NUM_THREADS to control the actual number of threads. It would be nice if one could specify the number of threads and number of worker processes.

The tutorial 01 is not correct:

-t/--num-workers X sets preprocess to use X threads, this and GPU device are mutually exclusive

Docker build fails

Hi, I tried to build using docker, but got the attached error - thoughts?

Picker looks awesome, would love to give it a try!

Cheers
Oli

image

assert positive_fraction <= pi

So I sometimes get the following error when running topaz train:

Traceback (most recent call last):
File "/usr/local/anaconda3/envs/topazenv/bin/topaz", line 11, in
load_entry_point('topaz==0.1.0', 'console_scripts', 'topaz')()
File "/usr/local/anaconda3/envs/topazenv/lib/python3.6/site-packages/topaz/main.py", line 144, in main
args.func(args)
File "/usr/local/anaconda3/envs/topazenv/lib/python3.6/site-packages/topaz/commands/train.py", line 643, in main
, autoencoder=args.autoencoder
File "/usr/local/anaconda3/envs/topazenv/lib/python3.6/site-packages/topaz/commands/train.py", line 422, in make_training_step_method
assert positive_fraction <= pi
AssertionError

In older versions (where you specified pi), I would decrease the value of radius and that eliminated the error. In current versions, increasing num-particles did the job, though I'm sure tweaking radius would also do it.

Most recently, I saw the error with a radius of 5 and num-particles of 100. Increasing to 200 or 300 got rid of the error, though I definitely don't expect that many particles per micrograph. I seem to have a way around the issue, but just curious as to what ratio of radius to num-particles generates this error.

Training on multi gpus

Can we train the model on multi gpus?
I set

--device=0,1

but that did not work.

Thanks

topaz convert fails with large number of box files

When I run topaz convert with a small number of crYOLO box files as input, it works fine. When I run with a large number (picking a few particle on each of several thousand mics using the following command), I get the attached error message.

topaz convert -s 16 --from box --to star *box -o particles_downscaled.star

Cheers
Oli
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.