GithubHelp home page GithubHelp logo

emory-hiti / niffler Goto Github PK

View Code? Open in Web Editor NEW
84.0 9.0 54.0 3.26 MB

Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.

Home Page: https://emory-hiti.github.io/Niffler/

License: BSD 3-Clause "New" or "Revised" License

JavaScript 3.98% XSLT 2.30% Python 52.35% Java 7.38% Shell 2.30% Gnuplot 0.45% Dockerfile 0.49% CSS 1.77% HTML 27.49% Nextflow 1.49%
machine-learning dicom scanners scanner-utilization pacs pacs-client pacs-connector dicom-server anonymization anonymisation

niffler's People

Contributors

anbhimi avatar birm avatar chima20097 avatar chinvib66 avatar dependabot[bot] avatar imiro avatar jeong-jasonji avatar mehulsinha73 avatar nishchal-007 avatar nitesh639 avatar pavan-bellam avatar pradeeban avatar ramon349 avatar rumbleftw avatar shazam213 avatar yuvraj-wale avatar zmz223 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

niffler's Issues

Niffler commit workflows are faling.

The CI tests that are triggered when we commit, are failing. For example, check -

https://github.com/Emory-HITI/Niffler/actions/runs/2034155913

and

https://github.com/Emory-HITI/Niffler/runs/5675974447?check_suite_focus=true

It fails at the step:

0s
Install Python3
Run yum update -y -q
yum update -y -q
yum install -q -y python3
yum install -q -y python3-pip unzip
shell: sh -e {0}
Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs in mirrorlist
Error: Process completed with exit code 1.

Should be fixed as this is currently rendering the CI commit workflows useless.

Containerize PNG Extraction module

This is an easy candidate for Docker containerization. It will be a helpful feature since dependencies such as gdcm tend to have conflicts when locally installed. A containerized module will solve that challenge.

Running install.sh in the container will install the dependencies. Please do a test on a sample image (such as one that could be obtained from https://www.cancerimagingarchive.net/collections/ ) to confirm it is working, before committing the changes or creating a pull request. Also, please include README instructions on how this could be run as a container so people could already start using your container.

Support both YYYY and YY for accession

PACS stores AccessionNumber as a string with the year in YY format compared to other clinical systems with the year in YYYY format. This variance in representations raises the problem of data quality in the queries.

Naturally, Niffler supports only YY format since that is the DICOM default. However, supporting both YYYY and YY formats will help with diverse extraction queries.

CommonHeadersOnly is not working as expected.

We rarely get all the headers when we run lots of DICOM files through png-extraction. In contrast, if we run just a couple of files, we get more attributes. Let me give an example:

The fields we got when we ran it against June 28th 2021 all MR data:

AccessionNumber,Private Creator,InstanceNumber,InstitutionName,IssuerOfPatientID,Manufacturer,ManufacturerModelName,Modality,PatientBirthDate,PatientID,PatientName,PatientSex,Private tag data,[Unknown],ReferringPhysicianName,SOPClassUID,SOPInstanceUID,SeriesDate,SeriesDescription,SeriesInstanceUID,SeriesNumber,SeriesTime,SpecificCharacterSet,StationName,StudyDate,StudyDescription,StudyID,StudyInstanceUID,StudyTime,[Reject Image Flag],[Significant Flag],[Confidential Flag],[Assigning Authority For Patient ID],file,has_pix_array,category,BitsAllocated,BitsStored,Columns,HighBit,ImageOrientationPatient,ImagePositionPatient,ImageType,PatientPosition,PhotometricInterpretation,PixelRepresentation,PixelSpacing,Rows,SamplesPerPixel,SliceThickness

The fields we got when we ran against just 2 DICOM images from the above set:

AccessionNumber,AcquisitionDate,AcquisitionMatrix,AcquisitionNumber,AcquisitionTime,AngioFlag,BitsAllocated,BitsStored,BodyPartExamined,Columns,ContentDate,ContentTime,DeviceSerialNumber,EchoNumbers,EchoTime,EchoTrainLength,EthnicGroup,FillerOrderNumberImagingServiceRequest,FlipAngle,FrameOfReferenceUID,HighBit,ImageOrientationPatient,ImagePositionPatient,ImageType,ImagedNucleus,ImagingFrequency,InPlanePhaseEncodingDirection,InstanceCreationDate,InstanceCreationTime,InstanceNumber,InstitutionAddress,InstitutionName,InstitutionalDepartmentName,IssuerOfPatientID,LargestImagePixelValue,Laterality,MRAcquisitionType,MagneticFieldStrength,Manufacturer,ManufacturerModelName,Modality,NumberOfAverages,NumberOfPhaseEncodingSteps,OtherPatientIDs,PatientAddress,PatientAge,PatientBirthDate,PatientID,PatientName,PatientPosition,PatientSex,PatientSize,PatientWeight,PercentPhaseFieldOfView,PercentSampling,PerformedProcedureStepDescription,PerformedProcedureStepID,PerformedProcedureStepStartDate,PerformedProcedureStepStartTime,PhotometricInterpretation,PhysiciansOfRecord,PixelBandwidth,PixelRepresentation,PixelSpacing,PregnancyStatus,0_ProcedureCodeSequence_CodeMeaning,0_ProcedureCodeSequence_CodeValue,0_ProcedureCodeSequence_CodingSchemeDesignator,0_ProcedureCodeSequence_CodingSchemeVersion,ProtocolName,0_ReferencedImageSequence_ReferencedSOPClassUID,0_ReferencedImageSequence_ReferencedSOPInstanceUID,1_ReferencedImageSequence_ReferencedSOPClassUID,1_ReferencedImageSequence_ReferencedSOPInstanceUID,2_ReferencedImageSequence_ReferencedSOPClassUID,2_ReferencedImageSequence_ReferencedSOPInstanceUID,0_ReferencedPatientSequence_ReferencedSOPClassUID,0_ReferencedPatientSequence_ReferencedSOPInstanceUID,0_ReferencedStudySequence_ReferencedSOPClassUID,0_ReferencedStudySequence_ReferencedSOPInstanceUID,ReferringPhysicianName,RepetitionTime,0_RequestAttributesSequence_RequestedProcedureID,0_RequestedProcedureCodeSequence_CodeMeaning,0_RequestedProcedureCodeSequence_CodeValue,0_RequestedProcedureCodeSequence_CodingSchemeDesignator,RequestedProcedureDescription,RequestingPhysician,Rows,SAR,SOPClassUID,SOPInstanceUID,SamplesPerPixel,ScanOptions,ScanningSequence,SequenceName,SequenceVariant,SeriesDate,SeriesDescription,SeriesInstanceUID,SeriesNumber,SeriesTime,SliceLocation,SliceThickness,SmallestImagePixelValue,SoftwareVersions,SpacingBetweenSlices,SpecificCharacterSet,StationName,StudyDate,StudyDescription,StudyID,StudyInstanceUID,StudyPriorityID,StudyStatusID,StudyTime,TransmitCoilName,VariableFlipAngleFlag,WindowCenter,WindowCenterWidthExplanation,WindowWidth,dBdt,Private Creator,Private tag data,[Unknown],[CSA Image Header Type],[CSA Image Header Version ??],[SliceMeasurementDuration],[GradientMode],[FlowCompensation],[TablePositionOrigin],[ImaAbsTablePosition],[ImaRelTablePosition],[SlicePosition_PCS],[TimeAfterStart],[SliceResolution],[RealDwellTime],[CSA Image Header Version],[CSA Image Header Info],[CSA Series Header Type],[CSA Series Header Version],[CSA Series Header Info],[Series Workflow Status],[AcquisitionMatrixText],[CoilString],[PATModeText],[PositivePCSDirections],[Reject Image Flag],[Significant Flag],[Confidential Flag],[Assigning Authority For Patient ID],file,has_pix_array,category,AdmissionID,0_AnatomicRegionSequence_CodeMeaning,0_AnatomicRegionSequence_CodeValue,0_AnatomicRegionSequence_CodingSchemeDesignator,BeatRejectionFlag,CardiacNumberOfImages,HeartRate,ImagesInAcquisition,InStackPositionNumber,InversionTime,PerformedLocation,PerformedStationName,ReceiveCoilName,ReconstructionDiameter,0_ReferencedPerformedProcedureStepSequence_ReferencedSOPClassUID,0_ReferencedPerformedProcedureStepSequence_ReferencedSOPInstanceUID,StackID,TriggerWindow,[Suite id],[Product id],[Image actual date],[Service id],[Mobile location number],[Equipment UID],[Actual series data time stamp],[Horiz. Frame of ref.],[Series contrast],[Last pseq],[Series plane],[First scan ras],[First scan location],[Last scan ras],[Last scan loc],[Display field of view],[Acquisition Duration],[Second echo],[Number of echoes],[Table delta],[Contiguous],[Peak SAR],[Cardiac repetition time],[Images per cardiac cycle],[Actual receive gain analog],[Actual receive gain digital],[Delay after trigger],[Swappf],[Pause Interval],[Pause Time],[Slice offset on freq axis],[Auto Prescan Center Frequency],[Auto Prescan Transmit Gain],[Auto Prescan Analog receiver gain],[Auto Prescan Digital receiver gain],[Bitmap defining CVs],[Pulse Sequence Mode],[Pulse Sequence Name],[Pulse Sequence Date],[Internal Pulse Sequence Name],[Transmitting Coil Type],[Surface Coil Type],[Extremity Coil flag],[Raw data run number],[Calibrated Field strength],[SAT fat/water/bone],[User data 0],[User data 1],[User data 2],[User data 3],[User data 4],[User data 5],[User data 6],[User data 7],[User data 8],[User data 9],[User data 10],[User data 11],[User data 12],[User data 13],[User data 14],[User data 15],[User data 16],[User data 17],[User data 18],[User data 19],[User data 20],[User data 21],[User data 22],[Projection angle],[Saturation planes],[SAT location R],[SAT location L],[SAT location A],[SAT location P],[SAT location H],[SAT location F],[SAT thickness R/L],[SAT thickness A/P],[SAT thickness H/F],[Phase Contrast flow axis],[Velocity encoding],[Thickness disclaimer],[Prescan type],[Prescan status],[Projection Algorithm],[Fractional echo],[Cardiac phase number],[Variable echoflag],"[Concatenated SAT {# DTI Diffusion Dir., release 9.0 & below}]","[User data 23 {# DTI Diffusion Dir., release 9.0 & below}]","[User data 24 {# DTI Diffusion Dir., release 10.0 & above}]",[Velocity Encode Scale],[Fast phases],[Transmit gain],[Series from which prescribed],[Image from which prescribed],[Screen Format],[Locations in acquisition],[Graphically prescribed],[Rotation from source x rot],[Rotation from source y rot],[Rotation from source z rot],[Num 3D slabs],[Locs per 3D slab],[Overlaps],[Image Filtering 0.5/0.2T],[Diffusion direction],[Tagging Flip Angle],[Tagging Orientation],[Tag Spacing],[RTIA_timer],[Fps],[Auto window/level alpha],[Auto window/level beta],[Auto window/level window],[Auto window/level level],[Start time(secs) in first axial],[No. of updates to header],[Indicates study has complete info (DICOM/genesis)],[Last pulse sequence used],[Images in Series],[Landmark Counter],[Number of Acquisitions],[Indicates no. of updates to header],[Series Complete Flag],[Number of images archived],[Last image number used],[Primary Receiver Suite and Host],[Protocol Data Block (compressed)],[Image archive flag],[Scout Type],[Imaging Mode],[Pulse Sequence],[Imaging Options],[Plane Type],[RAS letter of image location],[Image location],[Image dimension - X],[Image dimension - Y],[Number of Excitations],[Lower range of Pixels1],[Upper range of Pixels1],[Lower range of Pixels2],[Upper range of Pixels2],[Version of the hdr struct],[Advantage comp. Overflow],[Advantage comp. Underflow],[Bitmap of prescan options],[Gradient offset in X],[Gradient offset in Y],[Gradient offset in Z],[Number of EPI shots],[Views per segment],"[Respiratory rate, bpm]",[Respiratory trigger point],[Type of receiver used],[DB/dt Peak rate of change of gradient field],[dB/dt Limits in units of percent],[PSD estimated limit],[PSD estimated limit in tesla per second],[Window value],[GE image integrity],[Level value],[Unique image iden],[Histogram tables],[User defined data],[Effective echo spacing],[Filter Mode (String slop field 1 in legacy GE MR],"[Image Type (real, imaginary, phase, magnitude)]",[Vas collapse flag],[Vas flags],[Neg_scanspacing],[Offset Frequency],[User_usage_tag],[User_fill_map_MSW],[User_fill_map_LSW],[User data 25...User data 48 {User48=Effective Resolution for spiral}],[Slop_int_6... slop_int_9],[Slop_int_10...slop_int_17],[Scanner Study Entity UID],[Scanner Study ID],[Scanner Table Entry (single gradient coil systems only)/Scanner Table Entry + Gradient Coil Selected],[Recon mode flag word],[Coil ID Data],[GE Coil Name],[System Configuration Information],[Asset R Factors],[Additional Asset Data],"[Governing Body, dB/dt, and SAR definition]",[Private In-Plane Phase Encoding Direction],[SAR Definition],[SAR value],[Prescan Reuse String],[Content Qualification],[Image Filtering Parameters],[Rx Stack Identification]

In both cases, we set as below:
{
"DICOMHome": "/opt/localdrive/Niffler/modules/cold-extraction/june28",
"OutputDirectory": "June28_2021",
"Depth": 3,
"SplitIntoChunks": 1000,
"PrintImages": false,
"CommonHeadersOnly": false,
"PublicHeadersOnly": false,
"SpecificHeadersOnly": false,
"UseProcesses": 0,
"FlattenedToLevel": "patient",
"is16Bit":true,
"SendEmail": true,
"YourEmail": "[email protected]"
}
{
"DICOMHome": "/Users/pradeeban/Downloads/onedrive",
"OutputDirectory": "/Users/pradeeban/Downloads/onedrive/meta1",
"Depth": 0,
"SplitIntoChunks": 1,
"PrintImages": false,
"CommonHeadersOnly": false,
"PublicHeadersOnly": false,
"SpecificHeadersOnly": false,
"UseProcesses": 0,
"FlattenedToLevel": "patient",
"is16Bit":true,
"SendEmail": true,
"YourEmail": "[email protected]"
}

png-extractor module not working in arch linux

I have installed all the python dependencies, gdcm (yay -S gdcm). Then I placed dicom image and modified config as required. It generates all the folders but generates no png file and shows no error. I expect a png image in my output directory.
I am attaching ImageExtractor.out, i.e. the log file with it.
It shows there are no files in that folder though there are files present.

ImageExtractor.out.txt
ImageExtractor.pickle.txt

python that I have is 3.10.4

"any" and "any_any" cold-extraction modes.

Refine the modes from Niffler-0.7.x to Niffler-0.8.x:
empi: REMAINS THE SAME.
empi_accession: REMAINS THE SAME
accession: REMAINS THE SAME
any_any: CHANGED FROM empi_date (defaults to empi_date in the default config.json)
any: new (defaults to accession in the default config.json)

You will be able to query and retrieve based on "any" single DICOM property. No need to use this mode for EMPI based extractions as this is inefficient for that. Such extractions do not need FINDSCU (hence the decision to keep "empi" as the first mode). For backward compatibility, we keep the "accession" mode as well.

I tested "any" mode on a given date (AcquisitionDate) and it works.

For any_any:

You will be able to query and retrieve based on any two DICOM properties. No need to use this mode for EMPI-Accession based extractions as this is inefficient for that. Such extractions do not need FINDSCU (hence the decision to keep "empi_accession" as the second mode).

I tested "any_any" mode on a given date (AcquisitionDate) for a given modality (MR) and it works.

I cannot test for all the metadata headers. But with time as you all use Niffler-0.8.x or later, you will know how generic these two modes (any and any_any) actually are.

PNG-Extraction bugs: Missing Imports and Undeclared Variable Errors

Describe the bug
In the experimental slurm based implementation of png-extraction, there are two issues: the script is missing the subprocess import, and the t_start variable used is not declared.

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'modules/png-extraction/ImageExtractorSlurm.py'
  2. Scroll down and checkout line 340 and 343
  3. See error

Expected behavior
The script should not contain any errors.

Environment (please complete the following information):

  • OS: MacOS

Additional context
While I was checking out how the module is containerized, I came across these errors.

Rethinking how Niffler handles multiple query sources at once

Currently, we support querying multiple sources by running ColdDataRetriever twice (or more times), by changing these parameters in the system.json.

  • NifflerID: The ID of the current execution. Default is 1. You must increment the second execution to 2, so that the logs are properly stored in niffler1.log and niffler2.log.

  • MaxNifflerProcesses: How many Niffler processes can run in parallel. Make sure each execution has its own SrcAet properly configured. Each SrcAet can run only once.

But the same Query AET can run multiple queries to multiple source AETs. So, we may rethink our approach and fix the implementation.

This will be especially useful if all the SourceAET only know and accept that one particular Niffler instance, defined by a Hostnmae, port, and AET as something like (x.x.x.x, 4242, QBNIFFLER).

Anaconda reinstallation

Description
Anaconda is downloaded and installed again when install.sh is run in a system which already has anaconda installed.

To Reproduce
Steps to reproduce the behavior:

  1. Have Anaconda installed already in the system.
  2. Clone the repository.
  3. Run install.sh

Expected behavior
The install script should skip installing Anaconda.

Environment:

  • OS: Arch Linux x86_64, Kernel: 6.1.12-arch1-1

Packaging Niffler with pip

This is slightly tricky since Niffler is not just Python.

Among the modules, the below are in Python:

  • cold-extraction
  • dicom-anonymization
  • meta-extraction
  • nifti-extraction
  • png-extraction
  • rta-extraction
  • suvpar

Although the above could be packaged with pip following [1], there are complex dependencies such as DCM4CHE as storescp for cold-extraction and meta-extraction, mongo for meta-extraction, and gdcm for png-extraction. Additional care must be taken on initiatives on packaging Niffler as an installable configuration to ensure this is complete.

[1] https://packaging.python.org/en/latest/tutorials/packaging-projects/

Running ImageExtractor.py is not finding .dcm files in target directory

Describe the bug
Running ImageExtractor.py is not finding .dcm files in target directory

To Reproduce
Downloaded ImageExtractor.py & config.json, installed all dependencies, configured config.json, and ran ImageExtractor.py

Expected behaviour
Expect it to discover dicom files in target directory and convert to png

Logs or screenshots

$ ls /home/xxxxxxxx/Desktop/Antares/patient/opt
OP000000.dcm

$ file /home/xxxxxxxx/Desktop/Antares/patient/opt/OP000000.dcm 
/home/xxxxxxxx/Desktop/Antares/patient/opt/OP000000.dcm: DICOM medical imaging data

$ cat config.json 
{
	"DICOMHome": "/home/xxxxxxxx/Desktop/Antares/patient/opt",
	"OutputDirectory": "png",
	"Depth": 0,
	"SplitIntoChunks": 1,
	"PrintImages": true,
	"CommonHeadersOnly": false,
	"PublicHeadersOnly": true,
	"SpecificHeadersOnly": false,
	"UseProcesses": 0,
	"FlattenedToLevel": "patient",
	"is16Bit":false,
	"SendEmail": true,
	"YourEmail": "[email protected]"
}

$ python3 ImageExtractor.py 
$ cat png/ImageExtractor.out
INFO:root:------- Values Initialization DONE -------
INFO:root:Number of dicom files: 0
ERROR:root:There is no file present in the given folder in /home/xxxxxxxx/Desktop/Antares/patient/opt/*.dcm

Environment (please complete the following information):

  • OS: NAME="Linux Mint"
    VERSION="21 (Vanessa)"
  • Python Version - 3.10.6
  • Niffler Version - Niffler-0.9.3
  • Any other dependencies (depending on the module). - All other dependencies are me

Random PNG Extraction Errors

The png-extraction modules occasionally throw the below errors.

  • Multiprocessing Error

  • No error but metadata file is not created - Process Killed.

  • RuntimeWarning: invalid value encountered in true_divide

They are hard to reproduce. But recording here for future reference.

file column is nan for certain images.

description

The snippet of PNG extraction has a slight issue. Bellow are lines 182-187 of ImageExtractor.py. It imposes a limit on Dicom tags extracted. A file that exceeds that limit is moved to failed cases folder, yet its metadata is still preserved but does not record the ‘file’ or ‘has_pix_array’ attributes which would be useful for backtracking.

if len(kv) > dicom_tags_limit:
logging.debug(str(len(kv)) + " dicom tags produced by " + ff)
copyfile(ff, output_directory + '/failed-dicom/5/' + os.path.basename(ff))
else:
kv.append(('file', f_list_elem[1])) # adds my custom field with the original filepath
kv.append(('has_pix_array', c)) # adds my custom field with if file has image

Repduce

Would require the extraction of a DCM file who's tags are over 800

Possible solution

file and has_pix_array should be stored outside the if/else block

if len(kv) > dicom_tags_limit:
logging.debug(str(len(kv)) + " dicom tags produced by " + ff)
copyfile(ff, output_directory + '/failed-dicom/5/' + os.path.basename(ff))

kv.append(('file', f_list_elem[1])) # adds my custom field with the original filepath
kv.append(('has_pix_array', c)) # adds my custom field with if file has image

syntax error in the ImageExtractorNifti.py, line 206

Describe the bug
An error message pops up "Statements must be separated by newlines or semicolons" in the ImageExtractorNifti.py file of the nifty-extraction module at line 206.

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'modules/nifti-extraction/ImageExtractorNifti.py''
  2. Scroll down to 'line 206'
  3. See error

Expected behavior
There shouldn't be any syntax error

Environment (please complete the following information):

  • OS: MacOS

Better configurations for Destination AE to be different from Query AE

Niffler will need some modification that when we use an external AE to store the images, storescp does not need to be stared locally. I will create a bug report for that. But it shouldn't cause any trouble. It is just Niffler also creating a storescp unnecessarily for an iteration where we just want to send the data to another AE's storescp.

A solution will be, a system.json parameter, "IsExternalAE" and if it is true, not to start the local storescp process.

Support for custom Non-DICOM Time attributes

The following 3 attribute options (Month, Year, Days) are proposed.

  1. StudyMonth: (example, 2012 February) 201202. A for loop of 31.
  2. StudyYear: (example, 2013). 2013. A for loop of 31 * 12.
    The above 2 just assumes 31 days for all the months for simplicity. A C-FIND for 20120231 will return nothing. This is a quick and dirty approach. We should probably convert to date and handle better, especially to support the below case.

and slightly more complex:
study interval to retrieve images provided as a pair of entries
3) StudyStartDate,
4) Days.

(example:

StudyStartDate: 20130420,
Days: 15)

This fetches images for the 15 days, starting from (and including) 20130420.

In this issue report, StudyMonth is more important and easier to implement.

macOS_install.sh script for quick installation of dependancies for MacOS

Is your feature request related to a problem? Please describe.
The macOS install script is designed to address the problem of installing the necessary dependencies for the Niffler application on a macOS system quickly and efficiently. Without this script, users may have difficulty installing and configuring the required dependencies.

Describe the solution you'd like
a clear and concise solution would be to create a macOS equivalent script of the already in place install.sh script(which is for linux OS) that automates the installation process of all the necessary dependencies for the Niffler application on macOS. Additionally, it should create and configure the necessary service files (plist files) to ensure that the Niffler application and its components start automatically on system startup. The end result would be a hassle-free and straightforward installation process for Niffler on macOS.

Describe alternatives you've considered
An alternate solution could be a script that wound create an installable package for macOS as discussed here #374

Pause and Resume png-extraction

Currently, when the on-demand extraction fails in the middle, and when we restart it starts from the beginning.

This pause-and-resume feature should be for all 3 cases (DICOM pulls, png conversion, and metadata extraction).

In the real-time metadata extraction, we have a pickle file to keep track of the progress. I need to incorporate the same mechanism, as it seems to work well.

Enable the feature to extract only a certain DICOM headers in png-extraction

As of Niffler-0.8.5, png-extraction extracts all the metadata. The only filtering options are, getting all the images, getting the common headers, and getting only the public headers, as supported by the config.json options:

"CommonHeadersOnly": false,
"PublicHeadersOnly": true,

Enabling extracting only a certain subset of DICOM headers (both private and public), as supported by the meta-extraction module through featureset.txt files as in https://github.com/Emory-HITI/Niffler/blob/dev/modules/meta-extraction/conf/featureset1.txt is an alternative.

This option makes the png-extraction module more efficient when we just need certain headers stored in the output CSV file.

We add a new header (which is by default, false)

"SpecificHeadersOnly": false,
"CommonHeadersOnly": false,
"PublicHeadersOnly": true,

You could copy-paste https://github.com/Emory-HITI/Niffler/blob/dev/modules/meta-extraction/conf/featureset1.txt as the default featureset.txt for this module, but also add PhotometricInterpretation into this new featureset.txt.

In this way, there is always a featureset.txt.

We can make certain requirements from the users from what is an accepted featureset.txt. A featureset.txt must always consist of the below 4 fields as they are mandatory for the png-extraction.

PhotometricInterpretation
PatientID
StudyInstanceUID
SeriesInstanceUID

Just mention this in the README. That is sufficient.

When you implement the feature, just make sure to test for all the below 3 cases:
1.

"SpecificHeadersOnly": false,
"CommonHeadersOnly": false,
"PublicHeadersOnly": true,

This is the current default. This pulls all the public headers.

"SpecificHeadersOnly": false,
"CommonHeadersOnly": false,
"PublicHeadersOnly": false,

This pulls all the public and private headers.

"SpecificHeadersOnly": true,
"CommonHeadersOnly": false,
"PublicHeadersOnly": false,

This is currently not implemented. When SpecificHeadersOnly is set to true, it will ignore CommonHeadersOnly and PublicHeadersOnly tags, and extract the public and private tags mentioned in the featureset.txt (regardless whether they are public, private, or uncommon).

So, essentially, the below are all the same:

"SpecificHeadersOnly": true,
"CommonHeadersOnly": false,
"PublicHeadersOnly": false,

and

"SpecificHeadersOnly": true,
"CommonHeadersOnly": false,
"PublicHeadersOnly": true,

and

"SpecificHeadersOnly": true,
"CommonHeadersOnly": true,
"PublicHeadersOnly": false,

and

"SpecificHeadersOnly": true,
"CommonHeadersOnly": true,
"PublicHeadersOnly": true,

MemoryError while extracting metadata using PNG Extraction Module

While extracting metadata using the PNG Extraction Module (CommonHeadersOnly - true), all the DICOM tags are collected stored in a CSV format using pandas.

The memory error occurs because - in some instances, the DICOM Images could have a large no. of tags than expected which creates a sparse CSV file resulting in a large data frame that cannot be saved using pandas.

The error can be fixed by transferring such DICOM files (which have a large no. of tags) into a new folder in failed-dicom folder in the same module.

A composable workflow module

Currently, the processes are scheduled manually or the workflows are developed manually per each execution. This workflow module will enable chaining of processes:

One common workflow is, cold-extraction -> png-extraction, which will retrieve the images and then convert the DICOM images to PNG images and extract metadata.

Another workflow we use is for scanner utilization. It is, cold-extraction -> png-extraction -> scanner-util.

The aim is to reduce human-in-the-loop in several repeated execution patterns we have.

Containerizing Niffler

Currently, Niffler png-extraction is containerized (#262). However, other modules are not containerized. What we propose is a global Niffler container from where different modules can be executed.

While this can be a challenging undertaking (considering the multiple modules of Niffler and dependencies on our sibling frameworks such as Eaglescope for the visualizations), an attempt at this will help us expand the user base and also will help us improve the framework itself in its compactness.

End the Niffler process gracefully upon user error

The user errors are due to the user providing the wrong CSV or config.json for Niffler.

The error handling must be improved when that happens. Currently, the Python progress hangs in there when it fails due to a wrong csv or config.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.