emory-hiti / niffler Goto Github PK
View Code? Open in Web Editor NEWNiffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Home Page: https://emory-hiti.github.io/Niffler/
License: BSD 3-Clause "New" or "Revised" License
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Home Page: https://emory-hiti.github.io/Niffler/
License: BSD 3-Clause "New" or "Revised" License
This will help with installing Niffler on new environments or troubleshooting when a dependency is messed up later.
The CI tests that are triggered when we commit, are failing. For example, check -
https://github.com/Emory-HITI/Niffler/actions/runs/2034155913
and
https://github.com/Emory-HITI/Niffler/runs/5675974447?check_suite_focus=true
It fails at the step:
0s
Install Python3
Run yum update -y -q
yum update -y -q
yum install -q -y python3
yum install -q -y python3-pip unzip
shell: sh -e {0}
Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs in mirrorlist
Error: Process completed with exit code 1.
Should be fixed as this is currently rendering the CI commit workflows useless.
Can be a file output that can be viewed with a tail -f.
Run it real-time as a scheduled event, maybe 10 mins intervals.
Remove the intermediate csv generation process.
This is an easy candidate for Docker containerization. It will be a helpful feature since dependencies such as gdcm tend to have conflicts when locally installed. A containerized module will solve that challenge.
Running install.sh in the container will install the dependencies. Please do a test on a sample image (such as one that could be obtained from https://www.cancerimagingarchive.net/collections/ ) to confirm it is working, before committing the changes or creating a pull request. Also, please include README instructions on how this could be run as a container so people could already start using your container.
Develop a UI for the Niffler modules.
PACS stores AccessionNumber as a string with the year in YY format compared to other clinical systems with the year in YYYY format. This variance in representations raises the problem of data quality in the queries.
Naturally, Niffler supports only YY format since that is the DICOM default. However, supporting both YYYY and YY formats will help with diverse extraction queries.
Currently, the png extraction module supports extracting only public DICOM attributes. Extending it to support private tags can significantly help research works that depend on those.
We rarely get all the headers when we run lots of DICOM files through png-extraction. In contrast, if we run just a couple of files, we get more attributes. Let me give an example:
The fields we got when we ran it against June 28th 2021 all MR data:
AccessionNumber,Private Creator,InstanceNumber,InstitutionName,IssuerOfPatientID,Manufacturer,ManufacturerModelName,Modality,PatientBirthDate,PatientID,PatientName,PatientSex,Private tag data,[Unknown],ReferringPhysicianName,SOPClassUID,SOPInstanceUID,SeriesDate,SeriesDescription,SeriesInstanceUID,SeriesNumber,SeriesTime,SpecificCharacterSet,StationName,StudyDate,StudyDescription,StudyID,StudyInstanceUID,StudyTime,[Reject Image Flag],[Significant Flag],[Confidential Flag],[Assigning Authority For Patient ID],file,has_pix_array,category,BitsAllocated,BitsStored,Columns,HighBit,ImageOrientationPatient,ImagePositionPatient,ImageType,PatientPosition,PhotometricInterpretation,PixelRepresentation,PixelSpacing,Rows,SamplesPerPixel,SliceThickness
The fields we got when we ran against just 2 DICOM images from the above set:
AccessionNumber,AcquisitionDate,AcquisitionMatrix,AcquisitionNumber,AcquisitionTime,AngioFlag,BitsAllocated,BitsStored,BodyPartExamined,Columns,ContentDate,ContentTime,DeviceSerialNumber,EchoNumbers,EchoTime,EchoTrainLength,EthnicGroup,FillerOrderNumberImagingServiceRequest,FlipAngle,FrameOfReferenceUID,HighBit,ImageOrientationPatient,ImagePositionPatient,ImageType,ImagedNucleus,ImagingFrequency,InPlanePhaseEncodingDirection,InstanceCreationDate,InstanceCreationTime,InstanceNumber,InstitutionAddress,InstitutionName,InstitutionalDepartmentName,IssuerOfPatientID,LargestImagePixelValue,Laterality,MRAcquisitionType,MagneticFieldStrength,Manufacturer,ManufacturerModelName,Modality,NumberOfAverages,NumberOfPhaseEncodingSteps,OtherPatientIDs,PatientAddress,PatientAge,PatientBirthDate,PatientID,PatientName,PatientPosition,PatientSex,PatientSize,PatientWeight,PercentPhaseFieldOfView,PercentSampling,PerformedProcedureStepDescription,PerformedProcedureStepID,PerformedProcedureStepStartDate,PerformedProcedureStepStartTime,PhotometricInterpretation,PhysiciansOfRecord,PixelBandwidth,PixelRepresentation,PixelSpacing,PregnancyStatus,0_ProcedureCodeSequence_CodeMeaning,0_ProcedureCodeSequence_CodeValue,0_ProcedureCodeSequence_CodingSchemeDesignator,0_ProcedureCodeSequence_CodingSchemeVersion,ProtocolName,0_ReferencedImageSequence_ReferencedSOPClassUID,0_ReferencedImageSequence_ReferencedSOPInstanceUID,1_ReferencedImageSequence_ReferencedSOPClassUID,1_ReferencedImageSequence_ReferencedSOPInstanceUID,2_ReferencedImageSequence_ReferencedSOPClassUID,2_ReferencedImageSequence_ReferencedSOPInstanceUID,0_ReferencedPatientSequence_ReferencedSOPClassUID,0_ReferencedPatientSequence_ReferencedSOPInstanceUID,0_ReferencedStudySequence_ReferencedSOPClassUID,0_ReferencedStudySequence_ReferencedSOPInstanceUID,ReferringPhysicianName,RepetitionTime,0_RequestAttributesSequence_RequestedProcedureID,0_RequestedProcedureCodeSequence_CodeMeaning,0_RequestedProcedureCodeSequence_CodeValue,0_RequestedProcedureCodeSequence_CodingSchemeDesignator,RequestedProcedureDescription,RequestingPhysician,Rows,SAR,SOPClassUID,SOPInstanceUID,SamplesPerPixel,ScanOptions,ScanningSequence,SequenceName,SequenceVariant,SeriesDate,SeriesDescription,SeriesInstanceUID,SeriesNumber,SeriesTime,SliceLocation,SliceThickness,SmallestImagePixelValue,SoftwareVersions,SpacingBetweenSlices,SpecificCharacterSet,StationName,StudyDate,StudyDescription,StudyID,StudyInstanceUID,StudyPriorityID,StudyStatusID,StudyTime,TransmitCoilName,VariableFlipAngleFlag,WindowCenter,WindowCenterWidthExplanation,WindowWidth,dBdt,Private Creator,Private tag data,[Unknown],[CSA Image Header Type],[CSA Image Header Version ??],[SliceMeasurementDuration],[GradientMode],[FlowCompensation],[TablePositionOrigin],[ImaAbsTablePosition],[ImaRelTablePosition],[SlicePosition_PCS],[TimeAfterStart],[SliceResolution],[RealDwellTime],[CSA Image Header Version],[CSA Image Header Info],[CSA Series Header Type],[CSA Series Header Version],[CSA Series Header Info],[Series Workflow Status],[AcquisitionMatrixText],[CoilString],[PATModeText],[PositivePCSDirections],[Reject Image Flag],[Significant Flag],[Confidential Flag],[Assigning Authority For Patient ID],file,has_pix_array,category,AdmissionID,0_AnatomicRegionSequence_CodeMeaning,0_AnatomicRegionSequence_CodeValue,0_AnatomicRegionSequence_CodingSchemeDesignator,BeatRejectionFlag,CardiacNumberOfImages,HeartRate,ImagesInAcquisition,InStackPositionNumber,InversionTime,PerformedLocation,PerformedStationName,ReceiveCoilName,ReconstructionDiameter,0_ReferencedPerformedProcedureStepSequence_ReferencedSOPClassUID,0_ReferencedPerformedProcedureStepSequence_ReferencedSOPInstanceUID,StackID,TriggerWindow,[Suite id],[Product id],[Image actual date],[Service id],[Mobile location number],[Equipment UID],[Actual series data time stamp],[Horiz. Frame of ref.],[Series contrast],[Last pseq],[Series plane],[First scan ras],[First scan location],[Last scan ras],[Last scan loc],[Display field of view],[Acquisition Duration],[Second echo],[Number of echoes],[Table delta],[Contiguous],[Peak SAR],[Cardiac repetition time],[Images per cardiac cycle],[Actual receive gain analog],[Actual receive gain digital],[Delay after trigger],[Swappf],[Pause Interval],[Pause Time],[Slice offset on freq axis],[Auto Prescan Center Frequency],[Auto Prescan Transmit Gain],[Auto Prescan Analog receiver gain],[Auto Prescan Digital receiver gain],[Bitmap defining CVs],[Pulse Sequence Mode],[Pulse Sequence Name],[Pulse Sequence Date],[Internal Pulse Sequence Name],[Transmitting Coil Type],[Surface Coil Type],[Extremity Coil flag],[Raw data run number],[Calibrated Field strength],[SAT fat/water/bone],[User data 0],[User data 1],[User data 2],[User data 3],[User data 4],[User data 5],[User data 6],[User data 7],[User data 8],[User data 9],[User data 10],[User data 11],[User data 12],[User data 13],[User data 14],[User data 15],[User data 16],[User data 17],[User data 18],[User data 19],[User data 20],[User data 21],[User data 22],[Projection angle],[Saturation planes],[SAT location R],[SAT location L],[SAT location A],[SAT location P],[SAT location H],[SAT location F],[SAT thickness R/L],[SAT thickness A/P],[SAT thickness H/F],[Phase Contrast flow axis],[Velocity encoding],[Thickness disclaimer],[Prescan type],[Prescan status],[Projection Algorithm],[Fractional echo],[Cardiac phase number],[Variable echoflag],"[Concatenated SAT {# DTI Diffusion Dir., release 9.0 & below}]","[User data 23 {# DTI Diffusion Dir., release 9.0 & below}]","[User data 24 {# DTI Diffusion Dir., release 10.0 & above}]",[Velocity Encode Scale],[Fast phases],[Transmit gain],[Series from which prescribed],[Image from which prescribed],[Screen Format],[Locations in acquisition],[Graphically prescribed],[Rotation from source x rot],[Rotation from source y rot],[Rotation from source z rot],[Num 3D slabs],[Locs per 3D slab],[Overlaps],[Image Filtering 0.5/0.2T],[Diffusion direction],[Tagging Flip Angle],[Tagging Orientation],[Tag Spacing],[RTIA_timer],[Fps],[Auto window/level alpha],[Auto window/level beta],[Auto window/level window],[Auto window/level level],[Start time(secs) in first axial],[No. of updates to header],[Indicates study has complete info (DICOM/genesis)],[Last pulse sequence used],[Images in Series],[Landmark Counter],[Number of Acquisitions],[Indicates no. of updates to header],[Series Complete Flag],[Number of images archived],[Last image number used],[Primary Receiver Suite and Host],[Protocol Data Block (compressed)],[Image archive flag],[Scout Type],[Imaging Mode],[Pulse Sequence],[Imaging Options],[Plane Type],[RAS letter of image location],[Image location],[Image dimension - X],[Image dimension - Y],[Number of Excitations],[Lower range of Pixels1],[Upper range of Pixels1],[Lower range of Pixels2],[Upper range of Pixels2],[Version of the hdr struct],[Advantage comp. Overflow],[Advantage comp. Underflow],[Bitmap of prescan options],[Gradient offset in X],[Gradient offset in Y],[Gradient offset in Z],[Number of EPI shots],[Views per segment],"[Respiratory rate, bpm]",[Respiratory trigger point],[Type of receiver used],[DB/dt Peak rate of change of gradient field],[dB/dt Limits in units of percent],[PSD estimated limit],[PSD estimated limit in tesla per second],[Window value],[GE image integrity],[Level value],[Unique image iden],[Histogram tables],[User defined data],[Effective echo spacing],[Filter Mode (String slop field 1 in legacy GE MR],"[Image Type (real, imaginary, phase, magnitude)]",[Vas collapse flag],[Vas flags],[Neg_scanspacing],[Offset Frequency],[User_usage_tag],[User_fill_map_MSW],[User_fill_map_LSW],[User data 25...User data 48 {User48=Effective Resolution for spiral}],[Slop_int_6... slop_int_9],[Slop_int_10...slop_int_17],[Scanner Study Entity UID],[Scanner Study ID],[Scanner Table Entry (single gradient coil systems only)/Scanner Table Entry + Gradient Coil Selected],[Recon mode flag word],[Coil ID Data],[GE Coil Name],[System Configuration Information],[Asset R Factors],[Additional Asset Data],"[Governing Body, dB/dt, and SAR definition]",[Private In-Plane Phase Encoding Direction],[SAR Definition],[SAR value],[Prescan Reuse String],[Content Qualification],[Image Filtering Parameters],[Rx Stack Identification]
In both cases, we set as below:
{
"DICOMHome": "/opt/localdrive/Niffler/modules/cold-extraction/june28",
"OutputDirectory": "June28_2021",
"Depth": 3,
"SplitIntoChunks": 1000,
"PrintImages": false,
"CommonHeadersOnly": false,
"PublicHeadersOnly": false,
"SpecificHeadersOnly": false,
"UseProcesses": 0,
"FlattenedToLevel": "patient",
"is16Bit":true,
"SendEmail": true,
"YourEmail": "[email protected]"
}
{
"DICOMHome": "/Users/pradeeban/Downloads/onedrive",
"OutputDirectory": "/Users/pradeeban/Downloads/onedrive/meta1",
"Depth": 0,
"SplitIntoChunks": 1,
"PrintImages": false,
"CommonHeadersOnly": false,
"PublicHeadersOnly": false,
"SpecificHeadersOnly": false,
"UseProcesses": 0,
"FlattenedToLevel": "patient",
"is16Bit":true,
"SendEmail": true,
"YourEmail": "[email protected]"
}
Currently, the anonymization is a png-extraction module. We are reading the DICOM files, get their metadata as a csv, and write the file back as a PNG. However, we may need a DICOM -> anonymized DICOM conversion as elaborated in #105. This conversion option does not exist currently.
We observed some overlaps for 20200601 and need to find whether the algorithm can be fixed to handle this.
I have installed all the python dependencies, gdcm (yay -S gdcm). Then I placed dicom image and modified config as required. It generates all the folders but generates no png file and shows no error. I expect a png image in my output directory.
I am attaching ImageExtractor.out, i.e. the log file with it.
It shows there are no files in that folder though there are files present.
ImageExtractor.out.txt
ImageExtractor.pickle.txt
python that I have is 3.10.4
Refine the modes from Niffler-0.7.x to Niffler-0.8.x:
empi: REMAINS THE SAME.
empi_accession: REMAINS THE SAME
accession: REMAINS THE SAME
any_any: CHANGED FROM empi_date (defaults to empi_date in the default config.json)
any: new (defaults to accession in the default config.json)
You will be able to query and retrieve based on "any" single DICOM property. No need to use this mode for EMPI based extractions as this is inefficient for that. Such extractions do not need FINDSCU (hence the decision to keep "empi" as the first mode). For backward compatibility, we keep the "accession" mode as well.
I tested "any" mode on a given date (AcquisitionDate) and it works.
For any_any:
You will be able to query and retrieve based on any two DICOM properties. No need to use this mode for EMPI-Accession based extractions as this is inefficient for that. Such extractions do not need FINDSCU (hence the decision to keep "empi_accession" as the second mode).
I tested "any_any" mode on a given date (AcquisitionDate) for a given modality (MR) and it works.
I cannot test for all the metadata headers. But with time as you all use Niffler-0.8.x or later, you will know how generic these two modes (any and any_any) actually are.
Describe the bug
In the experimental slurm based implementation of png-extraction, there are two issues: the script is missing the subprocess import, and the t_start variable used is not declared.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The script should not contain any errors.
Environment (please complete the following information):
Additional context
While I was checking out how the module is containerized, I came across these errors.
This leads to extracted images not being cleaned up at 23:59 during the cleanup process.
Currently, this could be achieved by a human-in-the-loop. Make this an automated process.
Spotted with DICOMs of 1.1TB and 1000 metadata files (small files).
Currently, we support querying multiple sources by running ColdDataRetriever twice (or more times), by changing these parameters in the system.json.
NifflerID: The ID of the current execution. Default is 1. You must increment the second execution to 2, so that the logs are properly stored in niffler1.log and niffler2.log.
MaxNifflerProcesses: How many Niffler processes can run in parallel. Make sure each execution has its own SrcAet properly configured. Each SrcAet can run only once.
But the same Query AET can run multiple queries to multiple source AETs. So, we may rethink our approach and fix the implementation.
This will be especially useful if all the SourceAET only know and accept that one particular Niffler instance, defined by a Hostnmae, port, and AET as something like (x.x.x.x, 4242, QBNIFFLER).
This will serve as a log on which extractions succeeded when the extraction fails in the middle.
Description
Anaconda is downloaded and installed again when install.sh is run in a system which already has anaconda installed.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The install script should skip installing Anaconda.
Environment:
This is slightly tricky since Niffler is not just Python.
Among the modules, the below are in Python:
Although the above could be packaged with pip following [1], there are complex dependencies such as DCM4CHE as storescp for cold-extraction and meta-extraction, mongo for meta-extraction, and gdcm for png-extraction. Additional care must be taken on initiatives on packaging Niffler as an installable configuration to ensure this is complete.
[1] https://packaging.python.org/en/latest/tutorials/packaging-projects/
Describe the bug
Running ImageExtractor.py is not finding .dcm files in target directory
To Reproduce
Downloaded ImageExtractor.py & config.json, installed all dependencies, configured config.json, and ran ImageExtractor.py
Expected behaviour
Expect it to discover dicom files in target directory and convert to png
Logs or screenshots
$ ls /home/xxxxxxxx/Desktop/Antares/patient/opt
OP000000.dcm
$ file /home/xxxxxxxx/Desktop/Antares/patient/opt/OP000000.dcm
/home/xxxxxxxx/Desktop/Antares/patient/opt/OP000000.dcm: DICOM medical imaging data
$ cat config.json
{
"DICOMHome": "/home/xxxxxxxx/Desktop/Antares/patient/opt",
"OutputDirectory": "png",
"Depth": 0,
"SplitIntoChunks": 1,
"PrintImages": true,
"CommonHeadersOnly": false,
"PublicHeadersOnly": true,
"SpecificHeadersOnly": false,
"UseProcesses": 0,
"FlattenedToLevel": "patient",
"is16Bit":false,
"SendEmail": true,
"YourEmail": "[email protected]"
}
$ python3 ImageExtractor.py
$ cat png/ImageExtractor.out
INFO:root:------- Values Initialization DONE -------
INFO:root:Number of dicom files: 0
ERROR:root:There is no file present in the given folder in /home/xxxxxxxx/Desktop/Antares/patient/opt/*.dcm
Environment (please complete the following information):
The png-extraction modules occasionally throw the below errors.
Multiprocessing Error
No error but metadata file is not created - Process Killed.
RuntimeWarning: invalid value encountered in true_divide
They are hard to reproduce. But recording here for future reference.
The requests are currently hard-coded in the ColdRetriever.py as constants.
Replace them to read from a file (json, csv, or a conf file)
The output format will be hash(EMPI)/hash(Study)/hash(Series)/*.png
We will be creating a directory tree here - http://www.linfo.org/make_directory_tree.html
For example, missing accession in an EMPI-Accession extraction.
The snippet of PNG extraction has a slight issue. Bellow are lines 182-187 of ImageExtractor.py. It imposes a limit on Dicom tags extracted. A file that exceeds that limit is moved to failed cases folder, yet its metadata is still preserved but does not record the ‘file’ or ‘has_pix_array’ attributes which would be useful for backtracking.
if len(kv) > dicom_tags_limit:
logging.debug(str(len(kv)) + " dicom tags produced by " + ff)
copyfile(ff, output_directory + '/failed-dicom/5/' + os.path.basename(ff))
else:
kv.append(('file', f_list_elem[1])) # adds my custom field with the original filepath
kv.append(('has_pix_array', c)) # adds my custom field with if file has image
Would require the extraction of a DCM file who's tags are over 800
file and has_pix_array should be stored outside the if/else block
if len(kv) > dicom_tags_limit:
logging.debug(str(len(kv)) + " dicom tags produced by " + ff)
copyfile(ff, output_directory + '/failed-dicom/5/' + os.path.basename(ff))
kv.append(('file', f_list_elem[1])) # adds my custom field with the original filepath
kv.append(('has_pix_array', c)) # adds my custom field with if file has image
Describe the bug
An error message pops up "Statements must be separated by newlines or semicolons" in the ImageExtractorNifti.py file of the nifty-extraction module at line 206.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
There shouldn't be any syntax error
Environment (please complete the following information):
Make the code more readable and easy to run, removing hard coded values.
Niffler will need some modification that when we use an external AE to store the images, storescp does not need to be stared locally. I will create a bug report for that. But it shouldn't cause any trouble. It is just Niffler also creating a storescp unnecessarily for an iteration where we just want to send the data to another AE's storescp.
A solution will be, a system.json parameter, "IsExternalAE" and if it is true, not to start the local storescp process.
Refactor and move hard-coded values to a conf file, following the steps taken for on-demand extraction code refactoring.
The following 3 attribute options (Month, Year, Days) are proposed.
and slightly more complex:
study interval to retrieve images provided as a pair of entries
3) StudyStartDate,
4) Days.
(example:
StudyStartDate: 20130420,
Days: 15)
This fetches images for the 15 days, starting from (and including) 20130420.
In this issue report, StudyMonth is more important and easier to implement.
Is your feature request related to a problem? Please describe.
The macOS install script is designed to address the problem of installing the necessary dependencies for the Niffler application on a macOS system quickly and efficiently. Without this script, users may have difficulty installing and configuring the required dependencies.
Describe the solution you'd like
a clear and concise solution would be to create a macOS equivalent script of the already in place install.sh script(which is for linux OS) that automates the installation process of all the necessary dependencies for the Niffler application on macOS. Additionally, it should create and configure the necessary service files (plist files) to ensure that the Niffler application and its components start automatically on system startup. The end result would be a hassle-free and straightforward installation process for Niffler on macOS.
Describe alternatives you've considered
An alternate solution could be a script that wound create an installable package for macOS as discussed here #374
When "PublicHeadersOnly" : True, then it extract public headers properly.
but when "PublicHeadersOnly" : False, it could only extract public headers but could not able to extract private headers.
A service framework deployment might help with more visibility across communities such as Libre Health.
Currently, when the on-demand extraction fails in the middle, and when we restart it starts from the beginning.
This pause-and-resume feature should be for all 3 cases (DICOM pulls, png conversion, and metadata extraction).
In the real-time metadata extraction, we have a pickle file to keep track of the progress. I need to incorporate the same mechanism, as it seems to work well.
As of Niffler-0.8.5, png-extraction extracts all the metadata. The only filtering options are, getting all the images, getting the common headers, and getting only the public headers, as supported by the config.json options:
"CommonHeadersOnly": false,
"PublicHeadersOnly": true,
Enabling extracting only a certain subset of DICOM headers (both private and public), as supported by the meta-extraction module through featureset.txt files as in https://github.com/Emory-HITI/Niffler/blob/dev/modules/meta-extraction/conf/featureset1.txt is an alternative.
This option makes the png-extraction module more efficient when we just need certain headers stored in the output CSV file.
We add a new header (which is by default, false)
"SpecificHeadersOnly": false,
"CommonHeadersOnly": false,
"PublicHeadersOnly": true,
You could copy-paste https://github.com/Emory-HITI/Niffler/blob/dev/modules/meta-extraction/conf/featureset1.txt as the default featureset.txt for this module, but also add PhotometricInterpretation into this new featureset.txt.
In this way, there is always a featureset.txt.
We can make certain requirements from the users from what is an accepted featureset.txt. A featureset.txt must always consist of the below 4 fields as they are mandatory for the png-extraction.
PhotometricInterpretation
PatientID
StudyInstanceUID
SeriesInstanceUID
Just mention this in the README. That is sufficient.
When you implement the feature, just make sure to test for all the below 3 cases:
1.
"SpecificHeadersOnly": false,
"CommonHeadersOnly": false,
"PublicHeadersOnly": true,
This is the current default. This pulls all the public headers.
"SpecificHeadersOnly": false,
"CommonHeadersOnly": false,
"PublicHeadersOnly": false,
This pulls all the public and private headers.
"SpecificHeadersOnly": true,
"CommonHeadersOnly": false,
"PublicHeadersOnly": false,
This is currently not implemented. When SpecificHeadersOnly is set to true, it will ignore CommonHeadersOnly and PublicHeadersOnly tags, and extract the public and private tags mentioned in the featureset.txt (regardless whether they are public, private, or uncommon).
So, essentially, the below are all the same:
"SpecificHeadersOnly": true,
"CommonHeadersOnly": false,
"PublicHeadersOnly": false,
and
"SpecificHeadersOnly": true,
"CommonHeadersOnly": false,
"PublicHeadersOnly": true,
and
"SpecificHeadersOnly": true,
"CommonHeadersOnly": true,
"PublicHeadersOnly": false,
and
"SpecificHeadersOnly": true,
"CommonHeadersOnly": true,
"PublicHeadersOnly": true,
While extracting metadata using the PNG Extraction Module (CommonHeadersOnly - true), all the DICOM tags are collected stored in a CSV format using pandas.
The memory error occurs because - in some instances, the DICOM Images could have a large no. of tags than expected which creates a sparse CSV file resulting in a large data frame that cannot be saved using pandas.
The error can be fixed by transferring such DICOM files (which have a large no. of tags) into a new folder in failed-dicom
folder in the same module.
Currently, the processes are scheduled manually or the workflows are developed manually per each execution. This workflow module will enable chaining of processes:
One common workflow is, cold-extraction -> png-extraction, which will retrieve the images and then convert the DICOM images to PNG images and extract metadata.
Another workflow we use is for scanner utilization. It is, cold-extraction -> png-extraction -> scanner-util.
The aim is to reduce human-in-the-loop in several repeated execution patterns we have.
Currently, Niffler png-extraction is containerized (#262). However, other modules are not containerized. What we propose is a global Niffler container from where different modules can be executed.
While this can be a challenging undertaking (considering the multiple modules of Niffler and dependencies on our sibling frameworks such as Eaglescope for the visualizations), an attempt at this will help us expand the user base and also will help us improve the framework itself in its compactness.
For example, an email sender.
However, this will fail to send a notification if the VM halts.
Maybe, then incorporate the email sender with the mdextractor.service.
The user errors are due to the user providing the wrong CSV or config.json for Niffler.
The error handling must be improved when that happens. Currently, the Python progress hangs in there when it fails due to a wrong csv or config.
Follow approach similar to #38.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.