salilab / ihmvalidation Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 2.0 223.26 MB

Validation software for integrative models deposited to PDB

License: MIT License

Python 0.36% HTML 99.62% CSS 0.02% JavaScript 0.01% Shell 0.01% Dockerfile 0.01%

ihmvalidation's People

Contributors

Stargazers

Watchers

Forkers

aozalevsky barionleg

ihmvalidation's Issues

Missing restraint_types in _ihm_derived_distance_restraint in the PDBDEV_00000054

I believe that with the 1.10 update of the IHM dictionary explicit setting of _ihm_derived_distance_restraint.restraint_type became mandatory thus the following record now produce an error:

loop_                                                                                                                                                                                                      _ihm_derived_distance_restraint.id                                                                                                                                                                         
_ihm_derived_distance_restraint.group_id                                                                                                                                                                   
_ihm_derived_distance_restraint.feature_id_1                                                                                                                                                               
_ihm_derived_distance_restraint.feature_id_2                                                                                                                                                               
_ihm_derived_distance_restraint.restraint_type                                                                                                                                                             
_ihm_derived_distance_restraint.dataset_list_id                                                                                                                                                            
1 1 1 2 . 2                                                                                                                                                                                                
2 1 1 3 . 2

/usr/local/lib/python3.8/dist-packages/ihm/reader.py in __call__(self, id, group_id, dataset_list_id, feature_id_1, feature_id_2, restraint_type, group_conditionality, probability, mic_value, distance_lower_limit, distance_upper_limit)
   2105         r.feature2 = self.sysr.features.get_by_id(feature_id_2)
   2106         print(restraint_type)
-> 2107         r.distance = _handle_distance[restraint_type](distance_lower_limit,
   2108                                                       distance_upper_limit,
   2109                                                       self.get_float)

KeyError: None

I guess it should be fixed in the entry? @brindakv

Type mismatches in _ihm_starting_comparative_models section in PDBDEV_00000059

According to the dictionary the template_seq_id_begin and template_seq_id_end fields have to be integer values. However they are set with a ? symbol:

#                                                                                                                                                                                                          
loop_                                                                                                                                                                                                      
_ihm_starting_comparative_models.id                                                                                                                                                                        
_ihm_starting_comparative_models.starting_model_id                                                                                                                                                         
_ihm_starting_comparative_models.starting_model_auth_asym_id                                                                                                                                               
_ihm_starting_comparative_models.starting_model_seq_id_begin                                                                                                                                               
_ihm_starting_comparative_models.starting_model_seq_id_end                                                                                                                                                 
_ihm_starting_comparative_models.template_auth_asym_id                                                                                                                                                     
_ihm_starting_comparative_models.template_seq_id_begin                                                                                                                                                     
_ihm_starting_comparative_models.template_seq_id_end                                                                                                                                                       
_ihm_starting_comparative_models.template_sequence_identity                                                                                                                                                
_ihm_starting_comparative_models.template_sequence_identity_denominator                                                                                                                                    
_ihm_starting_comparative_models.template_dataset_list_id                                                                                                                                                  
_ihm_starting_comparative_models.alignment_file_id                                                                                                                                                         
1 3 A 1 51 C ? ? ? ? 3 .                                                                                                                                                                                   
2 4 A 1 51 D ? ? ? ? 3 .                                                                                                                                                                                   
#

IHM follows the spec and fails with the error:

/usr/local/lib/python3.8/dist-packages/ihm/reader.py in __call__(self, starting_model_id, template_dataset_list_id, alignment_file_id, template_auth_asym_id, starting_model_seq_id_begin, starting_model_seq_id_end, template_seq_id_begin, template_seq_id_end, template_sequence_identity, template_sequence_identity_denominator)
   1538         seq_id_range = (int(starting_model_seq_id_begin),
   1539                         int(starting_model_seq_id_end))
-> 1540         template_seq_id_range = (int(template_seq_id_begin),
   1541                                  int(template_seq_id_end))
   1542         identity = ihm.startmodel.SequenceIdentity(

TypeError: int() argument must be a string, a bytes-like object or a number, not '__UnknownValue'

@brindakv what is the best course of action in this case?

Add more information on how to interpret the report

How to interpret the report? What are good and bad values? Add information to the user guide (wherever possible) regarding what are good and bad values.

Good places to start would be in validation_help.html, particularly in the "Model Quality Assessment" and "Fit to Data Used for Modeling Assessment" sections. For example, state for each score whether higher or lower values are "better".

Paths to js/css resources

Some problems of #23 were caused by inaccessible paths to static js/css resources. Looks like there are some duplications and version mismatches that require refactoring. Overall it would be better to simplify a set of resources and sync it with layout.html.

E.g. about_validation.html and validation_help.html have a somewhat mixed set of links.

 <!-- add Javasscript file from js file -->

            <script type="text/javascript" src="js/jquery.min.js"></script>
            <script type="text/javascript" src="js/bootstrap.min.js"></script>
            <script type="text/javascript" src="js/main.js"></script>
            <script type="text/javascript" src="js/jquery-3.3.1.min.js"></script>
            <script type="text/javascript" src="js/popper1.12.9.min.js"></script>
            <script type="text/javascript" src=".js/bootstrap4.1.3.min.js"></script>
            <script type="text/javascript" src="js/bootstrap3-typeahead.min.js"></script>

Conflicting settings for wkhtmltopdf

wkhtmltopdf settings seem to be conflicting. Despite the fact that javascript was disabled, timeout is still applied and causes a considerable delay during pdf generation. Related to #38

IHMValidation/example/Execute.py

Lines 78 to 79 in 92b1d37

 'enable-javascript': None, 

 'javascript-delay': '50000',

IHMValidation/example/Execute.py

Lines 95 to 96 in 92b1d37

 'enable-javascript': None, 

 'javascript-delay': '500',

Comments on the code so far

A mini-review is below. I'll try to comment on each class of issue only once.

Tests will rot if they're not run periodically. It's pretty straightforward to set them up to run on each push, using GitHub Actions. See for example https://github.com/ihmwg/python-ihm/blob/main/.github/workflows/testpy.yml

Docs can also be auto-built with readthedocs.io. See e.g. https://python-ihm.readthedocs.io/. I can set that up for you if you like.

Exclude __pycache__ with a .gitignore file in https://github.com/salilab/IHMValidation/tree/master/master/pyext/src/validation. There's no point in tracking .pyc files in source control.

You might consider renaming your main branch from 'master' to 'main'. The latter is quickly becoming the standard on GitHub.

Never ever use tabs in Python code. Tabs are evil. (Perhaps configure your editor to insert 4 spaces for each tab keypress.) Run your code through a PEP-8 formatter like autopep8, reindent.py, or black.

IHMValidation/master/pyext/src/validation/Report.py

Line 14 in ddf1a08

import sys,os,glob

Bad practice to import multiple modules on one line (consider running flake8 over your code which will pick up a lot of issues like this; it can be quite educational).

IHMValidation/master/pyext/src/validation/Report.py

Line 35 in ddf1a08

def run_entry_composition(self,Template_Dict:dict)->dict:

Note that using type hints makes your code require a fairly recent version of Python 3. This may be OK. Function arguments should also usually be lowercase (Template_Dict -> template_dict).

IHMValidation/master/pyext/src/validation/Report.py

Line 67 in ddf1a08

 Template_Dict['Data']=[i.upper() for i in list(set(self.I.get_dataset_comp()['Dataset type']).difference({'Experimental model','Comparative model'}))] 

Sets should already be iterable; the list() here is unnecessary and perhaps inefficient.

IHMValidation/master/pyext/src/validation/Report.py

Line 85 in ddf1a08

 filename = os.path.abspath(os.path.join(os.getcwd(), 'static/results/',str(Template_Dict['ID'])+'_temp_mp.txt')) 

Using 'static/results/' here kind of defeats the point of usingpath.join in the first place. Use 'static', 'results' instead. The path.abspath is unnecessary too since you are not changing directory between when you construct the filename and when you use it to open the file.

IHMValidation/master/pyext/src/validation/Report.py

Line 107 in ddf1a08

global clashscore;global rama;global sidechain

Are you sure you need these variables to be global? You should always carefully check any global usage.

IHMValidation/master/pyext/src/validation/Report.py

Line 145 in ddf1a08

except:

Never use bare except. Always explicitly list the exceptions you want to catch. Otherwise you may catch exceptions that should be fixed (e.g. a syntax error in the try block).

IHMValidation/master/pyext/src/validation/Report.py

Line 146 in ddf1a08

print ("Molprobity cannot be calculated...")

Consider using Python's logging module for these sorts of prints, so they can be turned off if desired.

IHMValidation/master/pyext/src/validation/__init__.py

Line 29 in ddf1a08

class get_input_information(object):

Normally classes are CamelCase, e.g. GetInputInformation.

IHMValidation/master/pyext/src/validation/__init__.py

Line 35 in ddf1a08

self.system, = ihm.reader.read(open(self.mmcif_file),

Always better to open the file handle using with (a context manager) and put the read() inside the with body. This ensures the file is closed at the end of the scope.

IHMValidation/master/pyext/src/validation/__init__.py

Line 51 in ddf1a08

def get_id_from_entry(self)->str:

This seems to be trying to handle the case where _entry.id and _struct.entry_id are different. This shouldn't happen (pretty sure the dictionary requires them to be the same) but if it does, you could ask the python-ihm developer to handle this ;)

IHMValidation/master/pyext/src/validation/__init__.py

Lines 70 to 76 in ddf1a08

 aut=cit[0].authors 

 for ind in range(0,len(aut)): 

 if ind==0: 

 authors=str(aut[ind]) 

 else: 

 authors+=';'+str(aut[ind]) 

 return authors

This seems unnecessarily verbose. What about return "; ".join(aut for aut in cit[0].authors) ?

IHMValidation/master/pyext/src/validation/__init__.py

Line 83 in ddf1a08

mol_name=entities.description

This doesn't look right. Are you sure you test it? Shouldn't it be entities[0].description perhaps?

IHMValidation/master/pyext/src/validation/__init__.py

Line 89 in ddf1a08

 """check resolution of structure,returns 0 if its atomic and 1 if the model is multires""" 

Wouldn't True/False be more standard than 1/0 ?

IHMValidation/master/pyext/src/validation/__init__.py

Line 119 in ddf1a08

assembly_id=map(int,self.get_assembly_ID_of_models())

It's unusual to see map in modern Python code; it's largely been replaced with comprehensions, e.g. assembly_id = [int(x) for x in self.get_assembly_ID_of_models()

IHMValidation/master/pyext/src/validation/__init__.py

Line 172 in ddf1a08

 sampling_comp={'Step number':[], 'Protocol ID':[],'Method name':[],'Method type':[], \ 

Backslashes are unnecessary within brackets or parentheses.

IHMValidation/master/pyext/src/validation/__init__.py

Line 197 in ddf1a08

 RB=self.get_empty_chain_dict();RB_nos=[];all_nos=[];flex=self.get_empty_chain_dict() 

Usually code is easier to read on multiple lines. Semicolons should generally be avoided.

IHMValidation/master/pyext/src/validation/__init__.py

Lines 223 to 225 in ddf1a08

 chain=[] 

 for el in ass: 

 chain.append(el._id)

chain = [el._id for el in ass] would be more concise.

IHMValidation/master/pyext/src/validation/__init__.py

Line 227 in ddf1a08

unique=[used.append(x) for x in chain if x not in used]

I'm not sure what you're trying to do here. It's certainly unusual for the left side of a list comprehension to have side effects (used.append()). If it's something clever, add a comment to help the poor reader.

IHMValidation/master/pyext/src/validation/__init__.py

Line 252 in ddf1a08

for _,el in enumerate(self.system.asym_units):

Why are you using enumerate if you're not using the count? Just use for el in self.system.asym_units: instead.

IHMValidation/master/pyext/src/validation/__init__.py

Line 269 in ddf1a08

for _ in lists:

_ is usually used as a "I'm not using this value" placeholder, so this looks weird (and is also hard to read). Use a real variable name, e.g. software, instead.

IHMValidation/master/pyext/src/validation/__init__.py

Line 273 in ddf1a08

if str(_.version) == '?':

There's a difference in mmCIF between ? and '?' which you're losing here. It would be more correct to say if _.version == ihm.unknown:

IHMValidation/master/pyext/src/validation/__init__.py

Line 319 in ddf1a08

except AttributeError as error:

as error implies you're going to use the error object... and then you don't.

IHMValidation/master/pyext/src/validation/__init__.py

Line 320 in ddf1a08

loc=str('Not listed')

Huh? What's wrong with loc = 'Not listed' ?

IHMValidation/master/pyext/src/validation/__init__.py

Line 364 in ddf1a08

if i.data_type =='unspecified':

"unspecified" isn't a valid value according to the IHM dictionary. Do you mean "Other" instead?

IHMValidation/master/pyext/src/validation/__init__.py

Line 378 in ddf1a08

if 'CrossLink' in str(i.__class__.__name__):

This is not the right way to do this. Use isinstance instead.

IHMValidation/master/pyext/src/validation/__init__.py

Lines 454 to 457 in ddf1a08

 if 'SAS' in str(data_type) and 'SAS' in str(database): 

 return True 

 else: 

 return False

return 'SAS' in str(data_type) and 'SAS' in str(database) would be more concise here.

IHMValidation/master/pyext/src/validation/cx.py

Line 20 in ddf1a08

self.nos=get_input_information.get_number_of_models(self)

You can just say self.nos = self.get_number_of_models() here unless you have overridden the method in this class but really want to still call the base class method (which would be confusing).

IHMValidation/master/pyext/src/validation/cx.py

Lines 223 to 232 in ddf1a08

 if linker=='DSS' and dist<=30: 

 return 1 

 elif linker=='EDC' and dist<=20: 

 return 1 

 elif linker=='EDC' and dist>20: 

 return 0 

 elif dist<=30: 

 return 1 

 else: 

 return 0

Would perhaps be cleaner to use a dict here, e.g. something like

linkers = {'DSS': 30, 'EDC': 20}
return dist <= linkers.get(linker, 30)

That dict could go in a utility module somewhere so you can use it elsewhere, e.g. in cx_plots.py.

IHMValidation/master/pyext/src/validation/cx_plots.py

Line 30 in ddf1a08

self.filename = os.path.join('Output/images//')

"Joining" one thing is weird.

IHMValidation/master/pyext/src/validation/excludedvolume.py

Lines 56 to 57 in ddf1a08

 model_spheres={i+1:[j.x,j.y,j.z,j.radius] for i,j in enumerate(spheres)} 

 model_spheres_df=pd.DataFrame(model_spheres, index=['X','Y','Z','R'])

You might use a lot of memory doing it this way (since you construct three copies of the coordinates - one in IHM, one in your model_spheres object, and then another in the DataFrame). Assuming you're using pandas 0.13 or later, you can avoid the intermediate model_spheres object by passing a generator to the DataFrame constructor instead of a dict.

IHMValidation/master/pyext/src/validation/molprobity.py

Line 60 in ddf1a08

 filename = open(os.path.join(os.getcwd(),self.resultpath,self.ID+'_temp_rama.txt')) 

os.getcwd() isn't needed here. If you're trying to make the path absolute (although not needed here) os.path.abspath is the way to do it.

IHMValidation/master/pyext/src/validation/molprobity.py

Lines 67 to 68 in ddf1a08

 f_name_handle=open(f_name,'w+') 

 with f_name_handle as outfile:

This seems odd. with open(f_name,'w+') as outfile: would be more normal.

IHMValidation/master/pyext/src/validation/molprobity.py

Line 170 in ddf1a08

clashes_ordered=dict(sorted(clashes.items()))

dicts are never ordered, so sorting the inputs does nothing here.

IHMValidation/master/pyext/src/validation/molprobity.py

Line 295 in ddf1a08

if len(j.replace(',','').replace(':','').split()[0])>2:

Looks like you repeat j.replace(',','').replace(':','').split() a bunch of times here. Store it in a variable to make it more efficient and easier to read.

IHMValidation/master/pyext/src/validation/molprobity.py

Line 311 in ddf1a08

dict1['Observed distance (&#8491)'].append(val)

Should add a comment for those of us that haven't memorized all of Unicode. I assume this is the Angstrom symbol (maybe the HTML &Aring entity would be simpler).

IHMValidation/master/pyext/src/validation/sas.py

Line 62 in ddf1a08

print ("Error....unable to fetch data from SASBDB, please check the entry ID")

Wouldn't raise be more appropriate if this is an error?

IHMValidation/master/pyext/src/validation/sas.py

Line 109 in ddf1a08

for num,key in enumerate(list(data.keys())):

list() is unnecessary here.

IHMValidation/master/pyext/src/validation/sas.py

Line 373 in ddf1a08

list_sort=sorted(list_sub, key=lambda x: x[1])

Rather than lambda x: x[1] use operator.itemgetter(1).

IHMValidation/master/pyext/src/validation/sas.py

Lines 418 to 419 in ddf1a08

 if parameter_table['Estimated volume'] is None: 

 parameter_table['Estimated volume'].append('N/A')

How could this ever work? None doesn't have an append method.

IHMValidation/master/pyext/src/validation/sas.py

Line 447 in ddf1a08

 if len(sascifline)<3 and len(sascifline)>0 and '_sas_sample.specimen_concentration' in sascifline[0]: 

Maybe more concise to say 0 < len(sascifline) < 3

IHMValidation/master/pyext/src/validation/sas_plots.py

Line 387 in ddf1a08

if val_m[1].empty==False:

Would normally be written as if not val_m[1].empty: or perhaps if val_m[1].empty is False: if you also need to check it's not something non-boolean (e.g. None).

IHMValidation/master/pyext/src/validation/utility.py

Lines 50 to 58 in ddf1a08

 val='' 

 for el in tex: 

 for subel in el: 

 if subel==el[-1] and el==tex[-1]: 

 val+=str(subel)+'. ' 

 elif subel==el[-1] and el!= tex[-1]: 

 val+=str(subel)+', ' 

 else: 

 val+=str(subel)+':'

Strings are immutable, so every time you say val += foo val is destroyed and replaced with a new, longer string object. Whenever you have val += foo inside a for loop, consider replacing with something like val = ''.join(something).

IHMValidation/master/pyext/src/validation/utility.py

Line 80 in ddf1a08

def format_tupple(tex:list)->str:

"tupple" should be "tuple"

IHMValidation/master/pyext/src/validation/utility.py

Line 167 in ddf1a08

 sublist=['%s: Chain %s (%d residues)' % (sub_dict['Subunit name'][i],sub_dict['Chain ID'][i],sub_dict['Total residues'][i]) for i in range(0,model_number)] 

A first argument to range of 0 is redundant - that's the default.

IHMValidation/master/pyext/src/validation/utility.py

Lines 208 to 210 in ddf1a08

 new_restraints=dict() 

 for key,val in restraints.items(): 

 new_restraints[key]=list(set(val))

Seems a good candidate for a dict comprehension, e.g. new_restraints = {key: list(set(val)) for key,val in restraints.items()}

IHMValidation/master/pyext/src/validation/utility.py

Line 262 in ddf1a08

os.listdir(dirname_ed)

os.listdir('.') would be more concise.

Fix broken link to "Data Quality" under "Overall quality"

e.g. at https://pdb-dev-beta.wwpdb.org/Validation/PDBDEV_00000016/htmls/main.html clicking on the "Data Quality" tab in the table under "Overall quality" gives a 404. This is likely because this entry uses SAXS data that is not available in SASBDB. A page to that effect should be generated instead.

Fix typo on `About Validation` page

Fix typo on https://pdb-dev-beta.wwpdb.org/about_validation.html (data quality assesments).

Docker image

create a docker image with the right dependencies

Add link to more detailed reports in "Overall quality" section

Gerado suggests:

Yes I would just add a sentence in the "Overall quality" section that mentions where to find the more detailed reports (i.e. the pdf and the submenus available further up on the page).

Personally I scrolled down the page, saw the "This validation report contains model quality assessments for the structure" sentence and assumed that what followed was the whole report...

Add measure of connectivity restraint satisfaction

In Model Quality: add some measure of connectivity restraint satisfaction. This could be useful especially for bead models of the kind in IMP that represent regions of unknown structure.

Optimize static assets: logon.png

PDB-Dev logo pic: images/logon.png has a size of 1.1M which is about the same size as the whole report for some entries. Optimization with optipng reduces the size by only 8%. The JPEG version is about ~400K.

In case of migration to JPEG following files have to be modified:

about_validation.html:                            <img src="images/logon.png" class="float-left" alt="PDBDEV.org" height="100" width="110" style="margin-top: 0px; margin-bottom: 0px" />
templates/layout.html:                            <img src="../../../images/logon.png" class="float-left" alt="PDBDEV.org" height="100" width="110" style="margin-top: 0px; margin-bottom: 0px" />
templates/notformodeling.html:                            <img src="../../static/webimages/logon.png" class="float-left" alt="PDBDEV.org" height="100"  width="110"  style="margin-top: 10px; margin-bottom: 10px" />
templates/introduction.html:                            <img src="../../static/webimages/logon.png" class="float-left" alt="PDBDEV.org" height="100" width="110" style="margin-top: 10px; margin-bottom: 10px" />
validation_help.html:                            <img src="images/logon.png" class="float-left" alt="PDBDEV.org" height="100" width="110" style="margin-top: 0px; margin-bottom: 0px" />

Also, judging from the current code, pic size can be further reduced by decreasing the size.

Add percentages to all outlier reports

Currently, only the raw numbers are reported for any outliers (clashes, Ramachandran and standard geometry outliers from Molprobity). This cannot be used as an indicator of model quality since it scales as the size of the model. Report the percentage as well to make this easier to interpret.

(Another suggestion would be to report outliers per 100 residues. A simple percentage is easier to calculate however and reports roughly the same thing. "Number of residues" is a slightly tricky quantity for integrative models, for example if there are parts of the model that are not atomic, or that don't report outliers, for example ligands.)

Update plotting routines to work with bokeh 3.x

bokeh 3.x changes API, for instance

IHMValidation/master/pyext/src/validation/get_plots.py

Line 20 in c3bb0ef

from bokeh.models.widgets import Tabs, Panel

works in 2.4.3 but fails in 3.0.0

ihm fails on PDBDEV_00000088

There are multiple points in the entry 88 where ihm fails during parsing:

Traceback (most recent call last):                                                                    
  File "/IHMValidation/example/../master/pyext/src/validation/__init__.py", line 74, in __init__
    self.system, = ihm.reader.read(fh, model_class=self.model)                                        
  File "/root/miniforge/lib/python3.9/site-packages/ihm/reader.py", line 3298, in read       
    more_data = r.read_file()                                                                         
  File "/root/miniforge/lib/python3.9/site-packages/ihm/format.py", line 594, in read_file
    return self._read_file_c()                                                                        
  File "/root/miniforge/lib/python3.9/site-packages/ihm/format.py", line 645, in _read_file_c  
    eof, more_data = _format.ihm_read_file(self._c_format)                                            
  File "/root/miniforge/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)                                
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 470153: invalid start byte

is the result of the sentence:

Typically, 14<B7>106 to 20<B7>106 photons were recorded at TAC channel-width of 14.1\xa0ps (IBH-5000U) or 8\xa0ps (EasyTau300).

The other error:

  File "/root/miniforge/lib/python3.9/codecs.py", line 322, in decode                                                                                                                                       
    (result, consumed) = self._buffer_decode(data, self.errors, final)                                
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 471889: invalid start byte

ogirinates from:

Sample conditions for the EPR experiments were 100 <B5>M protein in 100 mM NaCl, 50 mM Tris-HCl, 5 mM MgCl2, pH 7.4 dissolved in D2O with 12.5 % (v/v) glycerol-d8.

And finally, after deleting symbols causing previous errors:

Traceback (most recent call last):
  File "/root/miniforge/lib/python3.9/site-packages/ihm/format.py", line 645, in _read_file_c
    eof, more_data = _format.ihm_read_file(self._c_format)
_format.FileFormatError: Wrong number of data values in loop (should be an exact multiple of the number of keys) at line 1940098

@benmwebb @brindakv I need your help on that.

Refactor utility.dict_to_JSlist

This is a continuation of the #38

Looks like utility.dict_to_JSlist is heavily used throughout the code and performs a lot of iterations and list comprehensions.

IHMValidation/master/pyext/src/validation/utility.py

Lines 32 to 43 in 4df2f34

 if bool(d) and len(list(d.keys())) > 0: 

 # add headers for table, which are the keys of the dict 

 output_list.append(list(d.keys())) 

 # add each row of the table as a list 

 target = list(d.values()) 

 for ind in range(len(target[0])): 

 sublist = [] 

 for el in target: 

 el = ['_' if str(i) == '?' else str(i) for i in el] 

 sublist.append(str(el[ind])) 

 output_list.append(sublist) 

 return output_list

Though a list comprehension is quite efficient, it is overused causing a sizable delay.

Redundant paths definitions

Paths for tools from the molprobity and ATSAS suites are explicitly defined through environment variables. I think this is redundant since there is a generic and OS-independent way of using the PATH environment variable.

ATSAS="" 
Molprobity_ramalyze=""
Molprobity_molprobity=""
Molprobity_clashscore=""
Molprobity_rotalyze=""
wkhtmltopdf=""

IHMValidation/master/pyext/src/validation/molprobity.py

Line 65 in 0bb8ad2

run([config('Molprobity_ramalyze'), self.mmcif_file], stdout=outfile)

IHMValidation/master/pyext/src/validation/molprobity.py

Line 81 in 0bb8ad2

run([config('Molprobity_molprobity'), self.mmcif_file,

IHMValidation/master/pyext/src/validation/molprobity.py

Line 97 in 0bb8ad2

run([config('Molprobity_clashscore'), self.mmcif_file], stdout=outfile)

IHMValidation/master/pyext/src/validation/molprobity.py

Line 112 in 0bb8ad2

run([config('Molprobity_rotalyze'), self.mmcif_file], stdout=outfile)

Moreover, in case of ATSAS it's a little bit misleading, since only datcmp used and not the whole ATSAS package.

IHMValidation/master/pyext/src/validation/sas.py

Lines 345 to 346 in 0bb8ad2

 run([config('ATSAS'), 'fit1.csv', 

 'fit2.csv'], stdout=outfile, shell=False)

Sync text between entries with and without SAS data

Currently entries without SAS (or any other additional data):

Data quality and fit to model assessments for other datasets and model uncertainty are under development.

while entries with SAS have:

Data quality assessment for SAS datasets and fit to model assessments for SAS datasets is also included in this assessment. Data quality and fit to model assessments for other datasets and model uncertainty are under development.

We should sync/rework text to explicitly show users what types of data there are and what types already have validation implemented.

Bug in the time settings

The current way of handling time has an implicit prerequisite that user time already has the America/Los_angeles

https://github.com/aozalevsky/IHMValidation/blob/f1aec61bc2f8de7717f20024b7fb712d94733d4f/example/Execute.py#L110-L112

It would be better to first get the UTC time from the user, and later convert it to the proper timezone.

Navbar's hamburger button multiplication in adaptive layout

Changing the size of a window causes the appearance of multiple hamburger buttons. The behavior is reproducible in both chromium and firefox on linux.

Hardcoded paths prevent parallel execution

It would be nice to have the ability to update the PDB-Dev in parallel. Together with the recent updates in #38 it would allow to rebuild the whole repo (with recalculated values) in under 2 minutes on a modern 32-128 core node.

So far I identified several places which interfere with parallel execution:

IHMValidation/master/pyext/src/validation/__init__.py

Lines 737 to 739 in f1aec61

 if os.path.isfile('test.cif'): 

 os.remove('test.cif') 

 file_re = open('test.cif', 'w')

uses a hardcoded test.cif as a temporary filename

IHMValidation/master/pyext/src/validation/utility.py

Lines 475 to 490 in f1aec61

 def clean_all(): 

 ''' 

  delete all generated files 

  ''' 

 # dirname_ed = os.getcwd() 

 os.listdir('.') 

 for item in os.listdir('.'): 

 if item.endswith('.txt'): 

 os.remove(item) 

 if item.endswith('.csv'): 

 os.remove(item) 

 if item.endswith('.json'): 

 os.remove(item) 

 if item.endswith('.sascif'): 

 os.remove(item)

removes any temp files by mask, including temp files generated for other structures. It specifically hits sascif processing (looks like other files are not reread again. at least when molprobity and excluded volume are already recalculated).

IHMValidation/master/pyext/src/validation/sas.py

Line 86 in f1aec61

with open(code+'.json', 'w') as f:

IHMValidation/master/pyext/src/validation/sas.py

Line 134 in f1aec61

with open(code+'.sascif', 'w') as f:

IHMValidation/master/pyext/src/validation/sas.py

Lines 370 to 372 in f1aec61

 fname = key+str(fitnum)+'fit.csv' 

 with open(fname, 'w') as f: 

 f.write(fit.text)

IHMValidation/master/pyext/src/validation/sas.py

Line 379 in f1aec61

fit_1.to_csv('fit1.csv', header=False, index=False)

IHMValidation/master/pyext/src/validation/sas.py

Lines 381 to 382 in f1aec61

 fit_2.to_csv('fit2.csv', header=False, index=False) 

 f1 = open('pval.txt', 'w+')

temp files for sas processing.

Don't emit "None" in output HTML

There are several places where "None" is written into the output HTML. One example is at https://pdb-dev-beta.wwpdb.org/Validation/PDBDEV_00000009/htmls/data_quality.html where the Dmax error is reported to be None nm. This is likely because the Python None value is being used as-is. "None nm" is obviously nonsensical; this should be reported instead as "0 nm" if there really is no error or, in the much more likely case that the error could not be calculated for some reason, that reason should be stated to the user.

Create milestones

We need to set up a set of milestones to prioritize issues. Especially for the initial SAS release.

Set bounds on bokeh plot ranges

Looks like none of the bokeh plots currently have bounds set on their x/y ranges. This allows them to scroll away from the data or even in some cases results in odd-looking initial plots. For example see the excluded volume plot https://pdb-dev-beta.wwpdb.org/Validation/PDBDEV_00000012/htmls/main.html. There will never be a negative number of violations so it should not be possible to scroll the x range that way. This can be done with something like

p = bokeh.plotting.figure(..., x_range=Range1d(0, xmax, bounds=(0, None)))

Some pages in PDF reports are cut off

Looks like there is a problem with the width settings of some sections. The beginning of the report looks ok, problems start from the Data quality section and continue down to the very end.

Seems like the problem has been there since the beginning, at least it was already there in Ben's update which was about the time of 9591c5a commit.

Add more test cases

Add more tests to the tests subdirectory to ensure that things that are fixed stay fixed. Add code coverage with codecov so that we can see where we're still lacking tests. Add these to GitHub Actions so that commits and pull requests are checked for breakage.

Misleading message about the number of bond outliers

Even if no outliers were detected (which means that everything is ok), the following message is printed:

Standard geometry: bond outliers[?]
Bond length outliers can not be evaluated for this model

Also incorrect formatting for the number of angle outliers:

Standard geometry: angle outliers[?]
There are 628 angle outliers in this entry (62800.0% of all angles). A summary is provided below, and a detailed list of outliers can be found

To be fixed here

IHMValidation/templates/model_quality.html

Line 60 in c3bb0ef

Entries for testing: 9, 55, 141

After the fix all reports have to be updated.

Add timezone into PDF report

It is always better to have the timezone explicitly described, especially when the service targets the international community

Report cluster precision and population if available

Report model precision and population of models in the cluster, if provided by the authors in the mmCIF file.

Rework the "Fit of model to data used for modeling" page

Plots should be smaller. Non-weighted residual can be dropped. Cluster Log(I) vs q plots with weighted residuals plots. P-values and chi squared values can be combined in one table.

Replace JavaScript with Jinja2

The pipeline

extracts information from MolProbity/ATSAS into a Python dict (called Template_Dict in many parts of the code)
uses Jinja2 to substitute this dict into JavaScript in the output HTML
at runtime, relies on the user's browser to execute that JavaScript to fill in the page content, e.g. generating tables on the fly or disabling parts of the page that are not relevant.

Since the output HTML is static, step 3 is redundant. Jinja2 logic can be used instead to generate the final HTML directly in step 2. This would result in much less bulky HTML, and any errors would be detected at build time, rather than at runtime.

Parsing of PDBDEV_00000013 fails

Full trace:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_2655681/2431891216.py in <module>
      1 fname = '/home/domain/data/silwer/pdb_dev/IHMValidation_aozalevsky/example/PDBDEV_00000013.cif'
      2 with open(fname, encoding='utf8') as f:
----> 3     m, = ihm.reader.read(f, model_class=ihm.model.Model)

/usr/local/lib/python3.8/dist-packages/ihm/reader.py in read(fh, model_class, format, handlers, warn_unknown_category, warn_unknown_keyword, read_starting_model_coord, starting_model_class, reject_old_file, variant)
   3296             ukhandler.add_category_handlers(hs)
   3297         r.category_handler = dict((h.category, h) for h in hs)
-> 3298         more_data = r.read_file()
   3299         for h in hs:
   3300             h.finalize()

/usr/local/lib/python3.8/dist-packages/ihm/format.py in read_file(self)
    587 
    588            :exc:`CifParserError` will be raised if the file cannot be parsed.
--> 589 
    590            :return: True iff more data blocks are available to be read.
    591         """

/usr/local/lib/python3.8/dist-packages/ihm/format.py in _read_file_c(self)
    638         if self.unknown_category_handler is not None:
    639             _format.add_unknown_category_handler(self._c_format,
--> 640                                                  self.unknown_category_handler)
    641         if self.unknown_keyword_handler is not None:
    642             _format.add_unknown_keyword_handler(self._c_format,

/usr/local/lib/python3.8/dist-packages/ihm/reader.py in __call__(self, starting_model_id, asym_id, entity_poly_segment_id, dataset_list_id, starting_model_auth_asym_id, starting_model_sequence_offset, description)
   1500                  starting_model_sequence_offset, description):
   1501         m = self.sysr.starting_models.get_by_id(starting_model_id)
-> 1502         asym = self.sysr.ranges.get(
   1503             self.sysr.asym_units.get_by_id(asym_id), entity_poly_segment_id)
   1504         m.asym_unit = asym

/usr/local/lib/python3.8/dist-packages/ihm/reader.py in get(self, asym_or_entity, range_id)
    190             return asym_or_entity
    191         else:
--> 192             return asym_or_entity(*self._id_map[range_id])
    193 
    194 

KeyError: '1'

I narrowed down the issue to the order of two sections. The code fails on

 1409 loop_                                                                                                                                                                                                
 1410 _ihm_starting_model_details.starting_model_id                                                                                                                                                        
 1411 _ihm_starting_model_details.entity_id                                                                                                                                                                
 1412 _ihm_starting_model_details.entity_description                                                                                                                                                       
 1413 _ihm_starting_model_details.asym_id                                                                                                                                                                  
 1414 _ihm_starting_model_details.entity_poly_segment_id                                                                                                                                                   
 1415 _ihm_starting_model_details.starting_model_source                                                                                                                                                    
 1416 _ihm_starting_model_details.starting_model_auth_asym_id                                                                                                                                              
 1417 _ihm_starting_model_details.starting_model_sequence_offset                                                                                                                                           
 1418 _ihm_starting_model_details.dataset_list_id                                                                                                                                                          
 1419     1  1  CYP199A2    A    1   'experimental model'  A  -13  1                                                                                                                                       
 1420     2  2  HaPux       B    2   'experimental model'  A    0  2

because actual _ihm_entity_poly_segment records are defined ~40 lines below

 1455 loop_                                                                                                                                                                                                
 1456 _ihm_entity_poly_segment.id                                                                                                                                                                          
 1457 _ihm_entity_poly_segment.entity_id                                                                                                                                                                   
 1458 _ihm_entity_poly_segment.seq_id_begin                                                                                                                                                                
 1459 _ihm_entity_poly_segment.seq_id_end                                                                                                                                                                  
 1460 _ihm_entity_poly_segment.comp_id_begin                                                                                                                                                               
 1461 _ihm_entity_poly_segment.comp_id_end                                                                                                                                                                 
 1462 1 1 1 399 SER ALA                                                                                                                                                                                    
 1463 2 2 1 106 PRO THR

If I swap them with each other parsing continues. Indeed, according to the scheme _ihm_entity_poly_segment table should go first. @benmwebb can you check my analysis?

Update documentation

Tables in the responsive layout

Responsive layout settings should be further tuned since they might produce weird-looking tables under some conditions.

Add tooltips to bokeh plots

Would be helpful to add tooltips to at least some of the plots with actual values, e.g. the model quality plots at https://pdb-dev-beta.wwpdb.org/Validation/PDBDEV_00000092/htmls/main.html, using bokeh's HoverTool.

Reduce report generation time

Report generation (with precalculated data) takes typically anything from several minutes up to several hours. It looks a bit unrealistic for a simple rendering task. There have to be some bottlenecks.

Below is a sample, profiling log for the PDBDEV_00000004

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   1762/1    0.020    0.000  103.081  103.081 {built-in method builtins.exec}
        1    0.005    0.005  103.081  103.081 Execute.py:7(<module>)
        2    0.000    0.000   51.620   25.810 api.py:30(from_file)
        2    0.000    0.000   51.602   25.801 pdfkit.py:160(to_pdf)
        4    0.000    0.000   51.590   12.898 subprocess.py:1090(communicate)
        2    0.000    0.000   51.589   25.794 subprocess.py:1926(_communicate)
      195   51.588    0.265   51.588    0.265 {method 'poll' of 'select.poll' objects}
        2    0.000    0.000   51.587   25.794 selectors.py:403(select)
        1    0.000    0.000   51.281   51.281 Execute.py:157(write_pdf)
        1    0.003    0.003   23.990   23.990 Report.py:117(run_model_quality)
       20    0.310    0.015   23.882    1.194 utility.py:16(dict_to_JSlist)
    31841   23.570    0.001   23.570    0.001 utility.py:40(<listcomp>)
      220    0.003    0.000   18.939    0.086 connectionpool.py:518(urlopen)
      220    0.002    0.000   18.899    0.086 connectionpool.py:357(_make_request)
        1    0.000    0.000   17.385   17.385 Report.py:348(run_sas_validation_plots)
      429    0.001    0.000   14.035    0.033 socket.py:690(readinto)
      220    0.001    0.000   13.262    0.060 client.py:1327(getresponse)
      220    0.001    0.000   13.259    0.060 client.py:312(begin)
      220    0.001    0.000   13.235    0.060 client.py:279(_read_status)
     1457    0.001    0.000   13.234    0.009 {method 'readline' of '_io.BufferedReader' objects}
      193    0.001    0.000   10.929    0.057 webdriver.py:404(execute)
      193    0.001    0.000   10.925    0.057 remote_connection.py:402(execute)
      193    0.002    0.000   10.922    0.057 remote_connection.py:423(_request)
      193    0.000    0.000   10.903    0.056 request.py:58(request)
      193    0.001    0.000   10.902    0.056 poolmanager.py:352(urlopen)
      242   10.803    0.045   10.803    0.045 {method 'recv_into' of '_socket.socket' objects}
      181    0.000    0.000   10.062    0.056 request.py:98(request_encode_body)

After a brief analysis of the calls and code I identified several bottlenecks:

wkhtmltopdf calls.

        2    0.000    0.000   51.602   25.801 pdfkit.py:160(to_pdf)
        4    0.000    0.000   51.590   12.898 subprocess.py:1090(communicate)
        2    0.000    0.000   51.589   25.794 subprocess.py:1926(_communicate)

utility.dict_to_JSlist

       20    0.310    0.015   23.882    1.194 utility.py:16(dict_to_JSlist)

3. various get requests

  220    0.003    0.000   18.939    0.086 connectionpool.py:518(urlopen)
  220    0.002    0.000   18.899    0.086 connectionpool.py:357(_make_request)


Let this issue be an umbrella issue. I'll open separate issues for individual bottlenecks.

Cleanup all classes and dependences

Cache ATSAS outputs

example/Execute.py can take a very long time to run for some PDB-Dev entries. This is likely because it has to recalculate all the various SAS plots. This makes regenerating entries to fix minor typos rather time consuming. Consider caching the outputs of running ATSAS, perhaps in the Validation/results directory, in the same way that MolProbity outputs are cached. Care should be taken though to clear or invalidate the cache if part of the SAS pipeline itself changes.

Don't duplicate HTML templates

There is a great deal of duplication in the HTML templates in the templates directory. This means that changes need to be made in multiple locations and things can get out of sync. More use of Jinja2 blocks and macros and "extends" should be made to reduce this, following on from afb8b3f.

Clean up && beautify HTML templates

HTML template files need to be designed better.

Improve handling of restraints in the summary table

Function get_restraints_info has to be refactored to:

avoid nonoptimal formatting:

IHMValidation/master/pyext/src/validation/__init__.py

Lines 514 to 519 in c3bb0ef

 elif isinstance(i, ihm.restraint.PredictedContactRestraint): 

 restraints_comp['Restraint info'].append('Distance: '+str(i.distance.distance) 

 + ' between residues ' + 

 str(i.resatom1.seq_id) 

 + ' and ' + str(i.resatom2.seq_id))

update if-tree to support current him specs, for instance, ihm.restraint.PredictedContactRestraint can have multiple types ('lower bound', 'upper bound', 'lower upper bound').

Provide p-values in the "overall quality" report

It would be better to provide p-values in some form in the "Fit to Data used for modeling" tab on the "Overall quality" page.

Fix dropdown menus

The drop down menus, e.g. "Validation Overview" at https://pdb-dev-beta.wwpdb.org/Validation/PDBDEV_00000016/htmls/main.html, don't work, at least on some Firefox instances. They look like they're supposed to drop down on mouseover, but this happens only slowly, or only on mouse click, and JavaScript errors are seen in the console.

Fix parsing of clash scores for PDB-Dev 62, 63

Execute.py fails for PDB-Dev entries 62 and 63 with

Traceback (most recent call last):
  File "/IHMValidation/example/Execute.py", line 208, in <module>
    template_dict, molprobity_dict, exv_data = report.run_model_quality(
  File "/IHMValidation/example/../master/pyext/src/validation/Report.py", line 240, in run_model_quality
    clashscores, Template_Dict['tot'] = I_mp.clash_summary_table(
  File "/IHMValidation/example/../master/pyext/src/validation/molprobity.py", line 574, in clash_summary_table
    dict1 = self.orderclashdict(dict1)
  File "/IHMValidation/example/../master/pyext/src/validation/molprobity.py", line 584, in orderclashdict
    df = pd.DataFrame(modeldict)
  File "/root/miniforge/lib/python3.9/site-packages/pandas/core/frame.py", line 636, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "/root/miniforge/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 502, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
  File "/root/miniforge/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 120, in arrays_to_mgr
    index = _extract_index(arrays)
  File "/root/miniforge/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 674, in _extract_index
    raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length

Looks like for some reason the code is only finding clash scores for 20 models even though the PDB-Dev entry has 25. Either the parsing of the MolProbity output is deficient here, or there really are no clashes for some of the models (in which case empty lists likely need to be returned so that everything works).

SAS processing is unaware of measuring units

It looks like the code is hardcoded for the 1/A units, thus it is failing on files with 1/nm units (related to #53)

There are multiple places using a hardcoded A to nm conversion:

IHMValidation/master/pyext/src/validation/sas.py

Line 190 in c3bb0ef

I_df['Q'] = I_df['Q']*10

IHMValidation/master/pyext/src/validation/sas.py

Line 217 in c3bb0ef

I_df['Q'] = I_df['Q']*10

IHMValidation/master/pyext/src/validation/sas.py

Line 421 in c3bb0ef

pdf_re['Q'] = pdf_re['Q']*10

IHMValidation/master/pyext/src/validation/sas.py

Line 481 in c3bb0ef

q_max = 1.3/(rg*10)

IHMValidation/master/pyext/src/validation/sas.py

Line 483 in c3bb0ef

G_df_range['Q'] = G_df['Q']*10

IHMValidation/master/pyext/src/validation/sas.py

Line 491 in c3bb0ef

G_df_range['Q2A'] = G_df_range['Q2']*100

The information about units is stored in the sascif file:

SASDC29

_sas_scan.unit                      1/A

SASDDD6

_sas_scan.unit                      1/nm

Validation pipeline fails on PDB-Dev 8 and 66

Parsing of the PDB-Dev entries 8 and 66 fails with the following trace:

Traceback (most recent call last):
  File "/IHMValidation/example/Execute.py", line 203, in <module>
    report = WriteReport(args.f)
  File "/IHMValidation/example/../master/pyext/src/validation/Report.py", line 26, in __init__
    self.input = GetInputInformation(self.mmcif_file)
  File "/IHMValidation/example/../master/pyext/src/validation/__init__.py", line 32, in __init__
    self.system, = ihm.reader.read(fh, model_class=self.model)
  File "/root/miniforge/lib/python3.9/site-packages/ihm/reader.py", line 3260, in read
    more_data = r.read_file()
  File "/root/miniforge/lib/python3.9/site-packages/ihm/format.py", line 589, in read_file
    return self._read_file_c()
  File "/root/miniforge/lib/python3.9/site-packages/ihm/format.py", line 640, in _read_file_c
    eof, more_data = _format.ihm_read_file(self._c_format)
  File "/root/miniforge/lib/python3.9/site-packages/ihm/reader.py", line 1247, in __call__
    a.append(self.sysr.ranges.get(obj, entity_poly_segment_id))
  File "/root/miniforge/lib/python3.9/site-packages/ihm/reader.py", line 191, in get
    return asym_or_entity(*self._id_map[range_id])
  File "/root/miniforge/lib/python3.9/site-packages/ihm/__init__.py", line 1289, in __call__
    return AsymUnitRange(self, seq_id_begin, seq_id_end)
  File "/root/miniforge/lib/python3.9/site-packages/ihm/__init__.py", line 1198, in __init__
    raise TypeError("Can only create ranges for polymeric entities")
TypeError: Can only create ranges for polymeric entities

The parsing code seems to be quite generic:

IHMValidation/master/pyext/src/validation/__init__.py

Lines 29 to 35 in 9d767d1

 self.model = ihm.model.Model 

 try: 

 with open(self.mmcif_file, encoding='utf8') as fh: 

 self.system, = ihm.reader.read(fh, model_class=self.model) 

 except UnicodeDecodeError: 

 with open(self.mmcif_file, encoding='ascii', errors='ignore') as fh: 

 self.system, = ihm.reader.read(fh, model_class=self.model)

So I presume the problem is indeed in the cif files. PDB-Dev 8 has this in the header:

#
loop_
_entity.id
_entity.type
_entity.src_method
_entity.pdbx_description
_entity.formula_weight
_entity.pdbx_number_of_molecules
_entity.details
1 polymer man chr2L_60-161 ? 1 ?
#
<...>
loop_
_struct_asym.id
_struct_asym.entity_id
_struct_asym.details
A 1 chr2L_60-161
#

Which I guess causes failure at the asym.entity check from the python-ihm https://github.com/ihmwg/python-ihm/blob/0989b68412c01359e9f51aaf8413325532306737/ihm/__init__.py#L1196-L1201

@benmwebb I guess I need your advice on this: is this an actual artifact in the cif or it should be handled in the code?

beta site indexing by google

Looks like google is able to index pages of the beta site. My guess is that this is unnecessary?

edit dependency and add bash script for venv

Incorrect model number detection for excluded volume data

IHMValidation/master/pyext/src/validation/Report.py

Lines 267 to 270 in 9258c8f

 exv_data = { 

 'Models': line[0], 'Excluded Volume Satisfaction (%)': 

 line[1], 'Number of violations': line[2]} 

 Template_Dict['NumModels'] = len(exv_data)

and

IHMValidation/master/pyext/src/validation/Report.py

Lines 277 to 278 in 9258c8f

 exv_data = I_ev.run_exc_vol_parallel(model_dict) 

 Template_Dict['NumModels'] = len(exv_data)

number of keys in the exv_data dict will be returned instead of actual number of models.

Potential issue with webdriver

Current HTML export code fails if the path to firefox/geckrodriver is not directly pointing to binary executable. This is exactly the case of the Conda installation used in the docker recipe.

The issue was reported here: bokeh/bokeh#10108
As a workaround, the path to conda firefox executable can be hardcoded like this:
export PATH=/root/miniforge/bin/FirefoxApp:${PATH}

I'll update docker and singularity recipes later.

	aut=cit[0].authors
	for ind in range(0,len(aut)):
	if ind==0:
	authors=str(aut[ind])
	else:
	authors+=';'+str(aut[ind])
	return authors

	if 'SAS' in str(data_type) and 'SAS' in str(database):
	return True
	else:
	return False

	if linker=='DSS' and dist<=30:
	return 1
	elif linker=='EDC' and dist<=20:
	return 1
	elif linker=='EDC' and dist>20:
	return 0
	elif dist<=30:
	return 1
	else:
	return 0

	model_spheres={i+1:[j.x,j.y,j.z,j.radius] for i,j in enumerate(spheres)}
	model_spheres_df=pd.DataFrame(model_spheres, index=['X','Y','Z','R'])

	f_name_handle=open(f_name,'w+')
	with f_name_handle as outfile:

	if parameter_table['Estimated volume'] is None:
	parameter_table['Estimated volume'].append('N/A')

	val=''
	for el in tex:
	for subel in el:
	if subel==el[-1] and el==tex[-1]:
	val+=str(subel)+'. '
	elif subel==el[-1] and el!= tex[-1]:
	val+=str(subel)+', '
	else:
	val+=str(subel)+':'

	new_restraints=dict()
	for key,val in restraints.items():
	new_restraints[key]=list(set(val))

	if bool(d) and len(list(d.keys())) > 0:
	# add headers for table, which are the keys of the dict
	output_list.append(list(d.keys()))
	# add each row of the table as a list
	target = list(d.values())
	for ind in range(len(target[0])):
	sublist = []
	for el in target:
	el = ['_' if str(i) == '?' else str(i) for i in el]
	sublist.append(str(el[ind]))
	output_list.append(sublist)
	return output_list

	run([config('ATSAS'), 'fit1.csv',
	'fit2.csv'], stdout=outfile, shell=False)

	if os.path.isfile('test.cif'):
	os.remove('test.cif')
	file_re = open('test.cif', 'w')

	def clean_all():
	'''
	delete all generated files
	'''

	# dirname_ed = os.getcwd()
	os.listdir('.')
	for item in os.listdir('.'):
	if item.endswith('.txt'):
	os.remove(item)
	if item.endswith('.csv'):
	os.remove(item)
	if item.endswith('.json'):
	os.remove(item)
	if item.endswith('.sascif'):
	os.remove(item)

	fname = key+str(fitnum)+'fit.csv'
	with open(fname, 'w') as f:
	f.write(fit.text)

	fit_2.to_csv('fit2.csv', header=False, index=False)
	f1 = open('pval.txt', 'w+')

	elif isinstance(i, ihm.restraint.PredictedContactRestraint):
	restraints_comp['Restraint info'].append('Distance: '+str(i.distance.distance)
	+ ' between residues ' +
	str(i.resatom1.seq_id)
	+ ' and ' + str(i.resatom2.seq_id))

	self.model = ihm.model.Model
	try:
	with open(self.mmcif_file, encoding='utf8') as fh:
	self.system, = ihm.reader.read(fh, model_class=self.model)
	except UnicodeDecodeError:
	with open(self.mmcif_file, encoding='ascii', errors='ignore') as fh:
	self.system, = ihm.reader.read(fh, model_class=self.model)

	exv_data = {
	'Models': line[0], 'Excluded Volume Satisfaction (%)':
	line[1], 'Number of violations': line[2]}
	Template_Dict['NumModels'] = len(exv_data)

	exv_data = I_ev.run_exc_vol_parallel(model_dict)
	Template_Dict['NumModels'] = len(exv_data)

salilab / ihmvalidation Goto Github PK

ihmvalidation's People

Contributors

Stargazers

Watchers

Forkers

ihmvalidation's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs