for processing the Enamine, Drugbank, Pubchem, Zinc15 dataset 1.convering to canonical smile 2.redundancy removal 3.generate descriptor
raw_file path: Enamine&Drugbank: ena+db.can file in https://app.box.com/folder/105434507153 Pubchem: smiles.pubchem.txt.gz file in /vol/ml/candle_aesp/databases/PubChem ZINC15: .smi files in /vol/ml/candle_aesp/databases/ZINC15