listerlab / home Goto Github PK

DMR Identification Tool

Python 94.21% R 5.79%

home's Introduction

HOME

HOME (histogram of methylation) is a python package for differential methylation region (DMR) identification. The method uses histogram of methylation features and the linear Support Vector Machine (SVM) to identify DMRs from whole genome bisulfite sequencing (WGBS) data. HOME can identify both pairwise and time series DMRs with or without replicates. #Installation HOME is written for python 2.7 and tested on Linux system. It is recommended to set up virtual environment for python 2.7 first before installing HOME package.

Step 1: Create a virtual environment for HOME

virtualenv -p <path_to_python2.7> <env_name>

NOTE : the above command assumes that the virtualenv tool is already installed on your system.

Step 2: Activate the virtual environment

source <env_name>/bin/activate

The name of the activated environment will appear on the left of the prompt.

Step 3: Install the HOME package outside virtual environment but make sure that the virtual environment is active

pip install git+https://github.com/ListerLab/HOME.git

or

git clone https://github.com/ListerLab/HOME.git
cd ./HOME
pip install -r requirements.txt
python setup.py install

*For conda users

git clone https://github.com/ListerLab/HOME.git
cd ./HOME
conda env create    *assuming the conda environment is activated and R is already installed in it*
source activate HOMEenv
python setup.py install

Usage

HOME can be run in pairwise mode for two group comparisons and time series mode for more than two group comparisons. It can also be used for mutiple pairwise comparisions with large number of input samples.

BSSeeker2 CGmap file can be provided as input file directly or Input file format as mentioned below needs to provided

Chromosome number, position, strand, type (CG/CHG/CHH) where H is anything but G, methylated reads and total number of reads. For a sample, this information are saved in a single tab separated text file without header, which can be compressed or uncompressed. Below shows an example of such file:

chr1	15814	+	CG	12	14
chr1	15815	-	CG	15	21
chr1	15816	-	CHG	1	9
chr1	15821	-	CHH	7	22
chr1	15823	-	CHH	0	2
chr1	15825	-	CHH	11	19

DMR detection for two group comparison: Pairwise

HOME-pairwise 	-t [CG/CHG/CHH/CHN/CNN]	 -i [sample_file_fullpath] 	-o [output_directorypath]

Note: Please check the number of cores to use and set them by npp parameter (default is 8). Also for non-CG DMR prediction for huge genomes like mammalian genome use parameter -sin.

Example:

HOME-pairwise 	-t CG 	-i ./testcase/sample_file_CG.tsv 	-o ./outputpath 

or (if CGmap files from BSSeeker2)

HOME-pairwise 	-t CG 	-i ./testcase/sample_file_CG.tsv 	-o ./outputpath --BSSeeker2

Required arguments:

 -t --type 	        Type of DMRs (CG /CHH/CHG/CHN/CNN) 

 -i --samplefilepath Sample file containing sample names and sample paths for each replicate (TAB sep); each sample info should be in different rows
 
 -o --outputpath 	  Path to the output directory

The default parameters for HOME are relatively permissive. To run HOME with more stringent setting please change the defaults parameters as below or higher:

HOME-pairwise 	-t CG 	-i ./testcase/sample_file_CG.tsv -o ./outputpath --delta 0.2 --minc 5

Optional arguments:

Parameter	Default	Description
-sc --scorecutoff	0.1	the score from the classifier for each C position
-p --pruncutoff	0.1	the SVM score checked for consecutive Cs from both ends to refine the boundaries
-npp -–numprocess	8	number of cores to be used
-ml --minlength	50	minimum length of DMRs required to be reported
-ncb --numcb	5	minimum number of Cs present between DMRs to keep them seperate
-md -–mergedist	500	maximum distance allowed between DMRs to merge
-prn --prunningC	3	number of consecutives Cs to be considered for pruning for boundary refinement2
-ns --numsamples	all	no.of samples to use for DMR calling; default takes all sample in the file
-sp --startposition	1st position	start position of sample in the sample file to use for timeseries DMR calling
-BSSeeker2 --BSSeeker2	False	input CGmap file from BSSeeker2
-mc --minc	3	minimum number of Cs in a DMR
-sin --singlechrom	False	parallel code for single chromosome; npp will be used for parallel run for each chr
-d --delta	0.1	minimum average difference in methylation required in a DMR
-wrt --withrespectto	all	samples to use for DMR calling for pairwise comparisions with respect to specific samples
-Keepall --Keepall	False	Keep all cytosine positions present in atleast one of the replicate

Parameter –sc

HOME assigns a score from the classifier to each C position based on the histogram features. This score is then used to cluster the C’s and call the DMRs. The user can set a lower score cutoff if the DMRs seems to be missing or DMR boundaries seems to be missing the differential methylated cytosine near the boundaries. Similarly, the user can increase the cutoff if the boundaries of DMRs seems to be extended. The score ranges from 0-1 and the default is set to 0.1.

NOTE: the default score is set after rigorous testing on various data sets and need not to be varied in most of the cases. The default should only be changed after proper visualization of the DMRs on the browser.

Parameter –p

This parameter controls boundary accuracy of the DMRs. After the DMRs are called they are then pruned based on the scores for the consecutive C’s on both ends. The user can increase the cutoff if the boundaries seems to be extending too far. Similarly the user can lower the cutoff if the DMR boundaries seems to be missing the differential methylated C’s. The range is from 0 to 0.5 and the default is 0.1. Please note that similar to parameter –sc this parameter need not to be altered in most of the cases and should only be changed after proper visual inspection of the DMRs.

Parameter –ml

This parameters sets minimum length (in terms of base pairs) for a DMR to be reported. The DMRs below this length will be skipped and not reported in the filtered DMR output file. The default is 50bp.

Parameter –ncb

This parameter controls when the smaller DMRs should be merged into one. It controls minimum number of C’s required between DMRs to keep them as separate DMRs. It works in relation with parameter –md (described below). The default is 5 C’s. So, if the number of C’s are less than 5 and the distance is less than 500bp (default for --md) between two consecutive DMRs it will be merged into one single DMR.

Parameter –md

This parameter allows the user to set the merge distance between two consecutive DMRs. The defaults is 500bp.

Parameter –npp

This parameter allows the user to set the number of parallel process to run at a time. The default is 8.

Parameter –mc

This parameter allows the user to set minimum number of C’s present in a DMRs to be reported. Any DMR with less than the set value will not be reported in the filtered DMR file. The default is 5.

Parameter –d

This parameter sets minimum average methylation difference present for a DMR to be reported. The DMRs with less than the set value will not be reported in the filtered DMR file. The default is 0.1.

Parameter –prn

This parameter is used in relation to parameter –p (described above). This controls the number of consecutive C’s to be considered from both ends for boundary refinement. The default is 3. Alteration of this parameter should only be done after proper visual inspection of the DMRs.

Parameter –sin

This parameter is used if you want to parallel the code by single chromosome. The default is False, so the code will be parallel for all chromosomes. It should be used with huge chromosomes for example in case of non-CG DMR prediction for mammalian genome. If the genome size is small it is adviced not to use it. To turn it on just say -sin in the command line.

Parameter –ns

This parameter is used if you want to use selected number of samples from your sample input file. The default is False, so the code will use all the samples in the sample input file. It allows you to have as many samples as you want in your input file but control the number of samples to use for DMR calling.

Parameter –sp

This parameter is used if you want to select the samples from anywhere in your sample input file. This parameter is used along with ns parameter. The default is False, so the code will start from the 1st sample in the sample input file. It allows you to have as many samples as you want in your input file but control the samples to use for DMR calling.

Parameter –BSSeeker2

This parameter is used if you want to provide CGmap file directly. The default is False, so the code will require the files in the input format mentioned above. If the user have methyaltion output files from BSSeeker2, it can be provided directly. To turn it on just say -BSSeeker2 in the command line.

Parameter –Keepall

This parameter is used if you want keep all the positions present in any of the replicates. The default is False, so it will keep only those position that are present in all the replicates of a sample. This parameter is especially useful if the dataset contains more than 1 replicate and contains missing positions which are coverged in atleast one of the replicate. To turn it on just say -Keepall in the command line.

Parameter –wrt

This parameter is used if user want to run multiple parawise comparsions with recpect to only specific samples, instead of for all pairwise combinations. For example if user want to compare control (cnt) with all other samples, he/she can use below command:

HOME-pairwise 	-t [CG/CHG/CHH/CHN/CNN]	 -i [sample_file_fullpath] -wrt cnt	-o [output_directorypath]

NOTE: more than one sample name can also be provided by -wrt, seperated by space

Output format

The output format is:

chr	start	end	status	numC	mean_meth1	mean_meth2	delta	Avg_coverage1	Avg_coverage2	len

Here, status refers to state of DMR (hyper/hypo). Mean_meth1 and mean_meth2 refers to mean methylation level for sample1 and 2 respectively. Delta refers to the difference in mean methylation level for two samples. Avg_coverage1 and avg_covereage2 gives the mean coverage for both samples.

DMR detection for more than two groups: time series

HOME-timeseries 	-t [CG/CHG/CHH/CHN/CNN]	-i [sample_file_fullpath]		-o [output_directorypath]

Example:

HOME-timeseries 	-t CG 	-i ./testcase/sample_file_CG.tsv	–o /outputpath

Required arguments:

-t  --type	            type of DMRs (CG /CHH/CHG/CHN/CNN) 

-i --samplefilepath Sample file containing sample names and sample paths for each replicate (TAB sep); each sample info should be in different rows 

-o  –-outputpath 		    path to the output directory

Optional arguments:

Parameter	Default	Description
-sc --scorecutoff	0.5	the score from the classifier for each C position
-npp -–numprocess	5	number of cores to be used
-ml --minlength	50	minimum length of DMRs required to be reported
-ns --numsamples	all	no.of samples to use for DMR calling; default takes all sample in the file
-sp --startposition	1st position	start position of sample in the sample file to use for timeseries DMR calling
-BSSeeker2 --BSSeeker2	False	input CGmap file from BSSeeker2
-mc --minc	4	minimum number of Cs in a DMR
-d --delta	0.1	minimum average difference in methylation required in a DMR
-sin --singlechrom	False	parallel code for single chromosome; npp will be used for parallel run for each chr
-Keepall --Keepall	False	Keep all cytosine positions present in atleast one of the replicate

Output format

chr	start	end	numC	len	max_delta	confidence_scores	Comb1-n

Here, Max_delta is the maximum average methylation difference among the compared samples. Confidence score takes into account the length, number of C’s and SVM score. The higher value denotes more confident DMR. Comb1-n denotes the pairwise comparisons for each combination of samples. It reports start: end: state: delta for each pairwise comparison.

Required tools

Troubleshooting

To stop HOME execution in middle:

cntrl+c and then
cntrl+z

Note: always delete the directories created by HOME run if stopped in middle

Error Exception: File...training_data file does not exist

Please remember to CD into HOME first before starting the run

Error while installing dependencies from requirement file: for example "Compilation failed : pyconfig.h: No such file or directory"

sudo apt-get install python-dev

Citation

If you use this software in your work, please cite our paper HOME

home's People

Contributors

Stargazers

Watchers

home's Issues

HOME_DMR error

Hi Akanksha，
I tried to used HOME to identify DMR from two samples without replicates, my command is:
HOME-pairwise -t CG -i ~/methylation/findDMR_HOME_input.txt -o ./ -md 200 --BSSeeker2 --delta 0.3 --minc 5 -npp 8
and the the .txt is:
ploidy_4 /user/zac/methylation/NovaA-210121/CGmap/4n_merge.CGmap.gz
ploidy_8 /user/zac/methylation/NovaA-210121/CGmap/8n_merge.CGmap.gz
but it returns to some error:
The output (if any) follows:

Traceback (most recent call last):
File "/gss1/home/zqq20190923/miniconda3/bin/HOME-pairwise", line 4, in
import('pkg_resources').run_script('HOME==1.0.0', 'HOME-pairwise')
File "/gss1/home/zqq20190923/miniconda3/lib/python2.7/site-packages/pkg_resources/init.py", line 666, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/gss1/home/zqq20190923/miniconda3/lib/python2.7/site-packages/pkg_resources/init.py", line 1469, in run_script
exec(script_code, namespace, namespace)
File "/gss1/home/zqq20190923/miniconda3/lib/python2.7/site-packages/HOME-1.0.0-py2.7.egg/EGG-INFO/scripts/HOME-pairwise", line 492, in

NameError: name 'status' is not defined
Temp directory at output path already exist.... please clean up and rerun
Temp directory at output path already exist.... please clean up and rerun

So could you tell me what is wrong with my HOME?
Thank you.

All Comb1-n "NA" for most DMRs in CHH

Hey, i am using HOME-timeseries with 23 Samples (no replicates) and i just noticed that most of my DMRs that got reported by HOME have "NA" in every Comb1-n column, even though max_delta and confidence_scores are reported.

An example looks like this:

chr	start	end	numC	len	max_delta	confidence_scores	HvBS001_VS_HvBS002	HvBS001_VS_HvBS004	HvBS001_VS_HvBS005	HvBS001_VS_HvBS006	HvBS001_VS_HvBS007	HvBS001_VS_HvBS008	HvBS001_VS_HvBS009	HvBS001_VS_HvBS011	HvBS001_VS_HvBS012	HvBS001_VS_HvBS013	HvBS001_VS_HvBS014	HvBS001_VS_HvBS015	HvBS001_VS_HvBS016	HvBS001_VS_HvBS017	HvBS001_VS_HvBS018	HvBS001_VS_HvBS019	HvBS001_VS_HvBS020	HvBS001_VS_HvBS021	HvBS001_VS_HvBS022	HvBS001_VS_HvBS023	HvBS001_VS_HvBS024	HvBS001_VS_HvBS048	HvBS002_VS_HvBS004	HvBS002_VS_HvBS005	HvBS002_VS_HvBS006	HvBS002_VS_HvBS007	HvBS002_VS_HvBS008	HvBS002_VS_HvBS009	HvBS002_VS_HvBS011	HvBS002_VS_HvBS012	HvBS002_VS_HvBS013	HvBS002_VS_HvBS014	HvBS002_VS_HvBS015	HvBS002_VS_HvBS016	HvBS002_VS_HvBS017	HvBS002_VS_HvBS018	HvBS002_VS_HvBS019	HvBS002_VS_HvBS020	HvBS002_VS_HvBS021	HvBS002_VS_HvBS022	HvBS002_VS_HvBS023	HvBS002_VS_HvBS024	HvBS002_VS_HvBS048	HvBS004_VS_HvBS005	HvBS004_VS_HvBS006	HvBS004_VS_HvBS007	HvBS004_VS_HvBS008	HvBS004_VS_HvBS009	HvBS004_VS_HvBS011	HvBS004_VS_HvBS012	HvBS004_VS_HvBS013	HvBS004_VS_HvBS014	HvBS004_VS_HvBS015	HvBS004_VS_HvBS016	HvBS004_VS_HvBS017	HvBS004_VS_HvBS018	HvBS004_VS_HvBS019	HvBS004_VS_HvBS020	HvBS004_VS_HvBS021	HvBS004_VS_HvBS022	HvBS004_VS_HvBS023	HvBS004_VS_HvBS024	HvBS004_VS_HvBS048	HvBS005_VS_HvBS006	HvBS005_VS_HvBS007	HvBS005_VS_HvBS008	HvBS005_VS_HvBS009	HvBS005_VS_HvBS011	HvBS005_VS_HvBS012	HvBS005_VS_HvBS013	HvBS005_VS_HvBS014	HvBS005_VS_HvBS015	HvBS005_VS_HvBS016	HvBS005_VS_HvBS017	HvBS005_VS_HvBS018	HvBS005_VS_HvBS019	HvBS005_VS_HvBS020	HvBS005_VS_HvBS021	HvBS005_VS_HvBS022	HvBS005_VS_HvBS023	HvBS005_VS_HvBS024	HvBS005_VS_HvBS048	HvBS006_VS_HvBS007	HvBS006_VS_HvBS008	HvBS006_VS_HvBS009	HvBS006_VS_HvBS011	HvBS006_VS_HvBS012	HvBS006_VS_HvBS013	HvBS006_VS_HvBS014	HvBS006_VS_HvBS015	HvBS006_VS_HvBS016	HvBS006_VS_HvBS017	HvBS006_VS_HvBS018	HvBS006_VS_HvBS019	HvBS006_VS_HvBS020	HvBS006_VS_HvBS021	HvBS006_VS_HvBS022	HvBS006_VS_HvBS023	HvBS006_VS_HvBS024	HvBS006_VS_HvBS048	HvBS007_VS_HvBS008	HvBS007_VS_HvBS009	HvBS007_VS_HvBS011	HvBS007_VS_HvBS012	HvBS007_VS_HvBS013	HvBS007_VS_HvBS014	HvBS007_VS_HvBS015	HvBS007_VS_HvBS016	HvBS007_VS_HvBS017	HvBS007_VS_HvBS018	HvBS007_VS_HvBS019	HvBS007_VS_HvBS020	HvBS007_VS_HvBS021	HvBS007_VS_HvBS022	HvBS007_VS_HvBS023	HvBS007_VS_HvBS024	HvBS007_VS_HvBS048	HvBS008_VS_HvBS009	HvBS008_VS_HvBS011	HvBS008_VS_HvBS012	HvBS008_VS_HvBS013	HvBS008_VS_HvBS014	HvBS008_VS_HvBS015	HvBS008_VS_HvBS016	HvBS008_VS_HvBS017	HvBS008_VS_HvBS018	HvBS008_VS_HvBS019	HvBS008_VS_HvBS020	HvBS008_VS_HvBS021	HvBS008_VS_HvBS022	HvBS008_VS_HvBS023	HvBS008_VS_HvBS024	HvBS008_VS_HvBS048	HvBS009_VS_HvBS011	HvBS009_VS_HvBS012	HvBS009_VS_HvBS013	HvBS009_VS_HvBS014	HvBS009_VS_HvBS015	HvBS009_VS_HvBS016	HvBS009_VS_HvBS017	HvBS009_VS_HvBS018	HvBS009_VS_HvBS019	HvBS009_VS_HvBS020	HvBS009_VS_HvBS021	HvBS009_VS_HvBS022	HvBS009_VS_HvBS023	HvBS009_VS_HvBS024	HvBS009_VS_HvBS048	HvBS011_VS_HvBS012	HvBS011_VS_HvBS013	HvBS011_VS_HvBS014	HvBS011_VS_HvBS015	HvBS011_VS_HvBS016	HvBS011_VS_HvBS017	HvBS011_VS_HvBS018	HvBS011_VS_HvBS019	HvBS011_VS_HvBS020	HvBS011_VS_HvBS021	HvBS011_VS_HvBS022	HvBS011_VS_HvBS023	HvBS011_VS_HvBS024	HvBS011_VS_HvBS048	HvBS012_VS_HvBS013	HvBS012_VS_HvBS014	HvBS012_VS_HvBS015	HvBS012_VS_HvBS016	HvBS012_VS_HvBS017	HvBS012_VS_HvBS018	HvBS012_VS_HvBS019	HvBS012_VS_HvBS020	HvBS012_VS_HvBS021	HvBS012_VS_HvBS022	HvBS012_VS_HvBS023	HvBS012_VS_HvBS024	HvBS012_VS_HvBS048	HvBS013_VS_HvBS014	HvBS013_VS_HvBS015	HvBS013_VS_HvBS016	HvBS013_VS_HvBS017	HvBS013_VS_HvBS018	HvBS013_VS_HvBS019	HvBS013_VS_HvBS020	HvBS013_VS_HvBS021	HvBS013_VS_HvBS022	HvBS013_VS_HvBS023	HvBS013_VS_HvBS024	HvBS013_VS_HvBS048	HvBS014_VS_HvBS015	HvBS014_VS_HvBS016	HvBS014_VS_HvBS017	HvBS014_VS_HvBS018	HvBS014_VS_HvBS019	HvBS014_VS_HvBS020	HvBS014_VS_HvBS021	HvBS014_VS_HvBS022	HvBS014_VS_HvBS023	HvBS014_VS_HvBS024	HvBS014_VS_HvBS048	HvBS015_VS_HvBS016	HvBS015_VS_HvBS017	HvBS015_VS_HvBS018	HvBS015_VS_HvBS019	HvBS015_VS_HvBS020	HvBS015_VS_HvBS021	HvBS015_VS_HvBS022	HvBS015_VS_HvBS023	HvBS015_VS_HvBS024	HvBS015_VS_HvBS048	HvBS016_VS_HvBS017	HvBS016_VS_HvBS018	HvBS016_VS_HvBS019	HvBS016_VS_HvBS020	HvBS016_VS_HvBS021	HvBS016_VS_HvBS022	HvBS016_VS_HvBS023	HvBS016_VS_HvBS024	HvBS016_VS_HvBS048	HvBS017_VS_HvBS018	HvBS017_VS_HvBS019	HvBS017_VS_HvBS020	HvBS017_VS_HvBS021	HvBS017_VS_HvBS022	HvBS017_VS_HvBS023	HvBS017_VS_HvBS024	HvBS017_VS_HvBS048	HvBS018_VS_HvBS019	HvBS018_VS_HvBS020	HvBS018_VS_HvBS021	HvBS018_VS_HvBS022	HvBS018_VS_HvBS023	HvBS018_VS_HvBS024	HvBS018_VS_HvBS048	HvBS019_VS_HvBS020	HvBS019_VS_HvBS021	HvBS019_VS_HvBS022	HvBS019_VS_HvBS023	HvBS019_VS_HvBS024	HvBS019_VS_HvBS048	HvBS020_VS_HvBS021	HvBS020_VS_HvBS022	HvBS020_VS_HvBS023	HvBS020_VS_HvBS024	HvBS020_VS_HvBS048	HvBS021_VS_HvBS022	HvBS021_VS_HvBS023	HvBS021_VS_HvBS024	HvBS021_VS_HvBS048	HvBS022_VS_HvBS023	HvBS022_VS_HvBS024	HvBS022_VS_HvBS048	HvBS023_VS_HvBS024	HvBS023_VS_HvBS048	HvBS024_VS_HvBS048
chr1H	26732	26836	7	104	0.16683316683316682	0.25653578517793324	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

Is this expected behavior? How can this happen?

I am happy to provide more information if necessary. In the meantime i will check if this happened also for other Contexts.
The input files are about 273M (uncompressed), so let me know how to submit them if they are needed.
Thanks alot in advance.

Segmentation fault of HOME

Hi:
I follow the instructions to install HOME software. But IT will give me segmentation fault (core dumped) error (when using HOME-pairwise or HOME-timeseries). And nothing else.

Nanopore data?

Hi, this looks interesting.

Is there any reason the tool would not work for nanopore data called by Megalodon (rerio) and modbam2bed ?

i.e. is it restricted to illumina bisulfite data?

Thanks,
Colin

SVC attribute error

Hello,

HOME-pairwise has failed with the following exception:

% HOME-pairwise -t CG -i  ../input.tsv  -o ../out -sin -npp 22
Preparing the DMRs from HOME.....
GOOD LUCK !
/wehisan/general/system/bioinf-software/bioinfsoftware/python/python-2.7.11/lib/python2.7/site-packages/statsmodels/stats/weightstats.py:575: RuntimeWarning: invalid value encountered in double_scalars
  zstat = value / std_diff
/wehisan/general/system/bioinf-software/bioinfsoftware/python/python-2.7.11/lib/python2.7/site-packages/sklearn/base.py:315: UserWarning: Trying to unpickle estimator SVC from version pre-0.18 when using version 0.18.1. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
Traceback (most recent call last):
  File "/wehisan/home/allstaff/g/gigante.s/soft/lister_home/bin/HOME-pairwise", line 4, in <module>
    __import__('pkg_resources').run_script('HOME==0.4', 'HOME-pairwise')
  File "/usr/local/bioinfsoftware/python/python-2.7.11/lib/python2.7/site-packages/pkg_resources/__init__.py", line 742, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/bioinfsoftware/python/python-2.7.11/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1510, in run_script
    exec(script_code, namespace, namespace)
  File "/wehisan/home/allstaff/g/gigante.s/soft/lister_home/lib/python2.7/site-packages/HOME-0.4-py2.7.egg/EGG-INFO/scripts/HOME-pairwise", line 452, in <module>

  File "/wehisan/home/allstaff/g/gigante.s/soft/lister_home/lib/python2.7/site-packages/HOME-0.4-py2.7.egg/EGG-INFO/scripts/HOME-pairwise", line 224, in main

Exception: 'SVC' object has no attribute 'decision_function_shape'

My input file is as follows:

$ less ../input.tsv
alt	~/path/to/data/alt.summary.home.tsv
ref	~/path/to/data/ref.summary.home.tsv

and my data files look like this

$ head ~/path/to/data/alt.summary.home.tsv
chr5    149642973       +       CG      34      42
chr9    121883727       +       CG      3       5
chr9    24213925        +       CG      1       3
chr2    180792870       +       CG      5       8
chr1    7779688 +       CG      1       8
chr9    120952088       +       CG      1       2
chr6    97035598        +       CG      2       4
chr3    127312598       +       CG      10      14
chr3    42041023        +       CG      4       7
chr7    74422202        +       CG      5       8

Am I doing something wrong?

Thanks!
Scott

HOME-pairwise for different numbers of replicate

Hi
I'm trying to use HOME for DMR analysis
I have 6 replicate for WT and 4 replicate for mutant
When I run HOME-pairwise, I got the following error message

[lee@ko44 HOME]$ ${HOME_DMR}/HOME-pairwise -t CG -npp 16 -i ${data}/HOME_DMR_sample_file_CG.txt -o ${data}/HOME_DMR_gz_out --BSSeeker2 --delta 0.2 --minc 5
Traceback (most recent call last):
File "/home/lee/NGS/sw/HOME_met/bin/HOME-pairwise", line 4, in
import('pkg_resources').run_script('HOME==1.0.0', 'HOME-pairwise')
File "/home/lee/NGS/sw/HOME_met/lib/python2.7/site-packages/pkg_resources/init.py", line 666, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/lee/NGS/sw/HOME_met/lib/python2.7/site-packages/pkg_resources/init.py", line 1469, in run_script
exec(script_code, namespace, namespace)
File "/home/lee/NGS/sw/HOME_met/lib/python2.7/site-packages/HOME-1.0.0-py2.7.egg/EGG-INFO/scripts/HOME-pairwise", line 314, in

File "/home/lee/NGS/sw/HOME_met/lib/python2.7/site-packages/pandas/io/parsers.py", line 498, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/lee/NGS/sw/HOME_met/lib/python2.7/site-packages/pandas/io/parsers.py", line 285, in _read
return parser.read()
File "/home/lee/NGS/sw/HOME_met/lib/python2.7/site-packages/pandas/io/parsers.py", line 747, in read
ret = self._engine.read(nrows)
File "/home/lee/NGS/sw/HOME_met/lib/python2.7/site-packages/pandas/io/parsers.py", line 1197, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:7988)
File "pandas/parser.pyx", line 788, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8244)
File "pandas/parser.pyx", line 842, in pandas.parser.TextReader._read_rows (pandas/parser.c:8970)
File "pandas/parser.pyx", line 829, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8838)
File "pandas/parser.pyx", line 1833, in pandas.parser.raise_parser_error (pandas/parser.c:22649)
pandas.parser.CParserError: Error tokenizing data. C error: Expected 5 fields in line 2, saw 7

I set HOME_DMR_sample_file_CG.txt as
mutant /data/path/mutant.r1.CGmap.gz data/path/mutant.r2.CGmap.gz data/path/mutant.r3.CGmap.gz data/path/mutant.r4.CGmap.gz
WT /data/path/WT.r1.CGmap.gz /data/path/WT.r2.CGmap.gz /data/path/WT.r3.CGmap.gz /data/path/WT.r4.CGmap.gz /data/path/WT.r5.CGmap.gz /data/path/WT.r6.CGmap.gz

I wonder that HOME does not support samples with different replicate numbers

Do you have any suggestion for this case?

Exception: reduce() of empty sequence with no initial value

I'm trying to run HOME, however I encounter the error below:
HOME-pairwise -t CG -i /data/WGBS/04-HOME/dom1-5.txt -o ./ex1-5-cg -d 0.4 -mc 5

Preparing the DMRs from HOME.....
GOOD LUCK !
Traceback (most recent call last):
File "/data/software/miniconda3/envs/py2.7/bin/HOME-pairwise", line 4, in
import('pkg_resources').run_script('HOME==1.0.0', 'HOME-pairwise')
File "/data/software/miniconda3/envs/py2.7/lib/python2.7/site-packages/pkg_resources/init.py", line 666, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/data/software/miniconda3/envs/py2.7/lib/python2.7/site-packages/pkg_resources/init.py", line 1469, in run_script
exec(script_code, namespace, namespace)
File "/data/software/miniconda3/envs/py2.7/lib/python2.7/site-packages/HOME-1.0.0-py2.7.egg/EGG-INFO/scripts/HOME-pairwise", line 513, in

File "/data/software/miniconda3/envs/py2.7/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
Exception: reduce() of empty sequence with no initial value

Do you have any suggestions for this?
Thanks!

HOME run does not end

Hello,

I'm having problems to run HOME, because the runs never end. It finds the mitochondrial DMRs quite quickly, but after that it does not produce any other result. I have tried it with 48h and 32 processors for 2 samples of Arabidopsis, and it always crashes because it reaches the time limit.

The command I am using is:
HOME-pairwise -t CG -i ./sample_file_CG_comparison1.txt -o ./output_comparison1_CG -npp 32

With the last version of home (0.9).

Thank you for your help,

Núria

NameError: name 'status' is not defined

Hi,

I'm trying to run HOME, HOME-pairwise -t CG -i ./neg_files.txt -o ./results/neg_2, however I encounter the error below:

Traceback (most recent call last):
File "/scratch/work/malonzm1/.conda_envs/HOME/bin/HOME-pairwise", line 4, in
import('pkg_resources').run_script('HOME==1.0.0', 'HOME-pairwise')
File "/scratch/work/malonzm1/.conda_envs/HOME/lib/python2.7/site-packages/pkg_resources/init.py", line 666, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/scratch/work/malonzm1/.conda_envs/HOME/lib/python2.7/site-packages/pkg_resources/init.py", line 1469, in run_script
exec(script_code, namespace, namespace)
File "/scratch/work/malonzm1/.conda_envs/HOME/lib/python2.7/site-packages/HOME-1.0.0-py2.7.egg/EGG-INFO/scripts/HOME-pairwise", line 492, in

NameError: name 'status' is not defined

Pls. advise. Thanks!

HOME-timeseries error

Hi,

I'm having an issue with HOME-timeseries. I'm running a comparison over four time-points with two replicates per time-point. The data is formatted like this:

chr1    10708   +       CG      3       3
chr1    10718   +       CG      3       3
chr1    10720   +       CG      3       3
chr1    10723   +       CG      3       3
chr1    10725   +       CG      3       3
chr1    10728   +       CG      3       3
chr1    10731   +       CG      3       3
chr1    10737   +       CG      2       2
chr1    12807   +       CG      1       1
chr1    13079   +       CG      4       7

This is the command I'm using:

HOME-timeseries -t CG -i /jim/HOME_analysis/meta_ct_time.txt -o /jim/HOME_analysis/output_ct_time/ --delta 0.25 --minc 5

This is the error I am getting:

Preparing the DMRs from HOME.....
GOOD LUCK !
Traceback (most recent call last):
  File "/jim/homenvir/bin/HOME-timeseries", line 4, in <module>
    __import__('pkg_resources').run_script('HOME==0.5', 'HOME-timeseries')
  File "/jim/homenvir/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 739, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/jim/homenvir/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1501, in run_script
    exec(script_code, namespace, namespace) 
  File "/jim/homenvir/local/lib/python2.7/site-packages/HOME-0.5-py2.7.egg/EGG-INFO/scripts/HOME-timeseries", line 489, in <module>

  File "/usr/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
Exception: 'numpy.ndarray' object has no attribute 'columns'

All prerequisites are the correct versions and I'm using a virtual environment. HOME-pairwise using the same data works great.

Thanks for your help,

Jim

multiprocessing range parameter

I've been trying to use this for the time-series capabilities but I get this error every time.

Traceback (most recent call last):
  File "/home/bth29393/.local/bin/HOME-timeseries", line 4, in <module>
    __import__('pkg_resources').run_script('HOME==0.7', 'HOME-timeseries')
  File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 750, in run_script
  File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 1534, in run_script
  File "/home/bth29393/.local/lib/python2.7/site-packages/HOME-0.7-py2.7.egg/EGG-INFO/scripts/HOME-timeseries", line 490, in <module>
    
  File "/usr/local/apps/eb/Python/2.7.14-foss-2018a/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
Exception: range parameter must be finite.

Looking at the code, I don't think they affects the output. But I can't be sure.

Running HOME v0.7 with correct requirements.

Push more

It would be nice to update a python3 version

training_data_nonCG.txt does not exist

Hello There,

Fairly new informatics.. sorry if it as naive question eating up a bit of your time.
I tried running home on both pairwise and time series mode for my sample at CHH and CHG context its giving me no DMrs while giving a note stating non_CG data not trained.

Am I missing something obvious. Please find the error below

cheers and many thanks,
Ranj.
ranjith.papareddy@dmn3 [~/HOME/tspool]> HOME-pairwise -t CHG -i ~/HOME/tspool/timeseries.tsv -o ~/HOME/tspool/ -npp 32
Preparing the DMRs from HOME.....
GOOD LUCK !
/net/gmi.oeaw.ac.at/software/mendel/29_04_2013/software/statsmodels/0.8.0-foss-2017a-Python-2.7.13/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/stats/weightstats.py:670: RuntimeWarning: invalid value encountered in double_scalars
zstat = value / std_diff
/net/gmi.oeaw.ac.at/software/mendel/29_04_2013/software/statsmodels/0.8.0-foss-2017a-Python-2.7.13/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/stats/weightstats.py:670: RuntimeWarning: invalid value encountered in double_scalars
zstat = value / std_diff
/net/gmi.oeaw.ac.at/software/mendel/29_04_2013/software/statsmodels/0.8.0-foss-2017a-Python-2.7.13/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/stats/weightstats.py:670: RuntimeWarning: invalid value encountered in double_scalars
zstat = value / std_diff
/net/gmi.oeaw.ac.at/software/mendel/29_04_2013/software/statsmodels/0.8.0-foss-2017a-Python-2.7.13/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/stats/weightstats.py:670: RuntimeWarning: invalid value encountered in double_scalars
zstat = value / std_diff
/net/gmi.oeaw.ac.at/software/mendel/29_04_2013/software/statsmodels/0.8.0-foss-2017a-Python-2.7.13/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/stats/weightstats.py:670: RuntimeWarning: invalid value encountered in double_scalars
zstat = value / std_diff
File /home/GMI/ranjith.papareddy/HOME/tspool/training_data/training_data_nonCG.txt does not exist

Installation Error

Hi,

I tried installing HOME and I ran into an error.
I cloned the directory as mentioned, created the environment with Python 2.7.

[juaguila@u05 ~]$ module load miniconda3-23.5.2
[juaguila@u05 ~]$ conda create -n HOME python=2.7
[juaguila@u05 ~]$ cd appz/
[juaguila@u05 appz]$ git clone https://github.com/ListerLab/HOME.git
[juaguila@u05 appz]$ cd ./HOME/

I tried runnin the installation and I got an error:
[juaguila@u05 ~]$ source activate HOME
(HOME) [juaguila@u05 HOME]$ python setup.py install

Processing statsmodels-0.6.1.zip
Writing /tmp/easy_install-rB20mf/statsmodels-0.6.1/setup.cfg
Running statsmodels-0.6.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-rB20mf/statsmodels-0.6.1/egg-dist-tmp-BFMJd_
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
package init file 'statsmodels/tsa/vector_ar/data/init.py' not found (or not a regular file)
warning: no files found matching '.pxi' anywhere in distribution
warning: no previously-included files matching '' found under directory 'build'
warning: no previously-included files matching '' found under directory 'dist'
warning: no previously-included files found matching 'docs/source/generated/'
warning: no previously-included files matching '' found under directory 'docs/build'
warning: no previously-included files matching '' found under directory 'docs/build/htmlhelp'
warning: no files found matching 'statsmodels/statsmodelsdoc.chm'
no previously-included directories found matching '/pycache'
warning: no previously-included files matching '~' found anywhere in distribution
warning: no previously-included files matching '.swp' found anywhere in distribution
warning: no previously-included files matching '.pyc' found anywhere in distribution
warning: no previously-included files matching '.pyo' found anywhere in distribution
warning: no previously-included files matching '.bak' found anywhere in distribution

File "/home/juaguila/.conda/envs/HOME/lib/python2.7/contextlib.py", line 35, in exit
self.gen.throw(type, value, traceback)
File "/home/juaguila/.conda/envs/HOME/lib/python2.7/site-packages/setuptools/sandbox.py", line 195, in setup_context
yield
File "/home/juaguila/.conda/envs/HOME/lib/python2.7/contextlib.py", line 35, in exit
self.gen.throw(type, value, traceback)
File "/home/juaguila/.conda/envs/HOME/lib/python2.7/site-packages/setuptools/sandbox.py", line 166, in save_modules
saved_exc.resume()
File "/home/juaguila/.conda/envs/HOME/lib/python2.7/site-packages/setuptools/sandbox.py", line 141, in resume
six.reraise(type, exc, self._tb)
File "/home/juaguila/.conda/envs/HOME/lib/python2.7/site-packages/setuptools/sandbox.py", line 154, in save_modules
yield saved
File "/home/juaguila/.conda/envs/HOME/lib/python2.7/site-packages/setuptools/sandbox.py", line 195, in setup_context
yield
File "/home/juaguila/.conda/envs/HOME/lib/python2.7/site-packages/setuptools/sandbox.py", line 250, in run_setup
_execfile(setup_script, ns)
File "/home/juaguila/.conda/envs/HOME/lib/python2.7/site-packages/setuptools/sandbox.py", line 45, in _execfile
exec(code, globals, locals)
File "/tmp/easy_install-0bJyLW/scikit-learn-0.16.1/setup.py", line 173, in

File "/tmp/easy_install-0bJyLW/scikit-learn-0.16.1/setup.py", line 165, in setup_package

ImportError: No module named numpy.distutils.core

You mentioned to have installed R but it doesn't seem to be anything related to R, since it mentions numpy, which is a python thing. Could you please tell what to install in R besides base R?

double scalar error

I keep getting this error running timeseries with CHH context. Just wondering if you know what is causing it or how to fix it.

RuntimeWarning: invalid value encountered in double_scalars
  zstat = value / std_diff

Also, can you update the documentation/README to say what NA means in the timeseries combn-1 columns? Current documentation says these cells have start:end:state:delta but most of mine are NA even with high confidence scores. Just trying to understand.

HOME-timeseries Exception

Hello, I am trying to run HOME-timeseries on a HPC but get the following error. I am not sure what this error means.

(homeenv) [a.ramesh@node01 data_meimp]$ Traceback (most recent call last):
  File "/data/proj2/popgen/a.ramesh/software/HOME/scripts/HOME-timeseries", line 480, in <module>
    main(dx)
  File "/data/proj2/popgen/a.ramesh/software/HOME/scripts/HOME-timeseries", line 263, in main
    raise Exception(e.message)
Exception

This is the command that I typed. I used a virtualenv as recommended.

python HOME/scripts/HOME-timeseries -t CG -i sample_file_CG.txt -o home_ouput --delta 0.2 --minc 5 -npp 1

This is file containing sample paths. It is tab seperated.

sample9783      /data/proj2/popgen/a.ramesh/projects/methylomes/arabidopsis/data_meimp/9783_home.tsv
sample9794      /data/proj2/popgen/a.ramesh/projects/methylomes/arabidopsis/data_meimp/9794_home.tsv
sample9808      /data/proj2/popgen/a.ramesh/projects/methylomes/arabidopsis/data_meimp/9808_home.tsv
sample9809      /data/proj2/popgen/a.ramesh/projects/methylomes/arabidopsis/data_meimp/9809_home.tsv
sample9810      /data/proj2/popgen/a.ramesh/projects/methylomes/arabidopsis/data_meimp/9810_home.tsv
sample9811      /data/proj2/popgen/a.ramesh/projects/methylomes/arabidopsis/data_meimp/9811_home.tsv
sample9812      /data/proj2/popgen/a.ramesh/projects/methylomes/arabidopsis/data_meimp/9812_home.tsv
sample9813      /data/proj2/popgen/a.ramesh/projects/methylomes/arabidopsis/data_meimp/9813_home.tsv
sample9814      /data/proj2/popgen/a.ramesh/projects/methylomes/arabidopsis/data_meimp/9814_home.tsv
sample9816      /data/proj2/popgen/a.ramesh/projects/methylomes/arabidopsis/data_meimp/9816_home.tsv

And this is some example data that I used

1       109     +       CG      18      18
1       110     -       CG      29      33
1       115     +       CG      17      18
1       116     -       CG      31      35
1       161     +       CG      19      22
1       162     -       CG      24      30
1       310     +       CG      8       14
1       311     -       CG      14      27
1       500     +       CG      5       12
1       501     -       CG      10      19
1       511     +       CG      7       13
1       512     -       CG      10      19
1       642     +       CG      8       10
1       643     -       CG      16      16
1       647     +       CG      9       13
1       648     -       CG      16      16
1       650     +       CG      11      14
1       651     -       CG      13      15
1       790     +       CG      9       9
1       791     -       CG      11      12

Within the output directory, I get HOME_Timeseries_DMRs (empty directory) and temp_HOME that contains

drwxrwx--- 2 a.ramesh users 131 Nov 29 12:36 sample9783_rep1
drwxrwx--- 2 a.ramesh users 131 Nov 29 12:36 sample9794_rep1
drwxrwx--- 2 a.ramesh users 131 Nov 29 12:36 sample9808_rep1
drwxrwx--- 2 a.ramesh users 131 Nov 29 12:36 sample9809_rep1
drwxrwx--- 2 a.ramesh users 131 Nov 29 12:36 sample9810_rep1
drwxrwx--- 2 a.ramesh users 131 Nov 29 12:36 sample9811_rep1
drwxrwx--- 2 a.ramesh users 131 Nov 29 12:36 sample9812_rep1
drwxrwx--- 2 a.ramesh users 131 Nov 29 12:36 sample9813_rep1
drwxrwx--- 2 a.ramesh users 131 Nov 29 12:37 sample9814_rep1
drwxrwx--- 2 a.ramesh users 131 Nov 29 12:37 sample9816_rep1
drwxrwx--- 2 a.ramesh users  10 Nov 29 12:37 chunks
-rw-rw---- 1 a.ramesh users 24M Nov 29 12:37 sample9783_format_1.txt
-rw-rw---- 1 a.ramesh users 24M Nov 29 12:37 sample9794_format_1.txt
-rw-rw---- 1 a.ramesh users 25M Nov 29 12:37 sample9808_format_1.txt
-rw-rw---- 1 a.ramesh users 24M Nov 29 12:37 sample9809_format_1.txt
-rw-rw---- 1 a.ramesh users 24M Nov 29 12:37 sample9810_format_1.txt
-rw-rw---- 1 a.ramesh users 24M Nov 29 12:37 sample9811_format_1.txt
-rw-rw---- 1 a.ramesh users 25M Nov 29 12:38 sample9812_format_1.txt
-rw-rw---- 1 a.ramesh users 24M Nov 29 12:38 sample9813_format_1.txt
-rw-rw---- 1 a.ramesh users 24M Nov 29 12:38 sample9814_format_1.txt
-rw-rw---- 1 a.ramesh users 24M Nov 29 12:38 sample9816_format_1.txt

Do you have any insight into what is going on? I get the same issue without the extra --delta 0.2 --minc 5.

I also get different error related to use -npp >1 related to pooling resources so used just 1.

Thanks!

Error when finding DMR's with scaffolded genome

Hello,

I am encountering the following error when attempting to find DMR's with a scaffolded genome:

File ".local/bin/HOME-pairwise", line 4, in
import('pkg_resources').run_script('HOME==0.6', 'HOME-pairwise')
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 748, in run_script
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 1524, in run_script
File "/.local/lib/python2.7/site-packages/HOME-0.6-py2.7.egg/EGG-INFO/scripts/HOME-pairwise", line 466, in

File "/usr/local/apps/eb/Python/2.7.14-foss-2016b/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
Exception: cannot convert the series to <type 'int'>

Any help resolving the issue would be most appreciated!

training data CG and non-CG for arabidopsis

Hi,
I was wondering whether we could use the training data available on HOME GitHub for arabidopsis methylome as well.

Cheers,
Ranj

HOME-pairwise script is missing

Hi,

setup.py in v0.8 references a HOME-pairwise files, that does not exist. see https://github.com/ListerLab/HOME/blob/0.8/setup.py

Also the versions are inconsistent:
Releases page shows 'Release 0.5' which is tagged as 0.8 in git, however setup.py in that tag shows 0.4.
This makes it very hard to provide installations on shared computer systems.

Best,
Erich

cannot open file './scripts/HOME_R.R': No such file or directory

It seems that if HOME-pairwise isn't run from the HOME installation directory, it cannot find HOME_R.R and in the end stops with
Exception: 'DataFrame' object has no attribute 'p_value'

Issue 20# Home run doesn’t end

Hi Akanksha,

Just to be sure, is HOME compatable for single replicate timeseries analysis (or pairwise) ?. I have HOME (>0.7) working well for both timeseries or pairewise when i have multi replicates but never ending job run for single replicates as Nuria mentioned(with or without -sin paramether ). Do you think is there something do with single replicate experiments. And i also see double scalar error in these cases. We use the same HPC.

Cheers,
Ranj

HOME ubale to open X server

I have installed HOME after creating an environment for python2.7 as described in the github installation instructions. Upon downloading the testdata (or trying to use my own), I get this error log while running HOME from the testdata directory:

HOME-pairwise -t CG -i ./sample_file_CG.txt -o ./CG -npp 24
/home/pgontarz/HOME_env/local/bin/HOME-pairwise: line 6: $'\nCreated on Mon Apr 16 12:15:36 2018\n\n@author: akanksha\n': command not found
import: unable to open X server ' @ error/import.c/ImportImageCommand/364. import: unable to open X server ' @ error/import.c/ImportImageCommand/364.
import: unable to open X server ' @ error/import.c/ImportImageCommand/364. from: can't read /var/mail/collections import: unable to open X server ' @ error/import.c/ImportImageCommand/364.
import: unable to open X server ' @ error/import.c/ImportImageCommand/364. import: unable to open X server ' @ error/import.c/ImportImageCommand/364.
import: unable to open X server ' @ error/import.c/ImportImageCommand/364. from: can't read /var/mail/collections import: unable to open X server ' @ error/import.c/ImportImageCommand/364.
import: unable to open X server ' @ error/import.c/ImportImageCommand/364. import: unable to open X server ' @ error/import.c/ImportImageCommand/364.
import: unable to open X server ' @ error/import.c/ImportImageCommand/364. from: can't read /var/mail/HOME from: can't read /var/mail/os.path /home/pgontarz/HOME_env/local/bin/HOME-pairwise: line 25: syntax error near unexpected token ('
/home/pgontarz/HOME_env/local/bin/HOME-pairwise: line 25: `def remEmptyDir(mypath):'

I am wondering if there is some issue with my installation of HOME? Building the virutalenv ran with no errors reported as did source <HOME_env>/bin/activate.

pgontarz@regmedhpc2:$ virtualenv -p /usr/bin/python2.7 HOME_env
Running virtualenv with interpreter /usr/bin/python2.7
New python executable in /home/pgontarz/HOME_env/bin/python2.7
Also creating executable in /home/pgontarz/HOME_env/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.
pgontarz@regmedhpc2:$ source HOME_env/bin/activate
(HOME_env) pgontarz@regmedhpc2:$ which python
/home/pgontarz/HOME_env/bin/python
(HOME_env) pgontarz@regmedhpc2:$ which pip
/home/pgontarz/HOME_env/bin/pip
(HOME_env) pgontarz@regmedhpc2:$ pip -V
pip 20.2.3 from /home/pgontarz/HOME_env/local/lib/python2.7/site-packages/pip (python 2.7)
(HOME_env) pgontarz@regmedhpc2:$ pip install git+https://github.com/ListerLab/HOME.git
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.
Collecting git+https://github.com/ListerLab/HOME.git
Cloning https://github.com/ListerLab/HOME.git to /tmp/pip-req-build-ljgoAz
Collecting numpy
Using cached numpy-1.16.6-cp27-cp27mu-manylinux1_x86_64.whl (17.0 MB)
Processing ./.cache/pip/wheels/e8/2f/e8/dcb43af3774d04ee19d01758335fc91a72c5074f8437a76dad/pandas-0.17.1-cp27-cp27mu-linux_x86_64.whl
Collecting scipy==0.16.0
Using cached scipy-0.16.0-cp27-cp27mu-manylinux1_x86_64.whl (38.2 MB)
Collecting scikit-learn==0.16.1
Using cached scikit-learn-0.16.1.tar.gz (7.3 MB)
Processing ./.cache/pip/wheels/70/29/19/37b211c5de170f745ec0875af6b3e7cf293a4e70b0491aa455/statsmodels-0.6.1-cp27-cp27mu-linux_x86_64.whl
Collecting pytz>=2011k
Using cached pytz-2020.1-py2.py3-none-any.whl (510 kB)
Collecting python-dateutil
Using cached python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
Collecting patsy
Using cached patsy-0.5.1-py2.py3-none-any.whl (231 kB)
Collecting six>=1.5
Using cached six-1.15.0-py2.py3-none-any.whl (10 kB)
Building wheels for collected packages: HOME, scikit-learn
Building wheel for HOME (setup.py) ... done
Created wheel for HOME: filename=HOME-1.0.0-py2-none-any.whl size=20971 sha256=3766aa49664dfcd6f6ee969c5500e84be7e2d9906188f1828a10a0bf54d7cd06
Stored in directory: /tmp/pip-ephem-wheel-cache-xvBrvW/wheels/33/77/39/0699ff79d381c958b6df0835f85b05d941395fe1bf6da305cd
Building wheel for scikit-learn (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /home/pgontarz/HOME_env/bin/python2.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-q3xJqc/scikit-learn/setup.py'"'"'; file='"'"'/tmp/pip-install-q3xJqc/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-lpsMHo
cwd: /tmp/pip-install-q3xJqc/scikit-learn/
Complete output (8 lines):
Partial import of sklearn during the build process.
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-q3xJqc/scikit-learn/setup.py", line 173, in
setup_package()
File "/tmp/pip-install-q3xJqc/scikit-learn/setup.py", line 165, in setup_package
from numpy.distutils.core import setup
ImportError: No module named numpy.distutils.core

ERROR: Failed building wheel for scikit-learn
Running setup.py clean for scikit-learn
Successfully built HOME
Failed to build scikit-learn
Installing collected packages: numpy, pytz, six, python-dateutil, pandas, scipy, scikit-learn, patsy, statsmodels, HOME
Running setup.py install for scikit-learn ... done
DEPRECATION: scikit-learn was installed using the legacy 'setup.py install' method, because a wheel could not be built for it. pip 21.0 will remove support for this functionality. A possible replacement is to fix the wheel build issue reported above. You can find discussion regarding this at pypa/pip#8368.
Successfully installed HOME-1.0.0 numpy-1.16.6 pandas-0.17.1 patsy-0.5.1 python-dateutil-2.8.1 pytz-2020.1 scikit-learn-0.16.1 scipy-0.16.0 six-1.15.0 statsmodels-0.6.1

sample path

Hi,
I found that the sample paths in the sample.tsv input have to be relative to the folders that HOME-pairwise creates, rather than relative to where the sample.tsv file is.
For instance if I call

HOME-pairwise -t CG -i ./wt_vs_mut.txt -o ./wt_vs_mut/

then the command creates the output directory wt_vs_mut/HOME_pairwise_DMRs/wt_VS_mut/
and the input file wt_vs_mut.txt has to be something like:
wt ../../../methylation_data_wt1.txt
mut ../../../methylation_data_mut1.txt

It would be more intuitive if the sample path had to be relative to either the sample.tsv file, or relative to where the command is called.

Index out of range error when using HOME-pairwise

Hello,

I'm having some issues getting HOME to run.

$ HOME-pairwise -t CG -i  ../input.tsv  -o ../out -sin -npp 20
Traceback (most recent call last):
  File "/wehisan/home/allstaff/g/gigante.s/soft/lister_home/bin/HOME-pairwise", line 4, in <module>
    __import__('pkg_resources').run_script('HOME==0.4', 'HOME-pairwise')
  File "/usr/local/bioinfsoftware/python/python-2.7.11/lib/python2.7/site-packages/pkg_resources/__init__.py", line 742, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/bioinfsoftware/python/python-2.7.11/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1510, in run_script
    exec(script_code, namespace, namespace)
  File "/wehisan/home/allstaff/g/gigante.s/soft/lister_home/lib/python2.7/site-packages/HOME-0.4-py2.7.egg/EGG-INFO/scripts/HOME-pairwise", line 395, in <module>

IndexError: list index out of range

My input file is as follows:

$ less ../input.tsv
alt ~/path/to/data/alt.summary.home.tsv
ref ~/path/to/data/ref.summary.home.tsv

and my data files look like this

$ head ~/path/to/data/alt.summary.home.tsv
chr5    149642973       +       CG      34      42
chr9    121883727       +       CG      3       5
chr9    24213925        +       CG      1       3
chr2    180792870       +       CG      5       8
chr1    7779688 +       CG      1       8
chr9    120952088       +       CG      1       2
chr6    97035598        +       CG      2       4
chr3    127312598       +       CG      10      14
chr3    42041023        +       CG      4       7
chr7    74422202        +       CG      5       8

The problem is the same without -sin and/or -npp. Am I doing something wrong?

Thanks!
Scott

listerlab / home Goto Github PK

home's Introduction

HOME

Usage

Required tools

Troubleshooting

Citation

home's People

Contributors

Stargazers

Watchers

Forkers

home's Issues

I tried installing HOME and I ran into an error. I cloned the directory as mentioned, created the environment with Python 2.7.

[juaguila@u05 ~]$ module load miniconda3-23.5.2 [juaguila@u05 ~]$ conda create -n HOME python=2.7 [juaguila@u05 ~]$ cd appz/ [juaguila@u05 appz]$ git clone https://github.com/ListerLab/HOME.git [juaguila@u05 appz]$ cd ./HOME/

Recommend Projects

Recommend Topics

Recommend Org

Jobs

I tried installing HOME and I ran into an error.
I cloned the directory as mentioned, created the environment with Python 2.7.

[juaguila@u05 ~]$ module load miniconda3-23.5.2
[juaguila@u05 ~]$ conda create -n HOME python=2.7
[juaguila@u05 ~]$ cd appz/
[juaguila@u05 appz]$ git clone https://github.com/ListerLab/HOME.git
[juaguila@u05 appz]$ cd ./HOME/