mennodejong1986 / sambar Goto Github PK

View Code? Open in Web Editor NEW

24.0 24.0 6.0 32.5 MB

SambaR: Snp datA Management and Basic Analyses in R

License: MIT License

TeX 46.43% C++ 53.57%

sambar's People

Contributors

Stargazers

Watchers

Forkers

idiltac altingia duydn roseannagg lyl8086 gihandjay bmillerlab

sambar's Issues

calckinship() and pairwise_relatedenss being recalculated

Hi, This isn't a big issue but it does lead to a lot of time being "wasted" by the users of your excellent software!

When I run calckinship() this is always printed out during the analysis:
"Pairwise_relatedness.txt file is not present or does not have the expected number of rows, or flag do_overwrite is set to TRUE. Creating datafile now.
This will take some time, but the next time you run the calckin() function this step will be automatically omitted and then it will be much faster."

I thought that do_overwrite must be set to FALSE as the DEFAULT based on the way this statement reads, but it turns out apparently it is not. I should have realized that sooner - but it has taken me several goes with running this function to finally get that I need to change from the default to stop rerunning this time consuming step. Part of the problem is that I actually look at your manual (very helpful) and there is no function argument "do_overwrite" choice in the calckinship() section.

Just thought it might be helpful to others to be aware of this issue.
Best,

PCAdapt do_pcadapt = FALSE not working in getpackages

Dear Menno de Jong,

First of all thanks for this amazing package and for a really great tutorial with human-like messages in the console!

I am having some issues with the error related to PCAdapt library and just noticed that a new version 4.3.0 is available since last month. I tried to modify the code for getpackages to overide this problem, as I was able to install it by myself. I must do it wrong as the getpackages function keep erasing my newest version... do you think it would be possible to update this?

Thanks in advance!

Maeva

LEAstructure error

Hi, Mennodejong
I am repeatedly getting the following error while running findstructure command;
Error in LEAstructureplot(mindemes = Kmin, maxdemes = Kmax, export = "eps", :
argument 5 matches multiple formal arguments
Can you suggest how to overcome this.
Also while loading SNP data for Selection with more than 400,000 snps, process is getting killed (with 10 GB RAM and 4 cpu core).
Thanks in advance.

Columns of genlight obj don't correspond to snps$genlightname column

Hi there,

I'm not sure why I'm getting this error, and the error says to contact a developer of SambaR.

ERROR: column names of genlight object do not correspond with snps$genlightname column. Contact developer of SambaR.

My data (.RAW + .BIM) and pop file are attached. Please let me know if you need more information.

The VCF came from iPyRAD, which I then filtered for depth, and then removed a few low quality individuals. I converted the resulting vcf to PED using vcftools, and then to RAW/BIM using Plink2 as instructed. One irregularity that I noticed was that the starting VCF (also attached) kept 6138 sites after filtering, but only 5954 sites passed filters and QC when converting to PED. Not sure if that helps in the detective work.

Please let me know if you need more information.

Thanks for your help,

Alex
filtered-vcf-sites-and-inds.bim.gz
filtered-vcf-sites-and-inds.raw.gz
popfile-filtered-vcf.txt.gz
mate-inds-sites-filtered.recode.vcf.gz

ERROR: Value of indmiss argument should be between 0 and 1.

When setting indmiss=1 and/or snpmiss=0, filterdata fails with the following error (for example for indmiss):

ERROR: Value of indmiss argument should be between 0 and 1.

I have a situation where I do not want to drop any of my individuals, so I need to set indmiss to 1. This error is due to the following code in SAMBAR_v1.0.4:

	if(indmiss>=1|indmiss<=0)
		{
		return(cat("ERROR: Value of indmiss argument should be between 0 and 1.",sep="\n"))
		}
	if(snpmiss>=1|snpmiss<=0)
		{
		return(cat("ERROR: Value of snpmiss argument should be between 0 and 1.",sep="\n"))

If modified to (i.e. the greater/less than equals are changed to greater/less than):

	if(indmiss>1|indmiss<0)
		{
		return(cat("ERROR: Value of indmiss argument should be between 0 and 1.",sep="\n"))
		}
	if(snpmiss>1|snpmiss<0)
		{
		return(cat("ERROR: Value of snpmiss argument should be between 0 and 1.",sep="\n"))
		}

Then it runs fine. Not sure if there is a reason why it should actually be greater/less than equal in this function, so thought I'd flag it as a potential issue. Thanks for this neat package!

Possible new option?

Hello, I have been using SambaR the last year and have found it very useful. So first, thanks for this software.

What I have been wondering is if it would be hard to make it possible to change the filtered data produced by SambaR back into a genlight object, so essentially a sambar2genlight option? The reason I am wondering is because I've tried to use other software to reproduce the subset of snps I have after running the filterdata() function in SambaR and I can't replicate the subset. I have tried to use the exportsambarfiles() option, but I can't get the output back into a genlight or genind format correctly - which is what I actually want.

Thank you again.

Issue with z scores for d statistics

Hello,

I receive the following error when I run d statistic analysis in the calcdistance function:

WARNING: Input dataframe does not contain a column named 'sign' for significance. Assigning all data points same pch symbol.
Error in round(mydf$D_Z, 1) :
non-numeric argument to mathematical function
In addition: Warning message:
In fttDescision(ftt, "SIGNIFICANT", TripletCombinations, dna) :

Error in round(mydf$D_Z, 1) :
non-numeric argument to mathematical function

Any help would be appreciated.

genlight to sambaR

Hi,

First of all thanks for the wonderful package! However I have an issue with converting a genlight to a sambaR object. What I do:

testvcfr <- read.vcfR("path/to/vcf/filtered.vcf", verbose = FALSE)
genl <- vcfR2genlight(testvcfr)
genlight2sambar(genlight_object = "genl" , do_confirm=TRUE)

I get the following message:

Creating inds object...
Creating snps object...

WARNING: only 1 allele type specified to the minor and/or major flag.
SambaR will assume that all SNPs have the same type of minor or major allele (by default A for major and T for minor).
This does not affect subsequent analyses, except for calculation of transition-transversion ratios and GC-content.
Still, if you have the information on allele types (e.g. 1,2,3,4, or alternatively A,C,G,T, why not provide these vectors to the minor and major flags?

ERROR: unexpected allele found in bim file. Alleles should be either 1,2,3,4 or A,C,T,G (character). You can observe the alleles by typing 'myalleles'.

What could be the reason for this behavior?

Many thanks in advance,
Pablo

Error with calcdiversity output

Hello, I've been running sambar using

source("https://github.com/mennodejong1986/SambaR/raw/master/SAMBAR_v1.08.txt")

I am copying the mypackageslist.txt output file as an attached file, though I don't see any problems there that seem to have anything to do with the error I'm getting.

My dataset imports without any issues and I can run the filterdata and findstructure commands without problems:

importdata(inputprefix="all", do_citations=FALSE, sumstatsfile=FALSE , depthfile=FALSE , samplefile="popmap.txt", colourvector=colvector1, pop_order=poporder)

filterdata(indmiss=0.9,snpmiss=0.1,min_mac=2,dohefilter=TRUE, maxprop_hefilter=0.5, min_spacing=500,nchroms=NULL,TsTvfilter=NULL, F_correct_maf=FALSE)
findstructure(Kmax=10,add_legend=TRUE,legend_pos="right",legend_cex=1, quickrun=TRUE, doBAPT=FALSE)

Problems completing analysis begin with the calcdiversity command:

calcdiversity(nrsites=NULL,legend_cex=0.5, dohwe=FALSE)
Because the flag do_SFS is set to FALSE (default), SambaR will not generate SFS-vectors.
REMINDER: if you receive an error stating 'cannot open file' (without the addition 'No such file or directory') this is likely because the file is opened in another file viewer.
If so, close the relevant file in the file viewer and try again.
Hardy Weinberg calculations...
If you receive an error after this line, set the flag dohwe to FALSE.
Plotting multilocus heterozygosity...
Column 'MLH' not present in inds dataframe. Setting it equal to inds$hetero_all.
Error in boxplot.default(split(mf[[response]], mf[-response], drop = drop, :
invalid first argument
Called from: boxplot.default(split(mf[[response]], mf[-response], drop = drop,
sep = sep, lex.order = lex.order), xlab = xlab, ylab = ylab,
add = add, ann = ann, horizontal = horizontal, ...)
Browse[1]>

When I look at the output from Browse[1]> it highlights this line:

if (0L == (n <- length(groups)))
stop("invalid first argument")

I am not sure what I need to do to fix this problem and if I try and continue running the additional commands for sambar I keep getting errors (though parts of the analysis will complete). Is there a problem that I can fix?

Thank you for any help you can offer.

mypackageslist.txt

SambaR Error: Structure command- Will not perform Mantel Tests. Error on line #11665

Hello,
I am using SambaR to analyze three different datasets that correspond to three different marine invertebrate species. All of my populations only have one (typically) geographic location (I do not have the exact long and lat co-ordinates for each individual sampled). When I run the command "findstructure(Kmax=6,add_legend=TRUE,legend_pos="bottomright",legend_cex=2,pop_order=NULL)" command on one of my datasets, I run into this error:

Creating geographical maps with admixture piecharts using LEA output...
No clear agreement between geographic locations of predefined populations and those of the inferred clusters.
Defaulting to random colour assignment.
Only 1 geographical position found for DC population. Skipping.
Only 1 geographical position found for DUR population. Skipping.
Only 1 geographical position found for MAK population. Skipping.
Only 1 geographical position found for NAIN population. Skipping.
Only 1 geographical position found for PAC population. Skipping.
Only 1 geographical position found for DC population. Skipping.
Only 1 geographical position found for DUR population. Skipping.
Only 1 geographical position found for MAK population. Skipping.
Only 1 geographical position found for NAIN population. Skipping.
Only 1 geographical position found for PAC population. Skipping. (REPEATED)
Creating mantel plot...

Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'ncol': object 'combitable' not found

I suspected this may be do to not having individual long and lat co-ordinates, but
I don't have this error on my other two data sets (run into a different error later on). I'm not sure how to fix the error since it seems to not create the combitable for this dataset and I'd really like to have the results of a Mantel test for this dataset. I will note that when running the filterdata command, I received this error:
Error in stripchart.default(split(mf[[response]], mf[-response]), dlab = dlab, :
invalid first argument

I am not sure if this would have an impact on analyses further down the road.

If you have any insight, it's much appreciated! Thanks for such a wonderful package. It's been very useful so far for exploring my data.

eror in data frame

Hi mennodejong
while running SambaR we are getiing the following error
Kindly suggest how to get rid of this?
Thanks in advance.
Error in $<-.data.frame(*tmp*, "genlightname", value = c("834340245:13:+_A", :
replacement has 337078 rows, data has 168540

Error in findstructure()

Hello, I've been getting this error after installing the new version of SambaR and unsure how to fix it. I've been running the previous version (1.05) and haven't had this error before with the same dataset. Is there a way to get the DAPC figures with the group ellipses?

Running DAPC analyses for various number of expected clusters (K).
Observe the screeplot in the 'dapc.summarystatistics'-plot to find the optimal K value (lowest point or flattening of decline).
K = 2
Number of retained PC's is below 4. Skipping dapc plot's for K=4 and higher.
Number of retained PC's is below 4. Skipping dapc plot's for K=4 and higher.
K = 3
Number of retained PC's is below 4. Skipping dapc plot's for K=4 and higher.
Number of retained PC's is below 4. Skipping dapc plot's for K=4 and higher.
K = 4
Number of retained PC's is below 4. Skipping dapc plot's for K=4 and higher.
Number of retained PC's is below 4. Skipping dapc plot's for K=4 and higher.
K = 5
Number of retained PC's is below 4. Skipping dapc plot's for K=4 and higher.
Number of retained PC's is below 4. Skipping dapc plot's for K=4 and higher.
K = 6
Number of retained PC's is below 4. Skipping dapc plot's for K=4 and higher.
Number of retained PC's is below 4. Skipping dapc plot's for K=4 and higher.
Not plotting correspondence between expected and inferred clusters because the flag 'plotheatmap' is set to FALSE.
Calculating overlap between inferred and expected clusters...
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
In addition: Warning messages:
1: In [<-.factor(*tmp*, inds$pop == my_pop, value = 1L) :
invalid factor level, NA generated
2: In [<-.factor(*tmp*, inds$pop == my_pop, value = 2L) :
invalid factor level, NA generated
3: In [<-.factor(*tmp*, inds$pop == my_pop, value = 3L) :
invalid factor level, NA generated
4: In [<-.factor(*tmp*, inds$pop == my_pop, value = 4L) :
invalid factor level, NA generated
5: In [<-.factor(*tmp*, inds$pop == my_pop, value = 5L) :
invalid factor level, NA generated

Can't figure out how to use latitude/longitude because I need to use genlight format to import my data

Hello,
First thanks for all the help so far (with the calcMLH and other problems).

Second, I have a vcf dataset that absolutely won't go into sambar except through the genlight option (I have checked with my service provider and I think the problem is with the format of vcf files output by BBTools). I can get the data into sambar using the genlight2sambar option, but I can't figure out any way to enter data like the "geofile" option I used before under the normal importdata() command into the dataset. I really need to consider lat/long with this data, so is there any way that would be possible?

Thank you!

mergepop() issue

Hi Menno,

First of all, thank you so much for this very useful and time saving package. I'm using SambaR to work with ddRADSeq short reads that are aligned to a reference genome. I managed to import the short read data as instructed in the manual.

However, when filtering the data using the command :
filterdata(indmiss=0.7,snpmiss=0.05,min_mac=2,dohefilter=TRUE,snpdepthfilter=TRUE,min_spacing=500,nchroms=NULL,silent=TRUE)

I have had several populations which have had no individuals retained after the filters. I figured the filters were applied per population instead of the read as a whole, so it may be more strict this way as I had limited individuals/population. SambaR didn't accept population with no individuals. This was an issue for me because I wish to analyse population structure with as many population as possible. One of the options SambaR mentioned was to merge populations that had no individual retained into another. I decided it may be better to merge all populations into one, so that the filters applies to the short reads as a whole instead of by population. Later on, I thought I may be able to extract PED/MAP files for this and manually add the population name before going on to the analyses.

I used the mergepop() command to merge all my populations into one population successfully. In the previous version (v1.06), I was able to merge all population into one population and was able to filter the data using the filterdata() command as above. After this, I was also able (using v1.06) to export PED/MAP files of the filtered data using exportsambarfiles().

However in v1.07 I noticed after the mergepop() command to merge all populations into one, that later on functions such as exportsambarfiles() and findstructure() returns an error that I have not seen in the previous version.

The error message I had after the command was as follow:

exportsambarfiles()
Currently only 1 population defined. Not exporting Bayesass input ('Bayesassinput.immanc.txt').
Creating input for Treemix...
Error in colnames<-(*tmp*, value = popnames) :
attempt to set 'colnames' on an object with less than two dimensions
findstructure(Kmax=6,add_legend=TRUE,legend_pos ="bottomright",legend_cex=3,symbol_size=3)
Redefining order of populations as specified by pop_order flag.
Expected population names:
1
ERROR: vector input to pop_order argument is not the same length as vector input to popnames argument.

Perhaps, when SambaR was merging populations, there was an issue in changing the vector length that wasn't observed in the previous version. Is there any way I can apply the filterdata() option to the short reads as a whole instead of by population to avoid the need to use mergepop()?

Sorry for the lengthy explanation. Hope this somewhat makes sense of the issue and thank you for your time.

Rico :)

error with output for dapc_ascore in file dapc.K.#.#.scatter.dapc.WITH_prior_popinfo.axis1vs2.dc_score.txt

Hello,
Hopefully I can explain this correctly by submitting some sample files to show what I mean. I've been trying to use SambaR to best advantage to explore structure in my very large SNP dataset - many samples >300, and many SNPs scored originally but SambaR has "identified" the most highly significant as ~1200 SNPs (setting indmiss and snpmiss as best I can based on your 2021 paper) so the computations really aren't too bad. I kept thinking I was not understanding the ascore output correctly when I looked at the dc_score file, but today I reran my analysis after removing a few populations that have "dirtier data" and I finally had the AHA moment while comparing the dc_score.txt files between the WITH_prior_popinfo and the WITHOUT_prior_popinfo between the different population datasets.

The WITHOUT_prior_popinfo files appear to have totally normal output where every variation of K# per optimum number of PCs has a value of dc_score which varies based as expected (0.0 up to >1.0 and higher as clustering gets more or less potentially meaningful in the DAPC).

But the WITH_prior_popinfo files have exactly the same dc_score in every file regardless of the K# per optimum of PCs. So there is no real way to decide if the varying K# tested is getting a better (closer to 0) or worse (>1) dc_score to evaluate the DAPC output. Worse for me, I can't really meaningfully tell if I should keep using the WITH_prior_popinfo because the dc_score supports a better DAPC analysis when I do. So far the WITH_prior_popinfo single score that I get repeatedly appears to support using the a priori population information since it is closer to 0 (about 0.5 for the weighted_meandc, so not really great) but it is the same for every K# tested. When I look at the WITHOUT_prior_pop info weighted_meandc it is higher - ranging from ~0.8 to over 1.2 - depending on the K# being tested. But I only have one WITH_prior_popinfo weighted_meandc to compare and I don't even know if it was calculated correctly for K=2 (I'm guessing the first score is being calculated and then the K value is not changing in every additional calculation for dc_score that is run - but that is just a guess).

I am finding the output from SambaR very helpful in exploring my hypothesis for my populations for structure overall, but this problem is making the DAPC part confusing. Would it be possible to find out how to fix this problem? Please let me know if I need to give you additional information to get help. Thank you!!
I am adding 5 WITH_prior_popinfo.txt files to show the problem of them being the same. I could add more, but they are all the same ....

dapc.K.2.15.scatter.dapc.WITH_prior_popinfo.axis1vs2.dc_score.txt
dapc.K.3.15.scatter.dapc.WITH_prior_popinfo.axis1vs2.dc_score.txt
dapc.K.4.15.scatter.dapc.WITH_prior_popinfo.axis1vs2.dc_score.txt
dapc.K.5.15.scatter.dapc.WITH_prior_popinfo.axis1vs2.dc_score.txt
dapc.K.6.15.scatter.dapc.WITH_prior_popinfo.axis1vs2.dc_score.txt

Data_quality folder for genlight object

Hi,

Thanks for the package! When following the manual, I found that I couldn't produce the folder "Data_quality" to examine my data for filtering. I guess it's because I directly convert a genlight object to sambaR, instead of using importdata() to import RAW/BIM files (My data comes directly from DArT and is in 2row format). Would it be possible for genlight object to have Data_quality folder as well?

fix needed for SAMBAR_v1.03.txt

line 4998 should read

if(any(snps$maf!=snps$minorcount/snps$nonmissallelecount,na.rm=TRUE))

SambaR not recognizing my .raw file

Hello,

I am trying to import my data from a .raw file. I have my directory set correctly and have the prefix set correctly, and yet SambaR is looking for a .bim file? This is not included in the manual. Can you provide some guidance? I have attached a photo below.

problems running getpackages

Hello,

I find the following error (today, because yesterday I had no problems) when loading packages:

Package poppr already installed.
Loading package poppr...
Warning: namespace ‘raster’ is not available and has been replaced
by .GlobalEnv when processing object ‘’

After this, R stop to load other packages.
Any advice about how to solve this?

Thanks in advance

optim.a.score n.sim arg

Hello,
On line 12050 of v1.06, within 'adegenet_dapc' function, 'optim.a.score' needs to specify higher n.sim than the default (n.sim = 10) e.g.
my_ascore <- optim.a.score(dapc.out, n.sim = 100)
otherwise the optimum number of PCs retained for the ascore jumps around with each run, and consequently there is a different ascore plot in each of the dapc folders. I found n.sim = 50 was ok and ascore varied by only one or two DCs, n.sim = 100 was better. This does slow down processing a bit.
Cheers,
Steph

using the selectionanalyses function

Hi there. I'm attempting to run the selection analyses function with the command:

selectionanalyses(do_meta=FALSE,do_pairwise=TRUE,export='pdf', do_fsthet=FALSE, do_pcadapt=FALSE, do_outflank=FALSE, dopiechart=FALSE)

I have to turn off a number of flags (as recommended in the comments while running) to get this to work on my machine. I get output for the first pair of populations, a few of the plots, then this error message:

Error in round(snps$HWEchi2[snps$name %in% myloci], myround) :
non-numeric argument to mathematical function

Here's the progress until then.

Sc_Sg
Creating output folder called Sc_Sg.
Running genome wide differentiation scan (GWDS)...
Using bonferroni method to correct for multiple testing.
The flag 'dothin' is set to FALSE (default). SambaR will infer neutral distribution from unthinned dataset.
Fitting exponential distribution...
Number of SNPs used to infer neutral distribution: 644806 out of 644806.
Testing which loci are outliers using Bonferroni correction...
Correcting GWDS p-values using the bonferroni method.
Found 0 outlier loci.
Creating Manhattan plot showing -ln(Fisher exact test p-values)...
Creating Manhattan plot showing -log(GWDS p-values)...
Skipping PCadapt because dopcadapt flag is set to FALSE.
Skipping piecharts because the flag 'do_piechart' is set to FALSE.
Creating Venn diagram...
No outliers. Skipping Venn diagram.
2D-plots pairwise comparison...
Executed one selection scan (not multiple selection scans). Skipping 2D plots.
ERROR: scores for OutFLANK not present in SNPs dataframe.
Writing outliers to bed files...
BED-files with outliers have been written to the selection subdirectory
Writing outlier info...
Error in round(snps$HWEchi2[snps$name %in% myloci], myround) :
non-numeric argument to mathematical function

Thanks for any help!
Dave

unable to add Bayescan output

Hi! I'm glad that I stumbled onto this all-in-one R package for population genetics analyses. The user manual is very detailed. Way to go! It'd be so convinient if eventally all the bugs will be fixed. I hope it will gain more users.

I'm currently running selection analysis and added the Bayescan output in the SambaR inputfiles directory as
./myWorkDir/SambaR_output/Inputfiles/pheno.bayescanout.fst

When I run,
selectionanalyses(do_meta=F, do_pairwise=T, do_outflank=F, do_pcadapt=T, do_fsthet=T, add_bayescan=T, export_data=T, export='pdf')

It returns,

ERROR: the inputfiles directory does not contain files with the extension 'bayescanout.fst'.

The pheno.bayescanout.fst looks like

prob log10(PO) qval alpha fst
1 0.00080016 -3.09648 0.782708 -0.00073201 0.23449
2 0.00040008 -3.3977 0.86715 0.00021504 0.23462

Any things I can do to fix it?

Thanks,
Ming

OutFLANK problems so selectionanalysis() fails

Hello,

I just found your troubleshooting comment and am now running the analysis with do_outflank=FALSE, but I think maybe that should be the default due to the following issues.

Just thought you would like to know that something is going wrong with this github repository. I tried multiple ways to load the package but it does not work. Here is the most informative error message:

devtools::install_github("whitlock/OutFLANK")
Downloading GitHub repo whitlock/OutFLANK@HEAD
Skipping 1 packages not available: qvalue
Error: Failed to install 'OutFLANK' from GitHub:
Multiple results for CFLAGS found, something is wrong.FALSE

I do have the package "qvalue" installed.

It does not seem like anyone is maintaining the repository as no answers have occurred to issues listed on the site for a couple of years at least.

ERROR: duplicate sample names found in input data (PED file)

Hi Menno,

I am getting trouble loading my input file generated from vcf via bed/map to raw/bim. Here is the error:

Reading PLINK raw format into a genlight object...

Reading loci information...

Reading and converting genotypes...
.
Building final object...

...done.

Creating snps dataframe...
WARNING: input raw/ped file contains duplicated SNP names. Adding numbers to make them unique, in order to avoid errors downstream.
Removing ':+' and ':-' from SNP names...
snps$minor not factor class.
snps$major not factor class.
Creating inds dataframe...
ERROR: duplicate sample names found in input data (PED file).
A list of the duplicated sample name(s) is saved in the vector 'myduplicates'.
Makes changes to filenames in the PED file (second column), convert to raw, and afterwards try running the importdata() function again.

myduplicates
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "1" "2" "3" "4" "5" "6" "7" "8" "9"
[20] "10" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "18" "19" "20" "1"
[39] "2" "3" "4" "5" "6" "7" "8" "9" "10" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21"
[58] "22" "23" "24" "25" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "1" "2" "3"
[77] "4" "5" "6" "7" "8" "9" "10" "11" "12" "005" "006" "008" "009" "010" "011" "012" "013" "014" "016"
[96] "001" "002" "003" "004" "005" "006" "014" "015" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11"
[115] "12" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "1" "2" "3" "4" "5" "6"
[134] "7" "8" "9" "10" "11" "12" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "002"
[153] "003" "004" "005" "006" "007" "008" "009" "010" "011" "012" "013" "014" "015" "016" "017"

I am not sure how should I deal with it as for sure I have the same numbers for samples from different pops....like XX_01, YY_01 etc. I generated the files all the same as your suggestions but only different when I put -allow-extra-chr to make the ped file. Thank you very much and I look forward to your suggestions!

Best regards,
Han

Trying to exclude some populations (which I did before). Error says to send it to you.

Hello,
I have excluded populations after importing data using the genlight2sambar() function before and it worked fine. I ran the filterdata() step first and then used the excludepop(do_exclude=c()) option to remove the populations. This is the error I get now:

excludepop(do_exclude=c("SBH","GOL","VH","EUR"))
WARNING: as of 28-10-2021, the flag do_exclude has been replaced by the flag 'popvector'.
The input vector to the popvector flag defines the names of populations to be either excluded or retained, depending on the flag 'retain' (default is FALSE, meaning: exclude).
Renaming do_exclude to popvector and continuing...
Resetting filters...
The flag 'retain' is set to FALSE, meaning that the specified populations will be excluded.
Excluding population: SBH.
Excluding population: GOL.
Excluding population: VH.
Excluding population: EUR.
Updating inds2 dataframe...
ERROR: unexpected number of rows of inds2 dataframe after adding filter1 column. Contact developer of SambaR.

I did try to use the popvector() function from the instructions in the warning in the error message ("WARNING: as of 28-10-2021, the flag do_exclude has been replaced by the flag 'popvector'.") but it returns: Error in popvector(c("SBH", "GOL", "VH", "EUR")) :
could not find function "popvector"

To try and work around this I considered using the subset_pop() and going at the problem from the other direction, so I ran a small test that seemed to work but the output couldn't be run through findstructure()

subset_pop(include_pops=c("PL-A", "PL-Gina"))
Subselecting populations. Note that this function should be executed after the filterdata() function, not before!
Done. Individuals which don't belong to populations defined in the include_pops argument, will be excluded from subsequent analyses.
This smaller dataset contains 41 retained individuals and 42349 polymorphic sites.
Don't run the filterdata() function, because this will partially undo the changes. To undo the changes, rerun the importdata() function.
findstructure(Kmax=6,add_legend=TRUE,legend_pos="right",legend_cex=0.75)
Redefining order of populations as specified by pop_order flag.
Expected population names:
PL-A, PL-Gina
ERROR: vector input to pop_order argument is not the same length as vector input to popnames argument.

I can remove the populations using another tool before I put the data in SambaR but I think that means I can't really compare the filtered data with, and then without, these populations directly when I run the remaining functions in SambaR based on what it says in the manual.

I would appreciate any help you could offer about this problem. Thank you.

Object mypopnind not found in findstructure()

Dear all,

I'm currently discovering SambaR, this seems a very great and simple method for population structure analyses!

However, I'm facing to an issue with the findstructure() function that I cannot resolve. I successed in importing and filtrate my data, respectively with importdata() and filterdata() functions.

My findstructure() function was used as following :
findstructure(Kmax=6,add_legend=TRUE,legend_pos="right",legend_cex=2,pop_order=NULL, do_subsets=FALSE)

The pipeline works and produces some plots (wahlund effect plot, pcoa plot, NJ/upgma trees...), until I enter in the DAPC analysis. There is some calculations with different K values and generates some plots, but I have finally an error because one object is not found, named "mypopnind".

Error in findstructure(Kmax = 6, add_legend = TRUE, legend_pos = "right", : object 'mypopnind' not found

Can you please explain me the origin and the possible fix of this error ?

Thanks a lot !

Asternosis

Return of a problem you solved for someone previously

Hello, I am running the new 1.08 version, but had this problem with 1.07 too. I ran vcftools and then plink to get my data in the right format. Then I got this error:
WARNING: non-defined allele (NA or 0) present in bim-file.
ERROR: encountered unexpected allele in bim file. Alleles should be either 1,2,3,4 or A,C,T,G (character). You can observe the alleles by typing 'myalleles'.

I looked in 'myalleles' and did not see anything odd. I read further in the verbose output (very helpful) and found this:

ERROR: encountered unexpected allele in bim file. Alleles should be either 1,2,3,4 or A,C,T,G (character). You can observe the alleles by typing 'myalleles'.

#As mentioned in the SambaR manual, if you convert from vcf to PED/MAP format, the first column of your map file will likely contain zero's only.
#to fix try
#cut -f2 yourfile.map | cut -f1 -d ':' > mycontigs.txt && cut -f2,3,4 yourfile.map > mymap.txt && paste mycontigs.txt mymap.txt > yourfile.map && rm mycontigs.txt mymap.txt

So, I ran the cut command and then reran plink. I did not discard the 2 output files after the first time, since my problem persisted after this attempt to fix it. I then realized I should be using 1.08 and tried again, but I get the same error (without the information about running "cut" in the output though. I would really appreciate any help you can offer.

Oh, I did try using a genlight object to input the data instead, but when I do that I can't get the populations added to "mygenlight" and I really need them.

I am including the 2 files "mycontigs.txt" and "mymap.txt". I can still see the 0s in "mymap.txt"
mycontigs.txt
mymap.txt
.

SambaR not creating snp and inds dataframe

I have followed the instructions in the manual, however, after importing my data the only object generated in the mygenlight object. Below I have pasted my exact script. What should I do to get the command to successfully create the other two dataframes? Since it did not work, should I just create a text file that lists my samples and the number of loci (however, this will likely be different than the loc.names list.)? Please advise!

#POPULATION ANALYSES
#SambaR
#datamm30

#Installing SambaR and setting working directory
source("/Users/taylorburke/Documents/MSc_Thesis/PROGRAMS/SambaR-master/SAMBAR_v1.01.txt")
getpackages(myrepos='http://cran.us.r-project.org',mylib=NULL)
setwd("~/Documents/MSc_Thesis/Data/Data")

#Import Data
importdata(inputprefix='filter6_mm30',geofile="geofile.txt",depthfile = TRUE)

Thank you,
Taylor

Problem creating RAW file

Hello,
I have a problem creating RAW file using PLINK. I have my VCF.GZ file which I successfully used to create ped and map files.
However, when using plink v1.07 in Linux (Fedora 34), I have trouble using the command from SambaR manual:
plink --file prefix –-chr-set 95 –-allow-extra-chr --make-bed --recode A --out prefix
There is no such option –-chr-set 95 and –-allow-extra-chr, see
http://zzz.bwh.harvard.edu/plink/reference.shtml#options
Moreover, the option --recode A should be --recodeA?
However, now *.raw file is created?
Could you help me with this, please?
Thank you,
Tomas

object 'NJ' of mode 'function' was not found

I am having an issue with findstructure:

Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'NJ' of mode 'function' was not found
In addition: Warning messages:
1: In bplt(at[i], wid = width[i], stats = z$stats[, i], out = z$out[z$group == :
Outlier (-Inf) in boxplot 2 is not drawn
2: In bplt(at[i], wid = width[i], stats = z$stats[, i], out = z$out[z$group == :
Outlier (-Inf) in boxplot 2 is not drawn
3: In bplt(at[i], wid = width[i], stats = z$stats[, i], out = z$out[z$group == :

When I set do_tree to FALSE then I get an error with nei:

Error in if (any(df < 0)) stop("negative entries in table") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In bplt(at[i], wid = width[i], stats = z$stats[, i], out = z$out[z$group == :
Outlier (-Inf) in boxplot 2 is not drawn
2: In bplt(at[i], wid = width[i], stats = z$stats[, i], out = z$out[z$group == :
Outlier (-Inf) in boxplot 2 is not drawn
3: In bplt(at[i], wid = width[i], stats = z$stats[, i], out = z$out[z$group == :

I have run SambaR many times for almost the exact same dataset but this started following updating R and some of the packages. Any help would be appreciated.

mennodejong1986 / sambar Goto Github PK

sambar's People

Contributors

Stargazers

Watchers

Forkers

sambar's Issues

ERROR: encountered unexpected allele in bim file. Alleles should be either 1,2,3,4 or A,C,T,G (character). You can observe the alleles by typing 'myalleles'.

Recommend Projects

Recommend Topics

Recommend Org

Jobs