biocore / american-gut Goto Github PK

View Code? Open in Web Editor NEW

107.0 107.0 81.0 416.65 MB

American Gut open-access data and IPython notebooks

License: Other

Python 25.22% CSS 0.03% TeX 0.92% Shell 0.01% Jupyter Notebook 73.83%

american-gut's People

Contributors

Stargazers

Watchers

american-gut's Issues

conda_req and pip_req differences

Hi,

I'm trying to install americangut and I've stumbled upon some (to me) interesting/strange things. Is there a reason why pip_requirements.txt and conda_requirements.txt differ (e.g. one contains qiime while other doesn't, cython, IPython, pandas, ...).

We need a contributing.md document

As we start to expand analyses and contributors, we need contributing guidelines. These probably need to cover the following areas:

Requirements for testing code (i.e. separate logic and plotting, plotting does not need to be tested)
What data can be hosted on GitHub, and where alternative data should be hosted
Structure for IPython Notebooks
Whether IPython Notebooks represent a tutorial of how data was generated, or if they need to reflect all data generated.

Moving this repository to biocore

AFAIK @meganap created an organization called American Gut, should we move this repository there or to biocore. And if we are moving this to biocore, should we delete the American Gut organization as it currently has nothing of relevant to the project and might only confuse people?

test_select_gamma.py fails with ImportError

Looks like test_select_gamma.py tries to import from select_gamma.py which only exists under the scripts folder. This has to be changed such that the library code that's being tested exists under the americangut module. Additionally the test code should not import select_gamma as it is currently doing it, it should import:

from americangut.select_gamma import function_name

@amnona do you think you could take care of this?

Move ipynb/cluster_utils.py to a new repo

These utilities looks really useful for other project out from the American Gut scope.

Also, removing the IPython dependency will help to make them more general.

Remove any commas from participant's name before creating \yourname{} macro

The way the templates are written in LaTeX, having a comma in the user's name, e.g. DALE EARNHARDT, JR., makes the name slightly taller, which pushes some figures onto a second page. Deleting any commas in the name will avoid this. Please do for both gut and skin/oral pipelines.

So instead of

\def\yourname{Dale Earnhardt, Jr.}

the macro should be defined as

\def\yourname{Dale Earnhardt Jr.}

Use un-rarefied OTU table

Can we use the un-rarefied OTU table for the results? Or is there a technical reason in the processing pipeline that does not allow us to do this?

Conflicting Python resources and virtualenvs

I tried to run Daniel's notebooks tonight (11/10). The ipymd release I got (ipymd==0.1.1) required IPython 4.0 or greater. The code in mod2_pcoa.py would not run with IPython > 4.0:

Traceback (most recent call last):
  File "/Users/jwdebelius/.virtualenvs/test_env/bin/mod2_pcoa.py", line 4, in <module>
    __import__('pkg_resources').require('americangut==0.0.1')
  File "/Users/jwdebelius/.virtualenvs/test_env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3018, in <module>
    working_set = WorkingSet._build_master()
  File "/Users/jwdebelius/.virtualenvs/test_env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 614, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/Users/jwdebelius/.virtualenvs/test_env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 627, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/Users/jwdebelius/.virtualenvs/test_env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 805, in resolve
    raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: IPython<4.0.0

I did downgrade my matplotlib to 1.4, and this is a function which calls seaborn, but the error isn't related to those packages.

Generate PNG instead of PDF for per-sample-results

The per sample results script currently generates PDFs for the alpha diversity graphs. These can not be easily shown on the AG webpage. PNG results will be much easier to embed.

Creation of 18S processing notebooks

We will begin to produce 18S sequencing data for participants and need a processing pipeline to produce those results.

@ellexis will work on this

Help running module2_v1.0 notebook

Hi, I'm trying to run the ipython notebook module2_v1.0 using the test data (debug=True). I can now run the first three blocks of code without errors, but I ran into a few issues:

In the code block that sets up the path for processing (chunk 4), the file BLOOM.fasta is not in the expected location, but copying and pasting it into the created americangut_results_r1-14 folder seems to remedy this.
The 4th chunk then continues until it reaches the jobs = [] line, where it runs into the below error.

ValueError                                Traceback (most recent call last)
<ipython-input-7-960ae518d4d2> in <module>()
     21 for f in glob(os.path.join(working_dir, "*.biom.gz")):
     22     jobs.append(submit(scripts['gunzip'] % {'input': f}))
---> 23 res = wait_on(jobs)
     24 
     25 

<ipython-input-2-4e88260b8c1f> in wait_on(jobs_to_monitor, additional_prefix)
    122     sys.stdout.flush()
    123 
--> 124     running_jobs = parse_qstat()
    125     while jobs_to_monitor:
    126         sleep(POLL_INTERVAL)

<ipython-input-2-4e88260b8c1f> in parse_qstat()
     41 
     42     jobs = {}
---> 43     for id_, name, state in lines.grep(user).fields(0,3,9).fields():
     44         job_id = id_.split('.')[0]
     45         jobs[job_id] = {}

ValueError: need more than 2 values to unpack

Is there a version issue with something I've installed maybe? I installed both the pip requirements and conda requirements packages, and I'm a little lost now. I'm new to python, so any help is appreciated! I'm running this on OSX 10.11.6 with 16 gb memory and python 2.7. Thanks!

Add round 8 acc to processing notebook

ERP005651

Zeros truncated in AG data mapping file

The barcode prefix is truncated from the AG_full.txt map and the AG.txt file.

Add additional stats to taxonomy summary

A participant requested adding additional stats. We could add N (for the group), mean, and stderr. It is likely that for these numbers to make any sense, we'll need to operate off rarified data though which is not ideal for the taxonomy summaries.

Processing Notebooks

Updating the processing notebooks

Generate ovals dynamically from PCoA data

Currently the ellipses (ovals) bounding the data in the PCoA plots are drawn manually in Illustrator and positioned in Latex until they pass the eyeball test. It would be great if the ellipses could be generated dynamically from the first and second coordinates (i.e. flattening the plot into 2 dimensions). Presumably the major axis of the ellipse would coincide with the linear regression of points, and the minor axis then bisects that perpendicularly. Then it's a matter of capturing e.g. 99% of the points. It seems plausible and would look better, be more accurate, and save us time down the road.

Analysis Summary Pipeline

I'd like to create something similar to the primary processing block, #161, for analysis, with the idea that the backend could be transferred over to Bokeh or similar later.

I think the easiest way to accomplish this might be a build a data dictionary backend, that would let people operate on the metadata, and then a holding object for the classes able to interact with that object. Test could go on top.

This way, we can have light weight, individual notebooks for each analysis step, and hopefully just switch out plotting code at some point.

I see the steps as:

This may be redundant with other repositories, although I think the first step is unique to American Gut.

generate_otu_significance_tables

This is a bug. When high is of length 1, the list passed in to convert_taxa on the subsequent line is [].

@jwdebelius can you fix this ASAP please?

Template does not include barcode

Can we include the barcode somewhere in the template? Some people have multiple fecal samples, and they will have no way to know which report goes with which sample (unless they go online and compare the images themselves). Maybe we should include the barcode after or below their name (or somewhere less prominent).

Add Travis CI

We now have some test code, so it would be good to have Travis, too

HMP table does not have taxonomy

quick fix, just need to add

The way American-Gut repo was intended to be used?

Hi,

I've been struggling with American-Gut repo and the way I should use it for the past few days. If I understood correctly the repo is broken into a package ('americangut' dir) and auxiliary files. Some of these files are intended to be used by the package itself while others are for interactive sessions with ipython notebooks for example.

In (#199) @jwdebelius recommends installing the package with pip install . -e --no-deps. Therefore, americangut dir indeed was intended to be used as a package. Still, this will not install folders latex and tests from package_data since setup.py seems to be a bit mis-configured (package_data should be a part of src dir of the package).

Also, running (e.g.) 01-get_sequences_and_metadata.md will fail on study_accessions = agenv.get_study_accessions() since it calls get_repository_dir (from results_utils.py) which will strangely take a part of the full path (outside of the package dir) and will try to find 'data' and 'latex' there. Moreover, 'data' isn't even specified in the setup.py.

Therefore, I'm not quite sure how should I use the repo. Should I define PYTHONPATH to include the repo and PATH to include scripts without installing the package or should I install the package (as recommended by @jwdebelius). If I need to install it, what else do I need to adjust to make it work (PATHs, PYTHONPATHs, ...)?

Primary processing revitalized

Cannot import biom file in R

I am new to biome file and try to download the biom here for testing. I have click and save the raw file into my hardisk and tried to open it in R using phyloseq and biom package but it returns :

"Error in fromJSON(content, handler, default.size, depth, allowComments, :
invalid JSON input"

I have copied and paste an example url (https://github.com/biocore/American-Gut/blob/master/data/HMP/HMPv35_100nt.biom) to some jason validator on web but it return a error.

Would you instruct on some materials that I can start with working biom files from here? Thank you so much!

Regards,
Carol

make_phyla_plots.py does not import correctly

See here

@jwdebelius, can you fix please?

nbviewer error

The notebook isn't running properly in its binder environment.
I'm receiving this error

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-r_uybfy_/scikit-bio/

install trouble - conflicting ipython requirements

I'm trying to install americangut, but I run into this error:

$ sudo python setup.py install --prefix=/home/directory
...
Processing dependencies for americangut==0.0.1
error: ipython 3.2.3 is installed but ipython>=4.0.0 is required by set(['ipykernel'])

But when I install ipython 4.1.1 and re-run the same command, the americangut installer replaces it with ipython 3.2.3. I think this might be because IPython<4.0.0 is listed in the conda requirements.

Any ideas on how to get this installed? Thanks!

American Gut logo on FAQ does is not a link

Is it using the new files (e.g., loggedoutheader.psp) that @teravest put together?

Add notebook to create module 1

Not a pressing issue right now but would be great to have for the time when we need to re-generate these figures.

FloatingPointError in generate_otu_signficance_table.py

https://github.com/biocore/American-Gut/blob/master/americangut/generate_otu_signifigance_tables.py#L113 can throw a div by zero error. @jwdebelius, can you check on this?

TaxTree tests are failing

yoshikivazquezbaeza:American-Gut@master$ python tests/test_taxtree.py

.....F.
======================================================================
FAIL: test_sample_rare_unique (__main__.TaxTreeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_taxtree.py", line 28, in test_sample_rare_unique
    self.assertEqual(sorted(obs), exp)
AssertionError: Lists differ: [('a', None, [['k__1', 'p__x',... != [('a', None, [['k__1', 'p__x',...

First differing element 2:
('c', None, [], [])
('c', None, [['k__1', 'p__y', 'c__']], [])

  [('a',
    None,
    [['k__1', 'p__x', 'c__'], ['k__1', 'p__y', 'c__3']],
    [['k__1', 'p__x', 'c__1'], ['k__1', 'p__x', 'c__2']]),
   ('b', None, [['k__1', 'p__x', 'c__'], ['k__1', 'p__y', 'c__3']], []),
-  ('c', None, [], [])]
+  ('c', None, [['k__1', 'p__y', 'c__']], [])]

----------------------------------------------------------------------
Ran 7 tests in 0.002s

FAILED (failures=1)

Showing up as rare, which is misleading.

Mislabeled metadata

Under TYPES_OF_PLANTS, there is a single survey result that entered 28 rather than 21-30.

Mind if I can submit a PR to fix this. I'm pretty sure it is an error

test_generate_otu_significance.py fails

python test_generate_otu_signifigance.py

.F.......
======================================================================
FAIL: test_calculate_tax_rank_1 (__main__.GenerateOTUSignifiganceTablesTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_generate_otu_signifigance.py", line 245, in test_calculate_tax_rank_1
    self.assertEqual(known_high_10, test_high_10)
AssertionError: Lists differ: [['k__Bacteria; p__Proteobacte... != [['k__Bacteria; p__Proteobacte...

First differing element 0:
['k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterbacteriaceae', 0.002, 7.6e-05, 26.3158, 1.450729834568669e-22]
['k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterbacteriaceae', 0.002, 7.6e-05, 26.0, 1.4507298345686689e-22]

Diff is 671 characters long. Set self.maxDiff to None to see it.

@jwdebelius can you look into this?

biocore / american-gut Goto Github PK

american-gut's People

Contributors

Stargazers

Watchers

Forkers

american-gut's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs