GithubHelp home page GithubHelp logo

isa-tools / magetoisaconverter Goto Github PK

View Code? Open in Web Editor NEW
7.0 5.0 2.0 319 KB

Converter which can pull from ArrayExpress (by an accession number) or read local files and convert them to ISAtab. Lead by Philippe Rocca-Serra & Eamonn Maguire, University of Oxford

Home Page: http://isa-tools.org

Shell 0.30% Java 99.70%

magetoisaconverter's Introduction

magetoisaconverter's People

Contributors

eamonnmag avatar proccaserra avatar

Stargazers

 avatar  avatar e nahang avatar Alejandra Gonzalez-Beltran avatar  avatar  avatar  avatar

Watchers

 avatar  avatar James Cloos avatar Alejandra Gonzalez-Beltran avatar  avatar

Forkers

proccaserra 00mjk

magetoisaconverter's Issues

Identically named attribute columns for same node

(Originally reported in the ISAforum: https://groups.google.com/forum/?fromgroups#!topic/isaforum/ikNZH9Or570)

We found that in some study files (possibly also assay files) generated by the converter from ArrayExpress MAGE-TAB files multiple, identically named attribute columns are generated for the same node.

One example for this can be found in the study file generated for E-GEOD-16013: http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-16013

The characteristics "tissue/development stage", "time point" and "tissue" appear twice and are assigned to the source name node. It appears that the MAGE-TAB SDRF file is the source of this duplication.

Our suggested solution is to merge the duplicated columns for each node by concatenating their content.

SDRF File Not Found Error

Hello. I've been using the JAR file on my local machine, and I noticed every so often when I converted files from ArrayExpress, that I would get an error that looked like this:

log4j:WARN No appenders could be found for logger (org.isatools.magetoisatab.io.DownloadUtils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-26565/E-GEOD-26565.idf.txt
Alternative Design Tag found at: Comment[AEExperimentType] ChIP-seq transcription profiling by array
transcription profiling
DNA microarray
transcription profiling
DNA microarray
http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-26565/E-GEOD-26565.sdrf.txt
ERROR: file not found!

When looking at experiments that threw a similar error, it seems that the problem is that for these studies, the SDRF.TXT file is broken up into two files: xxx.HYB.SDRF.TXT and xxx.SEQ.SDRF.TXT.

It looks like line 56 of MAGETabObtain.java currently only allows the converter to find the SDRF file if there is a single file: sdrfUrl = "http://www.ebi.ac.uk/arrayexpress/files/" + accessionNumber + "/" + accessionNumber + ".sdrf.txt";

Could this be fixed please?

E-GEOD-4332

Hi Phillipe and Eamonn,

Hope you're well. I'm reporting a couple of bugs.

I converted E-GEOD-4332 to ISA-Tab using the MAGEtoISAconverter and got the following error when loading the files:

"error The field study protocol type is missing from the STUDY PROTOCOLS section of the investigation file"

I manually added "study protocol type" in the appropriate place which fixed my loading problem.

However, the ISA-Tab is still invalid due to incorrect mapping of the "contact" information.

Thank you,
Shannan

Assay file names incorrect

Using the converter on E-GEOD-50653, the resulting assay file name should be "a_E-GEOD-50653_GeneChip_assay.txt, but is instead output as "a_E-GEOD-50653_RNA-Seq_assay.txt", causing errors in ISA-Creator. Probably related: the file has a column named "Derived Data File" instead of "Derived Array Data File", causing load in ISA-Creator to fail as well.

clean GEO datasets

AE parser produces node name concatenating GEO accesion number to GEO sample identifiers (as in GSE5258GSM\d+).

Missing Assay Files

Hello,

I just tried out the new parser, and I've noticed that there are some studies that seem to be successfully converted (no error messages), but have missing assay files. In the investigation file, there will be multiple assay files listed in the Study Assay section, but no corresponding files exist after conversion.

Some examples are E-MTAB-857, E-GEOD-15292, E-GEOD-28833, and E-ERAD-12.

I'm not really sure why this happens.

IndexOutOfBounds Error

Hello. I've been running the JAR file on my local machine to convert some ArrayExpress Experiments and many of them had errors. When I looked at the messages sent by the compiler, most of them were the result of an IndexOutOfBounds Exception.

Most of these were thrown by line 125 of MAGETabSDRFLoader.java: Column scanName = columnOrders.remove(Utils.getIndexForValue("Scan Name", columnOrders));

When looking at the files in question, basically the "Scan Name" header didn't exist, causing the error.

I also occasionally saw the exception thrown by line 126 of MAGETabSDRFLoader.java: columnOrders.add(derivedArrayDataFileIndex - 1, scanName);

This was thrown because derivedArrayDataFileIndex had a value of -1 because the "Derived Array Data File" header was missing.

Could this be fixed please?

ArrayExpress files that didn't work with MAGE2ISA beta

Hello ISAteam

  1. I used the MAGE2ISA converter (online version, http://isatab.sourceforge.net/magetoisa/ ) and got back error messages when I tried to load the output files into Isacreator v 1.3.2

I just double clicked on the archive to unarchive it prior to attempting to load into ISAcreator (is there a better way to unarchive? I'm on a MacBookPro OSX 10.6.7).

Here are the ArrayExpress studies and error messages:

For E-GEOD-9452 and E-GEOD-13367:
error in it's definitions STUDY DESIGN DESCRIPTORS has an error in it's definition
Unrecognised assay in unknown_experiment_design_type

For E-GEOD-10714, E-GEOD-16879, E-GEOD-4183
INVALID ISATAB FILEStudy Submission Date has an error in it's definition STUDY DESIGN DESCRIPTORS has an error in it's definition

Unrecognised assay in transcription profiling by array

  1. Is there a way to attach screenshots to these issue reports?

Thank you,
Dorothy Reilly
NIBR
Cambridge MA USA
617-871-3005
[email protected]

Protocol REF names

Hello. I noticed that in the Study and Assay files, the protocol names for the Protocol REF columns are abbreviated to P--#. If I'm understanding this correctly, the names should be the same as the ones in the Investigation file. Since the names are not abbreviated in the SDRF files, could they please be left alone?

Also, sometimes there are names in the Protocol REF columns that don't exist in the Investigation file (e.g. GSM462292/1 LE 1 in E-GEOD-18588 or P-MTAB-3235 in E-MTAB-440). Is there any way to safeguard against this?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.