Comments (4)
In addition we have a spreadsheet which is almost but not quite the same format as these tsv files. It'd be good to make sure the solution here is also correct for the spreadsheet (or maybe we can get rid of it?)
from idr.openmicroscopy.org.
What does
44 of 54
sets mean?
Part of this is the split between "Plates" and "Datasets". I also often have to figure it out by context. Happy to have the output format from the script be made more explicit.
What is
Bytes
, does that have to be used forSize (TB)
andSize
?
Bytes from stats.py was my first attempt at a size via SQL. It was pointed out that 1) my query was wrong and 2) it doesn't match what fs usage
was providing. Best option is likely to remove it.
What about this size?
Size in TB
is just an easier to read version of Size
And is the 25 files the # of Files?
Yes.
And how to get Targets?
This is a difficult one, and likely since Eleanor left hasn't been maintained or even defined.
But where to get Files (Million) from?
Again, this is just an easier to read version of Files
.
And how to get DB Size (GB)?
I think we have some diversity here. I'd suggest select pg_database_size('idr')
is the basis for most of the values.
In addition we have a spreadsheet which is almost but not quite the same format as these tsv files. It'd be good to make sure the solution here is also correct for the spreadsheet (or maybe we can get rid of it?)
👍 for having the solution work for both. I still use the spreadsheet, so until we have everything in one place I'd be 👎 for getting rid of it.
from idr.openmicroscopy.org.
A few additional comments,
- I think
xxx of yyy
computes the difference between the number of rows in the filepaths or plates tsv and the actual number of datasets/plates imported in the resource. I'd vote for keeping only the second value as it is the one we are reporting. - re
Bytes
, as mentioned abovestats.py
returns an estimate of the pixel volume using an OMERO query (sum(sizeXsizeYsizeZsizeCsizeT*2) currently). The known caveats are the pixel type and resolution handling and it returns the bytes size of the fact is returns an uncompressed full-resolution 5D volume which likelyexplains the huge diff with the current value. I would stick to havingSize
reporting the file size on disk of the raw data imported into the resource i.e. the output ofomero fs usage
. Proposing to removeBytes
fromstats.py
to reduce the confusion. Maybe renameSize
asRaw data size
to be explicit?
Re Targets
, this is a metric that is quite valuable but cannot simply be queried for the reasons described above as it requires some knowledge on the study itself. Given it has not been maintained for a while, happy to discuss removing it from the maintained stats format for now until we properly get back to it.
Re csv vs spreadhseet, I am pretty sure the headers were matching when I created the tsv files. If that's not the case, I am all for re-aligning it as it should work as cut-n-paste
Proposed actions:
- review and agree on the column names and definitions of
studies.tsv/releases.tsv
and the spreadsheet. Candidate to discuss:Targets
,Size
,Files
anything else? - review and adjust
stats.py
to produce an output matching the decisions above and which can be used directly and effectively for filling the studies rows in the TSV/spreadsheet. Can we include the output fromomero fs usage
and the average dimension calculation to the output? Can we simply generate the stats for one study (which might reduce the generation time(? - do we need
stats.py
or another script to createreleases.tsv
fromstudies.tsv
with the extra information (database size) ? or work from the spreadsheet?
from idr.openmicroscopy.org.
I think IDR/idr-utils#16 addresses most of the issues raised above related to studies.tsv
.
For releases.tsv
, I think most of the columns can be computed from the studies.tsv
except for the release date and the database size. I am erring on the side of a separate small script that will do this calculation and take the additional values as input parameters. Or a subcommand of stats.py
.
from idr.openmicroscopy.org.
Related Issues (14)
- ImJoy on ITR HOT 1
- Logos on ITR
- Update EuroBioimaging logo
- Clarify help desk
- Expose notebooks
- List all screen IDs via API HOT 2
- List image magnification in IDR API HOT 2
- Irregular Download Path Usage HOT 6
- Download of studies with screen files
- Using CH5 Files in Python HOT 5
- what is the meaning of 5D images, dimension, timepoint for image details? HOT 3
- Inconsistencies in JSON inidices HOT 2
- Unable to Download Data with Aspera HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from idr.openmicroscopy.org.