tuw-geo / geopathfinder Goto Github PK
View Code? Open in Web Editor NEWQuerying and searching data on the file system
License: MIT License
Querying and searching data on the file system
License: MIT License
Add travis support to run package tests
check if files that belong to multiple SmartPath() in one SmartTree(), are counted just one time in the disk usage.
The current yeoda naming convention is not alignment with our new versioning system. Please adopt the yeoda_path
, so that it takes a new input argument version
or data_version
, instead of version
and run_num
. This argument should contain everything which is version related, i.e. the software version + run number and this needs to be properly set by the workflow or the user, not geopathfinder
. This gives us also more freedom to work with other data sets having a different versioning scheme.
Please also change the 'version_run_id' field to 'version' or 'data_version' everywhere.
Often when working with file collections from geopathfinder I want to have some meta information, such as the names of the dimensions and which ones are temporal or spatial dimensions, so I can load them into my datacube in a generic way.
Since it is during file loading we have all that information, this probably would be the right place to provide a filename class or something similar containing such meta information.
Happens when simultaneous tasks attempts to create different files that should be in the same non-existent folder e.g. divided tasks on an HPC. One task will be successful, while all others will fail and give out FileExistsError when trying to create the directory.
File "..../lib/python3.7/site-packages/geopathfinder/folder_naming.py", line 133, in make_dir os.makedirs(self.directory) File "..../lib/python3.7/os.py", line 223, in makedirs os.makedirs(self.directory)
File "C:\code\sgrt\geopathfinder\geopathfinder\sgrt_naming.py", line 60, in init
super(SgrtFilename, self).__init__(fields, fields_def, ext='.tif')
TypeError: super() argument 1 must be type, not classobj'
Only integer values are currently valid for the relative orbit number. But for sgrt parameters, where no orbit number is defined in the filename ("---"), the function decode_rel_orbit() in sgrt_naming.py is failing because it tries to cast the string to int.
Remove .pytest_cache
and add it to .gitignore
Just ran into installation issues on windows due to very long file-names in the tests folder...
Removing the tests folder from the pypi sources should do the job
(and it would also significantly reduce the size of the library since tests are not required at runtime)
Collecting geopathfinder
Downloading geopathfinder-0.1.4.tar.gz (1.1 MB)
---------------------------------------- 1.1/1.1 MB 9.6 MB/s eta 0:00:00
Pip subprocess error:
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'C:\\Users\\---\\AppData\\Local\\Temp\\7\\pip-install-inlh1qw6\\geopathfinder_91a16088922149668c97e906b5085386\\tests/test_data/Sentinel-1_CSAR/IWGRDH/preprocessed/datasets/resampled/A0202/EQUI7_EU500M/E006N006T6/plia/M20160831_163321--_PLIA-----_S1AIWGRDH1--A_175_A0201_EU500M_E006N006T6.tif'
HINT: This error might have occurred since this system does not have Windows Long Path support enabled. You can find information on how to enable this at https://pip.pypa.io/warnings/enable-long-paths
python-dateutil version is fixed (python-dateutil==2.6.1
) but pandas now requires python-dateutil>=2.7.3
Is it possible to relax the version to python-dateutil>=2.6.1
or upgrade it directly?
Currently, I identify several points for improving the class logic and making the package more "pythonic".
__str__
or __add__
(e.g. adding a path to a tree) and properties like n_paths
, n_files
, or disk_usage
to better interact with an object. Especially, replace functions doing printing, e.g. print_file_register
and replace them with sth. like this https://pypi.org/project/seedir/self
, e.g. tree.filter(level, pattern='..').filter(level, pattern='..').prune(level)
and not having all these "collect" functions.get_disk_usage
or search_files_ts
)os.walk
and does not utilise parallelisation.build_smarttree
in general - a lot of list appends happen there, even after one knows the "dimensions" of paths and folders.This should just be the central issue collecting and discussing improvements or new ideas, which then can be distributed to other issues later on. Please feel free to add your ideas and thoughts - this should be considered as a brainstorming. If we come up with a specific set of tasks, we could also ask a student or a new employee to implement them.
And by the way: I did not find a package, which does already similar things - so this might be a huge benefit for the community!
Update license.txt with correct names
e.g. at src/geopathfinder/file_naming.py in Line 267 length = end - start, this can get messy.
what is "compact"? make more clear!
In regard to the yeoda_path
convention, the "logfiles" folder is at the same level as the "data_version" level at the moment. The advantage is that the level below "data_version" solely consists of sub-directories in a spatial context, e.g. different Equi7 continents.
However, in the context of job file logging under "logfiles", which are bound to a certain data version, we have the issue that they are hierarchically not connected with the different data versions anymore. This means if someone wants to move data produced with a specific version somewhere else, then it needs to be assured that the respective log files are also moved.
This issue could be solved by either:
when trying to install, it fails because of this error:
unable to create file tests/test_data/Sentinel-1_CSAR/IWGRDH/preprocessed/datasets/resam
pled/A0202/EQUI7_EU500M/E006N006T6/sig0/qlooks/Q20160831_163321--_SIG0-----_S1AIWGRDH1VVA_175_A02
01_EU500M_E006N006T6.tif
please shorten this
In order to automate data updates, some of our data packages (gldas, ecmwf_models, ...) should contain functions to go through existing data structures and determine what data (start date, end date etc.) is already stored and what data is missing. E.g. in the gldas package there are functions for that https://github.com/TUW-GEO/gldas/blob/75ca48f620c1b64d7c6246f081aaa6924834b7ff/gldas/download.py#L43 and https://github.com/TUW-GEO/gldas/blob/75ca48f620c1b64d7c6246f081aaa6924834b7ff/gldas/download.py#L119
Without looking too much into this package now, is this something that could fit here? It would be nice if I don't have to add functions as the ones above to all our packages because that would mean a lot of duplicate code.
file_num is not any more a keyword in eodr_naming, which breaks functionality. Maybe it was deleted by mistake in the latest changes?
@claxn
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.