Comments (21)
from deeplake.
That's great @sanchitvj. Here's a tutorial for uploading datasets using Hub that might be helpful for you!
from deeplake.
@sanchitvj No, it's not required but feel free to take a look if you ever want to understand how something is working under the hood!
from deeplake.
@sanchitvj did you take a look at the tutorial mentioned above? It has links to a couple of examples that would be helpful.
Here's an example that includes training as well, https://github.com/activeloopai/Hub/tree/master/examples/fashion-mnist.
Let me know if you have any particular doubts. I'd be happy to help.
from deeplake.
@sanchitvj sorry for getting back to you so late, somehow missed this.
The purpose of the generator class is to take a single item from a list and return a dictionary of numpy arrays. The dictionary will contain separate keys corresponding to each feature of the dataset(i.e. for images and for all the different annotations in MPII). You don't really need to go too much into how hub collections work for this.
Did you get a chance to go through the tutorial :- https://github.com/activeloopai/Hub/discussions/125?
Also, take a look at this example :-https://github.com/activeloopai/omdena-aerial/blob/master/store_omdena.py, it's a little easier to understand than the COCO example.
If it's still not clear, do join our dedicated Slack channel and we can set up a call to discuss in detail.
from deeplake.
@kristinagrig06 I would like to work on this issue, please assign me.
from deeplake.
Hi @sanchitvj ! Assigned you to this issue. Thanks for your willingness to contribute! Let me know if you have any questions! :)
from deeplake.
Hi, @sanchitvj ! Hope this finds you well. Dropping a note to check in on you an ask if you need a hand with uploading the dataset. Feel free to ask us in the GitHub Discussions (we have beta access!) or our dedicated Slack channel. Thanks a mil!
from deeplake.
I've one query, do I need to know the codebase of hub.
from deeplake.
@AbhinavTuli is there any example available on how to use the hub for loading dataset, visualize data(like what is present in the data), and training(using TensorFlow). The dataset I'm working on is challenging to use, process, and train.
from deeplake.
@AbhinavTuli Can I know what CocoGenerator class is doing? I'm facing difficulties understanding that. How the output of that class looks like. And in the COCO upload example, it's not clear because I can't see what are the outputs. I've done most of the part just want to deal with this issue of the generator. COCO upload example isn't much useful because mpii annotations is not the same as COCO. So can you guide me on how to write a generator function for this purpose and what all code files from the hub collections should I understand to get the basic idea to come over this issue?
from deeplake.
@AbhinavTuli I'm almost done. But how can I see that output is as expected? Here is my code. When I'm trying to print, this: '<hub.collections.dataset.core.Dataset object at 0x7f55ae0aac50>' is the output. So how can I check it's working correctly?
from deeplake.
Hey @sanchitvj, you can test out the code by using ds.store("./mpii"), this will store the dataset locally instead of uploading it to hub and should be much faster.
You can then load this saved dataset and try iterating over it
import hub
ds = hub.load("./mpii")
for item in ds:
print(item["data"].compute())
print(item["labels"].compute())
Just replace the keys ("data" and "labels") with your actual ones. Let me know how it goes!
from deeplake.
@AbhinavTuli This is the error coming after I'm doing ds.store(). Can you help me with it?
`0
Traceback (most recent call last):
File "", line 45, in call
ds["image"][i] = np.array(Image.open(img_path + all[i]['img_paths']))
KeyError: 0
Stack (most recent call last):
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/dist-packages/distributed/threadpoolexecutor.py", line 55, in _worker
task.run()
File "/usr/local/lib/python3.6/dist-packages/distributed/_concurrent_futures_thread.py", line 65, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.6/dist-packages/distributed/worker.py", line 3411, in apply_function
result = function(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/distributed/worker.py", line 3304, in execute_task
return func(*map(execute_task, args))
File "/usr/local/lib/python3.6/dist-packages/hub/collections/dataset/init.py", line 13, in _generate
output = generator(input)
File "", line 65, in call
logger.error(e, exc_info=e, stack_info=True)
distributed.worker - WARNING - Compute Failed
Function: execute_task
args: ((<function generate at 0x7fdbcd107f28>, <main.MPIIGenerator object at 0x7fdb0e5c02e8>, (<class 'dict'>, [['dataset', 'MPI'], ['isValidation', 0.0], ['img_paths', '003353243.jpg'], ['img_width', 1280.0], ['img_height', 720.0], ['objpos', [984.0, 97.0]], ['joint_self', [[991.0, 109.0, 0.0], [972.0, 101.0, 0.0], [1040.0, 47.0, 1.0], [1071.0, 116.0, 1.0], [999.0, 222.0, 1.0], [1033.0, 248.0, 0.0], [1056.0, 82.0, 1.0], [942.0, 96.0, 1.0], [937.583, 95.954, 1.0], [851.417, 95.046, 1.0], [962.0, 39.0, 0.0], [0.0, 0.0, 0.0], [926.0, 52.0, 1.0], [957.0, 139.0, 1.0], [980.0, 211.0, 1.0], [926.0, 257.0, 1.0]]], ['scale_provided', 2.585], ['joint_others', [[672.0, 231.0, 1.0], [677.0, 151.0, 1.0], [672.0, 12.0, 1.0], [745.0, 89.0, 0.0], [757.0, 127.0, 1.0], [651.0, 65.0, 0.0], [709.0, 51.0, 0.0], [800.0, 67.0, 0.0], [780.16, 67.863, 1.0], [865.84, 64.137, 1.0], [707.0, 94.0, 1.0], [673.0, 22.0, 1.0], [763.0, 71.0, 1.0], [837.0, 62.0, 0.0], [814.0, 140.0, 1.0], [790.0, 220.0, 1.0]]], ['scale
kwargs: {}
Exception: AttributeError("'NoneType' object has no attribute 'keys'",)`
from deeplake.
I would probably need to look at the code to help you out but seems like an issue in implementing the call function
from deeplake.
@AbhinavTuli Here is the code. And how much time do you think it will take to store this 13 GB data.
from deeplake.
@AbhinavTuli @kristinagrig06 @davidbuniat dataset is uploaded, It's visible on the app and I've loaded it and used it. Working fine, so can I send the PR now with an example code.
from deeplake.
@AbhinavTuli I've sent PR but one of the checks is failing, can you help me understand it.
from deeplake.
@sanchitvj there is a linting error with Black, if you can fix it then happy to merge! thanks for making the dataset!
from deeplake.
@davidbuniat All build checks passed.
from deeplake.
@sanchitvj awesome! once we check the dataset is working will merge the PR! Thanks for the awesome job!
from deeplake.
Related Issues (20)
- [FEATURE] Option to disable auto commit after data ingestion HOT 1
- [FEATURE] Delete multiple rows at once HOT 1
- Dataset.pop() not working as expected. HOT 9
- [FEATURE] Move directory ~/.activeloop Linux HOT 1
- [Bug] Error when Adding Documents to DeepLake Dataset - LockedException HOT 6
- [BUG] `create_tensor(exist_ok=True)` breaks for text htypes
- [BUG] Rcursion Error HOT 1
- [BUG] Langchain & Deeplake: SelfQueryRetriever Error on querying code HOT 3
- [FEATURE] Transform custom dataset to deeplake dataset/database/vectorstore conveniently using DDP HOT 5
- [BUG] Read-Only Vectorstore with GCS persistence goes stale HOT 7
- [BUG] ds.visualize not working in jupyter notebook for local dataset HOT 9
- [BUG] HOT 1
- [BUG] ds.visualize cannot work offline in jupyter notebook with local dataset HOT 7
- Not Logged in Agreement Error HOT 1
- [BUG] Can NOT run deeplake python library HOT 3
- [BUG] Filter across tensors in VectorStore Search HOT 3
- [BUG] google-auth is too old to use service account impersonation
- [BUG] paulgraham_essays cannot store to personal account
- [BUG] deeplake.util.exceptions.ReadSampleFromChunkError HOT 4
- How to export data? HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deeplake.