GithubHelp home page GithubHelp logo

Comments (7)

ChrisWu1997 avatar ChrisWu1997 commented on June 3, 2024

It looks like an incompatibility issue with CUDA, cuDNN and pytorch. Are you able to successfully run other files, like train.py or test.py?

from deepcad.

gxu-tz avatar gxu-tz commented on June 3, 2024

When I run python train.py --exp_name newDeepCAD -g 0,I got

Traceback (most recent call last):
File "train.py", line 62, in
main()
File "train.py", line 35, in main
outputs, losses = tr_agent.train_func(data)
File "/public1/tz/DeepCAD/trainer/base.py", line 118, in train_func
outputs, losses = self.forward(data)
File "/public1/tz/DeepCAD/trainer/trainerAE.py", line 27, in forward
outputs = self.net(commands, args)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/public1/tz/DeepCAD/model/autoencoder.py", line 154, in forward
z = self.encoder(commands_enc_, args_enc_)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/public1/tz/DeepCAD/model/autoencoder.py", line 74, in forward
src = self.embedding(commands, args, group_mask)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/public1/tz/DeepCAD/model/autoencoder.py", line 32, in forward
self.embed_fcn(self.arg_embed((args + 1).long()).view(S, N, -1)) # shift due to -1 PAD_VAL
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/nn/functional.py", line 1612, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

from deepcad.

gxu-tz avatar gxu-tz commented on June 3, 2024

I change the environment with CUDA11.1,cuDNN8.0.4,pytorch1.8.0,it solves the previous problem but a new one emerged.
When I run python pc2cad.py --exp_name pretrained --ae_ckpt 1000 -g 0 --pc_root /public1/tz/DeepCAD/data/pc_cad,I got

Traceback (most recent call last):
File "pc2cad.py", line 244, in
for b, data in enumerate(pbar):
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/tqdm/std.py", line 1107, in iter
for obj in iterable:
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 2.
Original Traceback (most recent call last):
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "pc2cad.py", line 186, in getitem
return self.getitem(index + 1)
File "pc2cad.py", line 186, in getitem
return self.getitem(index + 1)
File "pc2cad.py", line 186, in getitem
return self.getitem(index + 1)
File "pc2cad.py", line 183, in getitem
data_id = self.all_data[index]
IndexError: list index out of range

from deepcad.

ChrisWu1997 avatar ChrisWu1997 commented on June 3, 2024

Did you run python json2pc.py first to get all training and testing point clouds?

from deepcad.

gxu-tz avatar gxu-tz commented on June 3, 2024

When I run python json2pc.py,I got the error:

[Parallel(n_jobs=-1)]: Done 126816 tasks | elapsed: 7.0min
convert point cloud failed: 0041/00415456
convert point cloud failed: 0056/00560203
create_CAD failed: 0078/00787173
convert point cloud failed: 0077/00777387
Warning: tmp_out_00336192.stl file already exists and will be replaced
create_CAD failed: 0060/00604130
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
exception calling callback for <Future at 0x14f517023c10 state=finished raised TerminatedWorkerError>
Traceback (most recent call last):
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/parallel.py", line 340, in call
self.parallel.dispatch_next()
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/parallel.py", line 769, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/parallel.py", line 835, in dispatch_one_batch
self._dispatch(tasks)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/parallel.py", line 754, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 551, in apply_async
future = self._workers.submit(SafeFunction(func))
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py", line 160, in submit
fn, *args, **kwargs)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 1027, in submit
raise self._flags.broken
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGSEGV(-11)}
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
face_normals didn't match triangles, ignoring!
Warning: 1 face has been skipped due to null triangulation
face_normals didn't match triangles, ignoring!
Traceback (most recent call last):
File "json2pc.py", line 84, in
Parallel(n_jobs=-1, verbose=2)(delayed(process_one)(x) for x in all_data["train"])
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/parallel.py", line 1017, in call
self.retrieve()
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/parallel.py", line 909, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 562, in wrap_future_result
return future.result(timeout=timeout)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/concurrent/futures/_base.py", line 435, in result
return self.__get_result()
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/parallel.py", line 340, in call
self.parallel.dispatch_next()
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/parallel.py", line 769, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/parallel.py", line 835, in dispatch_one_batch
self._dispatch(tasks)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/parallel.py", line 754, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 551, in apply_async
future = self._workers.submit(SafeFunction(func))
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py", line 160, in submit
fn, *args, **kwargs)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 1027, in submit
raise self._flags.broken
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGSEGV(-11)}

So I didn't get all point clouds,I think perhaps it cause the last problem.

from deepcad.

ChrisWu1997 avatar ChrisWu1997 commented on June 3, 2024

This segmentation fault is caused by OpenCascade. Some cad models can not be converted to point clouds successfully. You can find those problematic data by replacing the Parallel execution with a for loop and printing out each data_id to see which one caused the problem. Then just skip it in the next run. Another quick solution is to simply give up those unprocessed data (if not too many) and to replace the following line in pc2cad.py

return self.__getitem__(index + 1)
with

return self.__getitem__(random.randint(0, self.__len__()))

from deepcad.

gxu-tz avatar gxu-tz commented on June 3, 2024

Thank you for your advice.I wrote a for loop and found the problematic data_id and I successfully run pc2cad.py.

from deepcad.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.