GithubHelp home page GithubHelp logo

aipixel / gps-gaussian Goto Github PK

View Code? Open in Web Editor NEW
490.0 27.0 29.0 76 KB

[CVPR 2024 Highlight] The official repo for “GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis”

Home Page: https://shunyuanzheng.github.io/GPS-Gaussian

License: MIT License

Python 100.00%

gps-gaussian's Introduction

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Shunyuan Zheng†,1, Boyao Zhou2, Ruizhi Shao2, Boning Liu2, Shengping Zhang*,1,3, Liqiang Nie1, Yebin Liu2

1Harbin Institute of Technology   2Tsinghua Univserity   3Peng Cheng Laboratory
*Corresponding author   Work done during an internship at Tsinghua Univserity

Introduction

We propose GPS-Gaussian, a generalizable pixel-wise 3D Gaussian representation for synthesizing novel views of any unseen characters instantly without any fine-tuning or optimization.

multi_person_live.mp4

Installation

To deploy and run GPS-Gaussian, run the following scripts:

conda env create --file environment.yml
conda activate gps_gaussian

Then, compile the diff-gaussian-rasterization in 3DGS repository:

git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive
cd gaussian-splatting/
pip install -e submodules/diff-gaussian-rasterization
cd ..

(optinal) RAFT-Stereo provides a faster CUDA implementation of the correlation sampler to speed up the model without impacting performance:

git clone https://github.com/princeton-vl/RAFT-Stereo.git
cd RAFT-Stereo/sampler && python setup.py install && cd ../..

If compiled this CUDA implementation, set corr_implementation='reg_cuda' in config/stereo_human_config.py else corr_implementation='reg'.

Run on synthetic human dataset

Dataset Preparation

  • We provide rendered THuman2.0 dataset for GPS-Gaussian training in 16-camera setting, download render_data from Baidu Netdisk or OneDrive and unzip it. Since we recommend rectifying the source images and determining the disparity in an offline manner, the saved files and the downloaded data necessity around 50GB of free storage space.
  • To train a more robust model, we recommend collecting more human scans for training (e.g. Twindom, Render People, 2K2K). Then, render the training data as the target scenario, including the number of cameras and the radius of the scene. We provide the rendering code to generate training data from human scans, see data documentation for more details.

Training

Note: At the first training time, we do stereo rectify and determine the disparity offline, the processed data will be saved at render_data/rectified_local. This process takes several hours and can extremely speed up the following training scheme. If you want to skip this pre-processing, set use_processed_data=False in stage1.yaml and stage2.yaml.

  • Stage1: pretrain the depth prediction model. Set data_root in stage1.yaml to the path of unzipped folder render_data.
python train_stage1.py
  • Stage2: train the full model. Set data_root in stage2.yaml to the path of unzipped folder render_data, and set the correct pretrained stage1 model path stage1_ckpt in stage2.yaml
python train_stage2.py
  • We provide the pretrained model GPS-GS_stage2_final.pth in Baidu Netdisk and OneDrive for fast evaluation and testing.

Testing

  • Real-world data: download the test data real_data from Baidu Netdisk or OneDrive. Then, run the following code for synthesizing a fixed novel view between src_view 0 and 1, the position of novel viewpoint between source views is adjusted with a ratio ranging from 0 to 1.
python test_real_data.py \
--test_data_root 'PATH/TO/REAL_DATA' \
--ckpt_path 'PATH/TO/GPS-GS_stage2_final.pth' \
--src_view 0 1 \
--ratio=0.5
  • Freeview rendering: run the following code to interpolate freeview between source views, and modify the novel_view_nums to set a specific number of novel viewpoints.
python test_view_interp.py \
--test_data_root 'PATH/TO/RENDER_DATA/val' \
--ckpt_path 'PATH/TO/GPS-GS_stage2_final.pth' \
--novel_view_nums 5

Citation

If you find this code useful for your research, please consider citing:

@inproceedings{zheng2024gpsgaussian,
  title={GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis},
  author={Zheng, Shunyuan and Zhou, Boyao and Shao, Ruizhi and Liu, Boning and Zhang, Shengping and Nie, Liqiang and Liu, Yebin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

gps-gaussian's People

Contributors

shunyuanzheng avatar

Stargazers

dc avatar  avatar  avatar Yujie Guo avatar 林杰 avatar Shrinidhi Bhat avatar Yu Jingrui avatar  avatar  avatar  avatar Qi Sun 孙启 avatar Xinyu Jiang avatar Kevin Bjorke avatar  avatar  avatar  avatar 최연우(Yonwoo Choi) avatar Cao Rui avatar  avatar Muyao Niu avatar  avatar  avatar zekangzhou avatar  avatar  avatar 某科学的苏打汽水 avatar Yifan LIU avatar Jack min avatar Zijian He avatar Zha JW avatar  avatar Gong Rui avatar Xiaoyuan Wang avatar Fudong Ge avatar stao avatar Khoa Nguyen-Tuan avatar  avatar Wastoon avatar Sofian Hadiwijaya avatar  avatar Cynicism  avatar  avatar  avatar Pulkit gera avatar 罗宏昆 avatar Xiong Lin avatar 管董先知 avatar  avatar tsuiusi avatar Seokju Yun avatar  avatar Jun avatar binghe avatar  avatar  avatar Pablo Vela avatar  avatar Kaiqiang Xiong avatar Xiaobing Han avatar Hiroki Kawauchi avatar Dianyi Yang avatar Charles Javerliat avatar jiayihaung avatar Yujian Zhang avatar jtj avatar frq_22 avatar  avatar Ruijie Zhu avatar Jing Wen avatar Chenyangguang (Cyrus) Zhang avatar  avatar LI Kezhou avatar  avatar Wongi Park avatar  avatar  avatar zhang avatar Zixing Zhao avatar 宋秉一 avatar Chiau-KangChi avatar Tony avatar  avatar Dogyoon Lee avatar Changjiang Cai avatar Vladislav avatar  avatar  avatar  avatar  avatar Xiaohang Yang avatar Mr.Indifferent avatar TeaWhite avatar JIAQI LI avatar  avatar Ni Lixia avatar MichaelYu avatar Weijie Wang avatar Zhenying Fang avatar Ye Yukang avatar  avatar

Watchers

fred monroe avatar Art A. avatar doctorimage avatar  avatar  avatar So Okawara avatar fingerx avatar  avatar PeterZs avatar Paragoner avatar  avatar signal processing fan avatar smart avatar Jacob Peddicord avatar laignjun avatar Deepak Mangla avatar Guile Lindroth avatar  avatar peter avatar Xiao Pan  avatar hiyyg avatar  avatar wangyi avatar Liu Lijuan avatar Francesco Fugazzi avatar Inferencer avatar  avatar

gps-gaussian's Issues

关于渲染实时性

作者您好,我使用以下代码对渲染实时性进行了测试,但是结果只有4.fps,远没有论文列出的25fps这么高,不知道是我的操作有问题还是因为卡不一样呢?

        fps = []
        for idx in tqdm(range(total_frames)):
            item = self.dataset.get_test_item(idx, source_id=view_select)
            data = self.fetch_data(item)

            data = get_novel_calib(data, self.cfg.dataset, ratio=ratio, intr_key='intr_ori', extr_key='extr_ori')
            start_time = time.time() 
            with torch.no_grad():
                data, _, _ = self.model(data, is_train=False) 
                data = pts2render(data, bg_color=self.cfg.dataset.bg_color)
            end_time = time.time()  
            synthesis_time = end_time - start_time
            fps.append(1/synthesis_time)
            render_novel = self.tensor2np(data['novel_view']['img_pred'])
            
            cv2.imwrite(test_path  + '/%s_novel.jpg' % (data['name']), render_novel)

        print(sum(fps)/ len(fps))

Exporting the Gaussians

I am trying to export the Gaussians in a ply file. When I open the ply file with a Gaussian viewer (SuperSplat), the Gaussians are too big, as shown in the following image showing the partial reconstruction of a person (the blue dots are at the center of the Gaussians):
image

The problem seems to be the scale parameters of the Gaussians. If I set them to -5, I get the following result:
image

TypeError: rasterize_gaussians(): incompatible function arguments. The following argument types are supported:

Is there anybody sattle this bug? In other 3dgs project, i never meet this bug, only in [GPS-Gaussian]. how can i solve it?

TypeError: rasterize_gaussians(): incompatible function arguments. The following argument types are supported:
1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: torch.Tensor, arg5: torch.Tensor, arg6: torch.Tensor, arg7: torch.Tensor, arg8: float, arg9: torch.Tensor, arg10: torch.Tensor, arg11: torch.Tensor, arg12: float, arg13: float, arg14: int, arg15: int, arg16: torch.Tensor, arg17: int, arg18: torch.Tensor, arg19: bool, arg20: bool) -> Tuple[int, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]

Training with Multiple GPUs

Hi @ShunyuanZheng , thanks for sharing the code. Amazing work! Do you plan to release the code of training on multiple gpus because training on a single gpu will take a lot of time. I tried to add the data parallelism but bugs appear.

I would appreciate it if you could provide some comments or suggestions for faster training.

Regarding Camera Coordinate

Hi,

Apologies for reaching out again. Could you please confirm if the coordinates used are in the OpenCV format?

Thank you!

Two-camera setup for inference?

Hello,

If I train with an eight-camera setup, and then test using a just two-camera setup, would that be feasible, assuming that the novel viewpoint always lies in between the two cameras? It seems like the inference code only needs two neighboring views?

Thanks!

Question abount training on 8 camera setting.

Thank you for sharing the code and pre-processed dataset.

They are very helpful for my research.

But i have the question about training on 8 camera setting.

I got the pre-processed dataset from here and merged two views (camera) folder into a folder for converting dataset of 16 views into 8 views.
(ex> 0004_000, 0004_001 (16 views) --> 0004_000 (8 views)
But the result was bad when using the dataset of 8 view. (Especially, depth estimation module (stage 1))

Is there something to do when merging the folder?

Thank you.

Regarding multi GPU training

I apologize for reaching out again, but I've encountered an error in the diff-gaussian-rasterization while using multiple GPUs(Vanilla Torch DDP). It appears that the rasterizer's device may not be correctly configured, resulting in an illegal memory access error on all ranks except rank 0.

Have you experienced this issue before? If so, could you share how you managed to resolve it?

Thank you for your time and assistance!

Error occurs on other data sets.

Hi! Great work! The following error occurs when I run on other data sets. Do you have any suggestions for this. Thank you!
File "/data4/hule/CODE/GS/GPS-Gaussian/gaussian_renderer/init.py", line 62, in render
cov3D_precomp=None)
File "/data4/hule/opt/anaconda3/envs/gs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data4/hule/opt/anaconda3/envs/gs/lib/python3.7/site-packages/diff_gaussian_rasterization/init.py", line 219, in forward
raster_settings,
File "/data4/hule/opt/anaconda3/envs/gs/lib/python3.7/site-packages/diff_gaussian_rasterization/init.py", line 41, in rasterize_gaussians
raster_settings,
File "/data4/hule/opt/anaconda3/envs/gs/lib/python3.7/site-packages/diff_gaussian_rasterization/init.py", line 90, in forward
raise ex
File "/data4/hule/opt/anaconda3/envs/gs/lib/python3.7/site-packages/diff_gaussian_rasterization/init.py", line 86, in forward
num_rendered, color, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args)
RuntimeError: an illegal memory access was encountered

Question about image matting.

Hello, I saw that your image matting algorithm is very impressive, with high real-time performance and high accuracy. Could you please tell me what algorithm you used?

How to deal with sideways camera shots for new view generation?

First of all, thank you for your excellent work. I am having some problems debugging your code, specifically due to the field of view limitation, where we rotate the camera 90 degrees and place it so that the human body is sideways in the image. At this point it seems that effective new viewpoint synthesis is not possible, is this because the training dataset is all positive, and where do I set this up so that it can work effectively.

Poor results when using pretrained model in sparse view setting

Hello, thanks for your wonderful work.

We employ the pretrained model offered in BaiduNetDisk without finetuning to predict in a sparse view setting (67.5°).
Input views:
0
1

Interpolated target view
0000_000_novel00
The results are much worse than the sparse view experiment shown in the paper.
The problem is similar to #32 where Gaussian regression network fails to learn how to combine the Gaussian from the left/right views.

It seems that the pretrained model cannot be used directly in sparse settings.
If so, will finetuing on sparse view datasets alleviate the problem?

Thanks for your time!

code release?

Thanks for sharing such wonderful work, and I wonder to know when you will release the training code.

IndexError: too many indices for tensor of dimension 4

Hi!

I tried your code. but error occurred.
Could you help me?
Please give me advice.

$ python train_stage1.py 2023-12-27 09:32:44,487 INFO [human_loader.py:133] Using local data in /home/shi3z/git/GPS-Gaussian/render_data/rectified_local/train ... 2023-12-27 09:32:44,583 INFO [human_loader.py:133] Using local data in /home/shi3z/git/GPS-Gaussian/render_data/rectified_local/val ... 0%| | 0/40000 [00:14<?, ?it/s] Traceback (most recent call last): File "/home/shi3z/git/GPS-Gaussian/train_stage1.py", line 175, in <module> trainer.train() File "/home/shi3z/git/GPS-Gaussian/train_stage1.py", line 56, in train _, flow_loss, metrics = self.model(data) File "/home/shi3z/.pyenv/versions/miniconda3-4.7.12/envs/gps_gaussian/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/shi3z/git/GPS-Gaussian/lib/network.py", line 37, in forward flow_loss, metrics = sequence_loss(flow_predictions, flow, valid) File "/home/shi3z/git/GPS-Gaussian/lib/loss.py", line 15, in sequence_loss assert not torch.isinf(flow_gt[valid.bool()]).any() IndexError: too many indices for tensor of dimension 4

3DoF or 6DoF?

thanks for the nice work!

In my understanding, GPS-Gaussian leverage two adjacent views to synthesize novel view, if we want a view that is far from all source views, will GPS-Gaussian still work?

Confusion About Pixel Coordinates

I am a little confused about whether pixel coordinates represent the pixel centers or the pixel corners.

  • If the pixel coordinates represent pixel corners, rescaling image is simply multiplying the first two rows of the intrinsic matrix, as done in human_loader.py
  • But looking at the implementation of depth2pts here shows that pixel coordinates represent the pixel centers. In this case, the pixel coordinates need to be transformed another way, as shown in this Stackoverflow Post

I'd appreciate it if you could explain which is the case here.

Best,

question about projection_matrix function

Thank you for your excellent work! What does this function do? What do the znear and zfar parameters mean? Do I need to change the parameter values when using the custom data set?

projection_matrix = getProjectionMatrix(znear=self.opt.znear, zfar=self.opt.zfar, K=intr, h=height, w=width).transpose(0, 1)

Exception in thread Thread-1 (_pin_memory_loop)

After printing "FINISHED TRAINING" on the stage2, the following exception is raised:

Exception in thread Thread-1 (_pin_memory_loop): Traceback (most recent call last): File "/home/cyz/miniconda3/envs/gs/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/home/cyz/miniconda3/envs/gs/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/home/cyz/miniconda3/envs/gs/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 49, in _pin_memory_loop do_one_step() File "/home/cyz/miniconda3/envs/gs/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 26, in do_one_step r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL) File "/home/cyz/miniconda3/envs/gs/lib/python3.10/multiprocessing/queues.py", line 122, in get

Have you encountered this problem?

Code finish or not?

I wonder whether the code is complete or not. If this is not complete, could u release the TODO list? Thanks a lot

about batchsize

Hello, thank you for your outstanding work! My GPU memory is limited, and during training, I set the batch size to 1. Why is the result blurry compared to others with batch size=2 under the same configuration?
When batch size=2, the first record‘s round of validation in stage 2 is clear, but when batch size=1 val, each record's round is blurry. Continuing to increase the number of rounds does not seem to improve. Is it necessary to require batch size>=2?

Depth map based on THuman2.0

Thank you for your great work. I would like to ask a question. When generating a depth map based on THuman2.0, I set the radius to 0.8m. At this time, the depth value at a closer position will be wrong, but the RGB map is correct. What is the reason for this situation? How should I handle this? Thanks.

Failure cases in avatar data

Hi, thanks for your work! When I am using the default setting on avatar, I saw some failure cases like:

Fail2
Fail1

It seems the Gaussian regression network fails to learn how to combine the Gaussian from the left/right views. Also, the stage 2 loss seems didn't converge well.
image

I also am not sure whether the stage 1 result is good enough; the final validation EPE is around 1.767.
image

Thanks for your time!

scale head

hello. Excellent work. I noticed that you truncated the scale, as shown in the following code, why do this operation?

scale head

scale_out = torch.clamp_max(self.scale_head(out), 0.01)

如何自建数据集?

Thks for your excellent work!!!
作者您好,我想问一下:

  1. 如何把自己的数据集转换为 GPS-Gaussian 训练的格式呢?有没有文档呢?

自己数据集的格式:

|-image
    |- 00 # cam
        |- 0000.jpg # frame
        |- ...
|- extri.yaml
|- intri.yaml
  1. 如果想在自己的数据集上测试的话,格式是不是要和 real_data.zip 中的所示一致呢?real_data/parm/0001 每一帧下面都有16个相机的内外参,我理解的是相机是固定的,不是可以共用一组内外参吗?为什么还要重复呢?

关于GPS中raft-stereo深度估计算法的疑问

作者您好,关于GPS中使用到的raft-stereo算法与其原始版本的区别,我有些疑问想请教。

关于fmap1和fmap2的cat问题:看源码和原始raft-stereo的主要区别在于fmap,在进入FlowUpdateModule前,进行了cat,按照左图1右图2的先后顺序分别得到fmap12和fmap21。

  1. 原始raft-stereo输入是两图,输出是左图的flow,所以不用concat,所以进入FlowUpdateModule的就是fmap1和fmap2?
  2. 而GPS中输入两图,希望同时获得这两张图的flow,所以做了cat处理,输入FlowUpdateModule的是fmap12和fmap21,后再进行split处理,以此同时得到左右图的flow?
  3. 进一步,之所以要区分fmap12和fmap21是因为CorrBlock1D下的 corr(fmap1, fmap2)计算是不可交换的,即torch.einsum('aijk,aijh->ajkh', fmap1, fmap2)torch.einsum('aijk,aijh->ajkh', fmap2, fmap1)返回结果不同,所以在原始输入在batch维度通过cat构造了 [(左,右),(右,左)] 两组样本,这样就能够在后续计算中同时输出了左图和右图的flow?

谢谢~

数据加载中的相机参数问题

在human_loader.py中:
projection_matrix = getProjectionMatrix(znear=self.opt.znear, zfar=self.opt.zfar, K=intr, h=height, w=width).transpose(0, 1) world_view_transform = torch.tensor(getWorld2View2(R, T, np.array(self.opt.trans), self.opt.scale)).transpose(0, 1)
这个代码里面world_view_transform为什么是 世界坐标系到相机坐标系 变换矩阵的转置呀?(为什么多用了个transpose(0, 1))
projection_matrix 也使用transpose(0, 1)进行了转置?
intr, extr是相机的内参和world to camera的外参吧?

why the pipeline in this paper can only reconstruct human body.

I'm new to these area and I've just read the paper. I'm wondering why the pipeline in this paper can only reconstruct human body. Which component introduces this limitation to the pipeline?
I'm terribly sorry if this question is stupid. I'm a newbie.

The output data

Hi Shunyuan,

Thanks for your nice work! I ran the test code, but the saved data was in jpg format, could I save the output as 3D data in ply format? In addition, if we would like to do the inference on real people captured by cameras in real-time, is there any processing required?

Thank you😊

Dataset on the applicability of this method

Hello! Thanks for your excellent work! But I have some question about the dataset. Does the method only apply to humans dealing with ring scenes and can it be used to deal with LLFF data?As it is mentioned in the paper that the purposed method need to select the left and right cameras as source views and relies on the results of binocular depth estimation. Looking forward to your reply! Best wishes!

Questions regarding the performance v.s. conventional point cloud

Dear authors, thanks for releasing this excellent work!
Can I ask a naive question for this line of research on generalizable Gaussian: what are the superiorities of generalizable Gaussian over the traditional point cloud? As generalizable Gaussian often suggests a pixel-aligned Gaussian point, the scales assigned to each Gaussian primitive should approach 0 (even many are 0 when running the code), which means the Gaussian primitives may degenerate to a point cloud.
Also, according to Fig. 5 in the supplementary, you compared the rendered view with the point cloud reprojection and showed that GS can perform well with the presence of depth noises by learning the opacity. But when the depth estimation has some voids (i.e., second row in Fig. 5 (c/d) in your supplementary), how can your model complete this missing 3D information and recover the real placed legs given a small scale upper-bound of 0.01 set in your code?
Could you advise me on these questions? Thank you!

About the stability of the generated novel view video pictures

Thanks for your excellent work, and I have a problem.
When I generated the real-time novel view video, the edges of the people and clothes were shaking, even for the still human. I guess it is because the depth estimation is unstable. How can I solve this problem to reduce the edge shake?
Thanks for your help!

bad result in new dataset without fine-tune

I directly used your published model weights(GPS-GS_stage2_final.pth) on a new dataset and I got a bad result.
The cmd is :
python test_view_interp.py --test_data_root 'data/humanNeRF-data' --ckpt_path 'data/GPS-GS_stage2_final.pth' --novel_view_nums 9.
This is the result:
0029_novel06
0039_novel03

This is the source views:
image
Does it need fine tune in a new dataset? Or may be the source views are too sparse.

How to achieve 25fps inference speed

Great work!

Here's some profiling on the inference speed on a 3090.

[CUDA Timer] raft_stereo takes 26.7794 ms
[CUDA Timer] flow2gsparms takes 80.8899 ms
[CUDA Timer] .... flow2gsparms/gs_parm_regresser takes 77.3901 ms
[CUDA Timer] render takes 4.7777 ms

With the given testing real images, the gs_parm_regresser alone takes 77ms, not to mention other parts like the raft_stereo. Could you please give some suggestions on speeding up?
How was the 25fps claimed in the paper achieved?

Pre-processing in stage1. EOFError: No data left in file.

Hello, this is a great job. You mentioned: "At the first training time, we do stereo rectify and determine the disparity offline, the processed data will be saved at render_data/rectified_local. This process takes several hours and can extremely speed up the following training scheme. " I didn't skip this pre-processing and set data_root in stage1.yaml to the path of unzipped folder render_data. I have completed 77% of the pre-processing(5245/6816), but the following error occurred at this point:
Traceback (most recent call last): File "/mnt/GPS-Gaussian/train_stage1.py", line 174, in <module> trainer = Trainer(cfg) File "/mnt/GPS-Gaussian/train_stage1.py", line 28, in __init__ self.train_set = StereoHumanDataset(self.cfg.dataset, phase='train') File "/mnt/GPS-Gaussian/lib/human_loader.py", line 129, in __init__ self.save_local_stereo_data() File "/mnt/GPS-Gaussian/lib/human_loader.py", line 134, in save_local_stereo_data view0_data = self.load_single_view(sample_name, self.opt.source_id[0], hr_img=False, File "/mnt/GPS-Gaussian/lib/human_loader.py", line 198, in load_single_view intr, extr = np.load(intr_name), np.load(extr_name) File "/home/pai/envs/gps_gaussian/lib/python3.10/site-packages/numpy/lib/npyio.py", line 436, in load raise EOFError("No data left in file") EOFError: No data left in file
What is the reason for this? If you can help me solve this problem, I would greatly appreciate it.
Snipaste_2024-01-01_15-11-30

Background Color

I have trained GPS-Gaussian with black background color (in stage2 I use the default value of bg_color as defined in stereo_human_config.py), and then tested with black and white background colors. I have noticed, that when I run GPS-Gaussian with different background colors, I get different foreground colors. The rendered person is brighter with white color than with black color. It seems that some Gaussians are transparent or there are gaps between Gaussians, so that the background color is also visible in the foreground.

Here is a rendered person with black and white background colors.
0130_novel_black
0130_novel_white

Do I really need to train GPS-Gaussian with the same background color that is used at inference? Why?

How can we test with custom dataset!!!

What a nice work!

I want to test it with the videos captured by multiple cameras, are there any specific data requirements?
I also noticed that the input image size is configured with hard code as 1024x1024, dose it only support the image size with (1024x1024) or (2048x2048)?

Normalization of input images

Dear authors,
I wonder if the input image data has been standardized preprocessed (e.g. mean_vals = [0.471, 0.448, 0.408] std_vals = [0.234, 0.239, 0.242]). I can't find the relevant code.
If the answer is no, I wonder if it would be beneficial to do so, since many pretrained models use standardization.
Thanks

Problems about preparing the data

Hi, Wonderful work and I have a few questions?

  1. I have tried to reproduce the results using the setup I have, but I found in your rendering code, fx and fy, cx and cy is set according to the image resolution instead of the intrinsic parameter we calibrated, I want to know what is the intuition behind it?

  2. And then I tried to adjust my cameras to your 16 views setting, however I only have 5 cameras and the lens is not wide enough. So I set it on a side with 22.5 degree angle of rotation between each, the picth is almost 0(like 1-2 difference) and take 1080*1920 resolution images.

I want to ask in this case, Do I need to set the base pitch to like 1-2 degree, because I found that in my testing, there are some flickering in the fingers as shown below:

lin423_m.mp4

How to get good calibration

Hi,

When I was reproducing the results with my data, I also found the method very sensitive to the calibrated radius w2c[2,3]. I want to ask

  1. what is a good method/tool to calibrate the cameras?
  2. how many cameras do we need to get a good calibration?

Thanks!!

实验设置问题

作者您好,论文中的实验设置是以8个相机进行训练,另外8个相机进行推理。但您所提供的GPS-GS_stage2_final.pth是16个相机训练的结果,是吗?
image

New Training Dataset

I am preparing a new training dataset. Do I need to run SMPL-X in order to normalize the orientation of the 3D human models?

测试时一定要mask吗

请问test的时候跑inference,mask是一定要的吗?还有为什么mask的图像不是二值的,而是RGB格式?

Training on Multi-View RGB-D Data and Testing on Single-View RGB-D

Hi there! I'm impressed with your work and had a question about the possibility of training a model on multi-view RGB-D data and then testing it on single-view RGB-D data. This scenario is particularly useful when training in a simulation environment and deploying the model in the real world.

I would greatly appreciate any insights or guidance on how to proceed with this. Thank you for your time and assistance!

Regarding depth map in training

Thanks for your interesting work. I have a question regarding the depth. If, during training, I utilize depth maps obtained directly from MVS rather than generating synthetic depth maps, could this significantly impact the model's performance?

Diffenrent size of stage1.pth

excuse me, would you like to solve my confusion that the stage1.pth download from the Onedrive is 74.23M. However, the one that i train by myself base on THU2.0 is 37M. And both them are work. So would you explain the reason of the difference?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.