Comments (8)
Hi @libenchong,
This is not really a bug, but actually a feature in Pytorch Lightning. If you keep fast_dev_run=True
this means you just want to feed the network with one batch and see if the feedforward step is working right (they call it dev mode). In this case, Pytorch Lightning will only pass one batch from the train dataset, and one batch from each validation set (to make sure your neural network is working right before you start training).
In the above example you are using batch size of 60, this is why we see that there are 60 elements in r_list (in validation we start with references then queries sequentially), which means PytorchLightning only fed one batch and called the get_recalls method (which, of course, will not be able to calculate recalls if there are no queries).
If you want to start training, just comment the following line:
fast_dev_run=True # comment if you want to start training the network and saving checkpoints
from gsv-cities.
I'll add a condition in validation_epoch_end
to make sure we don't run get_validation_recalls
if we are in dev mode.
Thank you
from gsv-cities.
Hi,@amaralibey
Thanks for your guilding. When I comment the following line:
fast_dev_run=True #
comment if you want to start training the network and saving checkpoint
,
I find the same bug that the length of feats is far less than num_ References results in utils.get_ validation_ Recall() divisor is 0 and appearing NAN
from gsv-cities.
this bug appears MixVPR code, MixVPR code has commented fast_dev_run=True
from gsv-cities.
Can you tell me what's the output of print(len(val_dataset))
when you put it bellow the line feats = torch.concat(val_step_outputs[i], dim=0)
?
from gsv-cities.
Hi,@amaralibey , pitts30k len(val_dataset) = 17608 ,feats.shape = [8804,8192] ;msls_val len(val_dataset)=19611,,feats.shape = [9806,8192].
Because I'm not familiar with pytorch_ Lightning code,so I wrote the code to get features with pytorch。Can you help me see if my writing is correct
` for i, (val_set_name, val_dataset) in enumerate(zip(dm.val_set_names, dm.val_datasets)):
feats = torch.concat(val_step_outputs[i], dim=0)
# libenchong write code begin
val_dataloader = DataLoader(val_dataset,batch_size=20,num_workers=4)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
Feat = torch.empty((len(val_dataset), 8192))
for iteration, (places, labels) in enumerate(val_dataloader, 1):
places = places.to(device)
# calculate descriptors
descriptors = self(places)
descriptors.detach().cpu()
Feat[labels.detach(),:] = descriptors.detach().cpu()
# libenchong write code end
if 'pitts' in val_set_name:
# split to ref and queries
num_references = val_dataset.dbStruct.numDb
num_queries = len(val_dataset)-num_references
print("*****")
positives = val_dataset.getPositives()
elif 'msls' in val_set_name:
# split to ref and queries
num_references = val_dataset.num_references
num_queries = len(val_dataset)-num_references
positives = val_dataset.pIdx
else:
print(f'Please implement validation_epoch_end for {val_set_name}')
raise NotImplemented
# r_list = feats[ : num_references]
# q_list = feats[num_references : ]
r_list = Feat[ : num_references]
q_list = Feat[num_references : ]
pitts_dict = utils.get_validation_recalls(r_list=r_list,
q_list=q_list,
k_values=[1, 5, 10, 15, 20, 25],
gt=positives,
print_results=True,
dataset_name=val_set_name,
faiss_gpu=self.faiss_gpu
)
del r_list, q_list, feats, num_references, positives,Feat
self.log(f'{val_set_name}/R1', pitts_dict[1], prog_bar=False, logger=True)
self.log(f'{val_set_name}/R5', pitts_dict[5], prog_bar=False, logger=True)
self.log(f'{val_set_name}/R10', pitts_dict[10], prog_bar=False, logger=True)
print('\n\n')`
from gsv-cities.
from gsv-cities.
Hi,@amaralibey,I think I have found the reason. I used two GPUs to run the code, resulting in feats len being half of that of a single GPU
trainer = pl.Trainer( accelerator='gpu', devices=[0,1], default_root_dir=f'./LOGS/{model.encoder_arch}'
a single GPU
two GPUS
#6 (comment)
from gsv-cities.
Related Issues (20)
- Some questions in the reproduction process HOT 3
- traindataset HOT 1
- About training and migration HOT 7
- Datasets construction HOT 1
- some questions about this experiments HOT 1
- Using the model
- Dataset image "panoid" generation rules HOT 1
- Dataset query image generation rule HOT 4
- about running problem HOT 2
- LICENSE HOT 1
- about aggregators
- A Question About Batch-Sizes
- pitts dataset
- What is the full name of "GSV"? HOT 1
- No Implementation of ConvAP module HOT 1
- lacking of MSLS files
- SPED and
- SPED and Nordland dataset HOT 4
- release pre-trained model HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gsv-cities.