GithubHelp home page GithubHelp logo

main.py's bug? about gsv-cities HOT 8 CLOSED

libenchong avatar libenchong commented on June 24, 2024
main.py's bug?

from gsv-cities.

Comments (8)

amaralibey avatar amaralibey commented on June 24, 2024

Hi @libenchong,
This is not really a bug, but actually a feature in Pytorch Lightning. If you keep fast_dev_run=True this means you just want to feed the network with one batch and see if the feedforward step is working right (they call it dev mode). In this case, Pytorch Lightning will only pass one batch from the train dataset, and one batch from each validation set (to make sure your neural network is working right before you start training).
In the above example you are using batch size of 60, this is why we see that there are 60 elements in r_list (in validation we start with references then queries sequentially), which means PytorchLightning only fed one batch and called the get_recalls method (which, of course, will not be able to calculate recalls if there are no queries).

If you want to start training, just comment the following line:
fast_dev_run=True # comment if you want to start training the network and saving checkpoints

from gsv-cities.

amaralibey avatar amaralibey commented on June 24, 2024

I'll add a condition in validation_epoch_end to make sure we don't run get_validation_recalls if we are in dev mode.
Thank you

from gsv-cities.

libenchong avatar libenchong commented on June 24, 2024

Hi,@amaralibey
Thanks for your guilding. When I comment the following line:
fast_dev_run=True # comment if you want to start training the network and saving checkpoint,
I find the same bug that the length of feats is far less than num_ References results in utils.get_ validation_ Recall() divisor is 0 and appearing NAN
image

from gsv-cities.

libenchong avatar libenchong commented on June 24, 2024

this bug appears MixVPR code, MixVPR code has commented fast_dev_run=True

from gsv-cities.

amaralibey avatar amaralibey commented on June 24, 2024

Can you tell me what's the output of print(len(val_dataset)) when you put it bellow the line feats = torch.concat(val_step_outputs[i], dim=0) ?

from gsv-cities.

libenchong avatar libenchong commented on June 24, 2024

Hi,@amaralibey , pitts30k len(val_dataset) = 17608 ,feats.shape = [8804,8192] ;msls_val len(val_dataset)=19611,,feats.shape = [9806,8192].
Because I'm not familiar with pytorch_ Lightning code,so I wrote the code to get features with pytorch。Can you help me see if my writing is correct
` for i, (val_set_name, val_dataset) in enumerate(zip(dm.val_set_names, dm.val_datasets)):
feats = torch.concat(val_step_outputs[i], dim=0)

        # libenchong write code  begin
        val_dataloader = DataLoader(val_dataset,batch_size=20,num_workers=4)
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        Feat = torch.empty((len(val_dataset), 8192))
        for iteration, (places, labels) in enumerate(val_dataloader, 1):
            places = places.to(device)
            # calculate descriptors
            descriptors = self(places)
            descriptors.detach().cpu()
            Feat[labels.detach(),:] = descriptors.detach().cpu()
        # libenchong write code end
        
        if 'pitts' in val_set_name:
            # split to ref and queries
            num_references = val_dataset.dbStruct.numDb
            num_queries = len(val_dataset)-num_references
            print("*****")
            positives = val_dataset.getPositives()
        elif 'msls' in val_set_name:
            # split to ref and queries
            num_references = val_dataset.num_references
            num_queries = len(val_dataset)-num_references
            positives = val_dataset.pIdx
        else:
            print(f'Please implement validation_epoch_end for {val_set_name}')
            raise NotImplemented

        # r_list = feats[ : num_references]
        # q_list = feats[num_references : ]

        r_list = Feat[ : num_references]
        q_list = Feat[num_references : ]
        pitts_dict = utils.get_validation_recalls(r_list=r_list, 
                                            q_list=q_list,
                                            k_values=[1, 5, 10, 15, 20, 25],
                                            gt=positives,
                                            print_results=True,
                                            dataset_name=val_set_name,
                                            faiss_gpu=self.faiss_gpu
                                            )
        del r_list, q_list, feats, num_references, positives,Feat

        self.log(f'{val_set_name}/R1', pitts_dict[1], prog_bar=False, logger=True)
        self.log(f'{val_set_name}/R5', pitts_dict[5], prog_bar=False, logger=True)
        self.log(f'{val_set_name}/R10', pitts_dict[10], prog_bar=False, logger=True)
    print('\n\n')`

image

from gsv-cities.

libenchong avatar libenchong commented on June 24, 2024

image
image

from gsv-cities.

libenchong avatar libenchong commented on June 24, 2024

Hi,@amaralibey,I think I have found the reason. I used two GPUs to run the code, resulting in feats len being half of that of a single GPU
trainer = pl.Trainer( accelerator='gpu', devices=[0,1], default_root_dir=f'./LOGS/{model.encoder_arch}'
a single GPU
图片1
two GPUS
#6 (comment)

from gsv-cities.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.