richzhang / perceptualsimilarity Goto Github PK
View Code? Open in Web Editor NEWLPIPS metric. pip install lpips
Home Page: https://richzhang.github.io/PerceptualSimilarity
License: BSD 2-Clause "Simplified" License
LPIPS metric. pip install lpips
Home Page: https://richzhang.github.io/PerceptualSimilarity
License: BSD 2-Clause "Simplified" License
It seems that the code can't calculate the metric on the images with unequal width and height, can it be expanded to calculate images of various sizes?
Thank you for such a well setup! I managed to run the code with a tiny modification on an Ubuntu 19.04. I commented those two lines in your requirements.txt
:
#numpy>=1.14.3
#opencv>=2.4.11
Later, I installed the numpy and the opencv from Ubuntu's own repositories:
sudo apt-get install python-numpy python-opencv
Lastly, I verified by running comparison of two sample images as in below:
XXXXX:PerceptualSimilarity$ python compute_dists.py -p0 imgs/ex_ref.png -p1 imgs/ex_p0.png --use_gpu
Setting up Perceptual loss...
Loading model from: /home/XXXXX/PerceptualSimilarity/models/weights/v0.1/alex.pth
...[net-lin [alex]] initialized
...Done
Distance: 0.722
I understand that the model takes as input two images, by design. I would like to know if there is a smart way to use LPIPS metric for image retrieval, other than computing all the pairwise distances.
For information, my dataset of game banners contains about 30k images. In my previous experiments, I extracted image features once, and could then work with this processed data using standard tools for efficient similarity search based on cosine similarity, Minkowski distance, etc.
Thank your for your attention.
Hi, how can we interpret the physical meaning of the similarity distance?
For example in which range, the distance means the images are very similar?
For example in which range, the distance means the images are very different?
I understood that 0 means two pictures are exactly the same. However, what if a value is around 0.5?
Any suggestions?
Thanks.
I want to use this model to detect the similarity of the two people's handwriting signature . so i made dataset just like 2afc.
train on net-lin + alex
(ep: 9, it: 20000, t: 0.003[s], ept: 0.16/0.55[h]) loss_total: 0.564, acc_r: 0.680
(ep: 9, it: 25000, t: 0.003[s], ept: 0.20/0.56[h]) loss_total: 0.556, acc_r: 0.700
(ep: 9, it: 30000, t: 0.003[s], ept: 0.25/0.58[h]) loss_total: 0.525, acc_r: 0.780
(ep: 9, it: 35000, t: 0.003[s], ept: 0.30/0.60[h]) loss_total: 0.570, acc_r: 0.720
(ep: 9, it: 40000, t: 0.003[s], ept: 0.35/0.62[h]) loss_total: 0.511, acc_r: 0.800
(ep: 9, it: 45000, t: 0.003[s], ept: 0.42/0.65[h]) loss_total: 0.674, acc_r: 0.660
(ep: 9, it: 50000, t: 0.003[s], ept: 0.48/0.67[h]) loss_total: 0.545, acc_r: 0.700
(ep: 9, it: 55000, t: 0.003[s], ept: 0.54/0.69[h]) loss_total: 0.548, acc_r: 0.720
(ep: 9, it: 60000, t: 0.003[s], ept: 0.62/0.72[h]) loss_total: 0.626, acc_r: 0.660
(ep: 9, it: 65000, t: 0.003[s], ept: 0.69/0.75[h]) loss_total: 0.606, acc_r: 0.640
(ep: 9, it: 70000, t: 0.003[s], ept: 0.77/0.77[h]) loss_total: 0.516, acc_r: 0.720
(ep: 10, it: 5000, t: 0.003[s], ept: 0.03/0.38[h]) loss_total: 0.447, acc_r: 0.800
(ep: 10, it: 10000, t: 0.003[s], ept: 0.05/0.38[h]) loss_total: 0.500, acc_r: 0.800
(ep: 10, it: 15000, t: 0.003[s], ept: 0.10/0.48[h]) loss_total: 0.484, acc_r: 0.840
(ep: 10, it: 20000, t: 0.003[s], ept: 0.15/0.52[h]) loss_total: 0.523, acc_r: 0.760
(ep: 10, it: 25000, t: 0.003[s], ept: 0.20/0.56[h]) loss_total: 0.579, acc_r: 0.700
(ep: 10, it: 30000, t: 0.003[s], ept: 0.25/0.59[h]) loss_total: 0.609, acc_r: 0.620
(ep: 10, it: 35000, t: 0.003[s], ept: 0.31/0.63[h]) loss_total: 0.544, acc_r: 0.760
(ep: 10, it: 40000, t: 0.003[s], ept: 0.38/0.67[h]) loss_total: 0.613, acc_r: 0.660
(ep: 10, it: 45000, t: 0.003[s], ept: 0.45/0.70[h]) loss_total: 0.569, acc_r: 0.700
(ep: 10, it: 50000, t: 0.003[s], ept: 0.52/0.74[h]) loss_total: 0.567, acc_r: 0.660
(ep: 10, it: 55000, t: 0.003[s], ept: 0.59/0.75[h]) loss_total: 0.651, acc_r: 0.600
(ep: 10, it: 60000, t: 0.003[s], ept: 0.66/0.77[h]) loss_total: 0.492, acc_r: 0.780
(ep: 10, it: 65000, t: 0.003[s], ept: 0.73/0.79[h]) loss_total: 0.547, acc_r: 0.720
(ep: 10, it: 70000, t: 0.003[s], ept: 0.81/0.81[h]) loss_total: 0.608, acc_r: 0.660
Hello @richzhang,
In the LPIPS paper, the 1x1 scaling convolution of the difference of the activations is performed before the squaring.
But, in the implementation, the difference of the activations is first squared, and after scaled.
diffs[kk] = (feats0[kk]-feats1[kk])**2
...
self.lin[kk](diffs[kk])
Is this a mistake ? If yes, in the paper or in the implementation ?
I am trying to independently replicate the LPIPS metric in Keras, initially focusing on uncalibrated VGG. Following the README
I was getting the test_network.py
working, but am a little confused by the three example images ex_ref.png
, ex_p0.png
, and ex_p1.png
and how they are processed.
Each of these images are 64x64, and in test_network.py
they are passed to the vgg network without scaling. But the native input size of VGG is 224x224 and the pytorch models documentation clearly states that input sizes are expected to that size (or larger):
All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224.
Notably, when provided with 224x224 inputs, the layer sizes are:
However when they left at 64x64 without scaling, the layer sizes are smaller at each stage:
I'm not familiar with pytorch internals and so it's not clear to me how to interpret this behaviour in porting this to Keras. So my questions are:
def forward(self, in0, in1):
in0_sc = (in0 - self.shift.expand_as(in0)) / self.scale.expand_as(in0)
in1_sc = (in1 - self.shift.expand_as(in0)) / self.scale.expand_as(in0)
if (self.pnet_tune):
outs0 = self.net.forward(in0)
outs1 = self.net.forward(in1)
else:
outs0 = self.net[0].forward(in0)
outs1 = self.net[0].forward(in1)
Why you don't use in0_sc
to feed the net? Is it a bug or a feature?
Would it also be possible to compare 2 images with different spatial resolution?
e.g.
img_1 = 256 X 270 X 3
img_2 = 180 X 245 X 3
One hacky way of doing it to resize both images to same size and compare.
In the end, it network gives a feature vector of different length if image res are different for 2 images. But just curious in case you have tried out something.
Hi,
When I run the code, I have a bug when a function in "networks_basic.py" is called, which is "in_tens.mean([2,3], keepdim=keepdim)". I don't know how to trackle this, so I take the liberty to ask for help. After search on the Internet, I guess the reason may be that the argument "dim" in the first position should be an integer instead of a list "[2,3]". The detailed error reporting information are as fllow.
Thanks
"Traceback (most recent call last):
File "compute_dists_pair.py", line 34, in
dist01 = model.forward(img0,img1).item()
File "PerceptualSimilarity-master/models/init.py", line 40, in forward
return self.model.forward(target, pred)
File "PerceptualSimilarity-master/models/dist_model.py", line 116, in forward
return self.net.forward(in0, in1, retPerLayer=retPerLayer)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "PerceptualSimilarity-master/models/networks_basic.py", line 79, in forward
res = [spatial_average(self.lins[kk].model(diffs[kk]), keepdim=True) for kk in range(self.L)]
File "PerceptualSimilarity-master/models/networks_basic.py", line 79, in
res = [spatial_average(self.lins[kk].model(diffs[kk]), keepdim=True) for kk in range(self.L)]
File "PerceptualSimilarity-master/models/networks_basic.py", line 18, in spatial_average
return in_tens.mean([2,3],keepdim=keepdim)
TypeError: mean() received an invalid combination of arguments - got (list, keepdim=bool), but expected one of:
"
when I call lpips_vgg = loss_fn_vgg(a, b)
I encounter this error, but I have a correct result in other codes.
Traceback (most recent call last):
File "test.py", line 125, in <module>
lpips_vgg_y = loss_fn_vgg(cropped_sr_img_y * 255, cropped_gt_img_y * 255)
File "/mnt/data0/home/name/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/data0/home/name/miniconda3/lib/python3.7/site-packages/lpips/lpips.py", line 87, in forward
in0_input, in1_input = (self.scaling_layer(in0), self.scaling_layer(in1)) if self.version=='0.1' else (in0, in1)
File "/mnt/data0/home/name/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/data0/home/name/miniconda3/lib/python3.7/site-packages/lpips/lpips.py", line 122, in forward
return (inp - self.shift) / self.scale
File "/mnt/data0/home/name/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 396, in __rsub__
return _C._VariableFunctions.rsub(self, other)
TypeError: rsub() received an invalid combination of arguments - got (Tensor, numpy.ndarray), but expected one of:
* (Tensor input, Tensor other, *, Number alpha)
* (Tensor input, Number other, Number alpha)
So I checked the format and type of input arguements a and b, get
type and shape of sr:torch.FloatTensor torch.Size([3, 472, 312])
type and shape of gt:torch.FloatTensor torch.Size([3, 472, 312])
And in the format of variables in the previous code witch returned right result is:
hr_img shape:torch.Size([3, 480, 320]), type:torch.FloatTensor
sr_img shape:torch.Size([3, 480, 320]), type:torch.FloatTensor
lpips:tensor([[[[0.2569]]]])
So I don't know which part should I correct.
Hello!
The following doesn't work:
import PerceptualSimilarity.models as psm
I have cloned the repository into my working directory and attempt to use the similarity function for my pictures.
The import fails with the following error:
`ModuleNotFoundError Traceback (most recent call last)
in
----> 1 import PerceptualSimilarity.models as psm
~/HDD/works/Skoltech/CapsuleAD/src/PerceptualSimilarity/models/init.py in
9 from torch.autograd import Variable
10
---> 11 from models import dist_model
12
ModuleNotFoundError: No module named 'models'`
What am I doing wrong?
Could you please look into it?
If the distance is smaller, does it mean the two images are similar?
Hello, i can't find the file compute_dists_pair.py, could you release it? Thanks.
Hey, I hope you are well.
In the model options for your loss, we can either go for 'net'(vanilla pre-trained CNN) or 'net-lin' (which I assume is the one with the learned linear layer). I was interested in the 'tune' and 'scratch' models of the CNNs for research purposes, are they available, how can I obtain them? Thank you for your time.
as title states
Hi @richzhang , I notice you adopt input image size as 6464. My question is if atribute input size is ok? or it should be dixed as 6464?
Hi, I can't find the appendix of your paper. Could you help me find it?
Hi there, i'm currently training an Artifact Removal/Super Resolution model, a multilayer ESPCN, but i'm having this issue after few iterations of training:
This is the code how i instantiate the loss:
lpips = lpips.LPIPS(net='vgg')
This is the code about the model:
class ESPCNResBlock(nn.Module):
def __init__(self, nf=64):
super(ESPCNResBlock, self).__init__()
self.conv1 = nn.Conv2d(nf, nf, kernel_size=3, padding=3 // 2)
self.conv2 = nn.Conv2d(nf, nf, kernel_size=3, padding=3 // 2)
def forward(self, input):
x = self.conv1(input)
x = F.hardtanh(x, min_val=-1, max_val=1.0)
x = self.conv2(x)
x = F.hardtanh(x, min_val=-1, max_val=1.0)
return x + input
class ESPCN(nn.Module):
def __init__(self, scale_factor=2, n_blocks=4, nf=64, in_channels=3, out_channels=3):
super(ESPCN, self).__init__()
self.scale_factor = scale_factor
layers = [nn.Conv2d(in_channels, nf, kernel_size=5, padding=5 // 2),
nn.Hardtanh()]
for _ in range(n_blocks//2):
layers += [ESPCNResBlock(),
]
layers += [
nn.Conv2d(nf, 32, kernel_size=3, padding=3 // 2),
nn.Hardtanh(),
]
self.first_part = nn.Sequential(*layers)
self.last_part = nn.Sequential(
nn.Conv2d(32, out_channels * (scale_factor ** 2), kernel_size=3, padding=3 // 2),
nn.PixelShuffle(scale_factor) if scale_factor > 1 else nn.Identity(),
nn.Tanh()
)
self._initialize_weights()
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
if m.in_channels == 32:
nn.init.normal_(m.weight.data, mean=0.0, std=0.001)
nn.init.zeros_(m.bias.data)
else:
nn.init.normal_(m.weight.data, mean=0.0,
std=math.sqrt(2 / (m.out_channels * m.weight.data[0][0].numel())))
nn.init.zeros_(m.bias.data)
def forward(self, input):
x = self.first_part(input)
x = self.last_part(x)
x = x + F.interpolate(input,
scale_factor=self.scale_factor,
mode='bilinear')
x = torch.clamp(x, min=-1, max=1)
return x
I've localized the error in the normalize function, however i'm still looking for a fix.
The model is trained with Adam on batch of 64x64 images.
It looks like target
and pred
should be replaced with in0
and in1
.
PerceptualSimilarity/lpips/lpips.py
Line 83 in a1188a3
To reproduce: set normalize=True
when calling the loss function.
Hi
Do you have a link that I can use to download the dataset, I am asking for windows
I find that when I import PerceptualSimilarity as a package, the weights loading line in dist_model.py fails as '.' points to the current directory of my calling code rather than the root directory of PerceptualSimilarity.
Here's a patch that fixes the issue:
fix_ps.patch.txt
Hello sir,
Whatever the inputs are, why the output is always 0 when using GPU?
But when using only CPU, I can obtain the scores normally.
Thank you!
Dear authors,
equation (1) in the paper states that you are taking the euclidean norm squared of the weighted differences.
Something like euclidean_norm(dot(w_l, (y - y_0)))²
However, in the implementation you are weighting the squared difference of the euclidean norms, something like dot(w_l, (euclidean_norm(y)-euclidean_norm(y_0))²)
which as far as I am concerend is not the same thing. Or am I missing something here?
Thanks!
When i use the ' sudo pip3 install lpips'
import lpips
and i use ' percept = lpips.PerceptualLoss(model='net-lin', net ='vgg', use_gpu= True)
then the error 'lpips' has no attribut 'PerceptualLoss'
Thank you for your significant contribution!
It seems that the LPIPS loss function can not be used directly in tensorflow to train a neural network. What should I do if I want to use it?
I noticed that the data normalization is
transforms.Normalize((0.5, 0.5, 0.5),(0.5, 0.5, 0.5))
(for example in twoafc_dataset.py).
The imagenet normalization coefficients are
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
This raises the questions:
In parameter type; parameters:
What is the meaning of oixel wise(l1) in Loss/Learning
When using --model net
for using an off-the-shelf network, train.py
breaks here:
PerceptualSimilarity/models/networks_basic.py
Line 194 in 2416334
net-lin
models, the dimensionality is [50, 1, 1, 1] while this wrapping into arrays doesn't happen when using net
models.net-lin
model?I am trying to replace the standard loss functions like MSE in my autoencoder network, with the Perceptual Similarity Metric. I wanted to know whether this would be possible, since instances of the network and specific formatting might be required for the same.
Hi, LPIPS helps me a lot in image translation task! There's one question that I could not figure it out by myself. For feature map distance, why the paper computes L2 distance channel-wisely and then averages spatially? Could we compute L2 distance spatially(flatten feature map for one channel, and compute L2 distance) and then average over channels?
Thanks!
Hi,
Any chance you can share the code for split-Brain / BiGAN / Puzzle
Thanks!!
First of all - great paper!
I'm trying to run the single image similarity script and running into this error (for my own input images of size (224,224,3):
RuntimeError: The size of tensor a (255) must match the size of tensor b (55) at non-singleton dimension 3
Any ideas why this could be happening?
Thanks,
I get the following error when running test_network.py
:
Traceback (most recent call last):
File "test_network.py", line 11, in <module>
model.initialize(model='net-lin',net='alex',use_gpu=False)
File "/Users/faro/repositories/PerceptualSimilarity/models/dist_model.py", line 38, in initialize
self.net.load_state_dict(torch.load('./weights/%s.pth'%net, map_location=lambda storage, loc: 'cpu'))
File "/Users/faro/repositories/PerceptualSimilarity/.env/lib/python3.6/site-packages/torch/serialization.py", line 261, in load
return _load(f, map_location, pickle_module)
File "/Users/faro/repositories/PerceptualSimilarity/.env/lib/python3.6/site-packages/torch/serialization.py", line 409, in _load
result = unpickler.load()
File "/Users/faro/repositories/PerceptualSimilarity/.env/lib/python3.6/site-packages/torch/_utils.py", line 74, in _rebuild_tensor
module = importlib.import_module(storage.__module__)
AttributeError: 'str' object has no attribute '__module__'
To get there I needed to change a few files and fixing some import bugs
One thing that would really help to reproduce the results if you could specify the requirements (especially the pytorch version). Maybe consider adding a requirements file like I did here: See my fork here: faroit@4ccefee#diff-b4ef698db8ca845e5845c4618278f29a
Currently, the upsample
function is as follows:
def upsample(in_tens, out_HW=(64,64)): # assumes scale factor is same for H and W
in_H, in_W = in_tens.shape[2], in_tens.shape[3]
scale_factor_H, scale_factor_W = 1.*out_HW[0]/in_H, 1.*out_HW[1]/in_W
return nn.Upsample(scale_factor=(scale_factor_H, scale_factor_W), mode='bilinear', align_corners=False)(in_tens)
This ends up failing in the case where the input images being compared are of resolution 800x600. When this is the case, one of the layers passed in as in_tens
has shape (1, 1, 149, 199). As a result, in_H * scale_factor_H
= 600.0000000000001 and in_W * scale_factor_W
= 799.9999999999999. The result of the Upsample
is an output tensor of size (1,1,600,799), which leads to an exception when it is added to other tensors of size (1,1,600,800).
Instead of computing the scale_factor
, a more robust solution is to just set the size
parameter directly:
return nn.Upsample(size=out_HW, mode='bilinear', align_corners=False)(in_tens)
This might also be the cause of this specific comment: #45 (comment)
Is the server hosting the dataset down ? I am not able to download it "ERROR 503: Service Unavailable"
Hello author,
Can we measure the difference of 2 gray scale images? Or this metric is only used for RGB images?
Thank you.
Thanks for Publishing the code and appreciate if you could help me understand this.
I trained a WGAN on my own data. Now, i am planning to use the generator network features[weights] to calculate PerceptualSimilarity score. I am not quite sure how to do this.
If i correctly understood, In either VGG/Resnet we will be passing the query images(image1, image2) through the network, for each input image we get all the features and calculate the score using them.
But i am not sure how to use that features for WGAN, since the input to the generator network is noise and the output is the synthetically generated images. How do i pass query images to get those features?
I notice that the model by default assumes images of dimensions 64x64. I'm curious how the model/distance metric performs for higher resolutions like that of ImageNet (224x224).
Is it possible to use this to measure geometric distortion? As in the quality of a retargeting compared to a full reference?
python test_network.py
Model [SSIM] initialized
Distances: (0.262, 0.344)
python test_network.py
Loading model from: /data/sunzhaomang/AdvFeat/PerceptualSimilarity/weights/v0.1/alex.pth
Model [net-lin [alex]] initialized
Distances: (0.034, 0.037)
python test_network.py
Loading model from: /data/sunzhaomang/AdvFeat/PerceptualSimilarity/weights/v0.1/alex.pth
Model [net-lin [alex]] initialized
Distances: (0.041, 0.047)
Using SSIM and LPIPS metric, the distance between ex_ref and ex_p0 is smaller than that between ex_ref and ex_p1, that is ex_p0.png is more similiar to ex_ref.png than ex_p1.png, which is contrary to the claim referred in the paper. How to explain this ???
Nice paper! One question:What's the input image size of Alexnet/VGG/Squeezenet?
Thanks in advance!
Any idea why importing lpips is causing this error? It seems to be something from the IPython import causing issues. The ipython (and prompt_toolkit) versions are both the latest release. It works fine within an IPython notebook, but importing lpips inside my training job causes this crash.
Traceback (most recent call last):
File "run_train.py", line 8, in <module>
import models
File "/home/timbrooks/code/prototypes/models/__init__.py", line 2, in <module>
from .model import *
File "/home/timbrooks/code/prototypes/models/model.py", line 18, in <module>
import lpips
File "/home/timbrooks/anaconda3/envs/prototypes/lib/python3.8/site-packages/lpips/__init__.py", line 11, in <module>
from lpips.trainer import *
File "/home/timbrooks/anaconda3/envs/prototypes/lib/python3.8/site-packages/lpips/trainer.py", line 11, in <module>
from IPython import embed
File "/home/timbrooks/anaconda3/envs/prototypes/lib/python3.8/site-packages/IPython/__init__.py", line 56, in <module>
from .terminal.embed import embed
File "/home/timbrooks/anaconda3/envs/prototypes/lib/python3.8/site-packages/IPython/terminal/embed.py", line 16, in <module>
from IPython.terminal.interactiveshell import TerminalInteractiveShell
File "/home/timbrooks/anaconda3/envs/prototypes/lib/python3.8/site-packages/IPython/terminal/interactiveshell.py", line 21, in <module>
from prompt_toolkit.formatted_text import PygmentsTokens
ModuleNotFoundError: No module named 'prompt_toolkit.formatted_text'
I installed lpips with the command that you suggested but the version installed has an error on file init.py of lpips folder. It is missing plt in load_image.
I've tried to use lpips in my super-resolution project and it keeps printing:
"Loading model from: C:\Workspace\envs\workplace\lib\site-packages\lpips\weights\v0.1\alex.pth
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]"
Is there any way to turn it off?
Is this safe to use lpips for gray images? The code does not work for 1 channel images. The hack would be to use 3 identical channels yet I am not sure what would be the effect within the end-to-end calibrated solution on color images.
I guess there is a bug when you put your model to mutil-gpus.
Reference:
pytorch/pytorch#8637 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.