junyangwang0410 / knight Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 0.0 36.71 MB

SotA text-only image/video method (IJCAI 2023)

Python 96.65% Jupyter Notebook 3.35%

knight's People

Contributors

Stargazers

Watchers

knight's Issues

about running run_image_captioning.py --dataset coco

Thanks for amazing work.
when I run
python running run_image_captioning.py --dataset coco
some error occurs. the error code is shown as follow:
Traceback (most recent call last): File "run_image_captioning.py", line 148, in <module> main(args) File "run_image_captioning.py", line 97, in main output = GPT_model(**token, labels = token["input_ids"], prefix = batch_caption_feature) File "/home/boyang/anaconda3/envs/knight/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) TypeError: forward() got an unexpected keyword argument 'prefix'
https://github.com/junyangwang0410/Knight/blob/e03dc2e340abcf418aba711acc300946145a0b08/run_image_captioning.py#LL97C25-L97C25

my conda environment is show as follow:

python run_image_captioning.py --dataset flickr

Hello, thank you for sharing code.

I have the following error when use Flickr30k on colab.

!python run_image_captioning.py --dataset flickr
FileNotFoundError: [Errno 2] No such file or directory: './feature/Flickr/nibers.npy'

请教一下数据集中的coco_test.txt是什么内容？我从官网下载的coco2014 image captioning 数据集中没有看到这个。

    json_path = "./data/COCO/captions_val2014.json"
    json_labels = json.load(open(json_path,'r'))
    annotations = json_labels["annotations"]
    images = json_labels["images"]
    images_path = "./data/COCO/image/"

    image_dict = dict()
    for image in images:
        image_dict[image["file_name"]] = image["id"]

    with open("./data/COCO/coco_test.txt") as image_names_data:
        image_names = image_names_data.readlines()

    image_features = []
    for image_info in image_names:
        image_file = image_info.split('\n')[0]
        image_id = image_dict[image_file]
        image_path = images_path + image_file
        ori_image = Image.open(image_path)
        image = preprocess(ori_image).unsqueeze(0).to(device)
        image_feature = clip_model.encode_image(image)
        image_features.append(image_feature)
        
    image_features = torch.cat(image_features)
    torch.save(image_features, "./feature/COCO/image_features.pkl")

因为不懂这个coco_test.txt文件，这段代码没有看明白，如果是读取图片的话，应该只需要拼接file_name与image_folder_name吧。

请教一下generate()方法中prefix参数。

def caption_generation(image_feature, model: GPT2LMHeadModel, tokenizer, device):
	text = "prefix prefix prefix prefix prefix:"
	inputs = tokenizer(text, return_tensors="pt")
	output = model.generate(inputs["input_ids"].to(device), 40, prefix = image_feature, do_sample = False, num_beams=5)[0]
	output = tokenizer.decode(output)
	return output.split(':')[1].split('.')[0].lower()

如上这段代码model.generate()方法中用到了一个prefix参数，我在查阅Huggingface的文档中并没有找到关于prefix参数的解释。

在modeling_gpt2.py文件中，我找到了如下部分代码：

def forward(
        ...
        prefix: Optional[torch.FloatTensor] = None,
    ) -> Union[Tuple, BaseModelOutputWithPastAndCrossAttentions]:
        ...

以及：

...
if inputs_embeds is None:
    inputs_embeds = self.wte(input_ids)
if prefix != None:
    prefix = prefix.expand(inputs_embeds.shape[0], 5, inputs_embeds.shape[2])
    inputs_embeds = torch.cat((prefix, inputs_embeds[:, 5:, :]), dim = 1)
position_embeds = self.wpe(position_ids)
hidden_states = inputs_embeds + position_embeds
...

这段部分的添加应该是作者的修改对吗？期待您的回复。

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble