internlm / internlm-xcomposer Goto Github PK

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Python 91.86% Shell 3.36% Jupyter Notebook 4.79%

chatgpt visual-language-learning multi-modality foundation gpt-4 instruction-tuning mllm multimodal vision-language-model language-model

internlm-xcomposer's Introduction

InternLM-XComposer-2.5

InternLM-XComposer2.5 🤗

｜ XComposer2.5 Technical Report 📄

English | 简体中文

Thanks the community for HuggingFace Demo | OpenXLab Demo of InternLM-XComposer-2.5.

👋 join us on Discord and WeChat

Multimodal Projects of Our Team

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

InternLM-XComposer2-: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Models

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

ShareGPT4V: Improving Large Multi-modal Models with Better Captions

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models

InternLM-XComposer-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. IXC-2.5 is trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. This long-context capability allows IXC-2.5 to perform exceptionally well in tasks requiring extensive input and output contexts.

Ultra-High Resolution Understanding: IXC-2.5 enhances the dynamic resolution solution proposed in IXC2-4KHD with a native 560 × 560 ViT vision encoder, supporting high-resolution images with any aspect ratio.
Fine-Grained Video Understanding: IXC-2.5 treats videos as a ultra-high-resolution composite picture consisting of tens to hundreds of frames, allowing it to capture fine details through dense sampling and higher resolution for each frame.
Multi-Turn Multi-Image Dialogue: IXC-2.5 supports free-form multi-turn multi-image dialogue, allowing it to naturally interact with humans in multi-round conversations.
Webpage Crafting: IXC-2.5 can be readily applied to create webpages by composing source code (HTML, CSS, and JavaScript) following text-image instructions.
Composing High-Quality Text-Image Articles: IXC-2.5 leverages specially designed Chain-of-Thought (CoT) and Direct Preference Optimization (DPO) techniques to significantly enhance the quality of its written content.
Awesome performance: IXC-2.5 has been evaluated on 28 benchmarks, outperforming existing open-source state-of-the-art models on 16 benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on 16 key tasks.

Please refer to Technical Report for more details.

Demo Video

🔥 For the best experience, please keep the audio on while enjoying the video.

demo3_en.mp4

Youtube Video

Please refer to Chinese Demo for the demo of the Chinese version.

News and Updates

2024.07.03 🎉🎉🎉 InternLM-XComposer2.5-7B is publicly available.
2024.07.01 🎉🎉🎉 ShareGPT4V is accepted by ECCV2024.
2024.04.22 🎉🎉🎉 The finetune code of InternLM-XComposer2-VL-7B-4KHD-7B are publicly available.
2024.04.09 🎉🎉🎉 InternLM-XComposer2-4KHD-7B and evaluation code are publicly available.
2024.04.09 🎉🎉🎉 InternLM-XComposer2-VL-1.8B is publicly available.
2024.02.22 🎉🎉🎉 We release DualFocus, a framework for integrating macro and micro perspectives within MLLMs to enhance vision-language task performance.

2024.02.06 🎉🎉🎉 InternLM-XComposer2-7B-4bit and InternLM-XComposer-VL2-7B-4bit are publicly available on Hugging Face and ModelScope.

2024.02.02 🎉🎉🎉 The finetune code of InternLM-XComposer2-VL-7B are publicly available.
2024.01.26 🎉🎉🎉 The evaluation code of InternLM-XComposer2-VL-7B are publicly available.
2024.01.26 🎉🎉🎉 InternLM-XComposer2-7B and InternLM-XComposer-VL2-7B are publicly available on Hugging Face and ModelScope.
2024.01.26 🎉🎉🎉 We release a technical report for more details of InternLM-XComposer2 series.
2023.11.22 🎉🎉🎉 We release the ShareGPT4V, a large-scale highly descriptive image-text dataset generated by GPT4-Vision and a superior large multimodal model, ShareGPT4V-7B.
2023.10.30 🎉🎉🎉 InternLM-XComposer-VL achieved the top 1 ranking in both Q-Bench and Tiny LVLM.
2023.10.19 🎉🎉🎉 Support for inference on multiple GPUs. Two 4090 GPUs are sufficient for deploying our demo.
2023.10.12 🎉🎉🎉 4-bit demo is supported, model files are available in Hugging Face and ModelScope.
2023.10.8 🎉🎉🎉 InternLM-XComposer-7B and InternLM-XComposer-VL-7B are publicly available on ModelScope.
2023.9.27 🎉🎉🎉 The evaluation code of InternLM-XComposer-VL-7B are publicly available.
2023.9.27 🎉🎉🎉 InternLM-XComposer-7B and InternLM-XComposer-VL-7B are publicly available on Hugging Face.
2023.9.27 🎉🎉🎉 We release a technical report for more details of our model series.

Model Zoo

Model	Usage	Transformers(HF)	ModelScope(HF)	Release Date
InternLM-XComposer-2.5	Video Understanding, Multi-image Multi-tune Dialog, 4K Resolution Understanding, Web Craft, Article creation, Benchmark	🤗internlm-xcomposer2.5	internlm-xcomposer2.5	2024-07-03
InternLM-XComposer2-4KHD	4K Resolution Understanding, Benchmark, VL-Chat	🤗internlm-xcomposer2-4khd-7b	internlm-xcomposer2-4khd-7b	2024-04-09
InternLM-XComposer2-VL-1.8B	Benchmark, VL-Chat	🤗internlm-xcomposer2-vl-1_8b	internlm-xcomposer2-vl-1_8b	2024-04-09
InternLM-XComposer2	Text-Image Composition	🤗internlm-xcomposer2-7b	internlm-xcomposer2-7b	2024-01-26
InternLM-XComposer2-VL	Benchmark, VL-Chat	🤗internlm-xcomposer2-vl-7b	internlm-xcomposer2-vl-7b	2024-01-26
InternLM-XComposer2-4bit	Text-Image Composition	🤗internlm-xcomposer2-7b-4bit	internlm-xcomposer2-7b-4bit	2024-02-06
InternLM-XComposer2-VL-4bit	Benchmark, VL-Chat	🤗internlm-xcomposer2-vl-7b-4bit	internlm-xcomposer2-vl-7b-4bit	2024-02-06
InternLM-XComposer	Text-Image Composition, VL-Chat	🤗internlm-xcomposer-7b	internlm-xcomposer-7b	2023-09-26
InternLM-XComposer-4bit	Text-Image Composition, VL-Chat	🤗internlm-xcomposer-7b-4bit	internlm-xcomposer-7b-4bit	2023-09-26
InternLM-XComposer-VL	Benchmark	🤗internlm-xcomposer-vl-7b	internlm-xcomposer-vl-7b	2023-09-26

Evaluation

We evaluate InternLM-XComposer-2.5 on 28 multimodal benchmarks, including image benchmarks MMDU, MMStar, RealWorldQA, Design2Code, DocVQA, Infographics VQA, TextVQA, ChartQA, OCRBench, DeepFrom, WTQ, VisualMRC, TabFact, MathVista, MMMU, AI2D, MME, MMBench, MMBench-CN, SEED-Bench, HallusionBench, MM-Vet, and video benchmarks MVBench, MLVU, Video-MME, MMBench-Video, TempCompass

See Evaluation Details here.

Compared with closed-source APIs and previous SOTAs on Video and Structural High-resolution images.

	MVBench	MLVU	MME-Video	MMBench-Video	TempCompass	DocVQA	ChartVQA	InfoVQA	TextVQA	OCRBench	DeepForm	WTQ	VisualMRC	TabFact
	VideoChat2	InternVL1.5	LIVA	InternVL1.5	Qwen-VL	InternVL1.5	InternVL1.5	InternVL1.5	InternVL1.5	GLM-4v	DocOwl 1.5	DocOwl 1.5	DocOwl 1.5	DocOwl 1.5
	7B	26B	34B	26B	7B	26B	26B	26B	26B	9B	8B	8B	8B	8B
	60.4	50.4	59.0	42.0	52.9	90.9	83.8	72.5	80.6	77.6	68.8	40.6	246.4	80.2

GPT-4V	43.5	49.2	59.9	56.0	---	88.4	78.5	75.1	78.0	51.6	---	---	---	---
Gemini-Pro	---	---	75.0	49.3	67.1	88.1	74.1	75.2	74.6	68.0	---	---	---	---
Ours	69.1	58.8	55.8	46.9		90.9	82.2	69.9	78.2	69.0	71.2	53.6	307.5	85.2

Compared with closed-source APIs and previous SOTAs on Multi-Image dialog and General Visual QA Benchmarks.

	MVBench	MLVU	MME-Video	MMBench-Video	TempCompass	DocVQA	ChartVQA	InfoVQA	TextVQA	OCRBench	DeepForm	WTQ	VisualMRC	TabFact
	VideoChat2	InternVL1.5	LIVA	InternVL1.5	Qwen-VL	InternVL1.5	InternVL1.5	InternVL1.5	InternVL1.5	GLM-4v	DocOwl 1.5	DocOwl 1.5	DocOwl 1.5	DocOwl 1.5
	7B	26B	34B	26B	7B	26B	26B	26B	26B	9B	8B	8B	8B	8B
	60.4	50.4	59.0	42.0	58.4	90.9	83.8	72.5	80.6	77.6	68.8	40.6	246.4	80.2

GPT-4V	43.5	49.2	59.9	56.0	---	88.4	78.5	75.1	78.0	51.6	---	---	---	---
Gemini-Pro	---	---	75.0	49.3	70.6	88.1	74.1	75.2	74.6	68.0	---	---	---	---
Ours	69.1	58.8	55.8	46.9	67.1	90.9	82.2	69.9	78.2	69.0	71.2	53.6	307.5	85.2

Requirements

python 3.8 and above
pytorch 1.12 and above, 2.0 and above are recommended
CUDA 11.4 and above are recommended (this is for GPU users)
flash-attention2 is required for high-resolution usage of InternLM-XComposer2.5.

Installation

Before running the code, make sure you have setup the environment and installed the required packages. Make sure you meet the above requirements, and then install the dependent libraries. Please refer to the installation instructions

Quickstart

We provide a simple example to show how to use InternLM-XComposer-2.5 with 🤗 Transformers.

Video Understanding

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'Here are some frames of a video. Describe this video in detail'
image = ['./examples/liuxiang.mp4',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, his = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
#The video opens with a shot of an athlete, dressed in a red and yellow uniform with the word "CHINA" emblazoned across the front, preparing for a race. 
#The athlete, Liu Xiang, is seen in a crouched position, focused and ready, with the Olympic rings visible in the background, indicating the prestigious setting of the Olympic Games. As the race commences, the athletes are seen sprinting towards the hurdles, their determination evident in their powerful strides. 
#The camera captures the intensity of the competition, with the athletes' numbers and times displayed on the screen, providing a real-time update on their performance. The race reaches a climax as Liu Xiang, still in his red and yellow uniform, triumphantly crosses the finish line, his arms raised in victory. 
#The crowd in the stands erupts into cheers, their excitement palpable as they witness the athlete's success. The video concludes with a close-up shot of Liu Xiang, still basking in the glory of his victory, as the Olympic rings continue to symbolize the significance of the event.

query = 'tell me the athlete code of Liu Xiang'
image = ['./examples/liuxiang.mp4',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, _ = model.chat(tokenizer, query, image, history=his, do_sample=False, num_beams=3, use_meta=True)
print(response)
#The athlete code of Liu Xiang, as displayed on his uniform in the video, is "1363".

Multi-Image Mutli-Tune Dialog

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'Image1 <ImageHere>; Image2 <ImageHere>; Image3 <ImageHere>; I want to buy a car from the three given cars, analyze their advantages and weaknesses one by one'
image = ['./examples/cars1.jpg',
        './examples/cars2.jpg',
        './examples/cars3.jpg',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, his = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
#To analyze the advantages and disadvantages of each car, we need to consider factors such as brand reputation, performance, design, cost, and maintenance. 
#1. Mercedes-Benz: - Advantages: Known for its luxury and reliability, the Mercedes-Benz brand offers a high level of comfort, advanced technology, and superior craftsmanship. The vehicle in the image appears to be an SUV, which is versatile for both city driving and off-road conditions. - Disadvantages: Typically, Mercedes-Benz vehicles are more expensive compared to other brands, and they may require more frequent maintenance due to their luxury status. 
#2. Bugatti: - Advantages: Bugatti is renowned for producing some of the fastest and most powerful cars in the world. The vehicle in the image is a sports car, likely offering an exhilarating driving experience with its high-performance engine and advanced aerodynamics. - Disadvantages: Bugatti cars are extremely expensive, making them less accessible to the average consumer. They also require specialized knowledge for maintenance and may not be suitable for everyday driving due to their high-performance nature. 
#3. BMW: - Advantages: BMW is a well-respected manufacturer known for its innovative engineering and design. The car in the image appears to be an electric vehicle (EV), which offers advantages such as lower emissions, reduced noise pollution, and potentially lower operating costs. - Disadvantages: EVs like the one in the image may have a shorter range compared to traditional gasoline-powered vehicles, limiting their suitability for long-distance travel. 
#Additionally, the initial cost of purchasing an EV can be higher than that of a conventional car. In conclusion, each car has its own set of advantages and disadvantages. The Mercedes-Benz offers luxury and versatility, the Bugatti provides high performance and exclusivity, and the BMW represents innovation in electric vehicle technology. The choice ultimately depends on the buyer's priorities, such as comfort, performance, environmental impact, and budget.

query = 'Image4 <ImageHere>; How about the car in Image4'
image.append('./examples/cars4.jpg')        
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, history= his, use_meta=True)
print(response)
#The car in Image4 is a red sports car, which appears to be a Ferrari. Ferrari is a renowned Italian manufacturer known for producing some of the most iconic and high-performance sports cars in the world. - Advantages: Ferrari vehicles are synonymous with speed, luxury, and engineering excellence. 
#The car in the image likely offers an exhilarating driving experience with its powerful engine, advanced aerodynamics, and high-quality craftsmanship. The red color adds to the car's aesthetic appeal, making it stand out on the road. - Disadvantages: Ferrari cars are extremely expensive, making them less accessible to the average consumer. 
#They also require specialized knowledge for maintenance and may not be suitable for everyday driving due to their high-performance nature. In conclusion, the Ferrari in Image4 represents a pinnacle of automotive engineering and design, offering unmatched performance and luxury. 
#However, its high cost and specialized maintenance requirements make it less practical for everyday use compared to the other vehicles in the images.

High Resolution Image Understanding

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'Analyze the given image in a detail manner'
image = ['./examples/dubai.png']
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
#The infographic is a visual representation of various facts about Dubai. It begins with a statement about Palm Jumeirah, highlighting it as the largest artificial island visible from space. It then provides a historical context, noting that in 1968, there were only a few cars in Dubai, contrasting this with the current figure of more than 1.5 million vehicles. 
#The infographic also points out that Dubai has the world's largest Gold Chain, with 7 of the top 10 tallest hotels located there. Additionally, it mentions that the crime rate is near 0%, and the income tax rate is also 0%, with 20% of the world's total cranes operating in Dubai. Furthermore, it states that 17% of the population is Emirati, and 83% are immigrants.
#The Dubai Mall is highlighted as the largest shopping mall in the world, with 1200 stores. The infographic also notes that Dubai has no standard address system, with no zip codes, area codes, or postal services. It mentions that the Burj Khalifa is so tall that its residents on top floors need to wait longer to break fast during Ramadan. 
#The infographic also includes information about Dubai's climate-controlled City, with the Royal Suite at Burj Al Arab costing $24,000 per night. Lastly, it notes that the net worth of the four listed billionaires is roughly equal to the GDP of Honduras.

Instruction to Webpage

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'A website for Research institutions. The name is Shanghai AI lab. Top Navigation Bar is blue.Below left, an image shows the logo of the lab. In the right, there is a passage of text below that describes the mission of the laboratory.There are several images to show the research projects of Shanghai AI lab.'
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response = model.write_webpage(query, seed=202, task='Instruction-aware Webpage Generation', repetition_penalty=3.0)
print(response)
# see the Instruction-aware Webpage Generation.html

See the Instruction to Webpage results here.

Resume to Webpage

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

## the input should be a resume in markdown format
query = './examples/resume.md'
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response = model.resume_2_webpage(query, seed=202, repetition_penalty=3.0)
print(response)

See the Resume to Webpage results here.

Screenshot to Webpage

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'Generate the HTML code of this web image with Tailwind CSS.'
image = ['./examples/screenshot.jpg']
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response = model.resume_2_webpage(query, image, seed=202, repetition_penalty=3.0)
print(response)

See the Screenshot to Webpage results here.

Write Article

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = '阅读下面的材料，根据要求写作。 电影《长安三万里》的出现让人感慨，影片并未将重点全落在大唐风华上，也展现了恢弘气象的阴暗面，即旧门阀的资源垄断、朝政的日益衰败与青年才俊的壮志难酬。高适仕进无门，只能回乡>沉潜修行。李白虽得玉真公主举荐，擢入翰林，但他只是成为唐玄宗的御用文人，不能真正实现有益于朝政的志意。然而，片中高潮部分《将进酒》一节，人至中年、挂着肚腩的李白引众人乘仙鹤上天，一路从水面、瀑布飞升至银河进入仙>宫，李白狂奔着与仙人们碰杯，最后大家纵身飞向漩涡般的九重天。肉身的微贱、世路的“天生我材必有用，坎坷，拘不住精神的高蹈。“天生我材必有用，千金散尽还复来。” 古往今来，身处闲顿、遭受挫折、被病痛折磨，很多人都曾经历>了人生的“失意”，却反而成就了他们“诗意”的人生。对正在追求人生价值的当代青年来说，如何对待人生中的缺憾和困顿?诗意人生中又有怎样的自我坚守和自我认同?请结合“失意”与“诗意”这两个关键词写一篇文章。 要求:选准角度，确定>立意，明确文体，自拟标题;不要套作，不得抄袭;不得泄露个人信息;不少于 800 字。'
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response = model.write_artical(query, seed=8192)
print(response)
#诗意人生，贵在坚守
#《菜根谭》有云:“闲时要有吃紧的心思,忙里要留吃闲工夫。”人生在世,总有失意之时,当面对缺憾和困顿,诗意地生活着才能为人生增添一抹亮色。何谓诗意地生活? 所谓诗意地生活，便是在于坚守本心、直面遗憾、超越自我,在失意中寻找人生价值。
#诗意地生活,需坚守本心,淡然处之。
#陶渊明曾执意辞去彭泽县令,归隐田园,“采菊东篱下,悠然见南山”,在山水间寄情自娱；王维面对仕途失意,终日沉醉于诗酒之中,“兴来每独往,胜事空自知”,在诗酒中闲逸自如;李白仕途不顺,被赐金放还,但他依旧豪气干云,“天生我才必有用,千金散尽还复来”,在失意中坦然豁达。坚守本心，便能在遭遇失意之时守住自己的精神家园,让生活充满诗意。反之,若不能坚守本心,而只是一味迎合世俗以求得升迁,那纵使身居高位,亦会丧失生活的乐趣。
#诗意地生活,需直面遗憾,超越自我。
#“西塞山前白鹭飞,桃花流水鳜鱼肥。青箬笠,绿柳枝,半斤酒,一纶丝。五湖四海皆如此,何妨到此处归。”白居易的《渔歌子》写出了多少人的愿望:没有权势纷扰,没有贫困凄凉,只有青山绿水、白鹭鸥鸟作伴,如此自由自在的生活令人神往。然而,白居易却并没有因此真的归隐山林,而是直面人生,超越自我,写下了一首首诗意而富有现实关怀的作品。如果白居易只顾逃避人生,那又怎会拥有“大弦嘈嘈如急雨,小弦切切如私语”的绝美比喻呢?如果白居易只顾归隐山林,那又怎会写出“此曲只应天上有,人间哪得配白居易”这样的诗句呢?
#诗意地生活,需直面遗憾,坚守本心。
#李文波患有渐冻症,医生说他活不过五年,但他没有因此放弃对音乐的热爱,而是与病魔作斗争,演奏出美妙的乐曲;孙家林自幼患有脑瘫,但他不甘于命运的捉弄,终成全国最美教师;史铁生饱受疾病折磨,但他仍能发出“我常常在我的心头清点,我有什么?”的叩问,并由此走上文学道路,为后世留下丰厚的文化遗产。这些人没有逃避,而是选择直面人生的缺憾,在坚守本心的同时超越自我,最终实现了自己的价值。
#诗意地生活,是于失意中坚守本心,于缺憾中超越自我。当面对人生的缺憾与挫折,坚守本心、超越自我的同时,也必将书写属于自己的辉煌篇章。
#愿你我都能诗意地生活着!

query = 'Please write a blog based on the title: French Pastries: A Sweet Indulgence'
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response = model.write_artical(query, seed=8192)
print(response)
#French Pastries: A Sweet Indulgence
#The French are well known for their love of pastries, and it’s a love that is passed down through generations. When one visits France, they are treated to an assortment of baked goods that can range from the delicate macaron to the rich and decadent chocolate mousse. While there are many delicious types of pastries found in France, five stand out as being the most iconic. Each of these pastries has its own unique qualities that make it special.
#1. Croissant
#One of the most famous pastries from France is the croissant. It is a buttery, flaky pastry that is best enjoyed fresh from the bakery. The dough is laminated with butter, giving it its signature layers. Croissants are typically eaten for breakfast or brunch, often accompanied by coffee or hot chocolate.
#2. Macaron
#The macaron is a small, delicate French confection made from almond flour, powdered sugar, and egg whites. The macaron itself is sandwiched with a ganache or jam filling. They come in a variety of colors and flavors, making them a popular choice for both casual snacking and upscale desserts.
#3. Madeleine
#The madeleine is a small shell-shaped cake that is light and sponge-like. It is often flavored with lemon or orange zest and sometimes dipped in chocolate. Madeleines are perfect for an afternoon snack with tea or coffee.
#4. Éclair
#The éclair is a long, thin pastry filled with cream and topped with chocolate glaze. It is a classic French treat that is both sweet and satisfying. Éclairs can be found in bakeries all over France and are often enjoyed with a cup of hot chocolate.
#5. Tarte Tatin
#The tarte Tatin is an apple tart that is known for its caramelized apples and puff pastry crust. It is named after the Tatin sisters who created the recipe in the late 19th century. Tarte Tatin is best served warm with a scoop of vanilla ice cream.
#These pastries are just a few of the many delicious treats that France has to offer. Whether you are a seasoned traveler or a first-time visitor, indulging in French pastries is a must-do activity. So go ahead, treat yourself—you deserve it!

Inference on Multiple GPUs

If you have multiple GPUs, but the memory size of each GPU is not enough to accommodate the entire model, you can split the model across multiple GPUs. First, install accelerate using the command: pip install accelerate. Then, execute the follows scripts for chat:

# chat with 2 GPUs
python example_code/example_chat.py --num_gpus 2

Inference Acceleration by LMDeploy

Coming Soon

4-Bit Model

Coming Soon

Finetune

Please refer to our finetune scripts.

Gradio Deploy

We provide code for users to build a web UI demo.

Please run the command below for Chat / Composition:

# For Multimodal Chat
python gradio_demo/gradio_demo_chat.py

# For Free-form Text-Image Composition
python gradio_demo/gradio_demo_composition.py

The user guidance of UI demo is given in HERE. If you wish to change the default folder of the model, please use the --code_path=new_folder option.

Citation

If you find our models / code / papers useful in your research, please consider giving ⭐ and citations 📝, thx :)

@article{internlmxcomposer2_4khd,
      title={InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD},
      author={Xiaoyi Dong and Pan Zhang and Yuhang Zang and Yuhang Cao and Bin Wang and Linke Ouyang and Songyang Zhang and Haodong Duan and Wenwei Zhang and Yining Li and Hang Yan and Yang Gao and Zhe Chen and Xinyue Zhang and Wei Li and Jingwen Li and Wenhai Wang and Kai Chen and Conghui He and Xingcheng Zhang and Jifeng Dai and Yu Qiao and Dahua Lin and Jiaqi Wang},
      journal={arXiv preprint arXiv:2404.06512},
      year={2024}
}

@article{internlmxcomposer2,
      title={InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model},
      author={Xiaoyi Dong and Pan Zhang and Yuhang Zang and Yuhang Cao and Bin Wang and Linke Ouyang and Xilin Wei and Songyang Zhang and Haodong Duan and Maosong Cao and Wenwei Zhang and Yining Li and Hang Yan and Yang Gao and Xinyue Zhang and Wei Li and Jingwen Li and Kai Chen and Conghui He and Xingcheng Zhang and Yu Qiao and Dahua Lin and Jiaqi Wang},
      journal={arXiv preprint arXiv:2401.16420},
      year={2024}
}

@article{internlmxcomposer,
      title={InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition},
      author={Pan Zhang and Xiaoyi Dong and Bin Wang and Yuhang Cao and Chao Xu and Linke Ouyang and Zhiyuan Zhao and Shuangrui Ding and Songyang Zhang and Haodong Duan and Wenwei Zhang and Hang Yan and Xinyue Zhang and Wei Li and Jingwen Li and Kai Chen and Conghui He and Xingcheng Zhang and Yu Qiao and Dahua Lin and Jiaqi Wang},
      journal={arXiv preprint arXiv:2309.15112},
      year={2023}
}

License & Contact Us

The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表（中文）. For other questions or collaborations, please contact [email protected].

internlm-xcomposer's People

Stargazers

Watchers

Forkers

eltociear tonywhite11 zhangzhuobys chenxwh vansin 2132660698 xinyexun caoliangjie kennymckormick mayi140611 sidd1609 huiguyy joberzheng niubikala keyhsw open-mmlab-12 pan6zhang zebrajack camenduru kp-forks davidalphafox sdc17 navezjt josephrp wayshall 604487475 hsaigroup fanguoshun alvin6233 xiaoachen98 etelis sorokinvld noahclements coldra1n f901107 lcolok techthiyanes zhaoganglxh wanboyang becauseofai hunseto haorenkk123 del-zhenwu stanleyran waywardspooky md-hassan chenkenanalytic yushunxiang kustomzone anthonyyuan yzgrfsy up-pika baggiorobertozoba yuhangzang learn01one valeriawong ndavid1 meigaoms saulocatharino komingshyu hongdangshao jjjymmm bpd1069 weifei7 joeaelkhoury keyman9848 b08240 wp1811983038 yeyimilk yangfukui tfgbestneal jesean jiangzongkang varuy322 haikuoxin jerrywei1985 sssssshf happybuby dlut-lyz e06084 azure-dragon-ai sugary199 segmond apollohuang1 lhagiimn jesusoctavioas jafitz26 qzl164 chriss-0x01 hadryan meitianjinbu hhy5277 automancursor liunix61 wangbindl lvhan028 evdcush kekewind ductai199x zyjwuyan

internlm-xcomposer's Issues

example_demo code contains inter information

Hi, you seem to leave extra information in L51 and L52 in examples/web_demo.py:

 self.llm_model = AutoModel.from_pretrained('/mnt/petrelfs/share_data/dongxiaoyi/share_models/release_chat', trust_remote_code=True)
        tokenizer = AutoTokenizer.from_pretrained('/mnt/petrelfs/share_data/dongxiaoyi/share_models/release_chat', trust_remote_code=True)

in the web_demo.py.
You may want to fix them to avoid the exposure.

What is the difference between InternConvertedInternLMAttention and InternLMAttention?

InternLMAttention is used in huggingface: https://huggingface.co/internlm/internlm-chat-7b/blob/main/modeling_internlm.py#L257
InternConvertedInternLMAttention is used in this repo: https://github.com/InternLM/InternLM-XComposer/blob/main/huggingface/internlm-xcomposer/modeling_InternLM.py#L732

I set intern_converted_llm to false and found that the results were all wrong. What is the difference between InternConvertedInternLMAttention and InternLMAttention?

An example with Transformers to generate text + images

Hi,

I see very interesting the examples detailed for interacting with images with Transformers for VQA etc. However, how can we really generate text + images (with the right history context) with HF transformers?

I cannot see an example with this good feature of your work.

Thanks.

Minimum GPU memory to run example_chat.py

Hello, I am interested in your work and curious about the minimum total GPU memory required to run example_chat.py for testing. I tried it on mine, which has 8GB of memory, clearly not enough. Can you show me the rough range for it?

AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

The code

model = model.eval()
tokenizer = AutoTokenizer.from_pretrained(
    "../internlm-xcomposer-7b-4bit", trust_remote_code=True
)
model.model.tokenizer = tokenizer

met error

File "/home/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b-4bit/tokenization_InternLM_XComposer.py", line 94, in vocab_size
    return self.sp_model.get_piece_size()
AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

CUDA Out of Memory in Multi-GPU Inference

import torch
from transformers import AutoModel, AutoTokenizer
import argparse

def auto_configure_device_map(num_gpus):
    # visual_encoder 算4层
    # internlm_model.model.embed_tokens 占用1层
    # norm 和 lm_head 占用1层
    # transformer.layers 占用 32 层
    # 总共34层分配到num_gpus张卡上
    num_trans_layers = 32
    per_gpu_layers = 38 / num_gpus

    device_map = {
        'visual_encoder': 0,
        'ln_vision': 0,
        'Qformer': 0,
        'internlm_model.model.embed_tokens': 0,
        'internlm_model.model.norm': 0,
        'internlm_model.lm_head': 0,
        'query_tokens': 0,
        'flag_image_start': 0,
        'flag_image_end': 0,
        'internlm_proj.weight': 0,
        'internlm_proj.bias': 0,
    }

    # device_map = {key: 0 for key in device_map.keys()}
    
    used = 6
    gpu_target = 0
    for i in range(num_trans_layers):
        if used >= per_gpu_layers:
            gpu_target += 1
            used = 0
        assert gpu_target < num_gpus
        device_map[f'internlm_model.model.layers.{i}'] = gpu_target
        used += 1

    return device_map

torch.set_grad_enabled(False)

parser = argparse.ArgumentParser()
parser.add_argument("--num_gpus", default=4, type=int)
args = parser.parse_args()

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer-vl-7b', trust_remote_code=True, cache_dir='/storage/internLM/').cuda().eval()
if args.num_gpus > 1:
    from accelerate import dispatch_model
    device_map = auto_configure_device_map(args.num_gpus)
    model = dispatch_model(model, device_map=device_map)

tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer-vl-7b', trust_remote_code=True, cache_dir='/storage/internLM/')
model.tokenizer = tokenizer


# example image
image = 'examples/images/aiyinsitan.jpg'

# Single-Turn Pure-Text Dialogue
text = 'Please introduce Einstein.'
with torch.no_grad():
    with model.maybe_autocast():
        response = model.generate(text)
print(response)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

什么时候可以把微调代码放出来？

请问图片如何进行微调？

如题，这么模型如何进行文字和图片的微调

Error: click "Insert a fixed number of Images" button error

Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\gradio\queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "C:\Python\Python310\lib\site-packages\gradio\route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1437, in process_api
result = await self.call_function(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Python\Python310\lib\site-packages\gradio\utils.py", line 650, in wrapper
response = f(*args, **kwargs)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 468, in adjust_img
caps = self.generate_loc_cap(idx_text_sections, int(img_num), progress)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 177, in generate_loc_cap
inject_text, locs = self.generate_loc(text_sections, image_num,
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 132, in generate_loc
for _ in progress.tqdm([1], desc="image spotting"):
TypeError: Progress.tqdm() missing 1 required positional argument: 'iterable'
Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\gradio\queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "C:\Python\Python310\lib\site-packages\gradio\route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1437, in process_api
result = await self.call_function(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Python\Python310\lib\site-packages\gradio\utils.py", line 650, in wrapper
response = f(*args, **kwargs)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 468, in adjust_img
caps = self.generate_loc_cap(idx_text_sections, int(img_num), progress)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 177, in generate_loc_cap
inject_text, locs = self.generate_loc(text_sections, image_num,
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 132, in generate_loc
for _ in progress.tqdm([1], desc="image spotting"):
TypeError: Progress.tqdm() missing 1 required positional argument: 'iterable'

在vscode中运行internlm-xcomposer-7b模型，无法生成图文的结果

User: Write a popular science article about “Unraveling the Mysteries of Black Holes: A Scientific Overview” with pictures and illustrations.
Bot: I'm sorry, but as an AI language model, I don't have the capability to create visual content such as pictures and illustrations. However, I can provide you with a text-based summary of the popular science article about "Unraveling the Mysteries of Black Holes: A Scientific Overview".

如图，采用论文中提到的prompt，但是没有得到预期的结果，不知道什么原因

No module named 'transformers_modules.internlm/internlm-xcomposer-7b'

下了internlm-xcomposer-7b ，放到internlm/internlm-xcomposer-7b、但是有下列报错

PS E:\InternLM-XComposer> python .\examples\web_demo.py
Traceback (most recent call last):
File "E:\cnai\InternLM-XComposer\examples\web_demo.py", line 816, in
demo_ui = Demo_UI()
File "E:\cnai\InternLM-XComposer\examples\web_demo.py", line 47, in init
self.llm_model = AutoModel.from_pretrained(
File "C:\Python\Python310\lib\site-packages\transformers\models\auto\auto_factory.py", line 456, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "C:\Python\Python310\lib\site-packages\transformers\models\auto\configuration_auto.py", line 953, in from_pretrained
config_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
File "C:\Python\Python310\lib\site-packages\transformers\dynamic_module_utils.py", line 443, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "C:\Python\Python310\lib\site-packages\transformers\dynamic_module_utils.py", line 164, in get_class_in_module
module = importlib.import_module(module_path)
File "C:\Python\Python310\lib\importlib_init_.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.internlm/internlm-xcomposer-7b'

多轮对话网页端和终端结果不符，且终端执行时部分结果为空

请教一个问题：同样的4个question，在网页端可以有比较好的结果，但是在Notebook中执行，第4个answer的结果就为空。尝试其他多轮对话也会出现类似的情况，但不一定是第4个answer，可能是第3个或其他位置为空。
网页端：

终端：

大佬，发现一个examplechat的新问题，api服务单次请求速度提上来了，请求次数多了以后，速度会变慢

如何解决

模型推理速度特别慢和webui速度不一致

text = '图片里面的是谁？'
response, history = model.chat(text=text, image=image, history=None)
使用的是这个推理程序

ModelScope urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>

import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-xcomposer-7b')
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval()

root@autodl-container-9e2911833c-01d8deff:~/autodl-tmp# python download.py 
2023-10-10 21:52:08,079 - modelscope - INFO - PyTorch version 1.11.0+cu113 Found.
2023-10-10 21:52:08,081 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2023-10-10 21:52:08,119 - modelscope - INFO - Loading done! Current index file version is 1.9.2, with md5 1c9bf186d1e03088e5abfbd8664a1def and a total number of 941 components indexed
2023-10-10 21:52:08,686 - modelscope - WARNING - There is no version specified and there is no version in the model repository,use the master branch, which is fragile, please use it with caution!
2023-10-10 21:52:08,686 - modelscope - INFO - Model revision not specified, use revision: master
Init VIT ... Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 1354, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1298, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1247, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1007, in _send_output
    self.send(msg)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 947, in send
    self.connect()
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1421, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "/root/miniconda3/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/root/miniconda3/lib/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/root/miniconda3/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "download.py", line 8, in <module>
    model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval()
  File "/root/miniconda3/lib/python3.8/site-packages/modelscope/utils/hf_util.py", line 181, in from_pretrained
    module_obj = module_class.from_pretrained(model_dir, *model_args,
  File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 560, in from_pretrained
    return model_class.from_pretrained(
  File "/root/miniconda3/lib/python3.8/site-packages/modelscope/utils/hf_util.py", line 78, in from_pretrained
    return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3085, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_InternLM_XComposer.py", line 43, in __init__
    self.visual_encoder = create_eva_vit_g()
  File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_vit.py", line 522, in create_eva_vit_g
    cached_file = download_cached_file(url, check_hash=False, progress=True)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_utils.py", line 44, in download_cached_file
    timm_hub.download_cached_file(url, check_hash, progress)
  File "/root/miniconda3/lib/python3.8/site-packages/timm/models/_hub.py", line 85, in download_cached_file
    download_url_to_file(url, cached_file, hash_prefix, progress=progress)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/hub.py", line 457, in download_url_to_file
    u = urlopen(req)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 1397, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 1357, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>
root@autodl-container-9e2911833c-01d8deff:~/autodl-tmp#

训练lora部分报错

我想做lora训练，只保冻结了其他部分，只保留了lora_A和lora_B,但是反向传播的时候会报错
# 冻结visual_encoder, ln_vision 和 internlm_model 的参数
for param in model.visual_encoder.parameters():
param.requires_grad = False

for param in model.ln_vision.parameters():
    param.requires_grad = False

for param in model.Qformer.parameters():
    param.requires_grad = False

for param in model.internlm_model.parameters():
    param.requires_grad = False

# 解冻需要训练的lora_A和lora_B的参数
for name, param in model.named_parameters():
    if "lora_A" in name or "lora_B" in name:
        param.requires_grad = True

训练代码：
input_ids = data['input_ids'].to(device, dtype=torch.long)
labels = data['labels'].to(device, dtype=torch.long)
attention_mask= data['attention_mask'].to(device, dtype=torch.long)
outputs = model.internlm_model(
input_ids=input_ids,
labels=labels,
attention_mask=attention_mask
)
loss = outputs.loss
# 反向传播，计算当前梯度
loss.backward()

错误如下：
Traceback (most recent call last):
File "/data/xinyuuliu/InternLM-XComposer/train_model/train.py", line 190, in
main()
File "/data/xinyuuliu/InternLM-XComposer/train_model/train.py", line 175, in main
train(epoch, model, device, training_loader, optimizer, gradient_accumulation_steps,model_output_dir)
File "/data/xinyuuliu/InternLM-XComposer/train_model/train.py", line 55, in train
loss.backward()
File "/root/miniconda3/envs/internLM/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/root/miniconda3/envs/internLM/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/miniconda3/envs/internLM/lib/python3.9/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_InternLM.py", line 80, in backward
rotary_emb.apply_rotary(dq1, dq2, rearrange(cos[:seqlen], 's d -> s 1 d'),
NameError: name 'rotary_emb' is not defined

Hi, will you release the fine-tuned code and how long will it take?

as the title.

生成的文本存在重复

体验多模态对话的时候，发现会出现重复的结果文本。

InternLM-XComposer-VL-7B, The chinese ability of the model does not match the demo.

`
import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)
model_path = "internlm/internlm-xcomposer-vl-7b"
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.tokenizer = tokenizer

image = "./image/aiyinsitan.jpg"
text = '请问这张图片里面的人是谁？并介绍下他。'
response = model.generate(text, image)
print(response)
`

response: albert einstein

I tried a lot of pictures, but the effect of the model is not satisfactory, and the results are basically in English.

可以公布multi-task 训练的代码吗？

Why I meet this problem when I use model to generate?

../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [55,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [56,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [57,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [58,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [59,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [60,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [61,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [62,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed.

Where may the problem live? Thanks!

运行web_demo.py，无法初始化模型

Traceback (most recent call last):
File "/home/batch/projects/InternLM-XComposer/examples/web_demo.py", line 816, in
demo_ui = Demo_UI()
File "/home/batch/projects/InternLM-XComposer/examples/web_demo.py", line 47, in init
self.llm_model = AutoModel.from_pretrained(
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 479, in from_pretrained
return model_class.from_pretrained(
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2675, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/home/batch/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-vl-7b/modeling_InternLM_XComposer.py", line 49, in init
self.Qformer, self.query_tokens = self.init_qformer(
File "/home/batch/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-vl-7b/modeling_InternLM_XComposer.py", line 122, in init_qformer
encoder_config = BertConfig.from_pretrained("bert-base-uncased")
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/configuration_utils.py", line 547, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/configuration_utils.py", line 629, in _get_config_dict
resolved_config_file = cached_file(
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/utils/hub.py", line 452, in cached_file
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

Curious about zero-shot performance for the pretrained VLLM

Is the LLM part of internlm-xcomposer-vl-7b the base version (internlm-7b) or the chat version (internlm-chat-7b or internlm-chat-7b-v1.1)?
If the LLM part is the base version, and there is no sft on xcomposer-vl, then how does it have the ability to follow the instructions of benchmarks to get high performance on zero-shot (instead of few-shot) testing?

Where can I find the MMBench-CN and CCBench data?

Where can I find the MMBench-CN and CCBench data? The link in readme just jumps to opencompass.

运行web_demo.py 出现错误

下载的是最新的代码文件，运行时出现如下错误：
正在接受医生的检查，医生在为它量体温。', 10: '一只宠物狗在主人帮助下清理自己的粪便，主人在旁边指导。', 12: '一只宠物狗在主人帮助下处理自己的毛发，主人在旁边指导。'}
https://static.openxlab.org.cn/lingbi/jpg-images/61ec717e9ee8ffd984f79d01838de29e352b6aa9b9a04bb60e56a92f00fa72db.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/78668d1138f169a78284213bb2df7991cf77ff48bbdb6f938a3536d238ccbdc7.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/9db1b1ecdfc698526459d4bb519ebd1dfc3b9be9e4983ff8b862dd970257793a.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/2f3ce59b613d2b7b989a919819ef1aed67dcd56d6381b0ea2af68972c700c367.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/3e4914997caf27f88b1219070586a7aa9ce79c6afacf824a640396abab216230.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/41eb2c71b25921e4ccb423df6e402caf74d71bc981b9bd699a44bc2d66ec0524.jpg
download image with url
image downloaded
model_select_image
Traceback (most recent call last):
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/blocks.py", line 1437, in process_api
result = await self.call_function(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/blocks.py", line 1123, in call_function
prediction = await utils.async_iteration(iterator)
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 512, in async_iteration
return await iterator.anext()
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 505, in anext
return await anyio.to_thread.run_sync(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 488, in run_sync_iterator_async
return next(iterator)
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 638, in gen_wrapper
yield from f(*args, **kwargs)
File "/data/liwx/InternLM-XComposer-main/InternLM-XComposer-main/examples/web_demo.py", line 444, in generate_article
self.selected = self.model_select_image(output_text, caps,
File "/data/liwx/InternLM-XComposer-main/InternLM-XComposer-main/examples/web_demo.py", line 299, in model_select_image
pre_img.append(images[len(pre_img) + ans2idx[answer]].cpu())
KeyError: '<'

不可以单独生成图片吗？

我这边调用的是internlm-xcomposer-7b的模型，然后输入如下命令：

>>> text='请帮我画一张长城的照片'
>>> response, history = model.chat(text=text, image=None, history=None)
>>> print(response)
很抱歉，作为一个语言模型，我并不具备绘画能力，无法为您画一张长城的照片。但是，如果您愿意，我可以为您提供一些关于长城的资料和信息，帮助您更好地了解这座伟大的建筑。
>>>

他是只能生成图文并茂的文章吗？

Suggestion : make examples/web_demo.py more secure

change the last line in examples/web_demo.py to make more securte , Not everyone need expose service to public

if name == "main":
demo.launch(share=True, server_name="0.0.0.0", server_port=11111)

to
demo.launch(share=False, server_name="127.0.0.1", server_port=11111)

请问可以int4量化加载吗

training data?

Will release training data?

question about ShareGPT4V dataset

where can I get the images in ShareGPT4V dataset.

Which model is used as the Share Captioner in ShareGPT4V?

Great work and congrat!
Can I ask which model is used as the captioner in ShareGPT4V data generation? Seems that the paper does not mention it very clearly.

怎样获得模型的参数量和tops数？

感谢这么出色的工作，想咨询一下，如何获取模型的参数量和tops数？

OCR support ?

Is it possible to make it work with ocr capability?

Are the pictures automatically selected after generating the article or do they need to be selected manually?

请问生成文章后的图片是自动选择的还是需要手动选择？

I would like to know how to use GPT4V to build the ShareGPT4V data set. Will the construction code or prompt be open source?

support for multiple GPU inference

Hello, I am interested in your work and I am curious about how to run internlm-xcomposer-7b in an environment that only contains 24GB GPUs. I am looking forward to a new version of inference code that supports multiple gpu inference.

Thank you

Failed to load 4-bits weights from HuggingFace

Description

Unable to load the quantized weights (4 bits) from HuggingFace

Code

The code is a direct copy from the file examples/example_chat_4bit_en.py

import torch
from transformers import AutoModel, AutoTokenizer

import auto_gptq
from auto_gptq.modeling import BaseGPTQForCausalLM

auto_gptq.modeling._base.SUPPORTED_MODELS = ["InternLMXComposer"]

torch.set_grad_enabled(False)


class InternLMXComposerQForCausalLM(BaseGPTQForCausalLM):
    layers_block_name = "internlm_model.model.layers"
    outside_layer_modules = [
        "query_tokens",
        "flag_image_start",
        "flag_image_end",
        "visual_encoder",
        "Qformer",
        "internlm_model.model.embed_tokens",
        "internlm_model.model.norm",
        "internlm_proj",
        "internlm_model.lm_head",
    ]
    inside_layer_modules = [
        ["self_attn.k_proj", "self_attn.v_proj", "self_attn.q_proj"],
        ["self_attn.o_proj"],
        ["mlp.gate_proj"],
        ["mlp.up_proj"],
        ["mlp.down_proj"],
    ]


# init model and tokenizer
model = InternLMXComposerQForCausalLM.from_quantized(
    "internlm/internlm-xcomposer-7b-4bit", trust_remote_code=True, device="cuda:0"
)
model = model.eval()
tokenizer = AutoTokenizer.from_pretrained(
    "internlm/internlm-xcomposer-7b-4bit", trust_remote_code=True
)
model.model.tokenizer = tokenizer

# example image
image = "examples/images/aiyinsitan.jpg"

# Multi-Turn Text-Image Dialogue
# 1st turn
text = 'Describe this image in detial.'
image = "examples/images/aiyinsitan.jpg"
response, history = model.chat(text, image)
print(f"User: {text}")
print(f"Bot: {response}") 
# The image features a black and white portrait of Albert Einstein, the famous physicist and mathematician. 
# Einstein is seated in the center of the frame, looking directly at the camera with a serious expression on his face. 
# He is dressed in a suit, which adds a touch of professionalism to his appearance.

Error

Traceback (most recent call last):
  File "/mnt/bd/dev-pierre-oreistein-st/sandbox/test_internlm_vl/test_internlm_vl_4bits", line 35, in <module>
    model = InternLMXComposerQForCausalLM.from_quantized(
  File "/home/pierre/.pyenv/versions/dev3.9/lib/python3.9/site-packages/auto_gptq/modeling/_base.py", line 847, in from_quantized
    raise FileNotFoundError(f"Could not find a model in {model_name_or_path} with a name in {', '.join(searched_files)}. Please specify the argument model_basename to use a custom file name.")
FileNotFoundError: Could not find a model in internlm/internlm-xcomposer-7b-4bit with a name in gptq_model-4bit-128g.safetensors, model.safetensors. Please specify the argument model_basename to use a custom file name.

Ideas

According to this similar issue I need to specify the model file. However, I was unable to find it on HuggingFace. Could you help me with this?

Thanks in advance for your help!

请问现在支持中文数据SFT吗？

请问现在有SFT的教程嘛？支持中文数据微调么？

internlm-xcomposer-7b-4bit 这个量化模型，运行失败

import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer
import os

torch.set_grad_enabled(False)

# init model and tokenizer
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-xcomposer-7b-4bit', revision = 'master')
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model.tokenizer = tokenizer

Video QA with InternLM-XComposer and Fine-tuning Code Availability

1、How does InternLM-XComposer handle video QA? The examples seem to only involve images.
2、Will the fine-tuning code be open-sourced soon?

图片未生成

(xcomposer) ➜  InternLM-XComposer git:(main) ✗ python examples/web_demo.py
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/button.py:89: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Button(...)` instead of `return gr.Button.update(...)`.
  warnings.warn(
Init VIT ... Done
Init Perceive Sampler ... Done
Init InternLM ... Done
Loading checkpoint sha
 load model done:  <class 'transformers_modules.internlm-xcomposer-7b.modeling_InternLM_XComposer.InternLMXComposerForCausalLM'>
/cpfs01/user/huwenxing/InternLM-XComposer/examples/web_demo.py:1009: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  chat_textbox = gr.Textbox(
Running on local URL:  http://0.0.0.0:11111
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/chatbot.py:161: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Chatbot(...)` instead of `return gr.Chatbot.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/textbox.py:163: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Textbox(...)` instead of `return gr.Textbox.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/button.py:89: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Button(...)` instead of `return gr.Button.update(...)`.
  warnings.warn(
init
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/markdown.py:92: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Markdown(...)` instead of `return gr.Markdown.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/gallery.py:143: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Gallery(...)` instead of `return gr.Gallery.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/button.py:89: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Button(...)` instead of `return gr.Button.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/textbox.py:163: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Textbox(...)` instead of `return gr.Textbox.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/helpers.py:818: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Textbox(...)` instead of `return gr.update(...)
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/asyncio/events.py:80: GradioUnusedKwargWarning: You have unused kwarg parameters in Button, please remove them: {'mode': 'static'}
  self._context.run(self._callback, *self._args)

Could not create share link. Missing file: /cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/frpc_linux_amd64_v0.2. 

Please check your internet connection. This can happen if your antivirus software blocks the download of this file. You can install manually by following these steps: 

1. Download this file: https://cdn-media.huggingface.co/frpc-gradio-0.2/frpc_linux_amd64
2. Rename the downloaded file to: frpc_linux_amd64_v0.2
3. Move the file to this location: /cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio
<object object at 0x7fb765662e10>
敦煌，位于甘肃省西北部，地处河西走廊西端，是古代丝绸之路上的重要交通枢纽和商埠重镇。它拥有着丰富的历史文化遗产，包括莫高窟、鸣沙山月牙泉、雅丹魔鬼城等著名景点。同时，敦煌也是**历史文化名城之一，有着深厚的文化底蕴和独特的民俗风情。

**一、莫高窟**

莫高窟，又名“千佛洞”，是**四大石窟之一，始建于十六国的前秦时期，距今已有1600多年的历史。它是世界上现存规模最大、内容最丰富的佛教艺术宝库，被誉为“东方卢浮宫”。莫高窟内共有735个洞窟，壁画总面积达45000多平方米，彩塑佛像5000余尊，是世界上最大的佛教艺术中心之一。在这里，游客可以欣赏到精美的壁画、雕塑和音乐表演，感受佛教文化的博大精深。

**二、鸣沙山月牙泉**

鸣沙山月牙泉是一处自然奇观，位于敦煌市西北约40公里处的沙漠中。这里地势平坦，沙丘连绵起伏，形成了一片广袤无垠的沙漠景观。而月牙泉则静静地镶嵌在这片沙漠之中，泉水清澈见底，形状如新月，故称“月牙泉”。每到夜晚，月亮升起时，月牙泉周围会发出阵阵清脆的响声，犹如天籁之音，令人心旷神怡。

**三、雅丹魔鬼城**

雅丹魔鬼城是一座典型的风蚀地貌，位于敦煌市西南约100公里的戈壁滩上。这里的地貌奇特，呈现出一种荒凉、神秘、恐怖的景象。由于长期受到风吹日晒雨淋，这里的岩石表面已经变得凹凸不平，形成了各种形态各异的造型，有的像动物，有的像人物，有的像建筑，让人不禁感叹大自然的鬼斧神工。

**四、其他景点**

除了莫高窟、鸣沙山月牙泉和雅丹魔鬼城之外，敦煌还有许多其他值得一游的景点，如玉门关、阳关、锁阳城、汉长城遗址等。这些景点都具有悠久的历史和文化价值，吸引着众多游客前来参观游览。

**五、特色美食**

敦煌的特色美食也非常丰富，其中最有名的当属驴肉黄面了。驴肉黄面是一道以驴肉为主要食材的面食，味道鲜香可口，深受当地人和游客的喜爱。此外，还有羊肉泡馍、胡羊焖饼、烤全羊等特色美食，都是不容错过的美味佳肴。

**六、旅游小贴士**

1. 敦煌气候干燥，日照强烈，紫外线较强，建议游客做好防晒措施，携带防晒霜、遮阳帽、太阳镜等物品。2. 敦煌属于高原地区，海拔较高，游客应注意休息，避免剧烈运动，以免出现高原反应。3. 敦煌旅游景点较多，游客应提前规划好行程，合理安排时间，避免走马观花，错过重要的景点。4. 在敦煌旅游期间，要注意保护环境，不乱扔垃圾，不破坏文物古迹，做一个文明的游客。总之，敦煌是一座历史悠久、文化底蕴深厚、风景优美的城市，是一个值得一游的好去处。希望这篇文章能够帮助您更好地了解敦煌，为您的旅行提供一些有用的信息。
敦煌，位于甘肃省西北部，地处河西走廊西端，是古代丝绸之路上的重要交通枢纽和商埠重镇。它拥有着丰富的历史文化遗产，包括莫高窟、鸣沙山月牙泉、雅丹魔鬼城等著名景点。同时，敦煌也是**历史文化名城之一，有着深厚的文化底蕴和独特的民俗风情。
**一、莫高窟**
莫高窟，又名“千佛洞”，是**四大石窟之一，始建于十六国的前秦时期，距今已有1600多年的历史。它是世界上现存规模最大、内容最丰富的佛教艺术宝库，被誉为“东方卢浮宫”。莫高窟内共有735个洞窟，壁画总面积达45000多平方米，彩塑佛像5000余尊，是世界上最大的佛教艺术中心之一。在这里，游客可以欣赏到精美的壁画、雕塑和音乐表演，感受佛教文化的博大精深。
**二、鸣沙山月牙泉**
鸣沙山月牙泉是一处自然奇观，位于敦煌市西北约40公里处的沙漠中。这里地势平坦，沙丘连绵起伏，形成了一片广袤无垠的沙漠景观。而月牙泉则静静地镶嵌在这片沙漠之中，泉水清澈见底，形状如新月，故称“月牙泉”。每到夜晚，月亮升起时，月牙泉周围会发出阵阵清脆的响声，犹如天籁之音，令人心旷神怡。
**三、雅丹魔鬼城**
雅丹魔鬼城是一座典型的风蚀地貌，位于敦煌市西南约100公里的戈壁滩上。这里的地貌奇特，呈现出一种荒凉、神秘、恐怖的景象。由于长期受到风吹日晒雨淋，这里的岩石表面已经变得凹凸不平，形成了各种形态各异的造型，有的像动物，有的像人物，有的像建筑，让人不禁感叹大自然的鬼斧神工。
**四、其他景点**
除了莫高窟、鸣沙山月牙泉和雅丹魔鬼城之外，敦煌还有许多其他值得一游的景点，如玉门关、阳关、锁阳城、汉长城遗址等。这些景点都具有悠久的历史和文化价值，吸引着众多游客前来参观游览。
**五、特色美食**
敦煌的特色美食也非常丰富，其中最有名的当属驴肉黄面了。驴肉黄面是一道以驴肉为主要食材的面食，味道鲜香可口，深受当地人和游客的喜爱。此外，还有羊肉泡馍、胡羊焖饼、烤全羊等特色美食，都是不容错过的美味佳肴。
**六、旅游小贴士**
1. 敦煌气候干燥，日照强烈，紫外线较强，建议游客做好防晒措施，携带防晒霜、遮阳帽、太阳镜等物品。2. 敦煌属于高原地区，海拔较高，游客应注意休息，避免剧烈运动，以免出现高原反应。3. 敦煌旅游景点较多，游客应提前规划好行程，合理安排时间，避免走马观花，错过重要的景点。4. 在敦煌旅游期间，要注意保护环境，不乱扔垃圾，不破坏文物古迹，做一个文明的游客。总之，敦煌是一座历史悠久、文化底蕴深厚、风景优美的城市，是一个值得一游的好去处。希望这篇文章能够帮助您更好地了解敦煌，为您的旅行提供一些有用的信息。

how can i install `rotary_emb` ?

from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained('/root/autodl-tmp/models/internlm7bxc', trust_remote_code=True).cuda().eval()
Traceback (most recent call last):
File "", line 1, in
File "/root/miniconda3/envs/llm_chat/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 550, in from_pretrained
model_class = get_class_from_dynamic_module(
File "/root/miniconda3/envs/llm_chat/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 497, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "/root/miniconda3/envs/llm_chat/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 199, in get_class_in_module
module = importlib.import_module(module_path)
File "/root/miniconda3/envs/llm_chat/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/root/.cache/huggingface/modules/transformers_modules/internlm7bxc/modeling_InternLM_XComposer.py", line 18, in
from .modeling_InternLM import *
File "/root/.cache/huggingface/modules/transformers_modules/internlm7bxc/modeling_InternLM.py", line 5, in
import rotary_emb

请问可以部署多卡4090的推理吗？

单卡显存不够

All LoRA weights are 0 in InternLM-XComposer-VL?

In paper, InternLM-XComposer-VL is trained by multi-task training in StageB.
It seems training LoRA, but the weights are 0 when loading the released model.

AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

After update web_demo.py ，this error occurred

File "/home/enbo/.cache/huggingface/modules/transformers_modules/internlm-xcomposer/tokenization_InternLM_XComposer.py", line 106, in get_vocab
vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
File "/home/enbo/.cache/huggingface/modules/transformers_modules/internlm-xcomposer/tokenization_InternLM_XComposer.py", line 94, in vocab_size
return self.sp_model.get_piece_size()
AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

No response using model.chat

Hi, I'm using internLM-XComposer to generate some data and I have tried your demo, it works fine when I'm using model.generate.
But when I using model.chat(), model can only reply to the first call, subsequent calls are unresponsive and return empty string.

I'm using:
torch==2.0.1
transformers==4.33.2

My hardware is a single 3090 with 24G GPU memory, so I use 4-bit quantized models and tried your examples/example_chat_4bit.py and find this issue.

Is my environmental issue causing this problem or something else?

internlm / internlm-xcomposer Goto Github PK

internlm-xcomposer's Introduction

Multimodal Projects of Our Team

Demo Video

News and Updates

Model Zoo

Evaluation

Compared with closed-source APIs and previous SOTAs on Video and Structural High-resolution images.

Compared with closed-source APIs and previous SOTAs on Multi-Image dialog and General Visual QA Benchmarks.

Requirements

Installation

Quickstart

Inference on Multiple GPUs

Inference Acceleration by LMDeploy

4-Bit Model

Finetune

Gradio Deploy

Citation

License & Contact Us

internlm-xcomposer's People

Stargazers

Watchers

Forkers

internlm-xcomposer's Issues

Description

Code

Error

Ideas

Recommend Projects

Recommend Topics

Recommend Org

Jobs