GithubHelp home page GithubHelp logo

internlm / internlm-xcomposer Goto Github PK

View Code? Open in Web Editor NEW
2.2K 40.0 138.0 68.38 MB

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Python 91.86% Shell 3.36% Jupyter Notebook 4.79%
chatgpt visual-language-learning multi-modality foundation gpt-4 instruction-tuning mllm multimodal vision-language-model language-model

internlm-xcomposer's Introduction

InternLM-XComposer-2.5

InternLM-XComposer2.5 🤗  | XComposer2.5 Technical Report 📄

English | 简体中文

Thanks the community for HuggingFace Demo | OpenXLab Demo of InternLM-XComposer-2.5.

👋 join us on Discord and WeChat

InternLM%2FInternLM-XComposer | Trendshift


Multimodal Projects of Our Team

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

InternLM-XComposer2-: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Models

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

ShareGPT4V: Improving Large Multi-modal Models with Better Captions

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models


InternLM-XComposer-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. IXC-2.5 is trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. This long-context capability allows IXC-2.5 to perform exceptionally well in tasks requiring extensive input and output contexts.

  • Ultra-High Resolution Understanding: IXC-2.5 enhances the dynamic resolution solution proposed in IXC2-4KHD with a native 560 × 560 ViT vision encoder, supporting high-resolution images with any aspect ratio.

  • Fine-Grained Video Understanding: IXC-2.5 treats videos as a ultra-high-resolution composite picture consisting of tens to hundreds of frames, allowing it to capture fine details through dense sampling and higher resolution for each frame.

  • Multi-Turn Multi-Image Dialogue: IXC-2.5 supports free-form multi-turn multi-image dialogue, allowing it to naturally interact with humans in multi-round conversations.

  • Webpage Crafting: IXC-2.5 can be readily applied to create webpages by composing source code (HTML, CSS, and JavaScript) following text-image instructions.

  • Composing High-Quality Text-Image Articles: IXC-2.5 leverages specially designed Chain-of-Thought (CoT) and Direct Preference Optimization (DPO) techniques to significantly enhance the quality of its written content.

  • Awesome performance: IXC-2.5 has been evaluated on 28 benchmarks, outperforming existing open-source state-of-the-art models on 16 benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on 16 key tasks.

Please refer to Technical Report for more details.

Demo Video

🔥 For the best experience, please keep the audio on while enjoying the video.

demo3_en.mp4

Youtube Video

Please refer to Chinese Demo for the demo of the Chinese version.

News and Updates

  • 2024.02.02 🎉🎉🎉 The finetune code of InternLM-XComposer2-VL-7B are publicly available.
  • 2024.01.26 🎉🎉🎉 The evaluation code of InternLM-XComposer2-VL-7B are publicly available.
  • 2024.01.26 🎉🎉🎉 InternLM-XComposer2-7B and InternLM-XComposer-VL2-7B are publicly available on Hugging Face and ModelScope.
  • 2024.01.26 🎉🎉🎉 We release a technical report for more details of InternLM-XComposer2 series.
  • 2023.11.22 🎉🎉🎉 We release the ShareGPT4V, a large-scale highly descriptive image-text dataset generated by GPT4-Vision and a superior large multimodal model, ShareGPT4V-7B.
  • 2023.10.30 🎉🎉🎉 InternLM-XComposer-VL achieved the top 1 ranking in both Q-Bench and Tiny LVLM.
  • 2023.10.19 🎉🎉🎉 Support for inference on multiple GPUs. Two 4090 GPUs are sufficient for deploying our demo.
  • 2023.10.12 🎉🎉🎉 4-bit demo is supported, model files are available in Hugging Face and ModelScope.
  • 2023.10.8 🎉🎉🎉 InternLM-XComposer-7B and InternLM-XComposer-VL-7B are publicly available on ModelScope.
  • 2023.9.27 🎉🎉🎉 The evaluation code of InternLM-XComposer-VL-7B are publicly available.
  • 2023.9.27 🎉🎉🎉 InternLM-XComposer-7B and InternLM-XComposer-VL-7B are publicly available on Hugging Face.
  • 2023.9.27 🎉🎉🎉 We release a technical report for more details of our model series.

Model Zoo

Model Usage Transformers(HF) ModelScope(HF) Release Date
InternLM-XComposer-2.5 Video Understanding, Multi-image Multi-tune Dialog, 4K Resolution Understanding, Web Craft, Article creation, Benchmark 🤗internlm-xcomposer2.5 internlm-xcomposer2.5 2024-07-03
InternLM-XComposer2-4KHD 4K Resolution Understanding, Benchmark, VL-Chat 🤗internlm-xcomposer2-4khd-7b internlm-xcomposer2-4khd-7b 2024-04-09
InternLM-XComposer2-VL-1.8B Benchmark, VL-Chat 🤗internlm-xcomposer2-vl-1_8b internlm-xcomposer2-vl-1_8b 2024-04-09
InternLM-XComposer2 Text-Image Composition 🤗internlm-xcomposer2-7b internlm-xcomposer2-7b 2024-01-26
InternLM-XComposer2-VL Benchmark, VL-Chat 🤗internlm-xcomposer2-vl-7b internlm-xcomposer2-vl-7b 2024-01-26
InternLM-XComposer2-4bit Text-Image Composition 🤗internlm-xcomposer2-7b-4bit internlm-xcomposer2-7b-4bit 2024-02-06
InternLM-XComposer2-VL-4bit Benchmark, VL-Chat 🤗internlm-xcomposer2-vl-7b-4bit internlm-xcomposer2-vl-7b-4bit 2024-02-06
InternLM-XComposer Text-Image Composition, VL-Chat 🤗internlm-xcomposer-7b internlm-xcomposer-7b 2023-09-26
InternLM-XComposer-4bit Text-Image Composition, VL-Chat 🤗internlm-xcomposer-7b-4bit internlm-xcomposer-7b-4bit 2023-09-26
InternLM-XComposer-VL Benchmark 🤗internlm-xcomposer-vl-7b internlm-xcomposer-vl-7b 2023-09-26

Evaluation

We evaluate InternLM-XComposer-2.5 on 28 multimodal benchmarks, including image benchmarks MMDU, MMStar, RealWorldQA, Design2Code, DocVQA, Infographics VQA, TextVQA, ChartQA, OCRBench, DeepFrom, WTQ, VisualMRC, TabFact, MathVista, MMMU, AI2D, MME, MMBench, MMBench-CN, SEED-Bench, HallusionBench, MM-Vet, and video benchmarks MVBench, MLVU, Video-MME, MMBench-Video, TempCompass

See Evaluation Details here.

Compared with closed-source APIs and previous SOTAs on Video and Structural High-resolution images.

MVBench MLVU MME-Video MMBench-Video TempCompass DocVQA ChartVQA InfoVQA TextVQA OCRBench DeepForm WTQ VisualMRC TabFact
VideoChat2 InternVL1.5 LIVA InternVL1.5 Qwen-VL InternVL1.5 InternVL1.5 InternVL1.5 InternVL1.5 GLM-4v DocOwl 1.5 DocOwl 1.5 DocOwl 1.5 DocOwl 1.5
7B 26B 34B 26B 7B 26B 26B 26B 26B 9B 8B 8B 8B 8B
60.4 50.4 59.0 42.0 52.9 90.9 83.8 72.5 80.6 77.6 68.8 40.6 246.4 80.2
GPT-4V 43.5 49.2 59.9 56.0 --- 88.4 78.5 75.1 78.0 51.6 --- --- --- ---
Gemini-Pro --- --- 75.0 49.3 67.1 88.1 74.1 75.2 74.6 68.0 --- --- --- ---
Ours 69.1 58.8 55.8 46.9 90.9 82.2 69.9 78.2 69.0 71.2 53.6 307.5 85.2

Compared with closed-source APIs and previous SOTAs on Multi-Image dialog and General Visual QA Benchmarks.

MVBench MLVU MME-Video MMBench-Video TempCompass DocVQA ChartVQA InfoVQA TextVQA OCRBench DeepForm WTQ VisualMRC TabFact
VideoChat2 InternVL1.5 LIVA InternVL1.5 Qwen-VL InternVL1.5 InternVL1.5 InternVL1.5 InternVL1.5 GLM-4v DocOwl 1.5 DocOwl 1.5 DocOwl 1.5 DocOwl 1.5
7B 26B 34B 26B 7B 26B 26B 26B 26B 9B 8B 8B 8B 8B
60.4 50.4 59.0 42.0 58.4 90.9 83.8 72.5 80.6 77.6 68.8 40.6 246.4 80.2
GPT-4V 43.5 49.2 59.9 56.0 --- 88.4 78.5 75.1 78.0 51.6 --- --- --- ---
Gemini-Pro --- --- 75.0 49.3 70.6 88.1 74.1 75.2 74.6 68.0 --- --- --- ---
Ours 69.1 58.8 55.8 46.9 67.1 90.9 82.2 69.9 78.2 69.0 71.2 53.6 307.5 85.2

Requirements

  • python 3.8 and above
  • pytorch 1.12 and above, 2.0 and above are recommended
  • CUDA 11.4 and above are recommended (this is for GPU users)
  • flash-attention2 is required for high-resolution usage of InternLM-XComposer2.5.

Installation

Before running the code, make sure you have setup the environment and installed the required packages. Make sure you meet the above requirements, and then install the dependent libraries. Please refer to the installation instructions

Quickstart

We provide a simple example to show how to use InternLM-XComposer-2.5 with 🤗 Transformers.

Video Understanding
import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'Here are some frames of a video. Describe this video in detail'
image = ['./examples/liuxiang.mp4',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, his = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
#The video opens with a shot of an athlete, dressed in a red and yellow uniform with the word "CHINA" emblazoned across the front, preparing for a race. 
#The athlete, Liu Xiang, is seen in a crouched position, focused and ready, with the Olympic rings visible in the background, indicating the prestigious setting of the Olympic Games. As the race commences, the athletes are seen sprinting towards the hurdles, their determination evident in their powerful strides. 
#The camera captures the intensity of the competition, with the athletes' numbers and times displayed on the screen, providing a real-time update on their performance. The race reaches a climax as Liu Xiang, still in his red and yellow uniform, triumphantly crosses the finish line, his arms raised in victory. 
#The crowd in the stands erupts into cheers, their excitement palpable as they witness the athlete's success. The video concludes with a close-up shot of Liu Xiang, still basking in the glory of his victory, as the Olympic rings continue to symbolize the significance of the event.

query = 'tell me the athlete code of Liu Xiang'
image = ['./examples/liuxiang.mp4',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, _ = model.chat(tokenizer, query, image, history=his, do_sample=False, num_beams=3, use_meta=True)
print(response)
#The athlete code of Liu Xiang, as displayed on his uniform in the video, is "1363".
Multi-Image Mutli-Tune Dialog
import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'Image1 <ImageHere>; Image2 <ImageHere>; Image3 <ImageHere>; I want to buy a car from the three given cars, analyze their advantages and weaknesses one by one'
image = ['./examples/cars1.jpg',
        './examples/cars2.jpg',
        './examples/cars3.jpg',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, his = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
#To analyze the advantages and disadvantages of each car, we need to consider factors such as brand reputation, performance, design, cost, and maintenance. 
#1. Mercedes-Benz: - Advantages: Known for its luxury and reliability, the Mercedes-Benz brand offers a high level of comfort, advanced technology, and superior craftsmanship. The vehicle in the image appears to be an SUV, which is versatile for both city driving and off-road conditions. - Disadvantages: Typically, Mercedes-Benz vehicles are more expensive compared to other brands, and they may require more frequent maintenance due to their luxury status. 
#2. Bugatti: - Advantages: Bugatti is renowned for producing some of the fastest and most powerful cars in the world. The vehicle in the image is a sports car, likely offering an exhilarating driving experience with its high-performance engine and advanced aerodynamics. - Disadvantages: Bugatti cars are extremely expensive, making them less accessible to the average consumer. They also require specialized knowledge for maintenance and may not be suitable for everyday driving due to their high-performance nature. 
#3. BMW: - Advantages: BMW is a well-respected manufacturer known for its innovative engineering and design. The car in the image appears to be an electric vehicle (EV), which offers advantages such as lower emissions, reduced noise pollution, and potentially lower operating costs. - Disadvantages: EVs like the one in the image may have a shorter range compared to traditional gasoline-powered vehicles, limiting their suitability for long-distance travel. 
#Additionally, the initial cost of purchasing an EV can be higher than that of a conventional car. In conclusion, each car has its own set of advantages and disadvantages. The Mercedes-Benz offers luxury and versatility, the Bugatti provides high performance and exclusivity, and the BMW represents innovation in electric vehicle technology. The choice ultimately depends on the buyer's priorities, such as comfort, performance, environmental impact, and budget.

query = 'Image4 <ImageHere>; How about the car in Image4'
image.append('./examples/cars4.jpg')        
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, history= his, use_meta=True)
print(response)
#The car in Image4 is a red sports car, which appears to be a Ferrari. Ferrari is a renowned Italian manufacturer known for producing some of the most iconic and high-performance sports cars in the world. - Advantages: Ferrari vehicles are synonymous with speed, luxury, and engineering excellence. 
#The car in the image likely offers an exhilarating driving experience with its powerful engine, advanced aerodynamics, and high-quality craftsmanship. The red color adds to the car's aesthetic appeal, making it stand out on the road. - Disadvantages: Ferrari cars are extremely expensive, making them less accessible to the average consumer. 
#They also require specialized knowledge for maintenance and may not be suitable for everyday driving due to their high-performance nature. In conclusion, the Ferrari in Image4 represents a pinnacle of automotive engineering and design, offering unmatched performance and luxury. 
#However, its high cost and specialized maintenance requirements make it less practical for everyday use compared to the other vehicles in the images.
High Resolution Image Understanding
import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'Analyze the given image in a detail manner'
image = ['./examples/dubai.png']
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
#The infographic is a visual representation of various facts about Dubai. It begins with a statement about Palm Jumeirah, highlighting it as the largest artificial island visible from space. It then provides a historical context, noting that in 1968, there were only a few cars in Dubai, contrasting this with the current figure of more than 1.5 million vehicles. 
#The infographic also points out that Dubai has the world's largest Gold Chain, with 7 of the top 10 tallest hotels located there. Additionally, it mentions that the crime rate is near 0%, and the income tax rate is also 0%, with 20% of the world's total cranes operating in Dubai. Furthermore, it states that 17% of the population is Emirati, and 83% are immigrants.
#The Dubai Mall is highlighted as the largest shopping mall in the world, with 1200 stores. The infographic also notes that Dubai has no standard address system, with no zip codes, area codes, or postal services. It mentions that the Burj Khalifa is so tall that its residents on top floors need to wait longer to break fast during Ramadan. 
#The infographic also includes information about Dubai's climate-controlled City, with the Royal Suite at Burj Al Arab costing $24,000 per night. Lastly, it notes that the net worth of the four listed billionaires is roughly equal to the GDP of Honduras.
Instruction to Webpage
import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'A website for Research institutions. The name is Shanghai AI lab. Top Navigation Bar is blue.Below left, an image shows the logo of the lab. In the right, there is a passage of text below that describes the mission of the laboratory.There are several images to show the research projects of Shanghai AI lab.'
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response = model.write_webpage(query, seed=202, task='Instruction-aware Webpage Generation', repetition_penalty=3.0)
print(response)
# see the Instruction-aware Webpage Generation.html 

See the Instruction to Webpage results here.

Resume to Webpage
import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

## the input should be a resume in markdown format
query = './examples/resume.md'
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response = model.resume_2_webpage(query, seed=202, repetition_penalty=3.0)
print(response)

See the Resume to Webpage results here.

Screenshot to Webpage
import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'Generate the HTML code of this web image with Tailwind CSS.'
image = ['./examples/screenshot.jpg']
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response = model.resume_2_webpage(query, image, seed=202, repetition_penalty=3.0)
print(response)

See the Screenshot to Webpage results here.

Write Article
import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = '阅读下面的材料,根据要求写作。 电影《长安三万里》的出现让人感慨,影片并未将重点全落在大唐风华上,也展现了恢弘气象的阴暗面,即旧门阀的资源垄断、朝政的日益衰败与青年才俊的壮志难酬。高适仕进无门,只能回乡>沉潜修行。李白虽得玉真公主举荐,擢入翰林,但他只是成为唐玄宗的御用文人,不能真正实现有益于朝政的志意。然而,片中高潮部分《将进酒》一节,人至中年、挂着肚腩的李白引众人乘仙鹤上天,一路从水面、瀑布飞升至银河进入仙>宫,李白狂奔着与仙人们碰杯,最后大家纵身飞向漩涡般的九重天。肉身的微贱、世路的“天生我材必有用,坎坷,拘不住精神的高蹈。“天生我材必有用,千金散尽还复来。” 古往今来,身处闲顿、遭受挫折、被病痛折磨,很多人都曾经历>了人生的“失意”,却反而成就了他们“诗意”的人生。对正在追求人生价值的当代青年来说,如何对待人生中的缺憾和困顿?诗意人生中又有怎样的自我坚守和自我认同?请结合“失意”与“诗意”这两个关键词写一篇文章。 要求:选准角度,确定>立意,明确文体,自拟标题;不要套作,不得抄袭;不得泄露个人信息;不少于 800 字。'
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response = model.write_artical(query, seed=8192)
print(response)
#诗意人生,贵在坚守
#《菜根谭》有云:“闲时要有吃紧的心思,忙里要留吃闲工夫。”人生在世,总有失意之时,当面对缺憾和困顿,诗意地生活着才能为人生增添一抹亮色。何谓诗意地生活? 所谓诗意地生活,便是在于坚守本心、直面遗憾、超越自我,在失意中寻找人生价值。
#诗意地生活,需坚守本心,淡然处之。
#陶渊明曾执意辞去彭泽县令,归隐田园,“采菊东篱下,悠然见南山”,在山水间寄情自娱;王维面对仕途失意,终日沉醉于诗酒之中,“兴来每独往,胜事空自知”,在诗酒中闲逸自如;李白仕途不顺,被赐金放还,但他依旧豪气干云,“天生我才必有用,千金散尽还复来”,在失意中坦然豁达。坚守本心,便能在遭遇失意之时守住自己的精神家园,让生活充满诗意。反之,若不能坚守本心,而只是一味迎合世俗以求得升迁,那纵使身居高位,亦会丧失生活的乐趣。
#诗意地生活,需直面遗憾,超越自我。
#“西塞山前白鹭飞,桃花流水鳜鱼肥。青箬笠,绿柳枝,半斤酒,一纶丝。五湖四海皆如此,何妨到此处归。”白居易的《渔歌子》写出了多少人的愿望:没有权势纷扰,没有贫困凄凉,只有青山绿水、白鹭鸥鸟作伴,如此自由自在的生活令人神往。然而,白居易却并没有因此真的归隐山林,而是直面人生,超越自我,写下了一首首诗意而富有现实关怀的作品。如果白居易只顾逃避人生,那又怎会拥有“大弦嘈嘈如急雨,小弦切切如私语”的绝美比喻呢?如果白居易只顾归隐山林,那又怎会写出“此曲只应天上有,人间哪得配白居易”这样的诗句呢?
#诗意地生活,需直面遗憾,坚守本心。
#李文波患有渐冻症,医生说他活不过五年,但他没有因此放弃对音乐的热爱,而是与病魔作斗争,演奏出美妙的乐曲;孙家林自幼患有脑瘫,但他不甘于命运的捉弄,终成全国最美教师;史铁生饱受疾病折磨,但他仍能发出“我常常在我的心头清点,我有什么?”的叩问,并由此走上文学道路,为后世留下丰厚的文化遗产。这些人没有逃避,而是选择直面人生的缺憾,在坚守本心的同时超越自我,最终实现了自己的价值。
#诗意地生活,是于失意中坚守本心,于缺憾中超越自我。当面对人生的缺憾与挫折,坚守本心、超越自我的同时,也必将书写属于自己的辉煌篇章。
#愿你我都能诗意地生活着!

query = 'Please write a blog based on the title: French Pastries: A Sweet Indulgence'
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response = model.write_artical(query, seed=8192)
print(response)
#French Pastries: A Sweet Indulgence
#The French are well known for their love of pastries, and it’s a love that is passed down through generations. When one visits France, they are treated to an assortment of baked goods that can range from the delicate macaron to the rich and decadent chocolate mousse. While there are many delicious types of pastries found in France, five stand out as being the most iconic. Each of these pastries has its own unique qualities that make it special.
#1. Croissant
#One of the most famous pastries from France is the croissant. It is a buttery, flaky pastry that is best enjoyed fresh from the bakery. The dough is laminated with butter, giving it its signature layers. Croissants are typically eaten for breakfast or brunch, often accompanied by coffee or hot chocolate.
#2. Macaron
#The macaron is a small, delicate French confection made from almond flour, powdered sugar, and egg whites. The macaron itself is sandwiched with a ganache or jam filling. They come in a variety of colors and flavors, making them a popular choice for both casual snacking and upscale desserts.
#3. Madeleine
#The madeleine is a small shell-shaped cake that is light and sponge-like. It is often flavored with lemon or orange zest and sometimes dipped in chocolate. Madeleines are perfect for an afternoon snack with tea or coffee.
#4. Éclair
#The éclair is a long, thin pastry filled with cream and topped with chocolate glaze. It is a classic French treat that is both sweet and satisfying. Éclairs can be found in bakeries all over France and are often enjoyed with a cup of hot chocolate.
#5. Tarte Tatin
#The tarte Tatin is an apple tart that is known for its caramelized apples and puff pastry crust. It is named after the Tatin sisters who created the recipe in the late 19th century. Tarte Tatin is best served warm with a scoop of vanilla ice cream.
#These pastries are just a few of the many delicious treats that France has to offer. Whether you are a seasoned traveler or a first-time visitor, indulging in French pastries is a must-do activity. So go ahead, treat yourself—you deserve it!

Inference on Multiple GPUs

If you have multiple GPUs, but the memory size of each GPU is not enough to accommodate the entire model, you can split the model across multiple GPUs. First, install accelerate using the command: pip install accelerate. Then, execute the follows scripts for chat:

# chat with 2 GPUs
python example_code/example_chat.py --num_gpus 2

Inference Acceleration by LMDeploy

Coming Soon

4-Bit Model

Coming Soon

Finetune

Please refer to our finetune scripts.

Gradio Deploy

We provide code for users to build a web UI demo.

Please run the command below for Chat / Composition:

# For Multimodal Chat
python gradio_demo/gradio_demo_chat.py

# For Free-form Text-Image Composition
python gradio_demo/gradio_demo_composition.py

The user guidance of UI demo is given in HERE. If you wish to change the default folder of the model, please use the --code_path=new_folder option.

Citation

If you find our models / code / papers useful in your research, please consider giving ⭐ and citations 📝, thx :)

@article{internlmxcomposer2_4khd,
      title={InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD},
      author={Xiaoyi Dong and Pan Zhang and Yuhang Zang and Yuhang Cao and Bin Wang and Linke Ouyang and Songyang Zhang and Haodong Duan and Wenwei Zhang and Yining Li and Hang Yan and Yang Gao and Zhe Chen and Xinyue Zhang and Wei Li and Jingwen Li and Wenhai Wang and Kai Chen and Conghui He and Xingcheng Zhang and Jifeng Dai and Yu Qiao and Dahua Lin and Jiaqi Wang},
      journal={arXiv preprint arXiv:2404.06512},
      year={2024}
}
@article{internlmxcomposer2,
      title={InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model},
      author={Xiaoyi Dong and Pan Zhang and Yuhang Zang and Yuhang Cao and Bin Wang and Linke Ouyang and Xilin Wei and Songyang Zhang and Haodong Duan and Maosong Cao and Wenwei Zhang and Yining Li and Hang Yan and Yang Gao and Xinyue Zhang and Wei Li and Jingwen Li and Kai Chen and Conghui He and Xingcheng Zhang and Yu Qiao and Dahua Lin and Jiaqi Wang},
      journal={arXiv preprint arXiv:2401.16420},
      year={2024}
}
@article{internlmxcomposer,
      title={InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition},
      author={Pan Zhang and Xiaoyi Dong and Bin Wang and Yuhang Cao and Chao Xu and Linke Ouyang and Zhiyuan Zhao and Shuangrui Ding and Songyang Zhang and Haodong Duan and Wenwei Zhang and Hang Yan and Xinyue Zhang and Wei Li and Jingwen Li and Kai Chen and Conghui He and Xingcheng Zhang and Yu Qiao and Dahua Lin and Jiaqi Wang},
      journal={arXiv preprint arXiv:2309.15112},
      year={2023}
}

License & Contact Us

The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact [email protected].

internlm-xcomposer's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

internlm-xcomposer's Issues

example_demo code contains inter information

Hi, you seem to leave extra information in L51 and L52 in examples/web_demo.py:

 self.llm_model = AutoModel.from_pretrained('/mnt/petrelfs/share_data/dongxiaoyi/share_models/release_chat', trust_remote_code=True)
        tokenizer = AutoTokenizer.from_pretrained('/mnt/petrelfs/share_data/dongxiaoyi/share_models/release_chat', trust_remote_code=True)

in the web_demo.py.
You may want to fix them to avoid the exposure.

What is the difference between InternConvertedInternLMAttention and InternLMAttention?

InternLMAttention is used in huggingface: https://huggingface.co/internlm/internlm-chat-7b/blob/main/modeling_internlm.py#L257
InternConvertedInternLMAttention is used in this repo: https://github.com/InternLM/InternLM-XComposer/blob/main/huggingface/internlm-xcomposer/modeling_InternLM.py#L732

I set intern_converted_llm to false and found that the results were all wrong. What is the difference between InternConvertedInternLMAttention and InternLMAttention?

An example with Transformers to generate text + images

Hi,

I see very interesting the examples detailed for interacting with images with Transformers for VQA etc. However, how can we really generate text + images (with the right history context) with HF transformers?

I cannot see an example with this good feature of your work.

Thanks.

Minimum GPU memory to run example_chat.py

Hello, I am interested in your work and curious about the minimum total GPU memory required to run example_chat.py for testing. I tried it on mine, which has 8GB of memory, clearly not enough. Can you show me the rough range for it?

AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

The code

model = model.eval()
tokenizer = AutoTokenizer.from_pretrained(
    "../internlm-xcomposer-7b-4bit", trust_remote_code=True
)
model.model.tokenizer = tokenizer

met error

File "/home/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b-4bit/tokenization_InternLM_XComposer.py", line 94, in vocab_size
    return self.sp_model.get_piece_size()
AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

CUDA Out of Memory in Multi-GPU Inference

import torch
from transformers import AutoModel, AutoTokenizer
import argparse

def auto_configure_device_map(num_gpus):
    # visual_encoder 算4层
    # internlm_model.model.embed_tokens 占用1层
    # norm 和 lm_head 占用1层
    # transformer.layers 占用 32 层
    # 总共34层分配到num_gpus张卡上
    num_trans_layers = 32
    per_gpu_layers = 38 / num_gpus

    device_map = {
        'visual_encoder': 0,
        'ln_vision': 0,
        'Qformer': 0,
        'internlm_model.model.embed_tokens': 0,
        'internlm_model.model.norm': 0,
        'internlm_model.lm_head': 0,
        'query_tokens': 0,
        'flag_image_start': 0,
        'flag_image_end': 0,
        'internlm_proj.weight': 0,
        'internlm_proj.bias': 0,
    }

    # device_map = {key: 0 for key in device_map.keys()}
    
    used = 6
    gpu_target = 0
    for i in range(num_trans_layers):
        if used >= per_gpu_layers:
            gpu_target += 1
            used = 0
        assert gpu_target < num_gpus
        device_map[f'internlm_model.model.layers.{i}'] = gpu_target
        used += 1

    return device_map

torch.set_grad_enabled(False)

parser = argparse.ArgumentParser()
parser.add_argument("--num_gpus", default=4, type=int)
args = parser.parse_args()

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer-vl-7b', trust_remote_code=True, cache_dir='/storage/internLM/').cuda().eval()
if args.num_gpus > 1:
    from accelerate import dispatch_model
    device_map = auto_configure_device_map(args.num_gpus)
    model = dispatch_model(model, device_map=device_map)

tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer-vl-7b', trust_remote_code=True, cache_dir='/storage/internLM/')
model.tokenizer = tokenizer


# example image
image = 'examples/images/aiyinsitan.jpg'

# Single-Turn Pure-Text Dialogue
text = 'Please introduce Einstein.'
with torch.no_grad():
    with model.maybe_autocast():
        response = model.generate(text)
print(response)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

Error: click "Insert a fixed number of Images" button error

Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\gradio\queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "C:\Python\Python310\lib\site-packages\gradio\route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1437, in process_api
result = await self.call_function(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Python\Python310\lib\site-packages\gradio\utils.py", line 650, in wrapper
response = f(*args, **kwargs)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 468, in adjust_img
caps = self.generate_loc_cap(idx_text_sections, int(img_num), progress)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 177, in generate_loc_cap
inject_text, locs = self.generate_loc(text_sections, image_num,
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 132, in generate_loc
for _ in progress.tqdm([1], desc="image spotting"):
TypeError: Progress.tqdm() missing 1 required positional argument: 'iterable'
Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\gradio\queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "C:\Python\Python310\lib\site-packages\gradio\route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1437, in process_api
result = await self.call_function(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Python\Python310\lib\site-packages\gradio\utils.py", line 650, in wrapper
response = f(*args, **kwargs)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 468, in adjust_img
caps = self.generate_loc_cap(idx_text_sections, int(img_num), progress)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 177, in generate_loc_cap
inject_text, locs = self.generate_loc(text_sections, image_num,
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 132, in generate_loc
for _ in progress.tqdm([1], desc="image spotting"):
TypeError: Progress.tqdm() missing 1 required positional argument: 'iterable'

在vscode中运行internlm-xcomposer-7b模型,无法生成图文的结果

User: Write a popular science article about “Unraveling the Mysteries of Black Holes: A Scientific Overview” with pictures and illustrations.
Bot: I'm sorry, but as an AI language model, I don't have the capability to create visual content such as pictures and illustrations. However, I can provide you with a text-based summary of the popular science article about "Unraveling the Mysteries of Black Holes: A Scientific Overview".

如图,采用论文中提到的prompt,但是没有得到预期的结果,不知道什么原因

No module named 'transformers_modules.internlm/internlm-xcomposer-7b'

下了internlm-xcomposer-7b ,放到internlm/internlm-xcomposer-7b、但是有下列报错

PS E:\InternLM-XComposer> python .\examples\web_demo.py
Traceback (most recent call last):
File "E:\cnai\InternLM-XComposer\examples\web_demo.py", line 816, in
demo_ui = Demo_UI()
File "E:\cnai\InternLM-XComposer\examples\web_demo.py", line 47, in init
self.llm_model = AutoModel.from_pretrained(
File "C:\Python\Python310\lib\site-packages\transformers\models\auto\auto_factory.py", line 456, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "C:\Python\Python310\lib\site-packages\transformers\models\auto\configuration_auto.py", line 953, in from_pretrained
config_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
File "C:\Python\Python310\lib\site-packages\transformers\dynamic_module_utils.py", line 443, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "C:\Python\Python310\lib\site-packages\transformers\dynamic_module_utils.py", line 164, in get_class_in_module
module = importlib.import_module(module_path)
File "C:\Python\Python310\lib\importlib_init_.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.internlm/internlm-xcomposer-7b'

ModelScope urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>

import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-xcomposer-7b')
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval()
root@autodl-container-9e2911833c-01d8deff:~/autodl-tmp# python download.py 
2023-10-10 21:52:08,079 - modelscope - INFO - PyTorch version 1.11.0+cu113 Found.
2023-10-10 21:52:08,081 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2023-10-10 21:52:08,119 - modelscope - INFO - Loading done! Current index file version is 1.9.2, with md5 1c9bf186d1e03088e5abfbd8664a1def and a total number of 941 components indexed
2023-10-10 21:52:08,686 - modelscope - WARNING - There is no version specified and there is no version in the model repository,use the master branch, which is fragile, please use it with caution!
2023-10-10 21:52:08,686 - modelscope - INFO - Model revision not specified, use revision: master
Init VIT ... Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 1354, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1298, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1247, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1007, in _send_output
    self.send(msg)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 947, in send
    self.connect()
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1421, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "/root/miniconda3/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/root/miniconda3/lib/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/root/miniconda3/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "download.py", line 8, in <module>
    model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval()
  File "/root/miniconda3/lib/python3.8/site-packages/modelscope/utils/hf_util.py", line 181, in from_pretrained
    module_obj = module_class.from_pretrained(model_dir, *model_args,
  File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 560, in from_pretrained
    return model_class.from_pretrained(
  File "/root/miniconda3/lib/python3.8/site-packages/modelscope/utils/hf_util.py", line 78, in from_pretrained
    return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3085, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_InternLM_XComposer.py", line 43, in __init__
    self.visual_encoder = create_eva_vit_g()
  File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_vit.py", line 522, in create_eva_vit_g
    cached_file = download_cached_file(url, check_hash=False, progress=True)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_utils.py", line 44, in download_cached_file
    timm_hub.download_cached_file(url, check_hash, progress)
  File "/root/miniconda3/lib/python3.8/site-packages/timm/models/_hub.py", line 85, in download_cached_file
    download_url_to_file(url, cached_file, hash_prefix, progress=progress)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/hub.py", line 457, in download_url_to_file
    u = urlopen(req)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 1397, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 1357, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>
root@autodl-container-9e2911833c-01d8deff:~/autodl-tmp# 

训练lora部分报错

我想做lora训练,只保冻结了其他部分,只保留了lora_A和lora_B,但是反向传播的时候会报错
# 冻结visual_encoder, ln_vision 和 internlm_model 的参数
for param in model.visual_encoder.parameters():
param.requires_grad = False

for param in model.ln_vision.parameters():
    param.requires_grad = False

for param in model.Qformer.parameters():
    param.requires_grad = False

for param in model.internlm_model.parameters():
    param.requires_grad = False

# 解冻需要训练的lora_A和lora_B的参数
for name, param in model.named_parameters():
    if "lora_A" in name or "lora_B" in name:
        param.requires_grad = True

训练代码:
input_ids = data['input_ids'].to(device, dtype=torch.long)
labels = data['labels'].to(device, dtype=torch.long)
attention_mask= data['attention_mask'].to(device, dtype=torch.long)
outputs = model.internlm_model(
input_ids=input_ids,
labels=labels,
attention_mask=attention_mask
)
loss = outputs.loss
# 反向传播,计算当前梯度
loss.backward()

错误如下:
Traceback (most recent call last):
File "/data/xinyuuliu/InternLM-XComposer/train_model/train.py", line 190, in
main()
File "/data/xinyuuliu/InternLM-XComposer/train_model/train.py", line 175, in main
train(epoch, model, device, training_loader, optimizer, gradient_accumulation_steps,model_output_dir)
File "/data/xinyuuliu/InternLM-XComposer/train_model/train.py", line 55, in train
loss.backward()
File "/root/miniconda3/envs/internLM/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/root/miniconda3/envs/internLM/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/miniconda3/envs/internLM/lib/python3.9/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_InternLM.py", line 80, in backward
rotary_emb.apply_rotary(dq1, dq2, rearrange(cos[:seqlen], 's d -> s 1 d'),
NameError: name 'rotary_emb' is not defined

InternLM-XComposer-VL-7B, The chinese ability of the model does not match the demo.

`
import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)
model_path = "internlm/internlm-xcomposer-vl-7b"
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.tokenizer = tokenizer

image = "./image/aiyinsitan.jpg"
text = '请问这张图片里面的人是谁?并介绍下他。'
response = model.generate(text, image)
print(response)
`

response: albert einstein

I tried a lot of pictures, but the effect of the model is not satisfactory, and the results are basically in English.

Why I meet this problem when I use model to generate?

../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [55,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [56,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [57,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [58,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [59,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [60,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [61,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [62,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed.

Where may the problem live? Thanks!

运行web_demo.py,无法初始化模型

Traceback (most recent call last):
File "/home/batch/projects/InternLM-XComposer/examples/web_demo.py", line 816, in
demo_ui = Demo_UI()
File "/home/batch/projects/InternLM-XComposer/examples/web_demo.py", line 47, in init
self.llm_model = AutoModel.from_pretrained(
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 479, in from_pretrained
return model_class.from_pretrained(
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2675, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/home/batch/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-vl-7b/modeling_InternLM_XComposer.py", line 49, in init
self.Qformer, self.query_tokens = self.init_qformer(
File "/home/batch/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-vl-7b/modeling_InternLM_XComposer.py", line 122, in init_qformer
encoder_config = BertConfig.from_pretrained("bert-base-uncased")
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/configuration_utils.py", line 547, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/configuration_utils.py", line 629, in _get_config_dict
resolved_config_file = cached_file(
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/utils/hub.py", line 452, in cached_file
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

运行web_demo.py 出现错误

下载的是最新的代码文件,运行时出现如下错误:
正在接受医生的检查,医生在为它量体温。', 10: '一只宠物狗在主人帮助下清理自己的粪便,主人在旁边指导。', 12: '一只宠物狗在主人帮助下处理自己的毛发,主人在旁边指导。'}
https://static.openxlab.org.cn/lingbi/jpg-images/61ec717e9ee8ffd984f79d01838de29e352b6aa9b9a04bb60e56a92f00fa72db.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/78668d1138f169a78284213bb2df7991cf77ff48bbdb6f938a3536d238ccbdc7.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/9db1b1ecdfc698526459d4bb519ebd1dfc3b9be9e4983ff8b862dd970257793a.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/2f3ce59b613d2b7b989a919819ef1aed67dcd56d6381b0ea2af68972c700c367.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/3e4914997caf27f88b1219070586a7aa9ce79c6afacf824a640396abab216230.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/41eb2c71b25921e4ccb423df6e402caf74d71bc981b9bd699a44bc2d66ec0524.jpg
download image with url
image downloaded
model_select_image
Traceback (most recent call last):
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/blocks.py", line 1437, in process_api
result = await self.call_function(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/blocks.py", line 1123, in call_function
prediction = await utils.async_iteration(iterator)
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 512, in async_iteration
return await iterator.anext()
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 505, in anext
return await anyio.to_thread.run_sync(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 488, in run_sync_iterator_async
return next(iterator)
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 638, in gen_wrapper
yield from f(*args, **kwargs)
File "/data/liwx/InternLM-XComposer-main/InternLM-XComposer-main/examples/web_demo.py", line 444, in generate_article
self.selected = self.model_select_image(output_text, caps,
File "/data/liwx/InternLM-XComposer-main/InternLM-XComposer-main/examples/web_demo.py", line 299, in model_select_image
pre_img.append(images[len(pre_img) + ans2idx[answer]].cpu())
KeyError: '<'

不可以单独生成图片吗?

我这边调用的是internlm-xcomposer-7b的模型,然后输入如下命令:

>>> text='请帮我画一张长城的照片'
>>> response, history = model.chat(text=text, image=None, history=None)
>>> print(response)
很抱歉,作为一个语言模型,我并不具备绘画能力,无法为您画一张长城的照片。但是,如果您愿意,我可以为您提供一些关于长城的资料和信息,帮助您更好地了解这座伟大的建筑。
>>>

他是只能生成图文并茂的文章吗?

Suggestion : make examples/web_demo.py more secure

change the last line in examples/web_demo.py to make more securte , Not everyone need expose service to public

if name == "main":
demo.launch(share=True, server_name="0.0.0.0", server_port=11111)

to
demo.launch(share=False, server_name="127.0.0.1", server_port=11111)

OCR support ?

Is it possible to make it work with ocr capability?

support for multiple GPU inference

Hello, I am interested in your work and I am curious about how to run internlm-xcomposer-7b in an environment that only contains 24GB GPUs. I am looking forward to a new version of inference code that supports multiple gpu inference.

Thank you

Failed to load 4-bits weights from HuggingFace

Description

Unable to load the quantized weights (4 bits) from HuggingFace

Code

The code is a direct copy from the file examples/example_chat_4bit_en.py

import torch
from transformers import AutoModel, AutoTokenizer

import auto_gptq
from auto_gptq.modeling import BaseGPTQForCausalLM

auto_gptq.modeling._base.SUPPORTED_MODELS = ["InternLMXComposer"]

torch.set_grad_enabled(False)


class InternLMXComposerQForCausalLM(BaseGPTQForCausalLM):
    layers_block_name = "internlm_model.model.layers"
    outside_layer_modules = [
        "query_tokens",
        "flag_image_start",
        "flag_image_end",
        "visual_encoder",
        "Qformer",
        "internlm_model.model.embed_tokens",
        "internlm_model.model.norm",
        "internlm_proj",
        "internlm_model.lm_head",
    ]
    inside_layer_modules = [
        ["self_attn.k_proj", "self_attn.v_proj", "self_attn.q_proj"],
        ["self_attn.o_proj"],
        ["mlp.gate_proj"],
        ["mlp.up_proj"],
        ["mlp.down_proj"],
    ]


# init model and tokenizer
model = InternLMXComposerQForCausalLM.from_quantized(
    "internlm/internlm-xcomposer-7b-4bit", trust_remote_code=True, device="cuda:0"
)
model = model.eval()
tokenizer = AutoTokenizer.from_pretrained(
    "internlm/internlm-xcomposer-7b-4bit", trust_remote_code=True
)
model.model.tokenizer = tokenizer

# example image
image = "examples/images/aiyinsitan.jpg"

# Multi-Turn Text-Image Dialogue
# 1st turn
text = 'Describe this image in detial.'
image = "examples/images/aiyinsitan.jpg"
response, history = model.chat(text, image)
print(f"User: {text}")
print(f"Bot: {response}") 
# The image features a black and white portrait of Albert Einstein, the famous physicist and mathematician. 
# Einstein is seated in the center of the frame, looking directly at the camera with a serious expression on his face. 
# He is dressed in a suit, which adds a touch of professionalism to his appearance. 

Error

Traceback (most recent call last):
  File "/mnt/bd/dev-pierre-oreistein-st/sandbox/test_internlm_vl/test_internlm_vl_4bits", line 35, in <module>
    model = InternLMXComposerQForCausalLM.from_quantized(
  File "/home/pierre/.pyenv/versions/dev3.9/lib/python3.9/site-packages/auto_gptq/modeling/_base.py", line 847, in from_quantized
    raise FileNotFoundError(f"Could not find a model in {model_name_or_path} with a name in {', '.join(searched_files)}. Please specify the argument model_basename to use a custom file name.")
FileNotFoundError: Could not find a model in internlm/internlm-xcomposer-7b-4bit with a name in gptq_model-4bit-128g.safetensors, model.safetensors. Please specify the argument model_basename to use a custom file name.

Ideas

According to this similar issue I need to specify the model file. However, I was unable to find it on HuggingFace. Could you help me with this?

Thanks in advance for your help!

internlm-xcomposer-7b-4bit 这个量化模型,运行失败

import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer
import os

torch.set_grad_enabled(False)

# init model and tokenizer
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-xcomposer-7b-4bit', revision = 'master')
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model.tokenizer = tokenizer

image

图片未生成

(xcomposer) ➜  InternLM-XComposer git:(main) ✗ python examples/web_demo.py
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/button.py:89: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Button(...)` instead of `return gr.Button.update(...)`.
  warnings.warn(
Init VIT ... Done
Init Perceive Sampler ... Done
Init InternLM ... Done
Loading checkpoint sha
 load model done:  <class 'transformers_modules.internlm-xcomposer-7b.modeling_InternLM_XComposer.InternLMXComposerForCausalLM'>
/cpfs01/user/huwenxing/InternLM-XComposer/examples/web_demo.py:1009: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  chat_textbox = gr.Textbox(
Running on local URL:  http://0.0.0.0:11111
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/chatbot.py:161: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Chatbot(...)` instead of `return gr.Chatbot.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/textbox.py:163: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Textbox(...)` instead of `return gr.Textbox.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/button.py:89: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Button(...)` instead of `return gr.Button.update(...)`.
  warnings.warn(
init
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/markdown.py:92: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Markdown(...)` instead of `return gr.Markdown.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/gallery.py:143: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Gallery(...)` instead of `return gr.Gallery.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/button.py:89: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Button(...)` instead of `return gr.Button.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/textbox.py:163: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Textbox(...)` instead of `return gr.Textbox.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/helpers.py:818: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Textbox(...)` instead of `return gr.update(...)
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/asyncio/events.py:80: GradioUnusedKwargWarning: You have unused kwarg parameters in Button, please remove them: {'mode': 'static'}
  self._context.run(self._callback, *self._args)

Could not create share link. Missing file: /cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/frpc_linux_amd64_v0.2. 

Please check your internet connection. This can happen if your antivirus software blocks the download of this file. You can install manually by following these steps: 

1. Download this file: https://cdn-media.huggingface.co/frpc-gradio-0.2/frpc_linux_amd64
2. Rename the downloaded file to: frpc_linux_amd64_v0.2
3. Move the file to this location: /cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio
<object object at 0x7fb765662e10>
敦煌,位于甘肃省西北部,地处河西走廊西端,是古代丝绸之路上的重要交通枢纽和商埠重镇。它拥有着丰富的历史文化遗产,包括莫高窟、鸣沙山月牙泉、雅丹魔鬼城等著名景点。同时,敦煌也是**历史文化名城之一,有着深厚的文化底蕴和独特的民俗风情。

**一、莫高窟**

莫高窟,又名“千佛洞”,是**四大石窟之一,始建于十六国的前秦时期,距今已有1600多年的历史。它是世界上现存规模最大、内容最丰富的佛教艺术宝库,被誉为“东方卢浮宫”。莫高窟内共有735个洞窟,壁画总面积达45000多平方米,彩塑佛像5000余尊,是世界上最大的佛教艺术中心之一。在这里,游客可以欣赏到精美的壁画、雕塑和音乐表演,感受佛教文化的博大精深。

**二、鸣沙山月牙泉**

鸣沙山月牙泉是一处自然奇观,位于敦煌市西北约40公里处的沙漠中。这里地势平坦,沙丘连绵起伏,形成了一片广袤无垠的沙漠景观。而月牙泉则静静地镶嵌在这片沙漠之中,泉水清澈见底,形状如新月,故称“月牙泉”。每到夜晚,月亮升起时,月牙泉周围会发出阵阵清脆的响声,犹如天籁之音,令人心旷神怡。

**三、雅丹魔鬼城**

雅丹魔鬼城是一座典型的风蚀地貌,位于敦煌市西南约100公里的戈壁滩上。这里的地貌奇特,呈现出一种荒凉、神秘、恐怖的景象。由于长期受到风吹日晒雨淋,这里的岩石表面已经变得凹凸不平,形成了各种形态各异的造型,有的像动物,有的像人物,有的像建筑,让人不禁感叹大自然的鬼斧神工。

**四、其他景点**

除了莫高窟、鸣沙山月牙泉和雅丹魔鬼城之外,敦煌还有许多其他值得一游的景点,如玉门关、阳关、锁阳城、汉长城遗址等。这些景点都具有悠久的历史和文化价值,吸引着众多游客前来参观游览。

**五、特色美食**

敦煌的特色美食也非常丰富,其中最有名的当属驴肉黄面了。驴肉黄面是一道以驴肉为主要食材的面食,味道鲜香可口,深受当地人和游客的喜爱。此外,还有羊肉泡馍、胡羊焖饼、烤全羊等特色美食,都是不容错过的美味佳肴。

**六、旅游小贴士**

1. 敦煌气候干燥,日照强烈,紫外线较强,建议游客做好防晒措施,携带防晒霜、遮阳帽、太阳镜等物品。2. 敦煌属于高原地区,海拔较高,游客应注意休息,避免剧烈运动,以免出现高原反应。3. 敦煌旅游景点较多,游客应提前规划好行程,合理安排时间,避免走马观花,错过重要的景点。4. 在敦煌旅游期间,要注意保护环境,不乱扔垃圾,不破坏文物古迹,做一个文明的游客。总之,敦煌是一座历史悠久、文化底蕴深厚、风景优美的城市,是一个值得一游的好去处。希望这篇文章能够帮助您更好地了解敦煌,为您的旅行提供一些有用的信息。
敦煌,位于甘肃省西北部,地处河西走廊西端,是古代丝绸之路上的重要交通枢纽和商埠重镇。它拥有着丰富的历史文化遗产,包括莫高窟、鸣沙山月牙泉、雅丹魔鬼城等著名景点。同时,敦煌也是**历史文化名城之一,有着深厚的文化底蕴和独特的民俗风情。
**一、莫高窟**
莫高窟,又名“千佛洞”,是**四大石窟之一,始建于十六国的前秦时期,距今已有1600多年的历史。它是世界上现存规模最大、内容最丰富的佛教艺术宝库,被誉为“东方卢浮宫”。莫高窟内共有735个洞窟,壁画总面积达45000多平方米,彩塑佛像5000余尊,是世界上最大的佛教艺术中心之一。在这里,游客可以欣赏到精美的壁画、雕塑和音乐表演,感受佛教文化的博大精深。
**二、鸣沙山月牙泉**
鸣沙山月牙泉是一处自然奇观,位于敦煌市西北约40公里处的沙漠中。这里地势平坦,沙丘连绵起伏,形成了一片广袤无垠的沙漠景观。而月牙泉则静静地镶嵌在这片沙漠之中,泉水清澈见底,形状如新月,故称“月牙泉”。每到夜晚,月亮升起时,月牙泉周围会发出阵阵清脆的响声,犹如天籁之音,令人心旷神怡。
**三、雅丹魔鬼城**
雅丹魔鬼城是一座典型的风蚀地貌,位于敦煌市西南约100公里的戈壁滩上。这里的地貌奇特,呈现出一种荒凉、神秘、恐怖的景象。由于长期受到风吹日晒雨淋,这里的岩石表面已经变得凹凸不平,形成了各种形态各异的造型,有的像动物,有的像人物,有的像建筑,让人不禁感叹大自然的鬼斧神工。
**四、其他景点**
除了莫高窟、鸣沙山月牙泉和雅丹魔鬼城之外,敦煌还有许多其他值得一游的景点,如玉门关、阳关、锁阳城、汉长城遗址等。这些景点都具有悠久的历史和文化价值,吸引着众多游客前来参观游览。
**五、特色美食**
敦煌的特色美食也非常丰富,其中最有名的当属驴肉黄面了。驴肉黄面是一道以驴肉为主要食材的面食,味道鲜香可口,深受当地人和游客的喜爱。此外,还有羊肉泡馍、胡羊焖饼、烤全羊等特色美食,都是不容错过的美味佳肴。
**六、旅游小贴士**
1. 敦煌气候干燥,日照强烈,紫外线较强,建议游客做好防晒措施,携带防晒霜、遮阳帽、太阳镜等物品。2. 敦煌属于高原地区,海拔较高,游客应注意休息,避免剧烈运动,以免出现高原反应。3. 敦煌旅游景点较多,游客应提前规划好行程,合理安排时间,避免走马观花,错过重要的景点。4. 在敦煌旅游期间,要注意保护环境,不乱扔垃圾,不破坏文物古迹,做一个文明的游客。总之,敦煌是一座历史悠久、文化底蕴深厚、风景优美的城市,是一个值得一游的好去处。希望这篇文章能够帮助您更好地了解敦煌,为您的旅行提供一些有用的信息。

image

how can i install `rotary_emb` ?

from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained('/root/autodl-tmp/models/internlm7bxc', trust_remote_code=True).cuda().eval()
Traceback (most recent call last):
File "", line 1, in
File "/root/miniconda3/envs/llm_chat/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 550, in from_pretrained
model_class = get_class_from_dynamic_module(
File "/root/miniconda3/envs/llm_chat/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 497, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "/root/miniconda3/envs/llm_chat/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 199, in get_class_in_module
module = importlib.import_module(module_path)
File "/root/miniconda3/envs/llm_chat/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/root/.cache/huggingface/modules/transformers_modules/internlm7bxc/modeling_InternLM_XComposer.py", line 18, in
from .modeling_InternLM import *
File "/root/.cache/huggingface/modules/transformers_modules/internlm7bxc/modeling_InternLM.py", line 5, in
import rotary_emb

AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

After update web_demo.py ,this error occurred

File "/home/enbo/.cache/huggingface/modules/transformers_modules/internlm-xcomposer/tokenization_InternLM_XComposer.py", line 106, in get_vocab
vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
File "/home/enbo/.cache/huggingface/modules/transformers_modules/internlm-xcomposer/tokenization_InternLM_XComposer.py", line 94, in vocab_size
return self.sp_model.get_piece_size()
AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

No response using model.chat

Hi, I'm using internLM-XComposer to generate some data and I have tried your demo, it works fine when I'm using model.generate.
But when I using model.chat(), model can only reply to the first call, subsequent calls are unresponsive and return empty string.

I'm using:
torch==2.0.1
transformers==4.33.2

My hardware is a single 3090 with 24G GPU memory, so I use 4-bit quantized models and tried your examples/example_chat_4bit.py and find this issue.

Is my environmental issue causing this problem or something else?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.