Comments (1)
π€
Thank you for providing the detailed error log. It seems like the issue is related to the text processing part of the ingestion process. The yarn run ingest
command uses a script that includes a text splitting step, which might be where the punctuation conversion is happening.
The script uses RecursiveCharacterTextSplitter
to split the text into chunks. If this splitter is not handling Japanese punctuation correctly, it could be converting it to Korean punctuation during the splitting process.
However, without access to the specific implementation of RecursiveCharacterTextSplitter
and the rest of the codebase, it's hard to pinpoint the exact cause of the issue. It would be helpful if you could provide more information about the RecursiveCharacterTextSplitter
and how it's implemented, or any other part of the code that might be handling text processing.
In the meantime, you might want to check the configuration of the text splitter and see if there's a setting that's causing this issue. If the splitter is using a library or service for text processing, you might also want to check the documentation for that to see if there's any known issues or settings that could cause this.
If you're unable to find a solution, it might be worth considering using a different method for text splitting that can handle Japanese punctuation correctly. There are many libraries and services available that can handle text processing for different languages, so you might be able to find one that works better for your needs.
I hope this helps! If you have any more questions or need further clarification, please don't hesitate to ask.
Sources
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
from gpt4-pdf-chatbot-langchain.
Related Issues (20)
- Can ChatGPT 3.5 be supportedοΌ HOT 6
- Text words overlay display HOT 1
- Does this project accepts image read from PDF? HOT 5
- Enhancement - ability to use a graph database such as neo4j instead of vector database HOT 1
- enhancement - integrate with llamaindex HOT 3
- Explain data ingestion code. HOT 4
- s HOT 1
- "TypeError: Cannot read properties of undefined (reading 'text')" HOT 1
- error TypeError: ids is not iterable HOT 1
- Add support for Pinecone Serverless HOT 9
- Error: Azure OpenAI API instance name not found HOT 3
- FetchError: request to https://api.openai.com/v1/embeddings failed HOT 1
- I get this error when I open my local server: Cannot read properties of undefined (reading 'text') HOT 14
- error PineconeConnectionError HOT 1
- Failed to ingest your data HOT 1
- Unable to connect with PineconeStore -> TypeError: Cannot read properties of undefined (reading 'text') HOT 1
- Fail to ingest data HOT 5
- PineCone is migrating free accounts to serverless type HOT 1
- error TypeError: Cannot read properties of undefined (reading 'text') when run npm run ingest HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gpt4-pdf-chatbot-langchain.