Comments (8)
Just hardcode any path that is valid in WSL.
from gerev.
I will be happy to take this one
from gerev.
Great! When are you expecting to finish this?
from gerev.
I am trying to set up the dev environment right now but getting some issues in the following environment:
- OS: WSL(Windows 10)
- Nvidia: No
the function
STORAGE_PATH = Path('/opt/storage/') if IS_IN_DOCKER else Path(f'/home/{os.getlogin()}/.gerev/storage/')
The following line is giving me the below error:
FileNotFoundError: [Errno 2] No such file or directory
Upon researching I found out that os.getlogin() is the culprit.
If you cannot provide any help with this issue, can you describe a proper environment setup that will be suitable for development?
from gerev.
Currently, it is possible to parse the entire content of pdf files as text, but as it's apparent from your parsers, the program needs to compile it in the following form:
Some title: related text
Some other title: related text
Am I right?
There is already a pull request that parses the entire pdf document as text.
If you have any enhancements or suggestions for that, I'll be more than willing to implement them.
Meanwhile, I am also researching how can I parse pdf while keeping the hierarchical information intact.
from gerev.
Hey!
Just like I commented on that other PR, it should be pdf->html
then we parse html>text
from gerev.
@rishi003 let's chat on discord! I could guide you a little bit :)
from gerev.
Sure, shall we discuss it on the discuss thread?
from gerev.
Related Issues (20)
- Receive "Confluence returned status code 429 for document xxxx" warning
- Support plain Website
- What amount of data can this engine handle?
- Container won't start
- UI is a little glitched on small laptop screens
- Suggestion: Change data source file structure HOT 2
- Google Drive source file processing fails due to missing lastModifyingUser.photoLink field HOT 1
- Google Drive source file processing fails due to missing lastModifyingUser.displayName field HOT 1
- Add email as a data source HOT 4
- How to handle updated documents HOT 1
- Problem with accessing Gerev behind an SSL proxy HOT 1
- Google Drive source fails indexing due to missing 'parents' HOT 1
- Email Spam
- feat: Nextcloud Support
- ./ui/build is not exists! HOT 1
- Suggestion: wrap data_source clients HOT 2
- No image found for data source causes UI bugs
- ValueError: time data '2017-12-05T15:00:24.972+08:00' does not match format '%Y-%m-%dT%H:%M:%S.%fZ'
- Support answer.dev
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gerev.