danswer-ai / danswer Goto Github PK
View Code? Open in Web Editor NEWGen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
Home Page: https://docs.danswer.dev/
License: Other
Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
Home Page: https://docs.danswer.dev/
License: Other
To enable to SSO (single sign on) we would like an option to direct the user to the login provider (i.e. google or another OIDC provider) instead of showing a login button. First that button provides little to no value and second it breaks single sign on. Combined with #225 this would enable a relogin without a user noticing (in most cases).
From what I can tell all this tool does is pull relevant chunks of text from, more often than not, proprietary data and sends it to openai's completion api to get a coherent reply--
Issue:
There's currently no way to delete specific indexed files without removing the entire database. As it stands, if there's an error or if a file becomes irrelevant, the only solution is to clear the whole database, which isn't ideal.
Proposed Solution:
The ability to delete individual indexed files. This would mean adding a delete option in the interface that would remove the file from the system's database, Vector DB (Qdrant), and the search engine (Typesense).
For companies working with the Atlassian tool suite we need a Bitbucket/Stash connector similar to the github connector.
Things seem to be progressing with the massive download and then I get this error.
https://share.getcloudapp.com/v1uWgO19
I'd love to give danswer a spin, and maybe adopt it at work. But I'd prefer to test with some neutral data before committing and connecting work stuff. It would be awesome if you had your own public instance fed with this GitHub repo and your public Slack (and whatever else compatible sources you might already have). This way, potential users and customers could try danswer out right on your homepage, cross-check with the sources (e.g. by joining Slack and comparing chats to search results), and be more easily convinced to jump right in.
Is this something you'd consider doing?
Chatwoot already connects to various channels, like whatsapp, and has integrations with Chatbots like Dialogflow, Rasa and could be useful for automating responses on such channels, by integrating with the Chatwoot API.
We use Dropbox Paper as our main store of internal documentation, so this would be super great. Thanks for your awesome work so far!
root@VM-8-13-centos docker_compose]# docker compose -f docker-compose.dev.yml -p danswer-stack up -d --pull always --force-recreate
unknown shorthand flag: 'f' in -f
See 'docker --help'.
Usage: docker [OPTIONS] COMMAND
A self-sufficient runtime for containers
Options:
--config string Location of client config files (default "/root/.docker")
-c, --context string Name of the context to use to connect to the daemon (overrides DOCKER_HOST env var and default context set with "docker context use")
-D, --debug Enable debug mode
-H, --host list Daemon socket(s) to connect to
-l, --log-level string Set the logging level ("debug"|"info"|"warn"|"error"|"fatal") (default "info")
--tls Use TLS; implied by --tlsverify
--tlscacert string Trust certs signed only by this CA (default "/root/.docker/ca.pem")
--tlscert string Path to TLS certificate file (default "/root/.docker/cert.pem")
--tlskey string Path to TLS key file (default "/root/.docker/key.pem")
--tlsverify Use TLS and verify the remote
-v, --version Print version information and quit
Management Commands:
app* Docker App (Docker Inc., v0.9.1-beta3)
builder Manage builds
buildx* Build with BuildKit (Docker Inc., v0.5.1-docker)
config Manage Docker configs
container Manage containers
context Manage contexts
image Manage images
manifest Manage Docker image manifests and manifest lists
network Manage networks
node Manage Swarm nodes
plugin Manage plugins
secret Manage Docker secrets
service Manage services
stack Manage Docker stacks
swarm Manage Swarm
system Manage Docker
trust Manage trust on Docker images
volume Manage volumes
Commands:
attach Attach local standard input, output, and error streams to a running container
build Build an image from a Dockerfile
commit Create a new image from a container's changes
cp Copy files/folders between a container and the local filesystem
create Create a new container
diff Inspect changes to files or directories on a container's filesystem
events Get real time events from the server
exec Run a command in a running container
export Export a container's filesystem as a tar archive
history Show the history of an image
images List images
import Import the contents from a tarball to create a filesystem image
info Display system-wide information
inspect Return low-level information on Docker objects
kill Kill one or more running containers
load Load an image from a tar archive or STDIN
login Log in to a Docker registry
logout Log out from a Docker registry
logs Fetch the logs of a container
pause Pause all processes within one or more containers
port List port mappings or a specific mapping for the container
ps List containers
pull Pull an image or a repository from a registry
push Push an image or a repository to a registry
rename Rename a container
restart Restart one or more containers
rm Remove one or more containers
rmi Remove one or more images
run Run a command in a new container
save Save one or more images to a tar archive (streamed to STDOUT by default)
search Search the Docker Hub for images
start Start one or more stopped containers
stats Display a live stream of container(s) resource usage statistics
stop Stop one or more running containers
tag Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE
top Display the running processes of a container
unpause Unpause all processes within one or more containers
update Update configuration of one or more containers
version Show the Docker version information
wait Block until one or more containers stop, then print their exit codes
Run 'docker COMMAND --help' for more information on a command.
To get more help with docker, check out our guides at https://docs.docker.com/go/guides/
I've installed on localhost
Added
and try to search "what is danswer" with AI search
Expected output: it use vector search of github repo readme.md and answer but unfortunately no(
error is GPT hurt itself in its confusion :(
of course I've added workable openai api key.
logs:
INFO: 100.43.95.255:57824 - "POST /stream-direct-qa HTTP/1.1" 200 OK
07/14/2023 04:55:05 AM search_backend.py 107 : Received QA query: What is danswer ?
07/14/2023 04:55:05 AM timing.py 29 : query_intent took 0.09182906150817871 seconds
07/14/2023 04:55:05 AM timing.py 29 : semantic_retrieval took 0.03994250297546387 seconds
07/14/2023 04:55:06 AM timing.py 29 : semantic_reranking took 0.40392088890075684 seconds
07/14/2023 04:55:06 AM semantic_search.py 86 : Top links from semantic search: https://docs.danswer.dev/introduction, https://docs.danswer.dev/introduction, https://glarity.app/, https://glarity.app/
07/14/2023 04:55:06 AM timing.py 29 : retrieve_ranked_documents took 0.4441695213317871 seconds
INFO: 100.43.95.255:58934 - "GET /users/me HTTP/1.1" 401 Unauthorized
INFO: 100.43.95.255:58940 - "GET /manage/connector HTTP/1.1" 401 Unauthorized
INFO: 100.43.95.255:58940 - "GET /auth/google/authorize HTTP/1.1" 200 OK
INFO: 100.43.95.255:58934 - "GET /users/me HTTP/1.1" 401 Unauthorized
INFO: 100.43.95.255:51906 - "POST /stream-direct-qa HTTP/1.1" 200 OK
07/14/2023 04:55:43 AM search_backend.py 107 : Received QA query: danswer
07/14/2023 04:55:43 AM timing.py 29 : query_intent took 0.12076759338378906 seconds
07/14/2023 04:55:44 AM timing.py 29 : retrieve_keyword_documents took 1.2382943630218506 seconds
INFO: 100.43.95.255:35708 - "GET /health HTTP/1.1" 200 OK
INFO: 100.43.95.255:35720 - "POST /stream-direct-qa HTTP/1.1" 200 OK
07/14/2023 04:57:04 AM search_backend.py 107 : Received QA query: What is danswer?
07/14/2023 04:57:04 AM timing.py 29 : query_intent took 0.11964130401611328 seconds
07/14/2023 04:57:04 AM timing.py 29 : semantic_retrieval took 0.04374074935913086 seconds
07/14/2023 04:57:04 AM timing.py 29 : semantic_reranking took 0.1724071502685547 seconds
07/14/2023 04:57:04 AM semantic_search.py 86 : Top links from semantic search: https://docs.danswer.dev/introduction, https://docs.danswer.dev/introduction, https://glarity.app/
07/14/2023 04:57:04 AM timing.py 29 : retrieve_ranked_documents took 0.21661090850830078 seconds
INFO: 100.43.95.255:56106 - "GET /health HTTP/1.1" 200 OK
INFO: 100.43.95.255:56122 - "GET /manage/admin/connector/indexing-status HTTP/1.1" 200 OK
INFO: 100.43.95.255:56144 - "GET /health HTTP/1.1" 200 OK
INFO: 100.43.95.255:56138 - "GET /manage/credential HTTP/1.1" 200 OK
INFO: 100.43.95.255:58996 - "GET /health HTTP/1.1" 200 OK
INFO: 100.43.95.255:59006 - "GET /health HTTP/1.1" 200 OK
INFO: 100.43.95.255:59032 - "GET /manage/credential HTTP/1.1" 200 OK
INFO: 100.43.95.255:59020 - "GET /manage/admin/connector/indexing-status HTTP/1.1" 200 OK
INFO: 100.43.95.255:33780 - "GET /health HTTP/1.1" 200 OK
INFO: 100.43.95.255:35170 - "GET /health HTTP/1.1" 200 OK
Issue: Not all enterprises use Slack.
Proposal: Write a bot which talks to the rocket.chat API and answers questions. Mostly a copy of the slack bot with some changed API calls. Refactoring out common code would be a bonus.
I have thousands of files to be ingested, but currently it will not process any batch bigger than 1Mb.
How can I increase this limit? Or is there an API way to ingest files?
It might be a good idea to add some metrics about the internals of Danswer to make the system observable. Since this is already Dockerized and has a foot in Kubernetes, prometheus might be a good choice here
Latest main
code isn’t building so healthy for me, even after removing all volumes & containers. Not sure if it's just a me problem.
=> CACHED [web_server builder 4/4] RUN npm run build 0.0s
=> CANCELED [web_server runner 4/6] COPY --from=builder /app/public ./public 0.0s
=> ERROR [web_server runner 5/6] COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./ 0.0s
=> ERROR [web_server runner 6/6] COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static 0.0s
------
> [web_server runner 5/6] COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./:
------
------
> [web_server runner 6/6] COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static:
------
failed to solve: failed to compute cache key: failed to calculate checksum of ref 48a78c11-0415-4a0b-84fe-5522179bfa68::vvvkeyv4m03y3pjgwjywf5yn8: "/app/.next/static": not found
Fixed it by adding the below line to web/Dockerfile
COPY --from=builder --chown=nextjs:nodejs /app/.next ./.next
I'd love to see the addition of a Connector for the Paperless-ngx open-source document management system.
When the frontend receives a 401 from the backend it should redirect the user to login instead of showing an (unrelated) error (such as "no results found"). This is usually caused be expiry of the user token (in case of OIDC/OAuth).
Please setup GitHub Actions to build docker images so we don't have to build them for each update.
Thanks
EDIT: Would love a AIO linuxserver.io-like image as well.
Would be great to have a GPT4ALL model, I'm sure it'll get there but opening for tracking :)
It would be nice to be able to give the Google Drive connector a specific folder (whether under My Drive or a Shared Drive), as I'm sure there are certain folders in most cases that should not be indexed.
I started to watch the /var/log/update.log
on danswer/danswer-background
and noticed the following exceptions raised:
07/15/2023 04:23:59 PM update.py 95 : Starting new indexing attempt for connector: 'GoogleDriveConnector', with config: '{}', and with credentials: '[6]'
07/15/2023 04:24:00 PM connector_auth.py 49 : Refreshed Google Drive tokens.
07/15/2023 04:24:01 PM connector.py 75 : Parseable Documents in batch: ['2023 Consolidated SS + Flick - Financial Model', 'XX Cash Flow 2023', '351618_UASD3PGZ (3).pdf', '351618_UASD3PGZ (2).pdf', '351618_UASD3PGZ (1).pdf', '351618_UASD3PGZ.pdf', 'XX_SPL001_waybill_UASD3PGZ_A5 (7).pdf', 'XX_SPL001_waybill_UASD3PGZ_A5 (6).pdf', 'XX_SPL001_waybill_UASD3PGZ_A5 (5).pdf', 'XX_SPL001_waybill_UASD3PGZ_A5 (4).pdf', 'XX_SPL001_waybill_UASD3PGZ_A5 (3).pdf']
07/15/2023 04:24:17 PM store.py 159 : Indexed 13 chunks into Typesense collection 'danswer_index', number failed: 0
07/15/2023 04:24:22 PM timing.py 29 : encode_chunks took 5.2996666431427 seconds
07/15/2023 04:24:22 PM indexing.py 167 : Indexed 13 chunks into Qdrant collection 'danswer_index', status: UpdateStatus.COMPLETED
07/15/2023 04:24:22 PM indexing_pipeline.py 44 : Indexed 0 new documents
07/15/2023 04:24:23 PM connector.py 75 : Parseable Documents in batch: ['XX_SPL001_waybill_UASD3PGZ_A5 (2).pdf', 'XX_SPL001_waybill_UASD3PGZ_A5 (1).pdf', 'XX_SPL001_waybill_UASD3PGZ_A5.pdf', 'Order_351619_waybill (16).pdf', 'Order_351619_waybill (15).pdf', 'Order_351619_waybill (14).pdf', 'Order_351619_waybill (13).pdf', 'Order_351619_waybill (12).pdf', 'Order_351619_waybill (11).pdf', 'Order_351619_waybill (10).pdf', 'Order_351619_waybill (9).pdf', 'Order_351619_waybill (8).pdf']
07/15/2023 04:24:28 PM update.py 176 : Indexing job with id 96 failed due to EOF marker not found
Traceback (most recent call last):
File "/app/danswer/background/update.py", line 155, in run_indexing_jobs
for doc_batch in doc_batch_generator:
File "/app/danswer/connectors/google_drive/connector.py", line 165, in poll_source
yield from self._fetch_docs_from_drive(start, end)
File "/app/danswer/connectors/google_drive/connector.py", line 144, in _fetch_docs_from_drive
text_contents = extract_text(file, service)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/danswer/connectors/google_drive/connector.py", line 101, in extract_text
pdf_reader = PdfReader(pdf_stream)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/PyPDF2/_reader.py", line 319, in __init__
self.read(stream)
File "/usr/local/lib/python3.11/site-packages/PyPDF2/_reader.py", line 1415, in read
self._find_eof_marker(stream)
File "/usr/local/lib/python3.11/site-packages/PyPDF2/_reader.py", line 1471, in _find_eof_marker
raise PdfReadError("EOF marker not found")
PyPDF2.errors.PdfReadError: EOF marker not found
I need to use proxy to valid OpenAI api key.
I modify docker_compose.dev.yml as follow:
environment: - http_proxy=http://XXXX - https_proxy=http://XXXX
above operation is operated on every image.
However, I can open the web, but the backend still report error.
How can i set internet proxy to valid openai api key correctly.
The Slack API doesn't like 'is_archived' conversations and we should filter them out.
https://api.slack.com/methods/conversations.join
Slack API doc:
Common error response
Typical error response if the conversation is archived and cannot be joined
{
"ok": false,
"error": "is_archived"
}
07/18/2023 03:52:12 PM connector.py 173 : Pulled 22300 documents from slack channel alerts-trials
07/18/2023 03:52:14 PM update.py 176 : Indexing job with id 240 failed due to The request to the Slack API failed. (url: https://www.slack.com/api/conversations.join)
The server responded with: {'ok': False, 'error': 'is_archived'}
Traceback (most recent call last):
File "/app/danswer/background/update.py", line 155, in run_indexing_jobs
for doc_batch in doc_batch_generator:
File "/app/danswer/connectors/slack/connector.py", line 293, in poll_source
for document in get_all_docs(
File "/app/danswer/connectors/slack/connector.py", line 150, in get_all_docs
for message_batch in channel_message_batches:
File "/app/danswer/connectors/slack/connector.py", line 64, in get_channel_messages
client.conversations_join(
File "/usr/local/lib/python3.11/site-packages/slack_sdk/web/client.py", line 2453, in conversations_join
return self.api_call("conversations.join", params=kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/slack_sdk/web/base_client.py", line 156, in api_call
return self._sync_send(api_url=api_url, req_args=req_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/slack_sdk/web/base_client.py", line 187, in _sync_send
return self._urllib_api_call(
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/slack_sdk/web/base_client.py", line 317, in _urllib_api_call
).validate()
^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/slack_sdk/web/slack_response.py", line 199, in validate
raise e.SlackApiError(message=msg, response=self)
slack_sdk.errors.SlackApiError: The request to the Slack API failed. (url: https://www.slack.com/api/conversations.join)
The server responded with: {'ok': False, 'error': 'is_archived'}
Dose it support private deployment with offline environment?
I could really use a Zulip connector for this. I wonder if anyone else would like one?
I'd like to disable telemetry for Qdrant:
# Very basic .env file with options that are easy to change. Allows you to deploy everything on a single machine.
# .env is not required unless you wish to change defaults
# Choose between "openai-chat-completion" and "openai-completion"
INTERNAL_MODEL_VERSION=openai-chat-completion
# Use a valid model for the choice above, consult https://platform.openai.com/docs/models/model-endpoint-compatibility
OPENAPI_MODEL_VERSION=gpt-3.5-turbo
# Enable or disable telemetry
QDRANT__TELEMETRY_DISABLED=true
However docker-compose logs still show telemetry being sent:
danswer-stack-vector_db-1 | [2023-06-29T08:46:31.439Z INFO storage::content_manager::consensus::persistent] Loading raft state from ./storage/raft_state
danswer-stack-vector_db-1 | [2023-06-29T08:46:31.444Z INFO qdrant] Distributed mode disabled
danswer-stack-vector_db-1 | [2023-06-29T08:46:31.444Z INFO qdrant] Telemetry reporting enabled, id: 8f118170-18b1-4ed0-b5ae-d4d4d5438b29
danswer-stack-vector_db-1 | [2023-06-29T08:46:31.444Z INFO qdrant::tonic] Qdrant gRPC listening on 6334
danswer-stack-vector_db-1 | [2023-06-29T08:46:31.444Z INFO actix_server::builder] Starting 23 workers
danswer-stack-vector_db-1 | [2023-06-29T08:46:31.444Z INFO actix_server::server] Actix runtime found;
Docker Compose version v2.18.1
Client: Docker Engine - Community
Version: 24.0.2
API version: 1.43
Go version: go1.20.4
Git commit: cb74dfc
Built: Thu May 25 21:52:14 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 24.0.2
API version: 1.43 (minimum version 1.12)
Go version: go1.20.4
Git commit: 659604f
Built: Thu May 25 21:52:14 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.21
GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc:
Version: 1.1.7
GitCommit: v1.1.7-0-g860f061
docker-init:
Version: 0.19.0
GitCommit: de40ad0```
Issue: DevOps organisations do not like to click stuff manually. Danswer partially relies on an admin UI to configure connectors.
Proposal: Implement a way to import configuration. This could be implemented as init container for Kubernetes workloads which uses the API. Another way would be to import configuration (including secrets) on startup.
As I'm testing the stack, saw that everyone seems to be an admin. As I have played around, there's something called user connectors. But there seems to be no way to add users (or connectors). Am i missing something on the documentation as well?
Looks like there is no Arm64 build of GPT4ALL after 0.1.7
.
When I downgrade to 0.1.7
it builds, but it's probably going to generate a few issues here - so here's an issue for tracking.
Solution for now:
Downgrade gpt4all in backend/requirements/default.txt
to 0.1.7
Further issue causes the stack not to start when using 0.1.7
- perhaps we could switch off GPT4ALL in backend/danswer/configs/app_configs.py
?
danswer-stack-api_server-1 | Traceback (most recent call last):
danswer-stack-api_server-1 | File "/usr/local/bin/uvicorn", line 8, in <module>
danswer-stack-api_server-1 | sys.exit(main())
danswer-stack-api_server-1 | ^^^^^^
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
danswer-stack-api_server-1 | return self.main(*args, **kwargs)
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
danswer-stack-api_server-1 | rv = self.invoke(ctx)
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
danswer-stack-api_server-1 | return ctx.invoke(self.callback, **ctx.params)
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
danswer-stack-api_server-1 | return __callback(*args, **kwargs)
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/site-packages/uvicorn/main.py", line 403, in main
danswer-stack-api_server-1 | run(
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/site-packages/uvicorn/main.py", line 568, in run
danswer-stack-api_server-1 | server.run()
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/site-packages/uvicorn/server.py", line 59, in run
danswer-stack-api_server-1 | return asyncio.run(self.serve(sockets=sockets))
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
danswer-stack-api_server-1 | return runner.run(main)
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
danswer-stack-api_server-1 | return self._loop.run_until_complete(task)
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
danswer-stack-api_server-1 | return future.result()
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/site-packages/uvicorn/server.py", line 66, in serve
danswer-stack-api_server-1 | config.load()
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/site-packages/uvicorn/config.py", line 471, in load
danswer-stack-api_server-1 | self.loaded_app = import_from_string(self.app)
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/site-packages/uvicorn/importer.py", line 21, in import_from_string
danswer-stack-api_server-1 | module = importlib.import_module(module_str)
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
danswer-stack-api_server-1 | return _bootstrap._gcd_import(name[level:], package, level)
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
danswer-stack-api_server-1 | File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
danswer-stack-api_server-1 | File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
danswer-stack-api_server-1 | File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
danswer-stack-api_server-1 | File "<frozen importlib._bootstrap_external>", line 940, in exec_module
danswer-stack-api_server-1 | File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
danswer-stack-api_server-1 | File "/app/danswer/main.py", line 26, in <module>
danswer-stack-api_server-1 | from danswer.direct_qa import get_default_backend_qa_model
danswer-stack-api_server-1 | File "/app/danswer/direct_qa/__init__.py", line 6, in <module>
danswer-stack-api_server-1 | from danswer.direct_qa.gpt_4_all import GPT4AllChatCompletionQA
danswer-stack-api_server-1 | File "/app/danswer/direct_qa/gpt_4_all.py", line 18, in <module>
danswer-stack-api_server-1 | from gpt4all import GPT4All # type:ignore
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/site-packages/gpt4all/__init__.py", line 1, in <module>
danswer-stack-api_server-1 | from . import gpt4all # noqa
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/site-packages/gpt4all/gpt4all.py", line 6, in <module>
danswer-stack-api_server-1 | from . import pyllmodel
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/site-packages/gpt4all/pyllmodel.py", line 39, in <module>
danswer-stack-api_server-1 | llmodel, llama = load_llmodel_library()
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/site-packages/gpt4all/pyllmodel.py", line 32, in load_llmodel_library
danswer-stack-api_server-1 | llama_lib = ctypes.CDLL(llama_dir, mode=ctypes.RTLD_GLOBAL)
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | File "/usr/local/lib/python3.11/ctypes/__init__.py", line 376, in __init__
danswer-stack-api_server-1 | self._handle = _dlopen(self._name, mode)
danswer-stack-api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1 | OSError: /usr/local/lib/python3.11/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libllama.so: cannot open shared object file: No such file or directory
danswer-stack-api_server-1 exited with code 0
For companies that are on Microsoft, you really need a connector for Teams like for Slack.
Something similar to: https://github.com/emptycrown/llama-hub/blob/main/llama_hub/airtable/base.py
Notion gained tons of popularity as a team wiki tool. Would be great to have a Notion connecter a la Notion Loader
Please add an "advanced settings" section on the document upload/ingestion page exposing the chunking options before tokenizing.
More important among them
Chunk size (in characters) New models with better indexing capabilities are appearing and it's very possible we'll get some upgrade on ada 002 witha higher token limit.
Chunk overlap: Having a small overlapping text between the end of one chunk and the start of the other improves vector DB search results.
Metadata edit: columns per chunk: chunk number, document page, document name and, if possible, an optional field to add a some additional info (alternative document address, for example).
Stack Trace:
failed to solve: process "/bin/sh -c pip install --no-cache-dir --upgrade -r /tmp/requirements.txt" did not complete successfully: exit code: 2
Hi,
Perhaps you could put together a Developer Guide for contributing connectors and add it to https://docs.danswer.dev/connectors/overview?
I just opened a couple of connector issues myself and I see there were a couple of other ones before me. If it's not too difficult this is where the community and open source could really shine.
I think this approach has worked really well for projects like Ruff for example.
All the best!
Similar to #137
Are there any plans to allow contributions of new Connectors, or some other way of contributing to Connector availability?
I can't find this information in readme's. OpenAI API key is required for the Danswer to be able to generate answers, which begs the question: which data from connected services are shared with OpenAI?
The quickstart instructions recommend two different approaches:
docker compose -f docker-compose.dev.yml -p danswer-stack up -d --pull always --force-recreate
docker compose -f docker-compose.dev.yml -p danswer-stack up -d --build --force-recreate
These work almost all the way, but get stuck while trying to start danswer-stack-relational_db-1
=> CACHED [web_server runner 5/6] COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./ 0.0s
=> CACHED [web_server runner 6/6] COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static 0.0s
=> [web_server] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:1bf5ef8a1d68f13ee6d7431887c9e38634e0db6ed2a0b0a7df69f9f47c9c63bc 0.0s
=> => naming to docker.io/danswer/danswer-web-server:latest 0.0s
[+] Running 6/7
⠿ Container danswer-stack-relational_db-1 Starting 20.6s
✔ Container danswer-stack-search_engine-1 Started 10.8s
✔ Container danswer-stack-vector_db-1 Started 10.7s
✔ Container danswer-stack-background-1 Recreated 0.1s
✔ Container danswer-stack-api_server-1 Recreated 0.2s
✔ Container danswer-stack-web_server-1 Recreated 0.1s
✔ Container danswer-stack-nginx-1 Recreated 0.1s
There is no printed error, but the container never starts correctly.
Docker version:
Docker version 24.0.2, build cb74dfc
.txt文件现在使用太局限了,可以支持PDF吗
Similar to https://docs.danswer.dev/connectors/github
I see this is already on your roadmap on https://docs.danswer.dev/connectors/overview so just putting this here for tracking.
I'm doing a lot of tech support by email and I have a lot of recurrent questions. Is is possible to integrate gmail as well?
Thanks!
Currently, users have to wait to see search results until AI returned something (or timed out) which might take a few seconds.
Proposal: Show results of semantic search/keyword search as soon as they become available. Later show AI results.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.