GithubHelp home page GithubHelp logo

Comments (9)

eyurtsev avatar eyurtsev commented on May 10, 2024

What environment are you running the code under?

from kor.

natea avatar natea commented on May 10, 2024

After running pip install kor it still complained about the following packages that were missing: selenium, unstructured and markdownify.

So I made a requirements.txt file and installed it using pip install -r requirements.txt

kor
selenium
unstructured
markdownify

Here's the result of that install:

(kor) nateaune@Nates-MBP kor % pip install -r requirements.txt
Collecting kor (from -r requirements.txt (line 1))
  Obtaining dependency information for kor from https://files.pythonhosted.org/packages/ab/91/1b349269b587594461361c60acd62a90bd101ae7aaea709746603ee06326/kor-0.13.0-py3-none-any.whl.metadata
  Using cached kor-0.13.0-py3-none-any.whl.metadata (6.2 kB)
Collecting selenium (from -r requirements.txt (line 2))
  Obtaining dependency information for selenium from https://files.pythonhosted.org/packages/10/56/8288d1813a68c1e0638515dbb777fce6d87d0d240e683216f956145310e6/selenium-4.11.2-py3-none-any.whl.metadata
  Using cached selenium-4.11.2-py3-none-any.whl.metadata (7.0 kB)
Collecting unstructured (from -r requirements.txt (line 3))
  Obtaining dependency information for unstructured from https://files.pythonhosted.org/packages/a7/98/5ccd2b4003c6a38303832c6170bee1c3821202771121abe7af81c2adbe05/unstructured-0.9.2-py3-none-any.whl.metadata
  Downloading unstructured-0.9.2-py3-none-any.whl.metadata (23 kB)
Collecting markdownify (from -r requirements.txt (line 4))
  Using cached markdownify-0.11.6-py3-none-any.whl (16 kB)
Collecting langchain>=0.0.205 (from kor->-r requirements.txt (line 1))
  Obtaining dependency information for langchain>=0.0.205 from https://files.pythonhosted.org/packages/3d/3b/e1b71f46dd68182f781483ec6ec13db1afc359f93cac19dd0accbad536c1/langchain-0.0.262-py3-none-any.whl.metadata
  Downloading langchain-0.0.262-py3-none-any.whl.metadata (15 kB)
Collecting openai<0.28,>=0.27 (from kor->-r requirements.txt (line 1))
  Obtaining dependency information for openai<0.28,>=0.27 from https://files.pythonhosted.org/packages/67/78/7588a047e458cb8075a4089d721d7af5e143ff85a2388d4a28c530be0494/openai-0.27.8-py3-none-any.whl.metadata
  Using cached openai-0.27.8-py3-none-any.whl.metadata (13 kB)
Collecting pandas<2.0.0,>=1.5.3 (from kor->-r requirements.txt (line 1))
  Using cached pandas-1.5.3-cp310-cp310-macosx_10_9_x86_64.whl (12.0 MB)
Collecting urllib3[socks]<3,>=1.26 (from selenium->-r requirements.txt (line 2))
  Obtaining dependency information for urllib3[socks]<3,>=1.26 from https://files.pythonhosted.org/packages/9b/81/62fd61001fa4b9d0df6e31d47ff49cfa9de4af03adecf339c7bc30656b37/urllib3-2.0.4-py3-none-any.whl.metadata
  Downloading urllib3-2.0.4-py3-none-any.whl.metadata (6.6 kB)
Collecting trio~=0.17 (from selenium->-r requirements.txt (line 2))
  Obtaining dependency information for trio~=0.17 from https://files.pythonhosted.org/packages/a3/dd/b61fa61b186d3267ef3903048fbee29132963ae762fb70b08d4a3cd6f7aa/trio-0.22.2-py3-none-any.whl.metadata
  Using cached trio-0.22.2-py3-none-any.whl.metadata (4.7 kB)
Collecting trio-websocket~=0.9 (from selenium->-r requirements.txt (line 2))
  Obtaining dependency information for trio-websocket~=0.9 from https://files.pythonhosted.org/packages/a5/a6/06e2373f95c12e9e8f6b910a76c86e375348ead77ab476230640666310fb/trio_websocket-0.10.3-py3-none-any.whl.metadata
  Using cached trio_websocket-0.10.3-py3-none-any.whl.metadata (4.6 kB)
Collecting certifi>=2021.10.8 (from selenium->-r requirements.txt (line 2))
  Obtaining dependency information for certifi>=2021.10.8 from https://files.pythonhosted.org/packages/4c/dd/2234eab22353ffc7d94e8d13177aaa050113286e93e7b40eae01fbf7c3d9/certifi-2023.7.22-py3-none-any.whl.metadata
  Downloading certifi-2023.7.22-py3-none-any.whl.metadata (2.2 kB)
Collecting chardet (from unstructured->-r requirements.txt (line 3))
  Obtaining dependency information for chardet from https://files.pythonhosted.org/packages/38/6f/f5fbc992a329ee4e0f288c1fe0e2ad9485ed064cac731ed2fe47dcc38cbf/chardet-5.2.0-py3-none-any.whl.metadata
  Using cached chardet-5.2.0-py3-none-any.whl.metadata (3.4 kB)
Collecting filetype (from unstructured->-r requirements.txt (line 3))
  Using cached filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Collecting python-magic (from unstructured->-r requirements.txt (line 3))
  Using cached python_magic-0.4.27-py2.py3-none-any.whl (13 kB)
Collecting lxml (from unstructured->-r requirements.txt (line 3))
  Obtaining dependency information for lxml from https://files.pythonhosted.org/packages/78/8d/96b95d704fab4a95651ceeb6022855ae5a3c631f86c6647749a2e868af92/lxml-4.9.3-cp310-cp310-macosx_11_0_x86_64.whl.metadata
  Using cached lxml-4.9.3-cp310-cp310-macosx_11_0_x86_64.whl.metadata (3.8 kB)
Collecting nltk (from unstructured->-r requirements.txt (line 3))
  Using cached nltk-3.8.1-py3-none-any.whl (1.5 MB)
Collecting tabulate (from unstructured->-r requirements.txt (line 3))
  Using cached tabulate-0.9.0-py3-none-any.whl (35 kB)
Collecting requests (from unstructured->-r requirements.txt (line 3))
  Obtaining dependency information for requests from https://files.pythonhosted.org/packages/70/8e/0e2d847013cb52cd35b38c009bb167a1a26b2ce6cd6965bf26b47bc0bf44/requests-2.31.0-py3-none-any.whl.metadata
  Downloading requests-2.31.0-py3-none-any.whl.metadata (4.6 kB)
Collecting beautifulsoup4<5,>=4.9 (from markdownify->-r requirements.txt (line 4))
  Using cached beautifulsoup4-4.12.2-py3-none-any.whl (142 kB)
Collecting six<2,>=1.15 (from markdownify->-r requirements.txt (line 4))
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting soupsieve>1.2 (from beautifulsoup4<5,>=4.9->markdownify->-r requirements.txt (line 4))
  Using cached soupsieve-2.4.1-py3-none-any.whl (36 kB)
Collecting PyYAML>=5.3 (from langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Obtaining dependency information for PyYAML>=5.3 from https://files.pythonhosted.org/packages/96/06/4beb652c0fe16834032e54f0956443d4cc797fe645527acee59e7deaa0a2/PyYAML-6.0.1-cp310-cp310-macosx_10_9_x86_64.whl.metadata
  Using cached PyYAML-6.0.1-cp310-cp310-macosx_10_9_x86_64.whl.metadata (2.1 kB)
Collecting SQLAlchemy<3,>=1.4 (from langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Obtaining dependency information for SQLAlchemy<3,>=1.4 from https://files.pythonhosted.org/packages/ae/42/101761a65b8d83efa5d87cbb61144dae557ed60087daeae89e965449963f/SQLAlchemy-2.0.19-cp310-cp310-macosx_10_9_x86_64.whl.metadata
  Using cached SQLAlchemy-2.0.19-cp310-cp310-macosx_10_9_x86_64.whl.metadata (9.4 kB)
Collecting aiohttp<4.0.0,>=3.8.3 (from langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Obtaining dependency information for aiohttp<4.0.0,>=3.8.3 from https://files.pythonhosted.org/packages/f3/56/a5a062bc98e8d5848f7790963771f8354f488726a59fd650742ca7391171/aiohttp-3.8.5-cp310-cp310-macosx_10_9_x86_64.whl.metadata
  Downloading aiohttp-3.8.5-cp310-cp310-macosx_10_9_x86_64.whl.metadata (7.7 kB)
Collecting async-timeout<5.0.0,>=4.0.0 (from langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Obtaining dependency information for async-timeout<5.0.0,>=4.0.0 from https://files.pythonhosted.org/packages/a7/fa/e01228c2938de91d47b307831c62ab9e4001e747789d0b05baf779a6488c/async_timeout-4.0.3-py3-none-any.whl.metadata
  Downloading async_timeout-4.0.3-py3-none-any.whl.metadata (4.2 kB)
Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Obtaining dependency information for dataclasses-json<0.6.0,>=0.5.7 from https://files.pythonhosted.org/packages/97/5f/e7cc90f36152810cab08b6c9c1125e8bcb9d76f8b3018d101b5f877b386c/dataclasses_json-0.5.14-py3-none-any.whl.metadata
  Downloading dataclasses_json-0.5.14-py3-none-any.whl.metadata (22 kB)
Collecting langsmith<0.1.0,>=0.0.11 (from langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Obtaining dependency information for langsmith<0.1.0,>=0.0.11 from https://files.pythonhosted.org/packages/a9/37/c07b98cdbf680714bf7fc7fa653cb722eff56a20df4232adc973fa98da30/langsmith-0.0.21-py3-none-any.whl.metadata
  Downloading langsmith-0.0.21-py3-none-any.whl.metadata (10 kB)
Collecting numexpr<3.0.0,>=2.8.4 (from langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Obtaining dependency information for numexpr<3.0.0,>=2.8.4 from https://files.pythonhosted.org/packages/88/3c/8af55554773ff8d5ed344050fb09788966c9a5b63e9d8de28b60f5a04fa8/numexpr-2.8.5-cp310-cp310-macosx_10_9_x86_64.whl.metadata
  Using cached numexpr-2.8.5-cp310-cp310-macosx_10_9_x86_64.whl.metadata (8.0 kB)
Collecting numpy<2,>=1 (from langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Obtaining dependency information for numpy<2,>=1 from https://files.pythonhosted.org/packages/d5/50/8aedb5ff1460e7c8527af15c6326115009e7c270ec705487155b779ebabb/numpy-1.25.2-cp310-cp310-macosx_10_9_x86_64.whl.metadata
  Downloading numpy-1.25.2-cp310-cp310-macosx_10_9_x86_64.whl.metadata (5.6 kB)
Collecting openapi-schema-pydantic<2.0,>=1.2 (from langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Using cached openapi_schema_pydantic-1.2.4-py3-none-any.whl (90 kB)
Collecting pydantic<2,>=1 (from langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Obtaining dependency information for pydantic<2,>=1 from https://files.pythonhosted.org/packages/58/26/ca79779dc217222d308254b4d4312108c4ac334fb63d97596e0ba0982868/pydantic-1.10.12-cp310-cp310-macosx_10_9_x86_64.whl.metadata
  Using cached pydantic-1.10.12-cp310-cp310-macosx_10_9_x86_64.whl.metadata (149 kB)
Collecting tenacity<9.0.0,>=8.1.0 (from langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Using cached tenacity-8.2.2-py3-none-any.whl (24 kB)
Collecting tqdm (from openai<0.28,>=0.27->kor->-r requirements.txt (line 1))
  Obtaining dependency information for tqdm from https://files.pythonhosted.org/packages/00/e5/f12a80907d0884e6dff9c16d0c0114d81b8cd07dc3ae54c5e962cc83037e/tqdm-4.66.1-py3-none-any.whl.metadata
  Downloading tqdm-4.66.1-py3-none-any.whl.metadata (57 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.6/57.6 kB 5.3 MB/s eta 0:00:00
Collecting python-dateutil>=2.8.1 (from pandas<2.0.0,>=1.5.3->kor->-r requirements.txt (line 1))
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pytz>=2020.1 (from pandas<2.0.0,>=1.5.3->kor->-r requirements.txt (line 1))
  Using cached pytz-2023.3-py2.py3-none-any.whl (502 kB)
Collecting charset-normalizer<4,>=2 (from requests->unstructured->-r requirements.txt (line 3))
  Obtaining dependency information for charset-normalizer<4,>=2 from https://files.pythonhosted.org/packages/81/a0/96317ce912b512b7998434eae5e24b28bcc5f1680ad85348e31e1ca56332/charset_normalizer-3.2.0-cp310-cp310-macosx_10_9_x86_64.whl.metadata
  Downloading charset_normalizer-3.2.0-cp310-cp310-macosx_10_9_x86_64.whl.metadata (31 kB)
Collecting idna<4,>=2.5 (from requests->unstructured->-r requirements.txt (line 3))
  Using cached idna-3.4-py3-none-any.whl (61 kB)
Collecting attrs>=20.1.0 (from trio~=0.17->selenium->-r requirements.txt (line 2))
  Using cached attrs-23.1.0-py3-none-any.whl (61 kB)
Collecting sortedcontainers (from trio~=0.17->selenium->-r requirements.txt (line 2))
  Using cached sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Collecting outcome (from trio~=0.17->selenium->-r requirements.txt (line 2))
  Using cached outcome-1.2.0-py2.py3-none-any.whl (9.7 kB)
Collecting sniffio (from trio~=0.17->selenium->-r requirements.txt (line 2))
  Using cached sniffio-1.3.0-py3-none-any.whl (10 kB)
Collecting exceptiongroup>=1.0.0rc9 (from trio~=0.17->selenium->-r requirements.txt (line 2))
  Obtaining dependency information for exceptiongroup>=1.0.0rc9 from https://files.pythonhosted.org/packages/fe/17/f43b7c9ccf399d72038042ee72785c305f6c6fdc6231942f8ab99d995742/exceptiongroup-1.1.2-py3-none-any.whl.metadata
  Using cached exceptiongroup-1.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting wsproto>=0.14 (from trio-websocket~=0.9->selenium->-r requirements.txt (line 2))
  Using cached wsproto-1.2.0-py3-none-any.whl (24 kB)
Collecting pysocks!=1.5.7,<2.0,>=1.5.6 (from urllib3[socks]<3,>=1.26->selenium->-r requirements.txt (line 2))
  Using cached PySocks-1.7.1-py3-none-any.whl (16 kB)
Collecting click (from nltk->unstructured->-r requirements.txt (line 3))
  Obtaining dependency information for click from https://files.pythonhosted.org/packages/1a/70/e63223f8116931d365993d4a6b7ef653a4d920b41d03de7c59499962821f/click-8.1.6-py3-none-any.whl.metadata
  Using cached click-8.1.6-py3-none-any.whl.metadata (3.0 kB)
Collecting joblib (from nltk->unstructured->-r requirements.txt (line 3))
  Obtaining dependency information for joblib from https://files.pythonhosted.org/packages/10/40/d551139c85db202f1f384ba8bcf96aca2f329440a844f924c8a0040b6d02/joblib-1.3.2-py3-none-any.whl.metadata
  Using cached joblib-1.3.2-py3-none-any.whl.metadata (5.4 kB)
Collecting regex>=2021.8.3 (from nltk->unstructured->-r requirements.txt (line 3))
  Obtaining dependency information for regex>=2021.8.3 from https://files.pythonhosted.org/packages/6b/20/8a419181449227182d61908484477d23d01b2b35211a45e838b746da8bb4/regex-2023.8.8-cp310-cp310-macosx_10_9_x86_64.whl.metadata
  Using cached regex-2023.8.8-cp310-cp310-macosx_10_9_x86_64.whl.metadata (40 kB)
Collecting multidict<7.0,>=4.5 (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Using cached multidict-6.0.4-cp310-cp310-macosx_10_9_x86_64.whl (29 kB)
Collecting yarl<2.0,>=1.0 (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Using cached yarl-1.9.2-cp310-cp310-macosx_10_9_x86_64.whl (65 kB)
Collecting frozenlist>=1.1.1 (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Obtaining dependency information for frozenlist>=1.1.1 from https://files.pythonhosted.org/packages/a3/5b/c785feda30d9fda8c1b1a11941e91253f59aeaf13e87ebe908d0f3f6c628/frozenlist-1.4.0-cp310-cp310-macosx_10_9_x86_64.whl.metadata
  Downloading frozenlist-1.4.0-cp310-cp310-macosx_10_9_x86_64.whl.metadata (5.2 kB)
Collecting aiosignal>=1.1.2 (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Using cached aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.6.0,>=0.5.7->langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Obtaining dependency information for marshmallow<4.0.0,>=3.18.0 from https://files.pythonhosted.org/packages/ed/3c/cebfdcad015240014ff08b883d1c0c427f2ba45ae8c6572851b6ef136cad/marshmallow-3.20.1-py3-none-any.whl.metadata
  Using cached marshmallow-3.20.1-py3-none-any.whl.metadata (7.8 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.6.0,>=0.5.7->langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Obtaining dependency information for typing-inspect<1,>=0.4.0 from https://files.pythonhosted.org/packages/65/f3/107a22063bf27bdccf2024833d3445f4eea42b2e598abfbd46f6a63b6cb0/typing_inspect-0.9.0-py3-none-any.whl.metadata
  Using cached typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting typing-extensions>=4.2.0 (from pydantic<2,>=1->langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Obtaining dependency information for typing-extensions>=4.2.0 from https://files.pythonhosted.org/packages/ec/6b/63cc3df74987c36fe26157ee12e09e8f9db4de771e0f3404263117e75b95/typing_extensions-4.7.1-py3-none-any.whl.metadata
  Using cached typing_extensions-4.7.1-py3-none-any.whl.metadata (3.1 kB)
Collecting greenlet!=0.4.17 (from SQLAlchemy<3,>=1.4->langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Using cached greenlet-2.0.2-cp310-cp310-macosx_11_0_x86_64.whl (242 kB)
Collecting h11<1,>=0.9.0 (from wsproto>=0.14->trio-websocket~=0.9->selenium->-r requirements.txt (line 2))
  Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Collecting packaging>=17.0 (from marshmallow<4.0.0,>=3.18.0->dataclasses-json<0.6.0,>=0.5.7->langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Using cached packaging-23.1-py3-none-any.whl (48 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.6.0,>=0.5.7->langchain>=0.0.205->kor->-r requirements.txt (line 1))
  Using cached mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)
Using cached kor-0.13.0-py3-none-any.whl (29 kB)
Using cached selenium-4.11.2-py3-none-any.whl (7.2 MB)
Downloading unstructured-0.9.2-py3-none-any.whl (1.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 57.4 MB/s eta 0:00:00
Using cached certifi-2023.7.22-py3-none-any.whl (158 kB)
Downloading langchain-0.0.262-py3-none-any.whl (1.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 63.4 MB/s eta 0:00:00
Using cached openai-0.27.8-py3-none-any.whl (73 kB)
Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Using cached trio-0.22.2-py3-none-any.whl (400 kB)
Using cached trio_websocket-0.10.3-py3-none-any.whl (17 kB)
Using cached chardet-5.2.0-py3-none-any.whl (199 kB)
Using cached lxml-4.9.3-cp310-cp310-macosx_11_0_x86_64.whl (4.8 MB)
Using cached aiohttp-3.8.5-cp310-cp310-macosx_10_9_x86_64.whl (365 kB)
Downloading async_timeout-4.0.3-py3-none-any.whl (5.7 kB)
Using cached charset_normalizer-3.2.0-cp310-cp310-macosx_10_9_x86_64.whl (126 kB)
Downloading dataclasses_json-0.5.14-py3-none-any.whl (26 kB)
Using cached exceptiongroup-1.1.2-py3-none-any.whl (14 kB)
Downloading langsmith-0.0.21-py3-none-any.whl (32 kB)
Using cached numexpr-2.8.5-cp310-cp310-macosx_10_9_x86_64.whl (101 kB)
Downloading numpy-1.25.2-cp310-cp310-macosx_10_9_x86_64.whl (20.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.8/20.8 MB 56.0 MB/s eta 0:00:00
Using cached pydantic-1.10.12-cp310-cp310-macosx_10_9_x86_64.whl (2.9 MB)
Using cached PyYAML-6.0.1-cp310-cp310-macosx_10_9_x86_64.whl (189 kB)
Using cached regex-2023.8.8-cp310-cp310-macosx_10_9_x86_64.whl (294 kB)
Using cached SQLAlchemy-2.0.19-cp310-cp310-macosx_10_9_x86_64.whl (2.0 MB)
Downloading urllib3-2.0.4-py3-none-any.whl (123 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 123.9/123.9 kB 13.2 MB/s eta 0:00:00
Using cached click-8.1.6-py3-none-any.whl (97 kB)
Using cached joblib-1.3.2-py3-none-any.whl (302 kB)
Downloading tqdm-4.66.1-py3-none-any.whl (78 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.3/78.3 kB 10.7 MB/s eta 0:00:00
Using cached frozenlist-1.4.0-cp310-cp310-macosx_10_9_x86_64.whl (46 kB)
Using cached marshmallow-3.20.1-py3-none-any.whl (49 kB)
Using cached typing_extensions-4.7.1-py3-none-any.whl (33 kB)
Using cached typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
Installing collected packages: sortedcontainers, pytz, filetype, urllib3, typing-extensions, tqdm, tenacity, tabulate, soupsieve, sniffio, six, regex, PyYAML, python-magic, pysocks, packaging, numpy, mypy-extensions, multidict, lxml, joblib, idna, h11, greenlet, frozenlist, exceptiongroup, click, charset-normalizer, chardet, certifi, attrs, async-timeout, yarl, wsproto, typing-inspect, SQLAlchemy, requests, python-dateutil, pydantic, outcome, numexpr, nltk, marshmallow, beautifulsoup4, aiosignal, unstructured, trio, pandas, openapi-schema-pydantic, markdownify, langsmith, dataclasses-json, aiohttp, trio-websocket, openai, langchain, selenium, kor
Successfully installed PyYAML-6.0.1 SQLAlchemy-2.0.19 aiohttp-3.8.5 aiosignal-1.3.1 async-timeout-4.0.3 attrs-23.1.0 beautifulsoup4-4.12.2 certifi-2023.7.22 chardet-5.2.0 charset-normalizer-3.2.0 click-8.1.6 dataclasses-json-0.5.14 exceptiongroup-1.1.2 filetype-1.2.0 frozenlist-1.4.0 greenlet-2.0.2 h11-0.14.0 idna-3.4 joblib-1.3.2 kor-0.13.0 langchain-0.0.262 langsmith-0.0.21 lxml-4.9.3 markdownify-0.11.6 marshmallow-3.20.1 multidict-6.0.4 mypy-extensions-1.0.0 nltk-3.8.1 numexpr-2.8.5 numpy-1.25.2 openai-0.27.8 openapi-schema-pydantic-1.2.4 outcome-1.2.0 packaging-23.1 pandas-1.5.3 pydantic-1.10.12 pysocks-1.7.1 python-dateutil-2.8.2 python-magic-0.4.27 pytz-2023.3 regex-2023.8.8 requests-2.31.0 selenium-4.11.2 six-1.16.0 sniffio-1.3.0 sortedcontainers-2.4.0 soupsieve-2.4.1 tabulate-0.9.0 tenacity-8.2.2 tqdm-4.66.1 trio-0.22.2 trio-websocket-0.10.3 typing-extensions-4.7.1 typing-inspect-0.9.0 unstructured-0.9.2 urllib3-2.0.4 wsproto-1.2.0 yarl-1.9.2

from kor.

natea avatar natea commented on May 10, 2024

This is the version of Python that I'm using:

(kor) nateaune@Nates-MBP kor % python example3.py
  File "/Users/nateaune/Documents/code/kor/example3.py", line 100
    document_extraction_results = await extract_from_documents(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: 'await' outside function
(kor) nateaune@Nates-MBP kor % which python
/Users/nateaune/.pyenv/shims/python
(kor) nateaune@Nates-MBP kor % python
Python 3.10.10 (main, Mar 29 2023, 14:29:38) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

from kor.

eyurtsev avatar eyurtsev commented on May 10, 2024

Ah this is async code and the issue is that it's being executed from a sync environment.

You can use jupyter notebook to run the code as it's async by default.

Or else you can do something like this:

>>> async def f(): print('hello')
... 
>>> import asyncio
>>> asyncio.run(f())
hello

Wrap the await .. code in an async function and then use asyncio.run to run the code.

from kor.

natea avatar natea commented on May 10, 2024

I was able to avoid the error, but it didn't produce the dataframe results I was expecting. Here is the code I used:

async def extract():
    with get_openai_callback() as cb:
        document_extraction_results = await extract_from_documents(
            chain, split_docs, max_concurrency=5, use_uid=False, return_exceptions=True
        )
        print(f"Total Tokens: {cb.total_tokens}")
        print(f"Prompt Tokens: {cb.prompt_tokens}")
        print(f"Completion Tokens: {cb.completion_tokens}")
        print(f"Successful Requests: {cb.successful_requests}")
        print(f"Total Cost (USD): ${cb.total_cost}")
        return document_extraction_results
    
document_extraction_results = asyncio.run(extract())

validated_data = list(
    itertools.chain.from_iterable(
        extraction["validated_data"] for extraction in document_extraction_results
    )
)
len(validated_data)

#Extraction is not perfect, but you can use a better LLM or provide more examples!

pd.DataFrame(record.dict() for record in validated_data)

from kor.

eyurtsev avatar eyurtsev commented on May 10, 2024

What did you expect? What did you get? What was the issue an exception? bad results?

from kor.

natea avatar natea commented on May 10, 2024

I guess I had expected that it would have showed me the same results as in your example:
https://eyurtsev.github.io/kor/document_extraction.html
CleanShot 2023-08-17 at 15 04 27@2x

from kor.

natea avatar natea commented on May 10, 2024

I'm seeing a lot of these errors, so maybe there's a problem with the way the ChromeDriver is set up?

Error fetching or processing a, exception: Message: invalid argument
  (Session info: headless chrome=115.0.5790.170)
Stacktrace:
0   chromedriver                        0x0000000103086a6c chromedriver + 4303468
1   chromedriver                        0x000000010307f198 chromedriver + 4272536
2   chromedriver                        0x0000000102cb33ec chromedriver + 291820
3   chromedriver                        0x0000000102c9ac44 chromedriver + 191556
4   chromedriver                        0x0000000102c988c8 chromedriver + 182472
5   chromedriver                        0x0000000102c99310 chromedriver + 185104
6   chromedriver                        0x0000000102cb5594 chromedriver + 300436
7   chromedriver                        0x0000000102d29c80 chromedriver + 777344
8   chromedriver                        0x0000000102d29628 chromedriver + 775720
9   chromedriver                        0x0000000102ce4b40 chromedriver + 494400
10  chromedriver                        0x0000000102ce5988 chromedriver + 498056
11  chromedriver                        0x0000000103047924 chromedriver + 4045092
12  chromedriver                        0x000000010304be68 chromedriver + 4062824
13  chromedriver                        0x0000000103052088 chromedriver + 4087944
14  chromedriver                        0x000000010304c96c chromedriver + 4065644
15  chromedriver                        0x0000000103024e64 chromedriver + 3903076
16  chromedriver                        0x000000010306855c chromedriver + 4179292
17  chromedriver                        0x00000001030686b4 chromedriver + 4179636
18  chromedriver                        0x0000000103078978 chromedriver + 4245880
19  libsystem_pthread.dylib             0x00000001980cbfa8 _pthread_start + 148
20  libsystem_pthread.dylib             0x00000001980c6da0 thread_start + 8

Error fetching or processing r, exception: Message: invalid argument
  (Session info: headless chrome=115.0.5790.170)
Stacktrace:
0   chromedriver                        0x0000000103086a6c chromedriver + 4303468
1   chromedriver                        0x000000010307f198 chromedriver + 4272536
2   chromedriver                        0x0000000102cb33ec chromedriver + 291820
3   chromedriver                        0x0000000102c9ac44 chromedriver + 191556
4   chromedriver                        0x0000000102c988c8 chromedriver + 182472
5   chromedriver                        0x0000000102c99310 chromedriver + 185104
6   chromedriver                        0x0000000102cb5594 chromedriver + 300436
7   chromedriver                        0x0000000102d29c80 chromedriver + 777344
8   chromedriver                        0x0000000102d29628 chromedriver + 775720
9   chromedriver                        0x0000000102ce4b40 chromedriver + 494400
10  chromedriver                        0x0000000102ce5988 chromedriver + 498056
11  chromedriver                        0x0000000103047924 chromedriver + 4045092
12  chromedriver                        0x000000010304be68 chromedriver + 4062824
13  chromedriver                        0x0000000103052088 chromedriver + 4087944
14  chromedriver                        0x000000010304c96c chromedriver + 4065644
15  chromedriver                        0x0000000103024e64 chromedriver + 3903076
16  chromedriver                        0x000000010306855c chromedriver + 4179292
17  chromedriver                        0x00000001030686b4 chromedriver + 4179636
18  chromedriver                        0x0000000103078978 chromedriver + 4245880
19  libsystem_pthread.dylib             0x00000001980cbfa8 _pthread_start + 148
20  libsystem_pthread.dylib             0x00000001980c6da0 thread_start + 8

Watch the trailer for Silo

[Silo

 Latest Episode: Jun 30](/tv/silo)

from kor.

eyurtsev avatar eyurtsev commented on May 10, 2024

@natea yeah this is an issue with the loader ```from langchain.document_loaders import SeleniumURLLoader`` i would look online to see how to resolve this

from kor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.