GithubHelp home page GithubHelp logo

amcat4py's People

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

amcat4py's Issues

Authentication method not working if amcat4 is on different machine

Just tried to log into an amcat instance I set up through docker on my server. .env looks liks this:

# Host this instance is served at (needed for checking tokens)
amcat4_host=http://192.168.2.180:8069/amcat

# Elasticsearch password. This the password for the 'elastic' user when Elastic xpack security is enabled
#amcat4_elastic_password=

# Elasticsearch host. Default: https://localhost:9200 if elastic_password is set, http://localhost:9200 otherwise
amcat4_elastic_host=http://elastic7:9200

# Elasticsearch verify SSL (only used if elastic_password is set). Default: True unless host is localhost)
amcat4_elastic_verify_ssl=True

# Do we require authorization?
# Valid options:
# - no_auth: everyone (that can reach the server) can do anything they want
# - allow_guests: everyone can use the server, dependent on index-level guest_role authorization settings
# - allow_authenticated_guests: everyone can use the server, if they have a valid middlecat login, and dependent on index-level guest_role authorization settings
# - authorized_users_only: only people with a valid middlecat login and an explicit server role can use the server
amcat4_auth=allow_authenticated_guests

# Middlecat server to trust as ID provider
amcat4_middlecat_url=https://middlecat.up.railway.app

# Email address for a hardcoded admin email (useful for setup and recovery)
[email protected]

# Elasticsearch index to store authorization information in
amcat4_system_index=amcat4_system

When I use amcat.login(), python is stuck at Waiting for authorization in browser... as the redirect seemingly does not work. The address generated by middlecat is:

https://middlecat.up.railway.app/authorize?response_type=code&client_id=amcat4py&redirect_uri=http%3A%2F%2Flocalhost%3A65432%2F&state=Y6igGmWz7aQGsvlHWUd68yV7mK4Ljd&code_challenge=UU5NSzZcSgBVU5g3d4ltDgs4xlhUUODxsFfMwCly538&code_challenge_method=S256&resource=http%3A%2F%2F192.168.2.180%3A8069%2Famcat&refresh_mode=static&session_type=api_key

http://localhost:65432/ is open, but the code is seemingly not sent through.

Rename private methods

I don't want to be nagging, but I think the low level methods (e.g., put, url) should be private methods (_put, _url). Not that it makes a big difference in python, but is probably better to hide that stuff from new users

Chunked upload_documents

Extend upload_documents to allow for chunked uploads of documents (steal from copy_index.py). Add a progress bar for large uploads and maybe add some some documentation so the user knows right away what the server expects

Incompatibility with 'crypography' 40.1

Installed package via pip install git+https://... but get an error when running

from amcat4py import AmcatClient
amcat = AmcatClient("http://localhost/amcat")
amcat.login()

---
TypeError                                 Traceback (most recent call last)
Cell In[2], line 2
      1 from amcat4py import AmcatClient
----> 2 amcat = AmcatClient("http://localhost/amcat")
      3 amcat.login()

File [~/anaconda3/envs/opted/lib/python3.11/site-packages/amcat4py/amcatclient.py:44](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/peter/uni/ParLawSpeechDashboard/~/anaconda3/envs/opted/lib/python3.11/site-packages/amcat4py/amcatclient.py:44), in AmcatClient.__init__(self, host, ignore_tz)
     42 self.server_config = self.get_server_config()
     43 # If we have a token cached, load it. Otherwise, only log in if explicitly requested
---> 44 self.token = _get_token(self.host, login_if_needed=False)

File [~/anaconda3/envs/opted/lib/python3.11/site-packages/amcat4py/auth.py:132](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/peter/uni/ParLawSpeechDashboard/~/anaconda3/envs/opted/lib/python3.11/site-packages/amcat4py/auth.py:132), in _get_token(host, force_refresh, login_if_needed)
    130 file_path = user_cache_dir(CLIENT_ID) + "/" + sha256(host.encode()).hexdigest()
    131 if os.path.exists(file_path) and not force_refresh:
--> 132     token = secret_read(file_path, host)
    133 elif login_if_needed:
    134     token = get_middlecat_token(host)

File [~/anaconda3/envs/opted/lib/python3.11/site-packages/amcat4py/auth.py:176](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/peter/uni/ParLawSpeechDashboard/~/anaconda3/envs/opted/lib/python3.11/site-packages/amcat4py/auth.py:176), in secret_read(path, host)
    174 with open(path, "rb") as f:
    175     token_enc = f.read()
--> 176 fernet = Fernet(make_key(host))
    177 return loads(fernet.decrypt(token_enc).decode())

File [~/anaconda3/envs/opted/lib/python3.11/site-packages/amcat4py/auth.py:191](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/peter/uni/ParLawSpeechDashboard/~/anaconda3/envs/opted/lib/python3.11/site-packages/amcat4py/auth.py:191), in make_key(key)
    181 """
    182 Helper function to make key for encryption of tokens
    183 :param key: string that is turned into key.
    184 """
    185 kdf = PBKDF2HMAC(
    186     algorithm=sha256(),
    187     length=32,
    188     salt="supergeheim".encode(),
    189     iterations=5,
    190 )
--> 191 return urlsafe_b64encode(kdf.derive(key.encode()))

File [~/anaconda3/envs/opted/lib/python3.11/site-packages/cryptography/hazmat/primitives/kdf/pbkdf2.py:53](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/peter/uni/ParLawSpeechDashboard/~/anaconda3/envs/opted/lib/python3.11/site-packages/cryptography/hazmat/primitives/kdf/pbkdf2.py:53), in PBKDF2HMAC.derive(self, key_material)
     50     raise AlreadyFinalized("PBKDF2 instances can only be used once.")
     51 self._used = True
---> 53 return rust_openssl.kdf.derive_pbkdf2_hmac(
     54     key_material,
     55     self._algorithm,
     56     self._salt,
     57     self._iterations,
     58     self._length,
     59 )

It works when downgrading cryptgraphy to 40.0.2

Misleading exception

When I upload a document to amcat4 via amcat4py and forget the required fields (title, text, date) the following exception is being raised

Unexpected exception formatting exception. Falling back to standard exception
Traceback (most recent call last):
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/amcat4py/amcatclient.py", line 84, in _request
    r.raise_for_status()
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: http://localhost/amcat/index/speeches_aut/documents

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3433, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_31037/2743098145.py", line 1, in <module>
    amcat.upload_documents("speeches_aut", speeches_aut.to_dicts())
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/amcat4py/amcatclient.py", line 267, in upload_documents
    self._post("documents", index=index, json=body)
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/amcat4py/amcatclient.py", line 100, in _post
    return self._request("post", url=self._url(url, index), data=data, headers=headers, ignore_status=ignore_status)
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/amcat4py/amcatclient.py", line 86, in _request
    raise AmcatError(e.response, e.request) from e
amcat4py.amcatclient.AmcatError: Error from server (422): [{'loc': ['body', 'documents', 0, 'title']
[...]
'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ['body', 'documents', 98, 'title'], 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ['body', 'documents', 99, 'title'], 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ['body', 'documents', 100, 'title'], 'msg': 'field required', 'type': 'value_error.missing'}]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 2052, in showtraceback
    stb = self.InteractiveTB.structured_traceback(
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1118, in structured_traceback
    return FormattedTB.structured_traceback(
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1012, in structured_traceback
    return VerboseTB.structured_traceback(
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/IPython/core/ultratb.py", line 865, in structured_traceback
    formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/IPython/core/ultratb.py", line 818, in format_exception_as_a_whole
    frames.append(self.format_record(r))
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/IPython/core/ultratb.py", line 736, in format_record
    result += ''.join(_format_traceback_lines(frame_info.lines, Colors, self.has_colors, lvals))
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/stack_data/core.py", line 734, in lines
    pieces = self.included_pieces
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/stack_data/core.py", line 677, in included_pieces
    scope_pieces = self.scope_pieces
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/stack_data/core.py", line 614, in scope_pieces
    scope_start, scope_end = self.source.line_range(self.scope)
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/stack_data/core.py", line 178, in line_range
    return line_range(self.asttext(), node)
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/executing/executing.py", line 428, in asttext
    self._asttext = ASTText(self.text, tree=self.tree, filename=self.filename)
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/asttokens/asttokens.py", line 307, in __init__
    super(ASTText, self).__init__(source_text, filename)
  File "/home/peter/anaconda3/envs/vds/lib/python3.10/site-packages/asttokens/asttokens.py", line 44, in __init__
    source_text = six.ensure_text(source_text)
AttributeError: module 'six' has no attribute 'ensure_text'

The client seems not to handle 422 error or forward the server's error msg

This differs from amcat4r, which prints The fields title, date, and text are required and can never be NA

Calls to users endpoint not implemented

Listing, adding, removing and modifying index users is currently not implemented, as far as I can see. That is:

  • GET /index/{ix}/users
  • POST /index/{ix}/users
  • DELETE /index/{ix}/users/{email}
  • PUT /index/{ix}/users/{email}

Update PyPi release

I would like to use the query_aggregate function in my current project. Can you update the PyPi amcat4py package so that I can update my local installation and use the new features? Or is a GitHub installation adviced?

amcat = AmcatClient("http://localhost/amcat")
amcat.query_aggregate(...)
---
'AmcatClient' object has no attribute 'query_aggregate'

Error thrown as the package is outdated.

HTTP Exceptions not correctly raised

When performing a bad request, the Amcat Client always returns HTTP Status Code 500, even though the error generated is 422 (or a different type).

Furthermore, error messages are not forwarded to the user. Performing a bad request with the client raises this exception:

requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://opted.amcat.nl/api/index/wp3/documents

Performing the same request manually raises this:

requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://opted.amcat.nl/api/index/wp3/documents

r.text
'{"detail":[{"loc":["body","documents"],"msg":"field required","type":"value_error.missing"},{"loc":["body","columns"],"msg":"field required","type":"value_error.missing"}]}'

No way to request a batch?

Looks like to get a batch of say 1000 documents, one needs to go through _post directly, as there is no way to stop query() or documents(), once they start pulling results:

amcat = AmcatClient("http://localhost/amcat")
body = dict(queries="test", 
            fields=["_id", "text"],
            page=0, 
            per_page=10)
res = amcat._post("query", index="state_of_the_union", json=body, ignore_status=[404]).json()  
len(res['results'])

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.