drgfreeman / dynamo-pandas Goto Github PK

View Code? Open in Web Editor NEW

21.0 21.0 6.0 166 KB

Make working with pandas data and AWS DynamoDB easy

Home Page: https://dynamo-pandas.readthedocs.io/en/stable/

License: MIT License

Python 100.00%

aws aws-dynamodb boto3 database dataframe deserialization dynamo-pandas dynamodb interface pandas serialization

dynamo-pandas's People

Contributors

Stargazers

Watchers

Forkers

divideby0 sreyan-ghosh philpot franco1215 j1anm1nxu zidane786

dynamo-pandas's Issues

Implement an update_df functionality

@DrGFreeman Is this repo still active?
I built an update functionality for updating selected columns in a dataframe to dynamodb using your module, and I'd want to contribute that functionality.

Typo in "UnprocessedItems" dictionary key

The key should read "UnprocessedItems":

dynamo-pandas/dynamo_pandas/transactions/transactions.py

Line 294 in 3a28921

return response["UprocessedItems"][table]

Configure Sphinx

Configure Sphinx docs and expand docstrings.

Add Python 3.10 to CI

Add Python 3.10 to tox environment and CI builds.

Make boto3 an "extra" requirement

boto3 is currently defined in the install_requires parameter of setup in setup.py. This result in the boto3 and botocore packages being added to lambda layers built using AWS SAM tools. These two packages use about 60 MB of layer storage space, a significant fraction of the AWS lambda layer size limit of 250 MB, although they are not required to be installed in the lambda layer since they are included in the lambda runtime environment.

Moving boto3 to the extras_require parameter of the setup function would prevent the addition of boto3 and botocore to lambda layers while allowing their installation using the 'boto' extra option.

Update Installation section of README and docs to reflect the changes in installation options.

Returned unprocessed items have incorrect format

The unprocessed items returned by the _put_items function embedded in the transactions.put_items function are not in the same format as the items passed to the function.

dynamo-pandas/dynamo_pandas/transactions/transactions.py

Lines 289 to 296 in 3a28921

 def _put_items(items, table=table): 

 response = client.batch_write_item( 

 RequestItems={table: [{"PutRequest": {"Item": item}} for item in items]} 

 ) 

 if response["UnprocessedItems"] != {}: 

 return response["UprocessedItems"][table] 

 else: 

 return []

The _put_item function expects a list of item dictionaries serialized with the serde.TypeSerializer.serialize() method whereas the function returns a list of dictionaries in the format {"PutRequest": {"Item": item}} where item is a serialized dictionary.

A correct implementation would be:

def _put_items(items, table=table):
        response = client.batch_write_item(
            RequestItems={table: [{"PutRequest": {"Item": item}} for item in items]}
        )
        if response["UnprocessedItems"] != {}:
            return [
                item["PutRequest"]["Item"]
                for item in response["UnprocessedItems"][table]
            ]
        else:
            return []

This bug is currently pass unit tests since the unprocessed items handling is not covered by tests (ref. #43).

moto mock_dynamodb2 is deprecated

moto's mock_dynamodb2 is deprecated and results in failing CI. Replace it with mock_dynamodb.

Release version 1.2.1

Release version 1.2.1 to make bug fixes from #45 available on PyPi.

Also add a CHANGELOG.md file to make tracking of changes easier.

Handling of unprocessed items from the client's batch_write_item function is not tested

The handling of the unprocessed items from the client's batch_write_item function called in transactions.put_items is not covered by unit tests. This can lead to bugs like #42 remaining unnoticed.

dynamo-pandas/dynamo_pandas/transactions/transactions.py

Lines 293 to 294 in 3a28921

 if response["UnprocessedItems"] != {}: 

 return response["UprocessedItems"][table]

Investivate whether mocking using moto can be used to return unprocessed items. Otherwise, potentially use a custom mock to return unprocessed items and ensure the whole function is covered by tests.

Timedelta string values cannot be converted with the dtype parameter

Timedelta string values stored in a table cannot be converted with the dtype parameter of the get_df and to_df functions or using the dataframe astype method. This is due to a known bug in pandas (ref.: pandas-dev/pandas#38509).

As a result, unit tests for the dtype parameter of the get_df and put_df function do not test this conversion. Once the pandas issue is resolved, this conversion can be added to the tests.

As a workaround, the Timedelta columns can be converted using pd.to_timedelta(df.column_name).

Move the keys function to the main module

Move the keys function from the transactions module to the main module.

Using the package in with the high level interface functions, a use should not have to import functions from sub modules. Since the keys function is meant as a helper function to keep the interface simple, it would make more sense to have it as part of the main module.

Align black version in pre-commit-config with requirements-dev.txt

#76 bumped black to 24.3.0 in requirements-dev.txt however the version in .pre-commit-config.yaml is still 22.3.0.

Align the two versions to avoid formatting conflicts.

Include testing on Windows platfrom in CI

Tests that pass on Ubuntu Linux fail on Windows 10 (ref. #12).

Include testing a subset of the CI test matrix on a Windows platform.

Remove support for Python 3.7 and add support for 3.11 & 3.12

Remove support for Python 3.7 and add support for 3.11 & 3.12. Ref. https://devguide.python.org/versions/.

Configure tox

Use tox to run unit test on different python version locally and in CI.

Code examples in README do not have syntax highlighting

Add the python language identifier to code examples in README.

Add high level transaction functions

Add high level transaction functions that integrate conversion and transactions in a single function call:

put_df(df, table) add/update all items from a dataframe.
get_df(keys, table) get specific items (or all if keys=None) from a table into a dataframe.

There is no test to ensure putting an item with an existing key updates the item

There is no test to ensure that putting an item with an existing key updates the item

Make to_items and to_df functions private

The to_items and to_df functions are simple functions that do not add much value to the API.

Make these functions private and remove the to_item function.

Unit tests failing on Windows platform

Most unit tests fail on Windows platform with the following exception:

AttributeError: module 'numpy' has no attribute 'float128'

Add setup.py

Add functions to convert DataFrame and Series to items dict and vice-versa

Add functions to convert pandas DataFrame and Series to items dict and vice-versa.

Examples (subject to modification):

to_items(df) to convert a dataframe to a list of dictionaries.
to_item(obj) to convert a single row dataframe or a series to a dictionary.
to_df(items, dtype=None) to convert a single or multiple items to a dataframe with optional data types.

Bad indentation in get_df docs dtype example

The dtype parameter in one of the examples in the get_df function docs is not properly indented.

dynamo-pandas/dynamo_pandas/dynamo_pandas.py

Lines 66 to 74 in 3e51320

  >>> df = get_df( 

  ... table="players", 

  ... keys=keys(player_id=["player_two", "player_four"]), 

  ... dtype={ 

  ... "bonus_points": "Int8", 

  ... "last_play": "datetime64[ns, UTC]", 

  ... # "play_time": "timedelta64[ns]" # See note below. 

  ... } 

  ... )

Boto3_kwargs parameter commit not part of latest release

Thanks for putting in the time to create this cool package. It has been really useful.

I was wondering if you could please create a new release. I am specifically after this commit with the boto3_kwargs parameter being added:
Add boto3_kwargs parameter

Thanks again.

Unprocessed keys in get_items are not handled correctly

In the transactions.get_items function, the unprocessed keys returned by the boto3.resource().batch_get_item() function are not handled correctly and the function is called again with all the original keys:

dynamo-pandas/dynamo_pandas/transactions/transactions.py

Lines 137 to 139 in 3a28921

 while response["UnprocessedKeys"] != {}: 

 response = resource.batch_get_item(RequestItems=_request(keys)) 

 items.extend(response["Responses"][table])

Also, this block of code is not covered by unit tests, preventing this bug from being reported in tests.

tox is missing from dev requirements

tox is used to automate execution of tests however it is not included in the dev requirements (requirements-dev.txt).

AWS configuration parameters cannot be overwritten

While AWS configuration parameters can be set via a config file or environment variables, there may be cases where these parameters need to be overwritten.

The current put_df, get_df and transactions module functions do not provide a mean to pass these parameters.

Adding a **kwargs argument to the different functions and passing it to the underlying boto3.client or boto3.resource function call would provide this functionality.

For examples, the get_df function signature would become:

def get_df(*, table, keys=None, attributes=None, dtype=None, **resource_kwargs):
    ...

filter with attribute value

Could we get dataframe from Dynamodb with filtering on attribute value? I know right now we can filter on keys, but not sure if we can filter on attribute value. Thank you.

Add put_items function in transactions module

Add put_items function in transactions module to allow adding/updating multiple items simultaneously.

Bad indentation in Overview documentation code example

The indentation of the dtype parameter and closing parenthesis in the get_df with dtype example in docs/overview.rst are incorrect:

df = get_df(
    table="players",
    keys=keys(player_id=["player_two", "player_four"]),
        dtype={
            "bonus_points": "Int8",
            "last_play": "datetime64[ns, UTC]",
            # "play_time": "timedelta64[ns]"  # See note below.
        }
    )

Should read:

df = get_df(
    table="players",
    keys=keys(player_id=["player_two", "player_four"]),
    dtype={
        "bonus_points": "Int8",
        "last_play": "datetime64[ns, UTC]",
        # "play_time": "timedelta64[ns]"  # See note below.
    }
)

The return value of transactions.get_item for a non-existent item is not tested

It is expected that the transactions.get_item function returns None if no item matching the specified key is found in the table.

There is currently no unit test to verify this behavior.

Tables with GSI & LSI?

Hi, firstly this package looks like it could really make my life easier, so thanks for putting the time in!
i'm not a dynamoDB expert, so sorry if this is a stupid error on my part.
I'm receiving a client error when working with 'get_df' on dynamo tables that have either GSI or LSI:
"An error occurred (ValidationException) when calling the BatchGetItem operation: The provided key element does not match the schema"

Following your examples, it's working for all tables that dont have a GSI or LSI, should i be using a different "keys" / query structure for those tables?

Test fixtures are failing with moto version 5

mock_dynamodb has been removed in moto version 5 and replaced with mock_aws.

error when calling get_df()

I defined boto3_agrs as a dictionary

boto3_args={}
boto3_args["endpoint_url"] = "http://localhost:8000"
boto3_args["aws_access_key_id"] = "fakeMyKeyId"
boto3_args["aws_secret_access_key"] = "fakeSecretAccessKey"

And tried to execute
df = get_df(table = "Employee", boto3_kwargs = boto3_args)

Errror: TypeError: get_df() got an unexpected keyword argument 'boto3_kwargs'

But when I checked th soruce code, the method signaure in dynamo_pandas.py is:
def get_df(*, table, keys=None, attributes=None, dtype=None, boto3_kwargs={}):

This does have keyword argument boto3_kwargs as a Key word argument.

Add parameter to select item attributes to get

Add parameter to select item attributes to get when calling the following functions:

get_df
transactions.get_all_items
transactions.get_item
transactions.get_items

The parameter would take a list of attribute names.

Example

>>> df = get_df(
...     table="players",
...     keys=[{"player_id": "player_three"}, {"player_id": "player_one"}],
...     attributes=["player_id", "play_time"],
... )
>>> print(df)
      player_id        play_time
0  player_three  1 days 14:01:19
1    player_one  2 days 17:41:55

Version number is not updated in release 1.1.0

Version number in release (tag) 1.1.0 is still 1.0.0.

Release version 1.0.0

Remove development notices from README.
Change version in __init__.py and docs/conf.py.

	def _put_items(items, table=table):
	response = client.batch_write_item(
	RequestItems={table: [{"PutRequest": {"Item": item}} for item in items]}
	)
	if response["UnprocessedItems"] != {}:
	return response["UprocessedItems"][table]
	else:
	return []

	>>> df = get_df(
	... table="players",
	... keys=keys(player_id=["player_two", "player_four"]),
	... dtype={
	... "bonus_points": "Int8",
	... "last_play": "datetime64[ns, UTC]",
	... # "play_time": "timedelta64[ns]" # See note below.
	... }
	... )

	while response["UnprocessedKeys"] != {}:
	response = resource.batch_get_item(RequestItems=_request(keys))
	items.extend(response["Responses"][table])

drgfreeman / dynamo-pandas Goto Github PK

dynamo-pandas's People

Contributors

Stargazers

Watchers

Forkers

dynamo-pandas's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs