I'm currently looking at fava again for handling my import process. What I noticed

See my changes here <a href="https://github.com/beancount/smart_importer/compare/featu

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Add duplicate detection functionality about smart_importer HOT 15 CLOSED

beancount commented on September 21, 2024

Add duplicate detection functionality

from smart_importer.

Comments (15)

tarioch commented on September 21, 2024 1

I have two use cases that I'm using this.

In my import config I do

Duplicate detection using the default logic of beancount (compares things like dates, accounts, ...)

from myimporter import MyImporter
from smart_importer import apply_hooks, PredictPostings
from smart_importer.detector import DuplicateDetector

CONFIG = [
     apply_hooks(MyImporter(), [DuplicateDetector()]),
]

Duplicate detection using the my own logic (using a reference number I store in the metadata)

from myimporter import MyImporter
from smart_importer import apply_hooks, PredictPostings
from smart_importer.detector import DuplicateDetector

class ReferenceDuplicatesComparator:
    def __call__(self, entry1, entry2):
        return 'ref' in entry1.meta and 'ref' in entry2.meta and entry1.meta['ref'] == entry2.meta['ref']

CONFIG = [
     apply_hooks(MyImporter(), [DuplicateDetector(comparator=ReferenceDuplicatesComparator)]),
]

from smart_importer.

johannesjh commented on September 21, 2024

I think I don't understand the problem, can you elaborate please.

From what I remember, I think that fava does (or did in the past?) include duplicate checks: I remember duplicates being shown in fava's import GUI in a pale green color, with radio buttons on the right-hand side to manually set the status (duplicate or not). I think the duplicate check worked with and without smart_importers.

Did fava maybe include bean-extract's duplicate checking functionality in order to expose it in the import GUI in a usable way? I am just guessing here, but it may be worth investigation prior to creating a new solution.

from smart_importer.

tarioch commented on September 21, 2024

If I add the meta duplicate then it will show up as duplicate but the default algorithm that bean-extract uses is not in place (basically detect similar transactions). What I now did was create a decorator which does exactly this, add the duplicate meta. @yagebu maybe you have some better insights.

from smart_importer.

tarioch commented on September 21, 2024

See my changes here https://github.com/beancount/smart_importer/compare/feature/duplicates which make this work very nicely with fava.
I'm using the default algorithm in most cases and a different comparator (based on a reference id) in another case and both work perfectly fine with fava now.

from smart_importer.

tarioch commented on September 21, 2024

Any further thoughts about this? I'm currently using this and works really well with fava. If you think it's not useful for someone else I'll move it to my personal repo.

from smart_importer.

mondjef commented on September 21, 2024

could this same approach be used to provide fava with the source metadata key?

from smart_importer.

tarioch commented on September 21, 2024

Normally I'm setting source metadata directly in the importer as this is custom to each importer.

from smart_importer.

johannesjh commented on September 21, 2024

I looked up Fava's source code and found what I suspected already: Fava already includes duplicate detection functionality. See the following source files:

https://github.com/beancount/fava/blob/master/fava/core/ingest.py#L83 calls beancount's extract_from_file method. In the method call, fava passes existing entries to allow for duplicate detection.
https://github.com/beancount/beancount/blob/master/beancount/ingest/extract.py#L46 beancount's extract_from_file method does indeed check for duplicates in existing entries.
https://github.com/beancount/fava/blob/master/fava/templates/extract.html#L21 fava's gui displays whether an entry is a (likely) duplicate, based on the __duplicate__ meta data.

In conclusion: Since fava and beancount come with duplicate detection out-of-the-box, I don't quite understand why the DuplicateDetector would be needed.

PS, @tarioch :
Does your DuplicateDetector decorator cover use cases that are not covered by beancount's default duplicate detection mechanism? Is this why you prefer the custom decorator over the default solution?

from smart_importer.

tarioch commented on September 21, 2024

Actually extract_from_file does not call the duplicate detection logic, only extract (and as you wrote fava calls extract_from_file not extract). So that's what the DuplicateDetector does, it calls that logic and then sets the __duplicate__ meta data so it will be correctly displayed in fava.

The thing that DuplicateDetector supports is it allows to customize the comparison function (the DuplicateDetector is only a simple wrapper to instantiate/configure the core beancount logic)

from smart_importer.

yagebu commented on September 21, 2024

@tarioch: Sorry for not commenting any earlier. On first glance I like it - when restructuring the decorators, I was hoping that it would enable this kind of application.

Maybe the class could already take arguments to allow for some basic cases of "custom" duplicate detection (i.e. by a meta key). Right now it just provides the same duplicate detection that Beancount provides, right?

from smart_importer.

tarioch commented on September 21, 2024

Yes it's just reusing the logic from Beancount. The good thing is that this logic was already made configurable. So for the comparison using the key my implementation is simply

class ReferenceDuplicatesComparator:
    def __call__(self, entry1, entry2):
        return 'ref' in entry1.meta and 'ref' in entry2.meta and entry1.meta['ref'] == entry2.meta['ref']

MyImporter = DuplicateDetector(comparator=ReferenceDuplicatesComparator())(OriginalImporter)

from smart_importer.

johannesjh commented on September 21, 2024

Ok, I suggest let's merge this. It adds extra configurability and it showcases another usecase for the decorators. Thank you @tarioch

from smart_importer.

Ramblurr commented on September 21, 2024

It would be great to see an example of how to use this.

I have a bog standard importer and want to add duplicate check functionality when importing through fava.

from smart_importer.

Ramblurr commented on September 21, 2024

Thanks @tarioch, comparing ref meta data is a great idea.

If i want to use both ReferenceDuplicatesComparator and the default behavior of DuplicateDetector, should i pass two instances of DuplicateDetector?
That is, something like:

from myimporter import MyImporter
from smart_importer import apply_hooks, PredictPostings
from smart_importer.detector import DuplicateDetector

class ReferenceDuplicatesComparator:
    def __call__(self, entry1, entry2):
        return 'ref' in entry1.meta and 'ref' in entry2.meta and entry1.meta['ref'] == entry2.meta['ref']

CONFIG = [
     apply_hooks(MyImporter(), [DuplicateDetector(), DuplicateDetector(comparator=ReferenceDuplicatesComparator)]),
]

from smart_importer.

tarioch commented on September 21, 2024

Yes, that will work. In the end all the duplicate dector does is set a special metadata key called 'duplicate' to True, this is read both by fava as well as the standard beancount utils.

from smart_importer.

Add duplicate detection functionality about smart_importer HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs