GithubHelp home page GithubHelp logo

Comments (15)

tarioch avatar tarioch commented on September 21, 2024 1

I have two use cases that I'm using this.

In my import config I do

Duplicate detection using the default logic of beancount (compares things like dates, accounts, ...)

from myimporter import MyImporter
from smart_importer import apply_hooks, PredictPostings
from smart_importer.detector import DuplicateDetector

CONFIG = [
     apply_hooks(MyImporter(), [DuplicateDetector()]),
]

Duplicate detection using the my own logic (using a reference number I store in the metadata)

from myimporter import MyImporter
from smart_importer import apply_hooks, PredictPostings
from smart_importer.detector import DuplicateDetector

class ReferenceDuplicatesComparator:
    def __call__(self, entry1, entry2):
        return 'ref' in entry1.meta and 'ref' in entry2.meta and entry1.meta['ref'] == entry2.meta['ref']

CONFIG = [
     apply_hooks(MyImporter(), [DuplicateDetector(comparator=ReferenceDuplicatesComparator)]),
]

from smart_importer.

johannesjh avatar johannesjh commented on September 21, 2024

I think I don't understand the problem, can you elaborate please.

From what I remember, I think that fava does (or did in the past?) include duplicate checks: I remember duplicates being shown in fava's import GUI in a pale green color, with radio buttons on the right-hand side to manually set the status (duplicate or not). I think the duplicate check worked with and without smart_importers.

Did fava maybe include bean-extract's duplicate checking functionality in order to expose it in the import GUI in a usable way? I am just guessing here, but it may be worth investigation prior to creating a new solution.

from smart_importer.

tarioch avatar tarioch commented on September 21, 2024

If I add the meta duplicate then it will show up as duplicate but the default algorithm that bean-extract uses is not in place (basically detect similar transactions). What I now did was create a decorator which does exactly this, add the duplicate meta. @yagebu maybe you have some better insights.

from smart_importer.

tarioch avatar tarioch commented on September 21, 2024

See my changes here https://github.com/beancount/smart_importer/compare/feature/duplicates which make this work very nicely with fava.
I'm using the default algorithm in most cases and a different comparator (based on a reference id) in another case and both work perfectly fine with fava now.

from smart_importer.

tarioch avatar tarioch commented on September 21, 2024

Any further thoughts about this? I'm currently using this and works really well with fava. If you think it's not useful for someone else I'll move it to my personal repo.

from smart_importer.

mondjef avatar mondjef commented on September 21, 2024

could this same approach be used to provide fava with the source metadata key?

from smart_importer.

tarioch avatar tarioch commented on September 21, 2024

Normally I'm setting source metadata directly in the importer as this is custom to each importer.

from smart_importer.

johannesjh avatar johannesjh commented on September 21, 2024

I looked up Fava's source code and found what I suspected already: Fava already includes duplicate detection functionality. See the following source files:

In conclusion: Since fava and beancount come with duplicate detection out-of-the-box, I don't quite understand why the DuplicateDetector would be needed.

PS, @tarioch :
Does your DuplicateDetector decorator cover use cases that are not covered by beancount's default duplicate detection mechanism? Is this why you prefer the custom decorator over the default solution?

from smart_importer.

tarioch avatar tarioch commented on September 21, 2024

Actually extract_from_file does not call the duplicate detection logic, only extract (and as you wrote fava calls extract_from_file not extract). So that's what the DuplicateDetector does, it calls that logic and then sets the __duplicate__ meta data so it will be correctly displayed in fava.

The thing that DuplicateDetector supports is it allows to customize the comparison function (the DuplicateDetector is only a simple wrapper to instantiate/configure the core beancount logic)

from smart_importer.

yagebu avatar yagebu commented on September 21, 2024

@tarioch: Sorry for not commenting any earlier. On first glance I like it - when restructuring the decorators, I was hoping that it would enable this kind of application.

Maybe the class could already take arguments to allow for some basic cases of "custom" duplicate detection (i.e. by a meta key). Right now it just provides the same duplicate detection that Beancount provides, right?

from smart_importer.

tarioch avatar tarioch commented on September 21, 2024

Yes it's just reusing the logic from Beancount. The good thing is that this logic was already made configurable. So for the comparison using the key my implementation is simply

class ReferenceDuplicatesComparator:
    def __call__(self, entry1, entry2):
        return 'ref' in entry1.meta and 'ref' in entry2.meta and entry1.meta['ref'] == entry2.meta['ref']

MyImporter = DuplicateDetector(comparator=ReferenceDuplicatesComparator())(OriginalImporter)

from smart_importer.

johannesjh avatar johannesjh commented on September 21, 2024

Ok, I suggest let's merge this. It adds extra configurability and it showcases another usecase for the decorators. Thank you @tarioch

from smart_importer.

Ramblurr avatar Ramblurr commented on September 21, 2024

It would be great to see an example of how to use this.

I have a bog standard importer and want to add duplicate check functionality when importing through fava.

from smart_importer.

Ramblurr avatar Ramblurr commented on September 21, 2024

Thanks @tarioch, comparing ref meta data is a great idea.

If i want to use both ReferenceDuplicatesComparator and the default behavior of DuplicateDetector, should i pass two instances of DuplicateDetector?
That is, something like:

from myimporter import MyImporter
from smart_importer import apply_hooks, PredictPostings
from smart_importer.detector import DuplicateDetector

class ReferenceDuplicatesComparator:
    def __call__(self, entry1, entry2):
        return 'ref' in entry1.meta and 'ref' in entry2.meta and entry1.meta['ref'] == entry2.meta['ref']

CONFIG = [
     apply_hooks(MyImporter(), [DuplicateDetector(), DuplicateDetector(comparator=ReferenceDuplicatesComparator)]),
]

from smart_importer.

tarioch avatar tarioch commented on September 21, 2024

Yes, that will work. In the end all the duplicate dector does is set a special metadata key called 'duplicate' to True, this is read both by fava as well as the standard beancount utils.

from smart_importer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.