Comments (15)
I have two use cases that I'm using this.
In my import config I do
Duplicate detection using the default logic of beancount (compares things like dates, accounts, ...)
from myimporter import MyImporter
from smart_importer import apply_hooks, PredictPostings
from smart_importer.detector import DuplicateDetector
CONFIG = [
apply_hooks(MyImporter(), [DuplicateDetector()]),
]
Duplicate detection using the my own logic (using a reference number I store in the metadata)
from myimporter import MyImporter
from smart_importer import apply_hooks, PredictPostings
from smart_importer.detector import DuplicateDetector
class ReferenceDuplicatesComparator:
def __call__(self, entry1, entry2):
return 'ref' in entry1.meta and 'ref' in entry2.meta and entry1.meta['ref'] == entry2.meta['ref']
CONFIG = [
apply_hooks(MyImporter(), [DuplicateDetector(comparator=ReferenceDuplicatesComparator)]),
]
from smart_importer.
I think I don't understand the problem, can you elaborate please.
From what I remember, I think that fava does (or did in the past?) include duplicate checks: I remember duplicates being shown in fava's import GUI in a pale green color, with radio buttons on the right-hand side to manually set the status (duplicate or not). I think the duplicate check worked with and without smart_importers.
Did fava maybe include bean-extract's duplicate checking functionality in order to expose it in the import GUI in a usable way? I am just guessing here, but it may be worth investigation prior to creating a new solution.
from smart_importer.
If I add the meta duplicate then it will show up as duplicate but the default algorithm that bean-extract uses is not in place (basically detect similar transactions). What I now did was create a decorator which does exactly this, add the duplicate meta. @yagebu maybe you have some better insights.
from smart_importer.
See my changes here https://github.com/beancount/smart_importer/compare/feature/duplicates which make this work very nicely with fava.
I'm using the default algorithm in most cases and a different comparator (based on a reference id) in another case and both work perfectly fine with fava now.
from smart_importer.
Any further thoughts about this? I'm currently using this and works really well with fava. If you think it's not useful for someone else I'll move it to my personal repo.
from smart_importer.
could this same approach be used to provide fava with the source metadata key?
from smart_importer.
Normally I'm setting source metadata directly in the importer as this is custom to each importer.
from smart_importer.
I looked up Fava's source code and found what I suspected already: Fava already includes duplicate detection functionality. See the following source files:
- https://github.com/beancount/fava/blob/master/fava/core/ingest.py#L83 calls beancount's
extract_from_file
method. In the method call, fava passes existing entries to allow for duplicate detection. - https://github.com/beancount/beancount/blob/master/beancount/ingest/extract.py#L46 beancount's
extract_from_file
method does indeed check for duplicates in existing entries. - https://github.com/beancount/fava/blob/master/fava/templates/extract.html#L21 fava's gui displays whether an entry is a (likely) duplicate, based on the
__duplicate__
meta data.
In conclusion: Since fava and beancount come with duplicate detection out-of-the-box, I don't quite understand why the DuplicateDetector would be needed.
PS, @tarioch :
Does your DuplicateDetector
decorator cover use cases that are not covered by beancount's default duplicate detection mechanism? Is this why you prefer the custom decorator over the default solution?
from smart_importer.
Actually extract_from_file does not call the duplicate detection logic, only extract (and as you wrote fava calls extract_from_file not extract). So that's what the DuplicateDetector does, it calls that logic and then sets the __duplicate__
meta data so it will be correctly displayed in fava.
The thing that DuplicateDetector
supports is it allows to customize the comparison function (the DuplicateDetector
is only a simple wrapper to instantiate/configure the core beancount logic)
from smart_importer.
@tarioch: Sorry for not commenting any earlier. On first glance I like it - when restructuring the decorators, I was hoping that it would enable this kind of application.
Maybe the class could already take arguments to allow for some basic cases of "custom" duplicate detection (i.e. by a meta key). Right now it just provides the same duplicate detection that Beancount provides, right?
from smart_importer.
Yes it's just reusing the logic from Beancount. The good thing is that this logic was already made configurable. So for the comparison using the key my implementation is simply
class ReferenceDuplicatesComparator:
def __call__(self, entry1, entry2):
return 'ref' in entry1.meta and 'ref' in entry2.meta and entry1.meta['ref'] == entry2.meta['ref']
MyImporter = DuplicateDetector(comparator=ReferenceDuplicatesComparator())(OriginalImporter)
from smart_importer.
Ok, I suggest let's merge this. It adds extra configurability and it showcases another usecase for the decorators. Thank you @tarioch
from smart_importer.
It would be great to see an example of how to use this.
I have a bog standard importer and want to add duplicate check functionality when importing through fava.
from smart_importer.
Thanks @tarioch, comparing ref meta data is a great idea.
If i want to use both ReferenceDuplicatesComparator
and the default behavior of DuplicateDetector
, should i pass two instances of DuplicateDetector
?
That is, something like:
from myimporter import MyImporter
from smart_importer import apply_hooks, PredictPostings
from smart_importer.detector import DuplicateDetector
class ReferenceDuplicatesComparator:
def __call__(self, entry1, entry2):
return 'ref' in entry1.meta and 'ref' in entry2.meta and entry1.meta['ref'] == entry2.meta['ref']
CONFIG = [
apply_hooks(MyImporter(), [DuplicateDetector(), DuplicateDetector(comparator=ReferenceDuplicatesComparator)]),
]
from smart_importer.
Yes, that will work. In the end all the duplicate dector does is set a special metadata key called 'duplicate' to True, this is read both by fava as well as the standard beancount utils.
from smart_importer.
Related Issues (20)
- pylint fails, tox depends on wrong version of beancount? HOT 3
- Expose ImporterHook interface HOT 4
- Allow custom getters for attributes HOT 2
- Predict postings with amounts HOT 3
- Predict metadata (and tags) HOT 3
- Package working with python 3.6.12 but not with python 3.9 HOT 26
- ImportError undefined symbol _PyGen_Send HOT 2
- Unable to use smart_importer with beangulp Importers HOT 8
- release a v0.4 to pypi HOT 2
- How to pass in existing entries as training data other than the current bean file HOT 10
- Using `file_account()` to filter training data doesn't work for certain use cases HOT 4
- TAGS HOT 2
- Predictions might lead to original posting being switched to 2nd place after prediction HOT 1
- Smart importer give duplicate asset postings
- Checker for potential mistakes HOT 1
- Passing too many arguments to extract() HOT 1
- I can't manage to have smart_importer working HOT 6
- Remove suggestions because suggest=True adds list as metadata which is invalid HOT 7
- Bypass specific transactions. HOT 4
- please provide the hook helpers as an own package HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from smart_importer.