Comments (7)
Great idea, I second you on this. Here are a few points of our IRL discussion
As most end users would either only use the existing attributes (negated
, value
, ...) and not write new ones, we should prefer a simpler notation as it is currently the case, under the base _
getter, but limiting the number of attributes. Therefore as suggested, we could use:
span._.value
for dates, measures, scores, concepts, ...,- and
span._.negation
,span._.rspeech
, etc for more syntax-based attributes.
As the norm
attribute is now strongly used throughout the lib as a depolluted ascii version of the text, changing it would probably mean refactoring most of the code. To generate semantically normalized text, we can use the __str__
and __repr__
methods on the generic value
attribute.
Following your dates
revamp, the _.value
extension could inherit of a pydantic.BaseModel
like:
class EDSValue(BaseModel):
def __str__(self): ...
def __repr__(self): ...
...
class Date(EDSValue):
...
class Measure(EDSValue):
...
class Concept(EDSValue):
...
from edsnlp.
@percevalw, @Thomzoy, @Aremaki, I'd love to get your thoughts on this!
from edsnlp.
Sounds good! 🎉
from edsnlp.
As discussed, here is a potential solution that standardizes the current architecture for custom extensions @Thomzoy @aricohen93
Each component can create a Span extension named after the label of the entities it creates:
eds.adicap
creates entities labeledadicap
, and adds anent._.adicap
extension containing the decoding informationeds.tnm
creates entities labeledtnm
, and adds anent._.tnm
extensioneds.drugs
creates entities labeleddrug
, and adds anent._.drug
extension- ...
A specific ._.value
extension is defined as an aggregator and retrieves the field associated with the label via a getter such that ent._.value == getattr(ent._, ent.label_)
. The str representation of ._.value
could be the one displayed in the demonstrator.
This way, we can keep a consistent typing of each extension (tnm
-> TNMScore
, adicap
-> AdicapCode
, date
-> Date
, ...), while offering a unique entry point for some use cases via the value extension.
This does not prevent to define other extensions if needed, or to keep the old entity extensions and deprecate them in future versions.
from edsnlp.
from edsnlp.
Following the discussion with @Thomzoy, we carry on with the approach commented above:
- each pipe defines the extensions it needs (negation, scores, etc)
- the extensions related to a normalized value should be named with the
label_
of the entity extracted (if any) - the
value
extensions is defined as the following getter:lambda span: span._.get(span.label_, default=None)
. Having multiple extensions and an aggregator extension allows multiple pipes to modify a single entity, and to prioritize the normalized value of the entity by setting its label — for instance, to choose between the extraction ofeds.drugs
(label =drug
), andeds.umls
(label =umls
) — without loosing information between pipes - the normalized extension can be anything: an int, a bool, a string, an object, depending on the complexity of the extraction, and should implement the equality operator such that any
span1._.value == span2._.value
test runs
For instance, the following (non-exhaustive) modifications should be made:
- dates: the dates will be labelled as
date
, to match thedate
extension (instead ofabsolute
/relative
since this info is already stored in thespan._.date
object) - measurements: the label of the extracted measurements becomes
measurement
, and the previous label (e.g.eds.weight
is added to the normalizedspan._.measurement
object - consultation_dates: the label of the
consultation_dates
spans will becomeconsultation_date
and theconsultation_date
extension will be the extracted date - tables: labelled as
table
andspan._.table
is a getter tospan._.to_pd_table(as_values=True)
- umls: labelled as
umls
(this is already the case) and changespan._.umls
to a newUMLSConcept(id=the cui, sty=the semantic type)
object
...
from edsnlp.
These suggestions have been integrated in #213
from edsnlp.
Related Issues (20)
- Harmonize processing utils
- UMLS matching HOT 1
- Adicap : enhancement of regex to match local spelling HOT 9
- Feature request: [feature] Feddback annotation EDS-TeVa from Emmanuelle
- Reason for pydantic <1.10.0 HOT 2
- Feature request: extract date format
- TNM doesn't match regex in sentence HOT 3
- Termination improvement: Support for newline character HOT 1
- Sentencizer cut codes in different sentences while it's the same token HOT 1
- Feature request: IAM system
- Feature request: [feature] HOT 1
- Adding terminologies for ATC code A05AA02 to drugs.json
- Recognised entitites assigned values are shifted left by n tokens
- Feature request: Unify span getters / setters HOT 3
- expose ignore_exluded option
- Installation issues on mac M1/M2 with python 3.9 HOT 2
- Refactor the parallelization utils HOT 1
- Feature request: section extension HOT 1
- Correct eds.history pipeline to distinguish "medical history" from "history of current disease" HOT 2
- Feature request: relieve constraints on non edsnlp custom attributes HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from edsnlp.