GithubHelp home page GithubHelp logo

pilarhidalgo / find_span Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 69 KB

Esta herramienta permite englobar un conjunto de palabras (patterns) asignadas a un SPAN o categoría asignable. El uso de de esta herramienta es multiple, desde identificar sinónimos, corrector de ortografía, hasta etiquetado de palabras en categorías como Persona, Organización, Lugar, lo único que necesitas es un diccionario donde para generar los SPAN.

License: GNU General Public License v3.0

Jupyter Notebook 19.74% Python 80.26%

find_span's People

Contributors

pilarhidalgo avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

find_span's Issues

Avoid reading patterns and creating nlp model on each call using OOP

There is an apportunity for a boost in performance (~500x faster) changing from functional to OOP.

image

This is a code draft for the change:

class SpanFinder:
    def __init__(self, io, sheet_name):
        self.io = io
        self.sheet_name = sheet_name
        self.patterns = self.read_patterns()
        self.nlp = self.set_nlp()
        self.trans = self.make_trans()
        
    def read_patterns(self):
        df = pd.read_excel(self.io, sheet_name=self.sheet_name)#puedes tener varios sheets con diferentes diccionarios en una sola hoja excel
        #creardo diccionarios en la estructura que Spacy requiere
        df1 = df[['label', 'pattern']]
        df2 = df[['label', 'pattern.1']]
        df2 = df2.rename(columns={'pattern.1': 'pattern'}, inplace=False)
        #Crear los diccionarios #si los nombres del conjunto SPAN son mas de 2 es necesario tambien implementar 
        #mas patterns
        patterns1 = df1.to_dict(orient='records')
        patterns2 = df2.to_dict(orient='records')
        #patternsn=dfn.to_dict(orient='records')
        patterns = patterns1 + patterns2
        return patterns
    
    def set_nlp(self):
        #Crear el objeto NLP
        nlp = Spanish()
        ruler = EntityRuler(nlp)
        ruler.add_patterns(self.patterns)
        nlp.add_pipe(ruler)
        return nlp
    
    def make_trans(self):
        #Preprocesar la entrada
        a,b = 'áéíóúü','aeiouu'
        trans = str.maketrans(a,b)
        return trans
    
    def find_span(self, sentence):
        sentence=sentence.lower().translate(self.trans)
        doc = self.nlp(sentence)
        #Devuelve la etiqueta real de la entidad
        if (sentence=='' or  sentence==' ' or sentence=='Por definir' or sentence=='por definir' or
           sentence=='no hay datos ' or sentence=='nan'):
            label=['No encontrado']#con esto evitaremos el rpoblema de comillas
        else: 
            label= list(set([(ent.text,ent.label_) for ent in doc.ents]))
        #label no establecida
        if label!=[]:
            label=label
        else: label=['No encontrado']#con esto evitaremos el problema de comillas
            
        return label[0]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.