GithubHelp home page GithubHelp logo

Comments (2)

uschtwill avatar uschtwill commented on July 30, 2024 1

@isensee-bastian I've been thinking about this, and it could really be the killer feature of the kraken. I think many people must be just as annoyed with all the duplicates on these platforms, as we are. And a way to cut through that noise would probably add significant value and drive some adoption.

I've also been thinking about how to do it...

  • Given that we do have a real DB (#8)
  • We could actually visit the URL to the listing, which we are so far only copying (the 'href')
  • We then copy the listing's text (introducing a new per-strategy selector)
  • Load into memory all listings of the last two days (from the database)
  • Calculate text similarity values for the listing at hand versus all the listings in memory
  • If none of the indices is higher than the threshold, write it to the DB as a new entry
  • If it passes the threshold and is thus identified as the same listing, amend it to an "also-seen-on" list/array-field on the already existing entry

This would be nice for the board notifiers/UIs, because you get a list of sites, where this listing has been posted. This allows you to then go to and apply on the site you like most (e.g. freelance.de instead of freelancermap.de). If we just drop duplicates silently, then we lose the value of this information.

On the other hand, this is something that only works for the board UIs. We'd have to see how we can wring the most value out of this for notifiers like Telegram or Slack.

from re-employment-kraken.

isensee-bastian avatar isensee-bastian commented on July 30, 2024 1

@uschtwill I totally agree. Manually identifying duplicates requires time and mental energy that could be spent in a better way.

The steps you listed make total sense. I think it is a good blueprint to follow for implementation.

About the tracking of duplicates: You are right about the value of storing multiple URLs per project. It helps to have a choice for applying. Moreover, by seeing the number of duplicates, one can roughly estimate how high or low the chances are for winning a project.
True, we need a different concept for notifying about duplicates in messengers vs board apps.

from re-employment-kraken.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.