GithubHelp home page GithubHelp logo

simhash's People

Contributors

mfonda avatar yukinying avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simhash's Issues

Calculate percentage similar

How can I take the distance and compute a percentage of similarity? For instance, if given this example

	a := []byte("this is a test for results")
	aHash := simhash.Simhash(simhash.NewWordFeatureSet(a))

	b := []byte("this is a test for cats")
	bHash := simhash.Simhash(simhash.NewWordFeatureSet(b))

	c := simhash.Compare(aHash, bHash)
	fmt.Println(c)

I get an output of 7. But I would like to see that these are 90% similar (or whatever the exact amount is). Thank you.

The results are not accurate

source str:

at.nk.tools.iTranslate com.UCMobile.intl com.alkitabku.android com.bca com.bercodingstudio.edukasianakcerdas com.cmcm.uangme com.commsource.beautyplus com.dealjava.android com.dwidasa.bjtm.mb.android com.engloryintertech.caping com.facebook.katana com.facebook.orca com.facemoji.lite.xiaomi com.fundevs.app.mediaconverter com.gojek.app com.grabtaxi.passenger com.instagram.android com.isaku.app com.lazada.android com.maxstream com.mediatek.webview com.mfashiongallery.emag com.mi.global.bbs com.microsoft.office.excel com.microsoft.office.outlook com.microsoft.office.powerpoint com.microsoft.office.word com.miui.android.fashiongallery com.miui.enbbs com.miui.virtualsim com.opera.mini.native com.oyo.consumer com.paypal.android.p2pmobile com.picmix.mobile com.skype.raider com.smartfren com.telkomsel.telkomselcm com.tencent.ibg.joox com.tiket.keretaapi com.tokopedia.tkpd com.uzmafariz.laguanakindonesiainggris com.vidio.android com.whatsapp com.whatsapp.wallpaper devian.tubemate.home dgeo.gizi.pro id.co.shopintar jp.naver.line.android me.msqrd.android ovo.id 0.0.1 1.0.14 1.1.4 1.2.4 1.2.6 1.4.2 1.4.4 1.5.9 1.54 1.7.1 1.8.3 12.9.5.1146 16.0.7830.1012 16.0.8201.1009 16.0.8201.1009 2 2.0.0 2.0.2 2.1.201 2.1.9.4 2.18.380 2.4.1 2.4.9 2.7.1 201.0.0.12.99 205.0.0.27.113 3.16 3.19.2 3.2.3 38.0.2254.134507 4.0.1 4.1.0 4.1.1 4.1.6-global 4.1.6.2 4.5.4 5.0.2 5.1.9 5.27.0 58.0.3029.125 (alps-mp-o1.mp6-1-arm) 6.0.3 6.23.1 7.0.030 7.18.0.507 7.3.1 7.8.2 79.0.0.21.101 8.19.2 M717052399-G-F V6-G-180731
co.modalinlah.md com.apelbox.jirakoyucme com.bukalapak.android com.cmcm.uangme com.dana.cepat.online com.easyuangpro.csdagowmla com.facebook.katana com.facebook.orca com.fintopia.idnEasycash.google com.gojek.app com.google.android.apps.youtube.mango com.google.android.instantapps.supervisor com.grabtaxi.passenger com.jetstartgames.chess com.kampungkredit com.king.candycrushsaga com.kreditqksp.kreditq com.ktakilat.loan com.lazada.android com.modalnasional.fintech com.ovexpn.ddlbyi.ijayaa com.paragisoft.gaple com.rioo.runnersubway com.spotify.music com.stanfordtek.pinjamduit com.tokopedia.tkpd com.traveloka.android com.tunaikita.cashloan com.whatsapp id.co.myhomecredit io.silvrr.installment jsifaxs.snuspohon.jshfpn oc.hbd.bqjp.detisolution phone.cleaner.speed.booster.cache.clean.android.master zid.cgaium.dzdl.kantongsukses 1.0.4 1.1.0 1.1.1 1.1.29 1.1.8 1.11.0 1.144.0.1 1.17.0 1.3 1.41.58 1.5.9 1.7.7 12.0.0 193.0.0.21.98 198.0.0.53.101 2.0.0 2.0.0 2.0.2 2.0.9 2.19.34 2.2.9 2.5.1 2.6.6 2.8.3 3.0.7 3.19 3.22.1 3.7.0 4.09-release-lmp-234792502 4.35.3 5.19.1 50.0.0 6.26.4 7.0.0 8.4.94.817

hash:

12638153115695167470
12638153115695167470

100% match

and
Is the similarity rate : (64-step) / 64?

Unable to calculate Chinese text

my test data:
text1: "没有做苦工的,那么学霸就来做苦工"
text2: "后脚龙族幻想恰烂钱"
result:
simhash1:ffffffffffffffff
simhash2:ffffffffffffffff

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.