mfonda / simhash Goto Github PK
View Code? Open in Web Editor NEWGo implementation of simhash algoritim
License: MIT License
Go implementation of simhash algoritim
License: MIT License
Does not support the Chinese ?
How can I take the distance and compute a percentage of similarity? For instance, if given this example
a := []byte("this is a test for results")
aHash := simhash.Simhash(simhash.NewWordFeatureSet(a))
b := []byte("this is a test for cats")
bHash := simhash.Simhash(simhash.NewWordFeatureSet(b))
c := simhash.Compare(aHash, bHash)
fmt.Println(c)
I get an output of 7. But I would like to see that these are 90% similar (or whatever the exact amount is). Thank you.
source str:
at.nk.tools.iTranslate com.UCMobile.intl com.alkitabku.android com.bca com.bercodingstudio.edukasianakcerdas com.cmcm.uangme com.commsource.beautyplus com.dealjava.android com.dwidasa.bjtm.mb.android com.engloryintertech.caping com.facebook.katana com.facebook.orca com.facemoji.lite.xiaomi com.fundevs.app.mediaconverter com.gojek.app com.grabtaxi.passenger com.instagram.android com.isaku.app com.lazada.android com.maxstream com.mediatek.webview com.mfashiongallery.emag com.mi.global.bbs com.microsoft.office.excel com.microsoft.office.outlook com.microsoft.office.powerpoint com.microsoft.office.word com.miui.android.fashiongallery com.miui.enbbs com.miui.virtualsim com.opera.mini.native com.oyo.consumer com.paypal.android.p2pmobile com.picmix.mobile com.skype.raider com.smartfren com.telkomsel.telkomselcm com.tencent.ibg.joox com.tiket.keretaapi com.tokopedia.tkpd com.uzmafariz.laguanakindonesiainggris com.vidio.android com.whatsapp com.whatsapp.wallpaper devian.tubemate.home dgeo.gizi.pro id.co.shopintar jp.naver.line.android me.msqrd.android ovo.id 0.0.1 1.0.14 1.1.4 1.2.4 1.2.6 1.4.2 1.4.4 1.5.9 1.54 1.7.1 1.8.3 12.9.5.1146 16.0.7830.1012 16.0.8201.1009 16.0.8201.1009 2 2.0.0 2.0.2 2.1.201 2.1.9.4 2.18.380 2.4.1 2.4.9 2.7.1 201.0.0.12.99 205.0.0.27.113 3.16 3.19.2 3.2.3 38.0.2254.134507 4.0.1 4.1.0 4.1.1 4.1.6-global 4.1.6.2 4.5.4 5.0.2 5.1.9 5.27.0 58.0.3029.125 (alps-mp-o1.mp6-1-arm) 6.0.3 6.23.1 7.0.030 7.18.0.507 7.3.1 7.8.2 79.0.0.21.101 8.19.2 M717052399-G-F V6-G-180731
co.modalinlah.md com.apelbox.jirakoyucme com.bukalapak.android com.cmcm.uangme com.dana.cepat.online com.easyuangpro.csdagowmla com.facebook.katana com.facebook.orca com.fintopia.idnEasycash.google com.gojek.app com.google.android.apps.youtube.mango com.google.android.instantapps.supervisor com.grabtaxi.passenger com.jetstartgames.chess com.kampungkredit com.king.candycrushsaga com.kreditqksp.kreditq com.ktakilat.loan com.lazada.android com.modalnasional.fintech com.ovexpn.ddlbyi.ijayaa com.paragisoft.gaple com.rioo.runnersubway com.spotify.music com.stanfordtek.pinjamduit com.tokopedia.tkpd com.traveloka.android com.tunaikita.cashloan com.whatsapp id.co.myhomecredit io.silvrr.installment jsifaxs.snuspohon.jshfpn oc.hbd.bqjp.detisolution phone.cleaner.speed.booster.cache.clean.android.master zid.cgaium.dzdl.kantongsukses 1.0.4 1.1.0 1.1.1 1.1.29 1.1.8 1.11.0 1.144.0.1 1.17.0 1.3 1.41.58 1.5.9 1.7.7 12.0.0 193.0.0.21.98 198.0.0.53.101 2.0.0 2.0.0 2.0.2 2.0.9 2.19.34 2.2.9 2.5.1 2.6.6 2.8.3 3.0.7 3.19 3.22.1 3.7.0 4.09-release-lmp-234792502 4.35.3 5.19.1 50.0.0 6.26.4 7.0.0 8.4.94.817
hash:
12638153115695167470
12638153115695167470
100% match
and
Is the similarity rate : (64-step) / 64?
my test data:
text1: "没有做苦工的,那么学霸就来做苦工"
text2: "后脚龙族幻想恰烂钱"
result:
simhash1:ffffffffffffffff
simhash2:ffffffffffffffff
When I trying to go get
your code, it shows me like this:
unrecognized import path "golang.org/x/text/unicode/norm"
How can I fix it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.