bitlap / geocoding Goto Github PK
View Code? Open in Web Editor NEW:globe_with_meridians: 地理编码技术,提供地址标准化和相似度计算。
License: MIT License
:globe_with_meridians: 地理编码技术,提供地址标准化和相似度计算。
License: MIT License
maven配置可参考 bitlap/bitlap 项目
请问大佬能发布java版本的源码吗?想向您学习一下
大佬好,在解析 “重庆市开州区南门镇” 时,发现目前的 region.dat 的地址信息比较老,没有 开州区 这个区。
想问一下大佬 region.dat 这个文件是我们自己来维护吗,还是互联网上就能获取到呢?如果从互联网能获取的话能麻烦发一下链接吗?谢谢
去除后期出现的更高级的信息. 会大幅提升相似度, 作者大大能优化一些这种情况吗?
String t1 = "海南省海口市灵山镇海榆大道4号绿地城.润园海口市灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)";
String t2 = "海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203";
结果:
海南省海口市灵山镇海榆大道4号绿地城.润园海口市灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)
addr1 >>>> Address(
provinceId=460000000000, province=海南省,
cityId=460100000000, city=海口市,
districtId=460108000000, district=美兰区,
streetId=460108101000, street=灵山镇,
townId=460108101000, town=灵山镇,
villageId=null, village=null,
road=null,
roadNum=null,
buildingNum=A-32,
text=西片去旧改项目地块11#楼22203栋单元层号
)
>>>>>>>>>>>>>>>>>
海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203
addr2 >>>> Address(
provinceId=460000000000, province=海南省,
cityId=460100000000, city=海口市,
districtId=460108000000, district=美兰区,
streetId=460108101000, street=灵山镇,
townId=460108101000, town=灵山镇,
villageId=null, village=null,
road=海榆大道,
roadNum=4号,
buildingNum=11#楼2单元203,
text=绿地城润园
)
加载扩展词典:dic/region.dic
加载扩展词典:dic/community.dic
加载扩展停止词典:dic/stop.dic
相似度结果分析 >>>>>>>>> MatchedResult(
doc1=Document(terms=[Term(灵山镇), Term(A), Term(32), Term(西片), Term(去), Term(旧), Term(改), Term(项目), Term(地块), Term(11#), Term(楼), Term(22203), Term(栋), Term(单元), Term(层), Term(号)], town=Term(灵山镇), village=null, road=null, roadNum=null, roadNumValue=0),
doc2=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4号), Term(11), Term(2), Term(203), Term(绿地城), Term(润园)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4号), roadNumValue=4),
terms=[io.patamon.geocoding.similarity.MatchedTerm@2cfb4a64],
similarity=0.4886777774252209
)
String t1 = "海南省海口市灵山镇海榆大道4号绿地城.润园灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)";
String t2 = "海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203";
结果
海南省海口市灵山镇海榆大道4号绿地城.润园灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)
addr1 >>>> Address(
provinceId=460000000000, province=海南省,
cityId=460100000000, city=海口市,
districtId=460108000000, district=美兰区,
streetId=460108101000, street=灵山镇,
townId=460108101000, town=灵山镇,
villageId=null, village=null,
road=海榆大道,
roadNum=4号,
buildingNum=A-32,
text=绿地城润园灵山西片去旧改项目地块11#楼22203栋单元层号
)
>>>>>>>>>>>>>>>>>
海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203
addr2 >>>> Address(
provinceId=460000000000, province=海南省,
cityId=460100000000, city=海口市,
districtId=460108000000, district=美兰区,
streetId=460108101000, street=灵山镇,
townId=460108101000, town=灵山镇,
villageId=null, village=null,
road=海榆大道,
roadNum=4号,
buildingNum=11#楼2单元203,
text=绿地城润园
)
加载扩展词典:dic/region.dic
加载扩展词典:dic/community.dic
加载扩展停止词典:dic/stop.dic
相似度结果分析 >>>>>>>>> MatchedResult(
doc1=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4号), Term(A), Term(32), Term(绿地城), Term(润园), Term(灵山), Term(西片), Term(去), Term(旧), Term(改), Term(项目), Term(地块), Term(11#), Term(楼), Term(22203), Term(栋), Term(单元), Term(层), Term(号)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4号), roadNumValue=4),
doc2=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4号), Term(11), Term(2), Term(203), Term(绿地城), Term(润园)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4号), roadNumValue=4),
terms=[io.patamon.geocoding.similarity.MatchedTerm@4b6995df, io.patamon.geocoding.similarity.MatchedTerm@2fc14f68, io.patamon.geocoding.similarity.MatchedTerm@591f989e, io.patamon.geocoding.similarity.MatchedTerm@66048bfd, io.patamon.geocoding.similarity.MatchedTerm@61443d8f],
similarity=0.7152705001057788
)
你好,
感谢开源这么有用的工具。
Geocoding.normalizing
这个API,在匹配完四级行政区之后,为了处理省市区重复书写的情况,removeRedundancy()
函数会继续移除能够解析到的省市区/县 乡镇/街道 及其之前的字符串,方便专心处理POI字符串。 但当POI字符串中出现了正常的地名字符串后(如 浙江省杭州市西湖区**建设银河西湖支行),removeRedundancy()
函数会错误的将 POI中的信息删除,只剩下“支行”。
举个栗子:
print(Geocoding.normalizing("浙江省杭州市西湖区**建设银河西湖支行"))
[Out]
Address(
provinceId=330000000000, province=浙江省,
cityId=330100000000, city=杭州市,
districtId=330106000000, district=西湖区,
streetId=null, street=null,
townId=null, town=null,
villageId=null, village=null,
road=null,
roadNum=null,
buildingNum=null,
text=支行
)
大佬你好,在使用过程中发现好像Geocoding.similarity(geoAddress1, geoAddress2) 有线程安全问题?
String geoAddress1 = "xxxx";
String geoAddress2 = "xxxx";
Geocoding.similarity(geoAddress1, geoAddress2)
感觉自定义地址是在字典里面新增地址的,而不是用于将错误地址改正后解析的? @IceMimosa
举个例子:
Geocoding.addRegionEntry(510000000000L, 100000000000L, "四州省", RegionType.Province, "四川")
Geocoding.normalizing("四州省广安市广安区")
能够将地址正确的解析为:四川省广安市广安区么?
请问,我在MAVEN中导入后,intellIj无法执行main方法,剔除后恢复正常,请问是什么情况
有对应的工具方法么?想生成一份国标的,求助下大佬
“天津市静海区大丰堆镇齐小王村村委会东100米“
这个地址会被解析成
provinceId=120000000000, province=天津,
cityId=120100000000, city=天津市,
districtId=120223000000, district=静海县,
streetId=120223113000, street=大丰堆镇,
townId=120223113000, town=大丰堆镇,
villageId=null, village=null,
road=null,
roadNum=null,
buildingNum=null,
text=齐小王村村委会东100米
通过自定义数据增加“静海区”还是不能解决。
是地址库没更新的问题吗?
这个是要通过修改地址库修改吗?
输入:四川省成都市郫都区西源大道1311号3栋4单元1楼102号
segment方法,seg_type = 'ik',
分词结果list为:['四川省', '成都市', '郫', '都', '西源大道', '1311号', '3栋', '4', '单元', '1楼', '102号']
期望结果list为:['四川省', '成都市', '郫都区', '西源大道', '1311号', '3栋', '4', '单元', '1楼', '102号']
请问有啥办法修正结果吗?感谢!
使用 normalizing: 标准化方法,输入地址:北京市海淀区西北旺东路10号院东区323102,发现返回数字323102 没有了
麻烦帮忙看看
输入: 广东省河源市源城区中山大道16号华怡小区
输出:
Address(
provinceId=440000000000, province=广东省,
cityId=442000000000, city=中山市,
districtId=442000000000, district=中山市,
streetId=null, street=null,
townId=null, town=null,
villageId=null, village=null,
road=null,
roadNum=null,
buildingNum=null,
text=大道16号华怡小区
)
Address(
provinceId=110000000000, province=北京,
cityId=110100000000, city=北京市,
districtId=110102000000, district=西城区,
streetId=null, street=null,
townId=null, town=null,
villageId=null, village=null,
road=新康街,
roadNum=2号院,
buildingNum=null,
text=1号楼北侧楼房
)
Addr1:江苏省南京市建邺区庐山路98-1号
Addr2:江苏省南京市庐山路98-1号
But I got the result 0.0 ?
不知道是我倒腾代码搞错?还是本来的bug?
比如:http://www.stats.gov.cn/sj/tjbz/tjyqhdmhcxhfdm/2023/
,或者类参数固定死2022、2023等等的输入。
如果有接口直接调用更好,没有的话可以用jsoup对页面进行爬虫
由于层级越深,生成的最终文件肯定越大。所以需要限制下地址的层级,比如1:省,2:市,3:区,4:街道/镇,5:居委会
json/pb...
对于一个空的dat字典文件,GeocodingX.addRegionEntry时,未初始化RegionEntity的children属性,导致下级的行政区划未能成功添加。DefaultRegionCache中的如下代码,最后一行在children未初始化(null)时,父RegionEntity不会添加子RegionEntity
override fun addRegionEntity(entity: RegionEntity) {
this.loadChildrenInCache(entity)
this.REGION_CACHE[entity.id] = entity
this.REGION_CACHE[entity.parentId]?.children?.add(entity)
}
代码:
from GeocodingCHN import Geocoding
geocoding = Geocoding()
text = '山东青岛李沧区延川路116号绿城城园东区7号楼2单元802户'
address_nor = geocoding.normalizing(text)
print(address_nor)
错误:
Traceback (most recent call last):
File "main.py", line 3, in
File "GeocodingCHN\Geocoding.py", line 61, in init
File "jpype_jclass.py", line 99, in new
TypeError: Class org.bitlap.geocoding.GeocodingX is not found
请教 mvn install 出 jar 包来后如何使用,对 Java 和 Kotlin 不熟悉,非常抱歉。
比如
val geocoding = GeocodingX("region_2021.dat")
geocoding.addRegionEntry(1,0,"xx")
geocoding.save("xx.dat")
Address(
provinceId=110000000000, province=北京,
cityId=110100000000, city=北京市,
districtId=110102000000, district=西城区,
streetId=null, street=null,
townId=null, town=null,
villageId=null, village=null,
road=新康街,
roadNum=2号院,
buildingNum=null,
text=1号楼北侧楼房
)
地址库能更新一下吗?谢谢
解析地址“福建福州鼓楼区六一路111号金三桥大厦”这种路名里面带“一”的会将“一”去掉
如“广东省广州市天河区越秀北路22号”,“越秀北路”会被识别成“北路”,区域也会变成“越秀区”,请问这个应该怎么解决?
“解放北路”会被识别为河北省的“解放县”,连着上级省市一起变了
大佬好,按照说明方法导入了五级地址库至mysql中,重新生成了dat文件,发现地址标准返回无法精确五级,这个怎么处理?
省/直辖市
市/州
县/区
乡/镇
村/社区
请问下 resources 里的 region.dic 和 community.dic 能否更新,如何生成呢?
比如输入“南山区”,会有两个匹配,一个是黑龙江省的,一个是广东省的,但是目前只会返回第一个;
还有如果只输入一个镇,返回的只有null,这里该怎么改呢?
导入那个pom依赖吗
要求:
通过readme 下载github的repo依赖失败:
Failed to execute goal on project customer-experience-data-factory: Could not resolve dependencies for project com.treeyee.cloud:customer-experience-data-factory:jar:0.0.1-SNAPSHOT: io.patamon.geocoding:geocoding:jar:1.1.6 was not found in https://raw.github.com/icemimosa/maven/release/ during a previous attempt. This failure was cached in the local repository and resolution is not reattempted until the update interval of patamon.release.repository has elapsed or updates are forced -> [Help 1]
克隆项目本地编译生成jar包,将jar添加到项目也运行失败(本地项目是java项目):
java.lang.NoClassDefFoundError: kotlin/jvm/internal/Intrinsics
at io.patamon.geocoding.Geocoding.similarity(Geocoding.kt)
老哥你知道这是啥原因吗
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.