GithubHelp home page GithubHelp logo

cases's Introduction

cases

**裁判文书网搜索

用法

Caution

需要磁盘空间 320G 以上,可能需要数小时的时间

Warning

在 Linux 平台,如果出现 IO error: ……Too many open files,可以使用ulimit -n 10000命令提高文件描述符上限。

0. 下载程序并创建配置文件

方法一:从 releases 页面下载已编译好的二进制文件(推荐),https://github.com/cncases/cases/releases

方法二:自行编译

## 安装 rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

## clone 本仓库
git clone https://github.com/cncases/cases.git

## 编译,对应程序在 target/release/ 文件夹中
cargo build -r 

配置文件参考config.toml

1. 下载原始数据(102G)

方法:通过bt下载,种子文件为 810air.torrent ,可以从本仓库下载,也可以通过链接 https://files.catbox.moe/810air.torrent

原始数据来源于马克数据网,文书数量超过8500万,约102G。下载后不要解压子文件,将文件路径填写到 config.toml 中的 raw_data_path 变量中;

2. 将数据加载到 rocksdb 数据库中

运行 convert config.toml 程序。此过程会将原始数据放入 rocksdb 数据库中,数据库文件路径为 config.toml 中的 db 变量;转换后的数据大小约为 200G,转换可能会花费数小时的时间;如果中途中断,再次运行会从中断处继续。

3. 创建索引

运行 index config.toml 程序会将数据库中的数据创建索引,索引文件路径为 config.toml 中的 index_path 变量;如果中途中断,需要删除 index_path 中的文件,重新运行 index 程序;默认情况下,不会索引案件内容,索引大小约为 15.5G,可能会花费数小时的时间。如果需要索引案件内容,需要将index.toml中的 index_with_full_text 设置为 true,但是这会使索引文件增加到150G左右,索引时间也会增加到十几个小时。

4. 运行搜索服务

运行 main config.toml 程序,用浏览器打开config.toml网址,即可搜索。

说明

当程序和配置文件放在同一目录下,且配置文件命名为 config.toml 时,可以省略配置文件路径参数。

screenshot

cases's People

Contributors

cncases avatar dependabot[bot] avatar kezhiadore avatar no398 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cases's Issues

运行`main`出错:Too many open files

直接运行./main命令行结果如下:

$ ./main
2024-01-22T06:56:34.400725Z  INFO main: listening on http://127.0.0.1:8081

但访问8081端口无任何响应。
使用管理员权限运行./main结果如下:

$ sudo ./main2024-01-22T06:56:38.123740Z  INFO main: listening on http://127.0.0.1:8081
thread 'main' panicked at src/bin/main.rs:27:84:
called `Result::unwrap()` on an `Err` value: Error { message: "IO error: While open a file for random read: /data/cases/rocksdb/009237.sst: Too many open files" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[1]    19883 IOT instruction  sudo ./main
  1. 该服务运行是否必须使用超级管理员权限,可否在非管理员下运行?
  2. 如何解决上述报错?

执行`convert`时出现 IO 错误

执行convert命令时,运行一段世界后报错:

2024-01-19T11:57:18.865094Z  INFO convert: inserting 15324160, time: 875
2024-01-19T11:57:18.937344Z  INFO convert: inserting 15325184, time: 875
2024-01-19T11:57:19.012542Z  INFO convert: inserting 15326208, time: 876
2024-01-19T11:57:19.128853Z  INFO convert: inserting 15327232, time: 876
2024-01-19T11:57:19.198966Z  INFO convert: inserting 15328256, time: 876
2024-01-19T11:57:19.267122Z  INFO convert: inserting 15329280, time: 876
2024-01-19T11:57:19.390479Z  INFO convert: inserting 15330304, time: 876
thread 'main' panicked at src/bin/convert.rs:61:45:
called `Result::unwrap()` on an `Err` value: Error { message: "IO error: While open a file for appending: /data/cases/rocksdb/002043.log: Too many open files" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

系统信息:

$ uname -a 
Linux ubuntu 5.15.0-84-generic #93-Ubuntu SMP Tue Sep 5 17:16:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         0
-m: resident set size (kbytes)      unlimited
-u: processes                       124695
-n: file descriptors                1024
-l: locked-in-memory size (kbytes)  4005456
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 124695
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15:                              unlimited

重新执行之后运行到同样的地方报同样的错误,看起来时文件打开限制的问题,有什么解决办法嘛?

断点续处理的问题

运行 convert config.toml 程序。此过程会将原始数据放入 rocksdb 数据库中,数据库文件路径为 config.toml 中的 db 变量;转换后的数据大小约为 200G,转换可能会花费数小时的时间;如果中途中断,再次运行会从中断处继续。

再次运行要先skipp很久的文件才能接着处理,目前处理了96G。这个是运行要加参数吗?

用git clone的代码编译的convert

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.