GithubHelp home page GithubHelp logo

bloomfilterredis's Introduction

基于Redis的布隆过滤器

简介

  • BloomFilterRedis:使用Redis的Bitmap作为位数组构建起来的可扩展的布隆过滤器,位数组的默认长度为2^23,哈希函数默认为八个。
  • orange:Scrapy工程,以“橘子水”为出发点的爬取百度百科的爬虫,配置了基于BloomFilterRedis的过滤器。

关于Bitmap以及其它介绍详见我的博文基于Redis的布隆过滤器的实现

开发环境

  • python 2.7.12
  • Redis 3.2.8
  • python-redis
  • scrapy 1.3.3

使用方法

from BloomFilterRedis import BloomFilterRedis

bloomFilterRedis = BloomFilterRedis("bloom")
bloomFilterRedis.do_filter("one item to check")

Scrapy中的使用方法

  1. BloomFilterRedis和复制到工程文件夹下,将BloomRedisDupeFilter.py复制到与settings.py同一目录下。
  2. 在settings.py中配置以下字段:
# 配置过滤器为基于redis的布隆过滤器
DUPEFILTER_CLASS = 'orange.BloomRedisDupeFilter.BloomRedisDupeFilter'
# reids中bitmap的key,默认为‘bloom’
# BLOOM_REDIS_KEY = 'bloom'
# redis的连接配置,默认为本机
# BLOOM_REDIS_HOST = '127.0.0.1'
# BLOOM_REDIS_PORT = 6379
# 布隆过滤器的哈希列表,默认为8个,定义在GeneralHashFunctions中
# BLOOM_HASH_LIST = ["rs_hash", "js_hash", "pjw_hash", "elf_hash", "bkdr_hash", "sdbm_hash", "djb_hash", "dek_hash"]

bloomfilterredis's People

Contributors

kongtianyi avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.