GithubHelp home page GithubHelp logo

crudbug / ambar Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rd17/ambar

0.0 2.0 0.0 81 KB

Ambar: Document Search System. Ingest, search and manage your files (SMB, FTP, IMAP, Dropbox)

Home Page: https://ambar.cloud/

License: Other

ambar's Introduction

Version License Blog

Ambar: Document Search System

Ambar Search

Ambar is an open-source document search and management system with automated crawling, OCR, tagging and instant full-text search.

Ambar defines the new way to manage your documents out of the box:

  • Ingest documents from any source
  • Find documents and images instantly with Google-like search
  • Manage your documents with tags, hide irrelevant search results
  • Download or share links to your documents, even if they've been deleted from the source

Features

Search

Tutorial: Mastering Ambar Search Queries

  • Fuzzy Search (John~3)
  • Phrase Search ("John Smith")
  • Search By Author (author:John)
  • Search By File Path (filename:*.txt)
  • Search By Date (when: yesterday, today, lastweek, etc)
  • Search By Size (size>1M)
  • Search By Tags (tags:ocr)
  • Search As You Type
  • Supported language analyzers: English ambar_en, Russian ambar_ru, German ambar_de, Italian ambar_it, Polish ambar_pl, Chinese ambar_cn, CJK ambar_cjk

Crawling

Content Extraction

  • Extract content from large files (>30M)
  • ZIP archives
  • MS Office documents (Word, Excel, Powerpoint, Visio, Publisher)
  • OCR over images
  • Email messages with attachments
  • Adobe PDF (with OCR)
  • OCR languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld
  • OpenOffice documents
  • RTF, Plaintext
  • HTML / XHTML
  • Multithread processing (Only EE)

General

  • Files Tagging
  • Hiding Irrelevant Search Results
  • Files Preview (with Google Docs View)
  • Real-Time Statistics
  • Web UI
  • REST API
  • Multiple user accounts (Only EE)

Editions

There are two editions available: Community and Enterprise. Enterprise Edition is a full featured document search and management system that can handle terabytes of data.

Community Edition is a scaled down, single user version of Enterprise Edition with limited number of pipelines and crawlers, though preserving the full functionality. You are welcome to use Ambar Community Edition for both personal and commercial purposes, at no cost.

Installation

Installation is straightforward. Turn on your Linux machine and follow our step-by-step installation guide.

Docker images can be found on Docker Hub

How it Works

FAQ

Is it open-source?

Yes, almost every Ambar's module is published on GitHub under Fair Source License 1

Is it free?

Yes, Community Edition is forever free. We will NOT charge a penny from you to use it.

Does it perform OCR?

Yes, it performs OCR on images (jpg, tiff, bmp, etc) and PDF's. OCR is perfomed by well-known open-source library Tesseract. We tuned it to achieve best perfomance and quality on scanned documents. You can easily find all files on which OCR was perfomed with tags:ocr query

Which languages are supported for OCR?

Supported languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld. If you miss your language, please create a new issue and we'll add it ASAP.

Does it support tagging?

Yes!

What about searching in PDF?

Yes, it can search through any PDF, even badly encoded or with scans inside. We did our best to make search over any kind of pdf document smooth.

I miss XXX language analyzer. Can you add it?

Yes, please create an issue on GitHub.

Are you going to add UI localizations?

We're working on it. Be patient.

What is the maximum file size it can handle?

It's limited by amount of RAM on your machine, typically 500MB. It's an awesome result, as typical document managment systems offer 30MB maximum file size to be processed.

What is the difference between Ambar CE and Ambar EE?

Basically Ambar CE is a downscaled Ambar EE. Check comparison on our landing page.

Can anyone else see my documents?

Nope, check our Privacy Policy.

I have a problem what should I do?

Submit an issue or chat with us on https://ambar.cloud

Change Log

Change Log

Contributors

Privacy Policy

Privacy Policy

License

Fair Source 1 License v0.9

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.