Search using CMM crawler but with index for each Server Group and with security Inform

Pipeline craler,sitemap scan list crawler record data

crawler should be able to crawl pdf docx xls

CrawlerRequestSetup must configure follow links for X Levels (

New search concept about cammintegrationportal HOT 10 OPEN

JochenHWezel commented on September 13, 2024

New search concept

from cammintegrationportal.

Comments (10)

JochenHWezel commented on September 13, 2024

requires plugin concept

from cammintegrationportal.

JochenHWezel commented on September 13, 2024

Pipeline

craler,sitemap scan list
crawler record data
split up into words
usage by search form with Logical parser

from cammintegrationportal.

JochenHWezel commented on September 13, 2024

apps might provide crawler Setup for standard search index of Server Group, they might additionally or alternatively specify additional search index names for purpose of in-app-search-index

from cammintegrationportal.

JochenHWezel commented on September 13, 2024

pages being crawled by crawler provide the data from the crawler Setup as follows:

sitemap as link list in non-visible,div
record data as ...???
- maybe additional sitemap URL to Special URL for providing record data?

from cammintegrationportal.

JochenHWezel commented on September 13, 2024

crawler tracks page Status - on repetitive page error of an URL, it stops crawling for a configured time

from cammintegrationportal.

JochenHWezel commented on September 13, 2024

search page can be set up with a Parser pre-set - requiring that user search text is logically valid/closed nexted logic Levels automatically on demand

from cammintegrationportal.

JochenHWezel commented on September 13, 2024

crawler should be able to crawl

pdf
docx
xlsx,cells as formattedtext
plain text
html
jpg meta data
etc.

from cammintegrationportal.

JochenHWezel commented on September 13, 2024

CrawlerRequestSetup must configure

follow links for X Levels (e.g. as xenu)
follow links
- within nav URL Folder (default)
- within custum list of URLs
- everywhere (up,to Limit of X Levels)

from cammintegrationportal.

JochenHWezel commented on September 13, 2024

crawler,should consider meta robots/index no-follow/follow, /robots.txt commands, /robots.txt-sitemap

from cammintegrationportal.

JochenHWezel commented on September 13, 2024

might make sense to make indexing use a 2nd database for big Environments/heavy load and use the cwm db only for small Environments/light load

might make sense to re-use other technics/modules like luscene or other engines

from cammintegrationportal.

Recommend Projects

New search concept about cammintegrationportal HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs