Comments (10)
requires plugin concept
from cammintegrationportal.
Pipeline
- craler,sitemap scan list
- crawler record data
- split up into words
- usage by search form with Logical parser
from cammintegrationportal.
apps might provide crawler Setup for standard search index of Server Group, they might additionally or alternatively specify additional search index names for purpose of in-app-search-index
from cammintegrationportal.
pages being crawled by crawler provide the data from the crawler Setup as follows:
- sitemap as link list in non-visible,div
- record data as ...???
- maybe additional sitemap URL to Special URL for providing record data?
from cammintegrationportal.
crawler tracks page Status - on repetitive page error of an URL, it stops crawling for a configured time
from cammintegrationportal.
search page can be set up with a Parser pre-set - requiring that user search text is logically valid/closed nexted logic Levels automatically on demand
from cammintegrationportal.
crawler should be able to crawl
- docx
- xlsx,cells as formattedtext
- plain text
- html
- jpg meta data
- etc.
from cammintegrationportal.
CrawlerRequestSetup must configure
- follow links for X Levels (e.g. as xenu)
- follow links
- within nav URL Folder (default)
- within custum list of URLs
- everywhere (up,to Limit of X Levels)
from cammintegrationportal.
crawler,should consider meta robots/index no-follow/follow, /robots.txt commands, /robots.txt-sitemap
from cammintegrationportal.
might make sense to make indexing use a 2nd database for big Environments/heavy load and use the cwm db only for small Environments/light load
might make sense to re-use other technics/modules like luscene or other engines
from cammintegrationportal.
Related Issues (20)
- Public user group ID 59 still shared between multiple server groups HOT 2
- Detect and warn insecure web.config settings
- SmartEditor content encoding wrong
- Auths for groups not working after update HOT 2
- After update from db b192, auto-rule auth is not protected HOT 1
- Text Module administration not working
- Add app auth for user with Dev&Test fails if already authed without Dev&Test HOT 2
- Auth-Copy on deleting auth transmission HOT 1
- Auths Admin: inherited auths not marked as such?! HOT 1
- Auths Admin: inheriting security objects
- Database deadlock situation found HOT 1
- Double Opt-In/Out
- User deletion fails HOT 1
- Add group doesn't work in b2115
- Deadlock situation at Log.DeleteExpiredEntries
- Log cleanup doesn't work HOT 2
- Cleanup of deactivated users after configured timespan HOT 1
- User Data Transfers
- New salutation type "divserse"
- Logging module prevents login
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cammintegrationportal.