GithubHelp home page GithubHelp logo

alfresco-indexer's Introduction

alfresco-indexer

What is it?

Alfresco Indexer is an API that allows to index content stored in Alfresco, when you want, how you want, selecting the content you're interested to.

Compatibility Matrix

Alfresco Indexer version (shipped with) ManifoldCF version (tested wth) Alfresco edition/version
0.7.x 1.8.0 to 2.2.0-RC0 Community 5.0.[a,b,c,d], Enterprise 4.2.x
0.8.x trunk (master) - WIP Community 5.0.d, Enterprise 5.0.x

Community 5.1.[a,b,c]-EA is work in progress (add issue link) There may be other permutations that work but haven't been tested.

Run Tests

git clone [email protected]:maoo/alfresco-indexer.git
mvn clean install -DskipTests
cd alfresco-indexer-webscripts-war
mvn clean integration-test

To know how to build the master and test it against ManifoldCF, follow these instructions

Project Structure

  • Alfresco Indexer Webscripts - A server-side component (an AMP that needs to be installed in Alfresco) that exposes a set of Webscripts on Alfresco Repository
  • Alfresco Indexer Client - A Java API that wraps HTTP invocations to Alfresco Indexer Webscripts and publishes a simple client interface to interact with Alfresco contents; hereby the most important methods you get access to:
/**
* Fetches nodes from Alfresco which has changed since the provided timestamp.
*
* @param lastAclChangesetId
*         the id of the last ACL changeset already being indexed; it can be considered a "startFrom" param
* @param lastTransactionId
*         the id of the last transaction already being indexed; it can be considered a "startFrom" param
* @return an {@link AlfrescoResponse}
*/
AlfrescoResponse fetchNodes(long lastTransactionId, long lastAclChangesetId, AlfrescoFilters filters) throws
AlfrescoDownException;

/**
* Fetches Node Info from Alfresco for a given node.
* @param nodeUuid the UUID for the node
* @return an {@link AlfrescoResponse}
* @throws AlfrescoDownException
*/
AlfrescoResponse fetchNode(String nodeUuid) throws AlfrescoDownException;

/**
* Fetches metadata from Alfresco for a given node.
* @param nodeUuid
*        the UUID for the node
* @return a map with metadata created from a json object
*/
Map<String, Object> fetchMetadata(String nodeUuid) throws AlfrescoDownException;

Differences with Alfresco-Solr integration

The software architecture of Alfresco Indexer is the same delivered by Alfresco-Solr integration:

  • A collection of webscripts (accessible via /alfresco/api/solr/* endpoints) that allow to track transactions and acl change events on Alfresco side
  • A Java client that interacts with webscripts and updates Apache Solr indexes

Nevertheless, the following differences can be noted:

  • Alfresco Indexer Webscripts are delivered by an AMP, they're not part of the core Alfresco code, as opposed to Alfresco Solr Integration
  • Alfresco Indexer is an unsupported, community, experimental effort; Alfresco Solr integration is stable and supported by Alfresco
  • Alfresco Indexer is agnostic to the Search Engine to adopt, as opposed to Alfresco-Solr integration; Alfresco ManifoldCF Connector is a great example on how to use Alfresco with other Search Engines (i.e. Elasticsearch)
  • Alfresco-Solr integration maintains 2 isolated index structures for transactions and changesets; Alfresco-Indexer maintains 1 index structure with one index entry per Alfresco node, containing a list of readable authorities (readablaAuthorities); as a result:
    1. Alfresco-Solr integration is slower at query time, since document index entries must be cross-referenced with ACL index entries to understand which documents are accessible from the current user
    2. Alfresco Indexer triggers a reindexing of all nodes whose ACL change; a change to /app:Company_Home would trigger a full re-indexing; on the other hand, it doesn't need complex query logic to implement authorisation query parsers for the Search Engine of your choice
  • Alfresco Indexer does not provide any Search Engine query parser, as opposed to Alfresco Solr integration, that delivers CMISQL, FTS and Lucene Query query predicates to implement advanced query capabilities; this makes Alfresco Indexer not suitable for any integration with Alfresco clients that rely on these search capabilities, such as Alfresco Share

To summarise, advantages of using Alfresco Indexer:

  • Simplified Search Index structure, it improves integration of Alfresco indexing with existing Search engines and index data structures
  • The authorization checks are implemented by query parsers by adding security constraints to a given query; there is no post-processing or data-joining activity involved during a query execution

Disadvantages of using Alfresco Indexer:

  • If an ACL changes on a node, also all other nodes that inherit from it will be re-indexed, including node properties and content
  • Alfresco query parsers cannot be used with this solution, therefore Alfresco Share won't work out of the box

Configuration

Alfresco Indexer Webscripts can be configured to tweak the indexing process; in alfresco-global.properties you can override the following default parameters.

Url Prefixes

indexer.properties.url.prefix = http://localhost:8080/alfresco/service/node/details
indexer.document.url.prefix = http://localhost:8080/alfresco/service/slingshot/node
indexer.content.url.prefix = http://localhost:8080/alfresco/service
indexer.share.url.prefix = http://localhost:8888/share
indexer.preview.url.prefix = http://localhost:8080/alfresco/service
indexer.thumbnail.url.prefix = http://localhost:8080/alfresco/service

Node Changes paging parameters

indexer.changes.nodesperacl=10
indexer.changes.nodespertxn=10

Node Changes allowed Node Types (whitelist)

indexer.changes.allowedTypes={http://www.alfresco.org/model/content/1.0}content,{http://www.alfresco.org/model/content/1.0}folder

Other examples of allowed types:

{http://www.alfresco.org/model/forum/1.0}topic
{http://www.alfresco.org/model/forum/1.0}post
{http://www.alfresco.org/model/content/1.0}person
{http://www.alfresco.org/model/content/1.0}link
{http://www.alfresco.org/model/calendar}calendar
{http://www.alfresco.org/model/calendar}calendarEvent
{http://www.alfresco.org/model/datalist/1.0}dataList
{http://www.alfresco.org/model/datalist/1.0}dataListItem (includes all sub-types, such as dl:task, dl:event and dl:issue)
{http://www.alfresco.org/model/blogintegration/1.0}blogDetails
{http://www.alfresco.org/model/blogintegration/1.0}blogPost

Binaries

Alfresco Indexer binaries can be found in Maven Central; you can use Alfresco Indexer using Apache Maven, simply adding the following dependency in your pom.xml file:

  <dependency>
      <groupId>com.github.maoo.indexer</groupId>
      <artifactId>alfresco-indexer-client</artifactId>
      <version>0.8.0</version>
  </dependency>

Release

Before releasing, make sure you can upload artifacts to Maven Central:

mvn deploy -Pgpg

If everything goes fine, make sure you're up-to-date with git master and run the release command:

git status
netstat -anl | grep 8080 #make sure local port 8080 is free
mvn clean -Ppurge
mvn release:prepare release:perform

Follow sonatype docs for setting up your environment.

Credits

This project was have been developed by

License

Please see the file LICENSE.md for the copyright licensing conditions attached to this codebase

alfresco-indexer's People

Contributors

maoo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alfresco-indexer's Issues

Unable to launch alfresco after applying alfresco-indexer-webscripts

Hi,
I cannot properly start alfresco server after applying alfresco-indexer-webscripts amp file. Alfresco gives the following error,

2015-10-23 10:34:09,115 ERROR [web.context.ContextLoader] [localhost-startStop-1] Context initialization failed
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'indexingSqlSessionFactory' defined in class path resource [alfresco/module/alfresco-indexer-webscripts/context/service-context.xml]: Error setting property values; nested exception is org.springframework.beans.NotWritablePropertyException: Invalid property 'useLocalCaches' of bean class [org.alfresco.ibatis.HierarchicalSqlSessionFactoryBean]: Bean property 'useLocalCaches' is not writable or has an invalid setter method. Does the parameter type of the setter match the return type of the getter?
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1454)

Simply commenting out the 'useLocalCaches' caches property in service-context.xml fixes this.

<property name="resourceLoader" ref="dialectResourceLoader"/>
<property name="dataSource" ref="dataSource"/>
<property name="configLocation">
  <value>classpath:alfresco/ibatis/alfresco-indexing-SqlMapConfig.xml</value>
</property>

but I am not sure whether this properly fix this issue or will it introduce new issues.

I am using Alfresco enterprise edition 5.0.1

I would greatly appreciate if you can give a proper solution for this issue.

Thanks

Does Alfresco-indexer plugin in Apache ManifoldCF Support Alfresco Community 7.1?

When I use alfresco-indexer-webscripts-0.8.1.amp that came with apache-manifoldcf-2.21-bin.tar.gz (Jan 3, 2022) download from http://www.apache.org/dyn/closer.lua/manifoldcf/apache-manifoldcf-2.21/apache-manifoldcf-2.21-bin.tar.gz

with Alfresco 7.1 Docker which create from https://github.com/Alfresco/alfresco-docker-installer.

I place alfresco-indexer-webscripts-0.8.1.amp like this

02

03

Then I build alfresco repository with command docker-compose build alfresco everything is ok.

But when I run Alfresco with command docker-compose up -d it have error like this.

04

05

I'm not sure does alfresco-indexer support Alfresco Community 7.1 or not? So please advice how can I fix this, Thank you.

Changes to document content/properties and filter configuration not working with Alfresco 5.0.d

  1. Changes to document content or properties does not cause the same document to be picked up by the Alfresco connector on the next run
  2. Adding ‘Filter Configuration’ seems to do very little to change what is picked up

IN DETAIL

  1. Failing to pick up modified content

Looking at the log files (which are set to debug) I can see that, upon the first crawl of Alfresco, Manifold sends the following requests:

DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1[\r][\n]"
DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> "GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1[\r][\n]"
DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1
DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1
DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> "GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1[\r][\n]"
DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1
DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1
DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1[\r][\n]"

This picks up all of the content e.g. documents.

Running a second crawl, without any other actions being done, results in the following requests:

DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1
DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1
DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1[\r][\n]”

So I can see that, in the first instance, we are targeting content directly while, in the second, we are asking for changes. The problem is that no changes are returned from the second set of requests. The response from these calls is:

DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " "totalNodes" : "0", [\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " "elapsedTime" : "8",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " "docs" : [[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " ],[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " "last_txn_id" : "352",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " "last_acl_changeset_id" : "13",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " "store_id" : "SpacesStore",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " "store_protocol" : "workspace"[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}"

Regardless of what changes I make to a document that I have been using for testing, the document is not updated. The response from the calls for changes (totalNodes) is always ‘0’.

  1. Adding ‘Filter Configuration’ seems to do very little to change what is picked up

Within my test Alfresco environment I have one site set up (Finance). Within the Finance doc library I have three test docs. No other changes have been made to the Alfresco instance.
Running a crawl with no filter configurations set returns 81 items. This is via the URL in a browser.
If I then set the Site Filter configuration to ‘Finance’ and apply, I still get 81 items when I re-run the crawl.
I can see that the term ‘Finance’ is being added to the URL but this does not seem to change the behaviour.

Text Extracting

Hi,

ManifoldCF use extract update handler to handle binary content. Binary content is sent to solr, and tikka try to extract text content and some metadata (mime type).

For alfresco connector, Alfresco should be used to convert binary to text as official solr do (by calling NodeContentGet). Because alfresco already know how to convert document to text.

But NodeContentGet webscript is protected by Certificat, you have to clone this webscript.

(original issue - philipmeadows/alfresco-webscript-manifold-connector#21 by @alexist )

Unable to use an account that isn't an administrator

Hi,

When connecting to Alfresco on the Manifold interface, it ask to enter a username/password. Whenever this account isn't a member of the ALFRESCO_ADMINISTRATORS group, then the connection is failing and the following is showing up on the logs:

ERROR 2018-01-08T10:59:03,081 (qtp638169719-446) - Json response is missing username.
com.github.maoo.indexer.client.AlfrescoParseException: Json response is missing username.
        at com.github.maoo.indexer.client.WebScriptsAlfrescoClient.getUsername(WebScriptsAlfrescoClient.java:305) ~[alfresco-indexer-client-0.8.1.jar:?]
        at com.github.maoo.indexer.client.WebScriptsAlfrescoClient.getUser(WebScriptsAlfrescoClient.java:298) ~[alfresco-indexer-client-0.8.1.jar:?]
        at com.github.maoo.indexer.client.WebScriptsAlfrescoClient.userFromHttpEntity(WebScriptsAlfrescoClient.java:289) ~[alfresco-indexer-client-0.8.1.jar:?]
        at com.github.maoo.indexer.client.WebScriptsAlfrescoClient.fetchUserAuthorities(WebScriptsAlfrescoClient.java:352) ~[alfresco-indexer-client-0.8.1.jar:?]
        at org.apache.manifoldcf.crawler.connectors.alfrescowebscript.AlfrescoConnector.check(AlfrescoConnector.java:133) [mcf-alfresco-webscript-connector.jar:?]
        at org.apache.jsp.viewconnection_jsp._jspService(viewconnection_jsp.java:249) [jsp/:?]
        at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) [jasper-6.0.35.jar:6.0.35]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0]
        at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388) [jasper-6.0.35.jar:6.0.35]
        at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) [jasper-6.0.35.jar:6.0.35]
        at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) [jasper-6.0.35.jar:6.0.35]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0]
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) [jetty-servlet-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) [jetty-servlet-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:595) [jetty-security-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) [jetty-servlet-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.Dispatcher.forward(Dispatcher.java:191) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.Dispatcher.forward(Dispatcher.java:72) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.java:709) [jasper-6.0.35.jar:6.0.35]
        at org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.java:680) [jasper-6.0.35.jar:6.0.35]
        at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:368) [jsp/:?]
        at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) [jasper-6.0.35.jar:6.0.35]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0]
        at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388) [jasper-6.0.35.jar:6.0.35]
        at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) [jasper-6.0.35.jar:6.0.35]
        at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) [jasper-6.0.35.jar:6.0.35]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0]
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) [jetty-servlet-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) [jetty-servlet-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) [jetty-security-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) [jetty-servlet-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.Server.handle(Server.java:497) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) [jetty-io-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610) [jetty-util-9.2.3.v20140905.jar:9.2.3.v20140905]
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539) [jetty-util-9.2.3.v20140905.jar:9.2.3.v20140905]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]

Would it be possible to use another user without administrative accesses? Where needed, I'm guessing this should be provided by using a runAs for example... Since this is all happening on the Alfresco side anyway, I don't think there would be an issue doing something like that.

Regards,
Morgan

IndexingService: Types Not Reset

Because the IndexingService is a singleton and filters are set per request, they must be reset with each set of requestsl to the indexing service. 4 of the filters are being reset to empty lists/maps, but the types are not be reset.

Allow for Site or Path filtering

It seems like site filtering is part of the plan. This could be implemented just as easily as path filtering. Either way, you need to trace the ancestral path. Path based filtering would be more powerful for the client too. If they are only looking to index or track certain folders, they may do so.

Test with Alfresco 5.1.c-EA

Currently mvn is failing with compilation failures

[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /Users/mau/w/alf/alfresco-indexer/alfresco-indexer-webscripts/src/test/java/com/github/maoo/indexer/webscripts/NodeWebScriptTest.java:[4,37] cannot find symbol
  symbol:   class BaseWebScriptTest
  location: package org.alfresco.repo.web.scripts
[ERROR] /Users/mau/w/alf/alfresco-indexer/alfresco-indexer-webscripts/src/test/java/com/github/maoo/indexer/webscripts/NodeWebScriptTest.java:[20,40] cannot find symbol
  symbol: class BaseWebScriptTest
[ERROR] /Users/mau/w/alf/alfresco-indexer/alfresco-indexer-webscripts/src/test/java/com/github/maoo/indexer/webscripts/NodeWebScriptTest.java:[36,37] cannot find symbol
  symbol:   variable super
  location: class com.github.maoo.indexer.webscripts.NodeWebScriptTest
[ERROR] /Users/mau/w/alf/alfresco-indexer/alfresco-indexer-webscripts/src/test/java/com/github/maoo/indexer/webscripts/NodeWebScriptTest.java:[37,47] cannot find symbol
  symbol:   variable super

site level permission changes from public to private not detected

Hi Maurizio,

When we change the permissions of initially public site to a private one, Permission changes are not properly detailed in node details webscript. The issue is that in alfresco even the public site is changed to private, it allows base.ReadPermissions to GROUP_EVERYONE. but site.SiteConsumer which is originally GROUP_EVERYONE changes accordingly. This could be Alfresco issue.

 "readableAuthorities" : [
"GROUP_EVERYONE"
,
"GROUP_site_newsite2_SiteManager"
,
"GROUP_site_newsite2_SiteCollaborator"
,
"GROUP_site_newsite2_SiteContributor"
,
"GROUP_site_newsite2_SiteConsumer"

],

should change to

"readableAuthorities" : [
"GROUP_site_newsite2_SiteManager"
,
"GROUP_site_newsite2_SiteCollaborator"
,
"GROUP_site_newsite2_SiteContributor"
,
"GROUP_site_newsite2_SiteConsumer"

],

Thanks,
Chalitha

IndexingService Not Thread Safe

I am assuming that the Indexing Service DAO is a singleton, but even if it isn't, there is certainly only one instance with respect to the node change webscript. This webscript is using the DAO as a state-machine (set, set, set, set, execute). This means that if there are two concurrent requests, they will stomp on each other.

Check method improvement

We can provide a simple checkConnection in the Alfresco Indexer Client :

/**
* THis method verify we have an effective connection to Alfresco, returning an exception with the proper details
* @return
* @throws AlfrescoConnectionException
*/
@OverRide
public boolean checkConnection() throws AlfrescoConnectionException
{…}

This method can be used by ManifoldCF to provide the repositoryConnector check, keeping track of the error messages.

Wrong authoritiesUrl in WebScriptsAlfrescoClient

on line https://github.com/maoo/alfresco-indexer/blob/master/alfresco-indexer-client/src/main/java/com/github/maoo/indexer/client/WebScriptsAlfrescoClient.java#L90 the url mentioned is

authoritiesUrl = String.format("%s://%s%s/api/node/auth/resolve/", protocol, hostname, endpoint);

whereas it should be

authoritiesUrl = String.format("%s://%s%s/auth/resolve/", protocol, hostname, endpoint);

as specified in https://github.com/maoo/alfresco-indexer/blob/master/alfresco-indexer-webscripts/src/main/amp/config/alfresco/extension/templates/webscripts/com/github/maoo/indexer/webscripts/authresolve.get.desc.xml#L4

Also interesting to notice that one test is updated and another is not

Tests need to be verified too.

Commit rights for maoo

Not sure if I've access, but I'd like to confirm (from anyone is the administrator of this project) to grant me (@maoo) commit rights, so that I can keep on merging PRs and continue to cover my role in the Apache ManifoldCF project.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.