GithubHelp home page GithubHelp logo

Comments (3)

themowski avatar themowski commented on June 10, 2024 2

Thanks for the quick response! Given the dev team's constraints on time & resources, it makes sense, even if it's not what I was hoping to hear. It sounds like a PR for this would be declined, so I'm going to close this issue.

For anyone else who finds this and is looking for a solution that works while we wait for the upcoming search overhaul, I was able to enable q.op support without directly modifying the upstream CKAN source by adding code that modifies ckan.lib.search.query.VALID_SOLR_PARAMETERS to the top of my plugin. This isn't particularly robust, but it gets the job done. Here is the complete example plugin code:

import logging
from typing import Any, Dict
from ckan import plugins


logger = logging.getLogger(__name__)

# Add 'q.op' to VALID_SOLR_PARAMETERS
try:
    logger.info("Adding 'q.op' to ckan.lib.search.query.VALID_SOLR_PARAMETERS")
    import ckan.lib.search.query as ckan_query
    ckan_query.VALID_SOLR_PARAMETERS.add("q.op")
    logger.info("Successfully added 'q.op' to VALID_SOLR_PARAMETERS")
except Exception as err:
    raise RuntimeError("Patch to add 'q.op' support to ckan.lib.search.query.VALID_SOLR_PARAMETERS encountered an error") from err

# Plugin code
class MyPlugin(plugins.SingletonPlugin):
    plugins.implements(plugins.IPackageController, inherit=True)

    def before_dataset_search(self, search_params: Dict[str, Any]) -> Dict[str, Any]:
        # Use the Extended DisMax query parser. This combines the Standard
        # parser's syntax with the customizability of the DisMax parser.
        #   https://solr.apache.org/guide/solr/latest/query-guide/edismax-query-parser.html
        search_params["defType"] = "edismax"

        # Set the default query operation to OR
        search_params["q.op"] = "OR"
        
        # Override CKAN's overly restrictive default "mm" value by
        # setting it to "0%", which is the default for an "OR" query.
        #   https://solr.apache.org/guide/solr/latest/query-guide/edismax-query-parser.html#extended-dismax-parameters
        search_params["mm"] = "0%"
        
        return search_params

from ckan.

smotornyuk avatar smotornyuk commented on June 10, 2024

No, it wasn't an oversight. this setdefault('q.op') is intentional. CKAN has been using the default value of q.op for more than 10 years(or even since the solr support was added). Initially, it was set to AND as for users it was more expected to see results that match all terms in the query.

If you want to use OR, you are an "advanced" CKAN search user. And CKAN search switches into advanced mode as soon as you add at least one ":". So, instead of a OR b, you can use text:a OR b. text is the default search field that contains the combined value of meaningful dataset fields.

Technically, we can add q.op to the parameters list, but at the moment we are working on support of multiple search engines apart from Solr. And q.op will not be added to the list of allowed parameters till the work is finished. After work is finished, there probably will be more self-describing name for this parameter.
So adding q.op to remove it soon, especially when we have a workaround(add : to q) will not be approved.

from ckan.

themowski avatar themowski commented on June 10, 2024

After playing with this some more, I found that there's a potential bug in the upstream code that prevents q.op = OR from working correctly.

The function _read() in ckan/views/group.py sets up a filter query that is used when viewing the page for an Organization or Group. Typically, this would show the details for the Org/Group, and the datasets in that set. However, the filter query that is constructed ends up being a "complicated Boolean query" (using the terminology in Solr's documentation), and the piece that actually filters the Org/Group omits the + sign that makes the Org/Group ID required. As a result, when q.op = OR, that filter is just a suggestion, and all datasets end up being shown. This doesn't happen when q.op = AND because the "required" concept is built into the AND.

Adding the + operator when the filter query is initialized with the Org/Group ID fixes the issue when q.op = OR.

diff --git a/ckan/views/group.py b/ckan/views/group.py
index 3d901d162..36e759470 100644
--- a/ckan/views/group.py
+++ b/ckan/views/group.py
@@ -232,9 +232,9 @@ def _read(id: Optional[str], limit: int, group_type: str) -> dict[str, Any]:
 
     # Search within group
     if g.group_dict.get(u'is_organization'):
-        fq = u' owner_org:"%s"' % g.group_dict.get(u'id')
+        fq = u' +owner_org:"%s"' % g.group_dict.get(u'id')
     else:
-        fq = u' groups:"%s"' % g.group_dict.get(u'name')
+        fq = u' +groups:"%s"' % g.group_dict.get(u'name')
 
     extra_vars["q"] = q

@smotornyuk -- Do you agree that this is an actual bug that is worth fixing, since it only really works by coincidence and it is more technically correct to add the + operator (or to specify multiple fq values, but I assume that is not being done for performance/cache reasons)? Or are you also going to be reimplementing this view function as part of the upcoming search overhaul?

from ckan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.