These are minor issues, they're not even bugs, I'm just putting them here in case you want to mimic Fuuka's search behavior as much as possible.
Because -
is a special character, a word with -
in won't be able to be searched for if you input it like that. For example, Anon-san
, or Akiha-sama
. The other special characters, |
, +
and "
don't really pose that much of an issue because usually people don't mistakenly try to search for words with those characters in them while expecting results.
So, as a courtesy to the user, Fuuka, in the _sphinx_escape
function (which is used under the same circumstances as your HalfEscapeString
method), turns an expression like something-something
into "something\-something"
(which is technically returns the same results as "something something"
because -
isn't indexed and is counted as a word boundary character, but I'm still including the escaped -
for the sake of correction), so you get the expected results. In that way, the -
character is only treated as "exclude the following word or quoted expression from the search results" when it's preceded by at least one space. This matches Google's behavior:
https://www.google.com/search?q=akiha-sama
https://www.google.com/search?q=akiha%20-sama
Example:
http://oldarchive.foolz.us/jp/?task=search&ghost=&search_text=akiha-sama (autocorrected by Fuuka)
http://oldarchive.foolz.us/jp/?task=search&ghost=&search_text=%22akiha%20sama%22 (equivalent search)
http://archive.foolz.us/jp/search/text/akiha-sama/ (broken)
http://archive.foolz.us/jp/search/text/%22akiha%20sama%22/ (equivalent search)
These are the Perl regexes that perform the replacement above (you need the first one so you won't wreck someone who searches by "akiha-sama"
, otherwise the second one would turn it into ""akiha-sama""
):
$query=~ s/\"([^\s]+)-([^\s]*)\"/$1-$2/g;
$query=~ s/([^\s]+)-([^\s]*)/"$1\\-$2"/g;
Another thing that I noticed is that making a search with invalid syntax returns a 404. Ideally, it should return an error message so someone who's unaware of search syntax has a chance to figure it out and correct their mistake, instead of being throughly confused. Examples:
http://archive.foolz.us/jp/search/text/-/
http://oldarchive.foolz.us/jp/?task=search&ghost=&search_text=-
http://archive.foolz.us/jp/search/text/%22something/
http://oldarchive.foolz.us/jp/?task=search&ghost=&search_text=%22something