frutik / awesome-search Goto Github PK

Awesome Search - this is all about the (e-commerce, but not only) search and its awesomeness

HTML 100.00%

search search-engine spelling-correction suggestions ecommerce-search relevant-search synonyms natural-language-processing query-understanding autocomplete-suggestions

awesome-search's Introduction

Hi there 👋

I have been working in the domain of search for e-commerce/marketplaces for almost a decade.

My area of interest: search, e-commerce, e-commerce search, NLP, ML, cloud technologies, distributed applications, reliability, high availability, docker, Kubernetes, DevOps/SRE, python, ruby, nodejs.

Currently: webshop.nl, shops.ae Ex: lohika.com, marktplaats.nl, nationalevacaturebank.nl, belvilla.com, kasta.ua

I live in the Netherlands, originally from Ukraine. Software Engineer since 1998. M.S. in Informatics and Physics.

awesome-search's People

Contributors

Stargazers

Watchers

Forkers

haicheviet tinhte96 399601829 yaro-m mightguy vikky2101 sony-git eric-seekas neethu-ak knowledgehacker roshanrathod seanlee2020 stungkit m-penn da-southampton zhongbin1 nonva allensmile 15810856129 keyman9848 paulyuan1219 yanchaosb123 sunsong111 huanghao128 verticaio adamdubnytskyy nagamanir jasonbosco jjwanat hemantcs seeker1943 ojasaar epugh sukamto572 sanjayagnani92 ramch22 gitcommitshow potatospudowski shashank085236 kiminh alvarocalle therakeshpurohit wahidmounir tiffen kshunmugaraj jessisyl bigdataedison zhongyunuestc wang-ii xiaobiaohust estherva7 phani1in nikeshnaik drcyfai damo894127201 gofirestar drake-jin-forks sharpboy2008 avsolatorio der-ofenmeister swipswaps shandou imadarsh1001 lijianshe02 wntp thorpham mewemew gao-bio kamalkarki techsoft29 hwijune mincong-h smarthi segfault091 priyabrata017 ichbinhandsome radu-gheorghe kacperlukawski hurutoriya canslove eliekawerk otrosien maiquangtuan gladiopeace nikhilgarg28 amitborkar subhasisj stjepanjurekovic eqeiland tacho87 vinhch kish2011 mklueh otisg wangj3081 daya6489 theankurai xycsony usmc2033 awakari

awesome-search's Issues

bert sandbox

BERT for dummies — Step by Step Tutorial - https://towardsdatascience.com/bert-for-dummies-step-by-step-tutorial-fb90890ffe03
DIY Practical guide on Transformer. Hands-on proven PyTorch code for Intent Classification in NLU with BERT fine-tuned.
Simple Transformers — Named Entity Recognition with Transformer Models - https://towardsdatascience.com/simple-transformers-named-entity-recognition-with-transformer-models-c04b9242a2a0
Simple Transformers is the “it just works” Transformer library. Use Transformer models for Named Entity Recognition with just 3 lines of code. Yes, really.
Named Entity Recognition using BERT - https://medium.com/swlh/named-entity-recognition-using-bert-2fb924864d47
If you know SQL, you probably understand Transformer, BERT and GPT. - https://towardsdatascience.com/if-you-know-sql-you-probably-understand-transformer-bert-and-gpt-7b197cb48d24
It’s all about query and retrieval.
Improving sentence embeddings with BERT and Representation Learning - https://towardsdatascience.com/improving-sentence-embeddings-with-bert-and-representation-learning-dfba6b444f6b
BERT is now part of Google Search, so let’s understand how it reasons - https://towardsdatascience.com/how-does-bert-reason-54feb363211
Simple BERT using TensorFlow 2.0 - https://towardsdatascience.com/simple-bert-using-tensorflow-2-0-132cb19e9b22

sandbox Jan 2020

spacy

A tour of awesome features of spaCy (part 1/2) - https://medium.com/eliiza-ai/a-tour-of-awesome-features-of-spacy-part-1-2-58b32425954f
A tour of awesome features of spaCy (part 2/2) - https://medium.com/eliiza-ai/a-tour-of-awesome-features-of-spacy-part-2-2-d7bd628a81ce
How to Train NER with Custom training data using spaCy. https://medium.com/@manivannan_data/how-to-train-ner-with-custom-training-data-using-spacy-188e0e508c6
Rule-Based Matching with spaCy - https://medium.com/@ashiqgiga07/rule-based-matching-with-spacy-295b76ca2b68
Custom Named Entity Recognition Using spaCy - https://towardsdatascience.com/custom-named-entity-recognition-using-spacy-7140ebbb3718
SpaCy Classifier with pre-train token2vec VS. One without pre-train - https://towardsdatascience.com/nlp-spacy-classifier-with-pre-train-token2vec-vs-one-without-pre-train-2f05d2179290

The biggest problem with search is all the gobbledygook – can word2vec or fastText help? https://www.linkedin.com/pulse/biggest-problem-search-all-gobbledygook-can-word2vec-help-gr%C3%B6nroos/

Search In Practice — Approximate Nearest Neighbors Using Quantizations https://medium.com/analytics-vidhya/search-in-practice-approximate-nearest-neighbors-using-quantizations-baaf00ff8e02

Short Text Topic Modeling - https://towardsdatascience.com/short-text-topic-modeling-70e50a57c883

Finding Similar Quora Questions with Word2Vec and Xgboost - https://towardsdatascience.com/finding-similar-quora-questions-with-word2vec-and-xgboost-1a19ad272c0d

content pre-processing sandbox

Detecting Image Duplicates at OLX Scale

Perceptual Hashes

https://zentity.io/

deduplication

Determining compatibility

Knowledge graphs applied in the retail industry

The Ecommerce Knowledge Graph - Semantics3 Labs

Datafari Search Engine

Hi,
you may want to add to your "product and services" section the Datafari Open Source Enterprise Search Engine. The repo is here: https://github.com/francelabs/datafari

Cedric (disclaimer: I work for the company that develops Datafari)

sandbox Dec 2020

why the smartSuggest module might matter to you

Guide the user during the search formulation process to facilitate accurate data entry, encourage exploratory search and boost product discovery.

https://blog.searchhub.io/why-weve-developed-the-searchhub-smartsuggest-module-and-why-it-might-matter-to-you

Query Segmentation and Spelling Correction

https://towardsdatascience.com/query-segmentation-and-spelling-correction-483173008981

ELMo Embedding — The Entire Intent of a Query

https://medium.com/analytics-vidhya/elmo-embedding-the-entire-intent-of-a-query-530b268c4cd

Search to Search recommendations (Collaborative Synonym and Spell corrections)

https://haystackconf.com/europe2019/search-to-search-recommendations/

What is Search in the Omnichannel?

https://opensourceconnections.com/blog/2020/12/18/what-is-search-in-the-omnichannel/

Using approximate nearest neighbor search in real world applications

https://blog.vespa.ai/using-approximate-nearest-neighbor-search-in-real-world-applications/

GPT-3: Demos, Use-cases, Implications

https://towardsdatascience.com/gpt-3-demos-use-cases-implications-77f86e540dc1

Roles in a Data Team

Different roles in a data team and their responsibilities

https://towardsdatascience.com/roles-in-a-data-team-d97a87fdabaa

Search Product Management: The Most Misunderstood Role in Search?

https://jamesrubinstein.medium.com/search-product-management-the-most-misunderstood-role-in-search-2b7569058638

Improving search relevance with data-driven query optimization

https://www.elastic.co/blog/improving-search-relevance-with-data-driven-query-optimization

Using Behavioral Data to Improve Search

https://tech.ebayinc.com/engineering/using-behavioral-data-to-improve-search/

Search Product Manager: Software PM vs. Enterprise PM or What does that * PM do?

https://www2.slideshare.net/jt_kane/search-product-manager-software-pm-vs-enterprise-pm-or-what-does-that-pm-do

Analyzing online search relevance metrics with the Elastic Stack

https://www.elastic.co/blog/analyzing-online-search-relevance-metrics-with-the-elastic-stack

Introducing txtai, an AI-powered search engine built on Transformers

Add Natural Language Understanding to any application

https://towardsdatascience.com/introducing-txtai-an-ai-powered-search-engine-built-on-transformers-37674be252ec

Best Practices for Enterprise Search User Experience (UX)

https://www.searchblox.com/best-practices-for-enterprise-search-user-experience-ux/

The Annual Search Shootout – Testing strategy on 2019’s topics

https://opensourceconnections.com/blog/2020/11/25/the-annual-search-shootout-testing-strategy-on-2019s-topics/

Three Pillars of Search Relevancy. Part 1: Findability

https://blog.searchhub.io/three-pillars-of-search-quality-in-ecommerce-part-1-findability

Use Site Search to Optimize Your Customer Journey

Largely, it remains, the neglected stepchild of e-commerce optimization. Site-search optimization has the potential to catapult your customer journey strategy to a new level.

https://blog.searchhub.io/why-use-site-search-analytics-to-optimize-your-customer-journey

Weighted Quality Score for Ads, Feed, and Search

This is a practical guide for engineers and product managers about how to combine multiple definitions of item quality to form a “pretty good” overall score of quality using a simple linear model. This isn’t the best or optimal way to optimize user experience, but it’s easy to implement, understand, extend, is generally applicable to virtually any product, and is time-tested in industry.

https://medium.com/promoted/weighted-quality-score-for-ads-feed-and-search-2fa70ec4f51f

Unsupervised Attribute Extraction for Online Listings

I will talk about my project on developing an unsupervised approach to extract attributes from online listings, done in collaboration with OLX Group, part of Prosus. The OLX Group operates a network of online trading platforms in over 40 countries, building market leading classifieds marketplaces that empower millions of people to buy, sell, and create prosperity in local communities.

https://medium.com/prosus-ai-tech-blog/unsupervised-attribute-extraction-for-online-listings-41baa5d2270e

NLP: All the Features. Every Feature That Can Be Extracted From the Text

I will be sharing all the possible NLP features that you can extract from unstructured texts for using in downstream tasks. I also list the python libraries I prefer to use for computing these features.

https://medium.com/swlh/nlp-all-them-features-every-feature-that-can-be-extracted-from-text-7032c0c87dee

Search (Pt 2) — A Semantic Horse Race

Cutting edge NLP vs traditional search

https://towardsdatascience.com/search-pt-2-semantic-horse-race-5128cae7ce8d

Billion-scale semantic similarity search with FAISS+SBERT

Building the prototype for an intelligent search engine

https://towardsdatascience.com/billion-scale-semantic-similarity-search-with-faiss-sbert-c845614962e2

Visualizing 100,000 Amazon Products

Fast sentence embeddings (fse) enables you to compute sentence embeddings for millions of reviews in only a few minutes.

https://towardsdatascience.com/vis-amz-83dea6fcb059

Query Understanding: An efficient way how to deal with long tail queries

Our data shows that when people search for a certain product, most of them use roughly 1.5 words. These short queries unfortunately make it hard for full-text search to offer them relevant results. While there is improvement to be found in using filters, there are often so many that it can be confusing. One of the ways to make searching more effective is to use the ‘learning to rank’ approach, which creates an optimal ranking of results. However, even this machine-learning method is not all-mighty – and that’s why we’ve come up with Query Understanding, a great companion to ‘learning to rank’.

https://www.luigisbox.com/blog/query-understanding/

Search Optimization 101 – How do you fix a broken search?

https://blog.supahands.com/2020/08/04/search-optimization-101-how-do-you-fix-a-broken-search/

Testing Search for Relevancy and Precision

Despite the fact that site search often receives the most traffic, it’s also the place where the user experience designer bears the least influence. Few tools exist to appraise the quality of the search experience, much less strategize ways to improve it. When it comes to site search, user experience designers are often sidelined like the single person at an old flame’s wedding: Everything seems to be moving along without you, and if you slipped out halfway through, chances are no one would notice. But relevancy testing and precision testing offer hope.

https://alistapart.com/article/testing-search-for-relevancy-and-precision/

philosophe.*

Testing Search

https://www.philosophe.com/archived_content/search_topics/search_tests.html

Assumptions About User Search Behavior

https://www.philosophe.com/archived_content/search_topics/user_behavior.html

How not to use BERT for Document Ranking

BERT (Bidirectional Encoder Representations from Transformers) turned 2 years a few days ago, and since its introduction it has been a revolution for Search and Information Retrieval. It has drastically improved the accuracy on many different information seeking tasks, be it answering questions or ranking documents, far beyond what was thought possible just a few years ago. In this blog post I’ll give an quick overview of how to evaluate search ranking models using well established relevancy datasets and how to achieve terrible ranking results using BERT in a way it was not meant to be used with a few good pointers on how to successfully apply BERT for ranking.

https://bergum.medium.com/how-not-to-use-bert-for-search-ranking-4586716428d9

“Avacado” or Avocado?

A simple search query correction heuristic for the resource-constrained

https://tech.instacart.com/avacado-or-avocado-4b4b78dc0698

10-step checklist to build a great search

https://medium.com/videdressing-engineering/10-step-checklist-for-building-a-great-search-1c8373a97a87

Fess Search Engine

I just learned about https://github.com/codelibs/fess Fess, and got to play with it this week. It's basically a clone of GSA on Elasticsearch, combinging the front end (though it's optional) with a backend crawler.

It seems actively maintained... I would have opened a PR, but not quite sure where to put it...

sandbox Jun 2021

sandbox April 2021

Articles

Building Smarter Search Products: 3 Steps for Evaluating Search Algorithms
Clothes in Space: Real-time personalization in less than 100 lines of code
(Definitely Not) Lost in Translation: ‘Translating’ Products for Multi-Brand Personalization
The influence of TF-IDF algorithms in eCommerce search
E-commerce search and recommendation with Vespa.ai
Collaborative Filtering Recommendation with Co-Occurrence Algorithm
Co-occurrence Matrix
Intro to Cooccurrence Recommenders with Spark
How We Built A Context-Specific Bidding System for Etsy Ads
VINTED SEARCH SCALING CHAPTER 3: ELASTICSEARCH INDEX MANAGEMENT
A Guide to Better Ecommerce Site Search
Using approximate nearest neighbor search to find similar products
What Is a Judgment List?
Speeding up BERT Search in Elasticsearch
The Search Before the Search: Keyword Foraging

Video

Measuring and Optimizing Findability in e-commerce Search (MICES 2019)
Bias on Search and Recommender Systems

Papers

A Transformer-based Embedding Model for Personalized Product Search
How to Grow a (Product) Tree Personalized Category Suggestions for eCommerce Type-Ahead
Don’t Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

Click Models for Web Search

https://www.morganclaypool.com/doi/10.2200/S00654ED1V01Y201507ICR043

https://www.youtube.com/watch?time_continue=3&v=33QDCpZmR-E&feature=emb_logo

Datasets: Adding a section for open-source datasets?

Appreciate the collection of awesome search resources! There are a lot of different publicly available data sets in regards to search & relevance. For example, Home Depo Search Relevance at Kaggle (https://www.kaggle.com/c/home-depot-product-search-relevance) or WANDS (https://github.com/wayfair/WANDS), among many others.

In this repo, I find no section with datasets, which could be every valuable one is interested in pre-training or playing around with training their own ML-models. Is this something that could be added perhaps?