GithubHelp home page GithubHelp logo

frutik / awesome-search Goto Github PK

View Code? Open in Web Editor NEW
1.2K 63.0 104.0 1.13 MB

Awesome Search - this is all about the (e-commerce, but not only) search and its awesomeness

HTML 100.00%
search search-engine spelling-correction suggestions ecommerce-search relevant-search synonyms natural-language-processing query-understanding autocomplete-suggestions

awesome-search's Introduction

Hi there 👋

I have been working in the domain of search for e-commerce/marketplaces for almost a decade.

My area of interest: search, e-commerce, e-commerce search, NLP, ML, cloud technologies, distributed applications, reliability, high availability, docker, Kubernetes, DevOps/SRE, python, ruby, nodejs.

Currently: webshop.nl, shops.ae Ex: lohika.com, marktplaats.nl, nationalevacaturebank.nl, belvilla.com, kasta.ua

I live in the Netherlands, originally from Ukraine. Software Engineer since 1998. M.S. in Informatics and Physics.

awesome-search's People

Contributors

adamdubnytskyy avatar codebrain avatar epugh avatar frutik avatar gitcommitshow avatar hemantcs avatar hurutoriya avatar jasonbosco avatar kacperlukawski avatar mincong-h avatar nikhilgarg28 avatar ojasaar avatar otisg avatar otrosien avatar radu-gheorghe avatar stjepanjurekovic avatar yaro-m avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-search's Issues

bert sandbox

sandbox Jan 2020

spacy

The biggest problem with search is all the gobbledygook – can word2vec or fastText help? https://www.linkedin.com/pulse/biggest-problem-search-all-gobbledygook-can-word2vec-help-gr%C3%B6nroos/

Search In Practice — Approximate Nearest Neighbors Using Quantizations https://medium.com/analytics-vidhya/search-in-practice-approximate-nearest-neighbors-using-quantizations-baaf00ff8e02

Short Text Topic Modeling - https://towardsdatascience.com/short-text-topic-modeling-70e50a57c883

Finding Similar Quora Questions with Word2Vec and Xgboost - https://towardsdatascience.com/finding-similar-quora-questions-with-word2vec-and-xgboost-1a19ad272c0d

Vespa sandbox

OpenSource Connections Sandbox

sandbox Dec 2020

why the smartSuggest module might matter to you

Guide the user during the search formulation process to facilitate accurate data entry, encourage exploratory search and boost product discovery.

https://blog.searchhub.io/why-weve-developed-the-searchhub-smartsuggest-module-and-why-it-might-matter-to-you

Query Segmentation and Spelling Correction

https://towardsdatascience.com/query-segmentation-and-spelling-correction-483173008981

ELMo Embedding — The Entire Intent of a Query

https://medium.com/analytics-vidhya/elmo-embedding-the-entire-intent-of-a-query-530b268c4cd

Search to Search recommendations (Collaborative Synonym and Spell corrections)

https://haystackconf.com/europe2019/search-to-search-recommendations/

What is Search in the Omnichannel?

https://opensourceconnections.com/blog/2020/12/18/what-is-search-in-the-omnichannel/

Using approximate nearest neighbor search in real world applications

https://blog.vespa.ai/using-approximate-nearest-neighbor-search-in-real-world-applications/

GPT-3: Demos, Use-cases, Implications

https://towardsdatascience.com/gpt-3-demos-use-cases-implications-77f86e540dc1

Roles in a Data Team

Different roles in a data team and their responsibilities

https://towardsdatascience.com/roles-in-a-data-team-d97a87fdabaa

Search Product Management: The Most Misunderstood Role in Search?

https://jamesrubinstein.medium.com/search-product-management-the-most-misunderstood-role-in-search-2b7569058638

Improving search relevance with data-driven query optimization

https://www.elastic.co/blog/improving-search-relevance-with-data-driven-query-optimization

Using Behavioral Data to Improve Search

https://tech.ebayinc.com/engineering/using-behavioral-data-to-improve-search/

Search Product Manager: Software PM vs. Enterprise PM or What does that * PM do?

https://www2.slideshare.net/jt_kane/search-product-manager-software-pm-vs-enterprise-pm-or-what-does-that-pm-do

Analyzing online search relevance metrics with the Elastic Stack

https://www.elastic.co/blog/analyzing-online-search-relevance-metrics-with-the-elastic-stack

Introducing txtai, an AI-powered search engine built on Transformers

Add Natural Language Understanding to any application

https://towardsdatascience.com/introducing-txtai-an-ai-powered-search-engine-built-on-transformers-37674be252ec

Best Practices for Enterprise Search User Experience (UX)

https://www.searchblox.com/best-practices-for-enterprise-search-user-experience-ux/

The Annual Search Shootout – Testing strategy on 2019’s topics

https://opensourceconnections.com/blog/2020/11/25/the-annual-search-shootout-testing-strategy-on-2019s-topics/

Three Pillars of Search Relevancy. Part 1: Findability

https://blog.searchhub.io/three-pillars-of-search-quality-in-ecommerce-part-1-findability

Use Site Search to Optimize Your Customer Journey

Largely, it remains, the neglected stepchild of e-commerce optimization. Site-search optimization has the potential to catapult your customer journey strategy to a new level.

https://blog.searchhub.io/why-use-site-search-analytics-to-optimize-your-customer-journey

Weighted Quality Score for Ads, Feed, and Search

This is a practical guide for engineers and product managers about how to combine multiple definitions of item quality to form a “pretty good” overall score of quality using a simple linear model. This isn’t the best or optimal way to optimize user experience, but it’s easy to implement, understand, extend, is generally applicable to virtually any product, and is time-tested in industry.

https://medium.com/promoted/weighted-quality-score-for-ads-feed-and-search-2fa70ec4f51f

Unsupervised Attribute Extraction for Online Listings

I will talk about my project on developing an unsupervised approach to extract attributes from online listings, done in collaboration with OLX Group, part of Prosus. The OLX Group operates a network of online trading platforms in over 40 countries, building market leading classifieds marketplaces that empower millions of people to buy, sell, and create prosperity in local communities.

https://medium.com/prosus-ai-tech-blog/unsupervised-attribute-extraction-for-online-listings-41baa5d2270e

NLP: All the Features. Every Feature That Can Be Extracted From the Text

I will be sharing all the possible NLP features that you can extract from unstructured texts for using in downstream tasks. I also list the python libraries I prefer to use for computing these features.

https://medium.com/swlh/nlp-all-them-features-every-feature-that-can-be-extracted-from-text-7032c0c87dee

Search (Pt 2) — A Semantic Horse Race

Cutting edge NLP vs traditional search

https://towardsdatascience.com/search-pt-2-semantic-horse-race-5128cae7ce8d

Billion-scale semantic similarity search with FAISS+SBERT

Building the prototype for an intelligent search engine

https://towardsdatascience.com/billion-scale-semantic-similarity-search-with-faiss-sbert-c845614962e2

Visualizing 100,000 Amazon Products

Fast sentence embeddings (fse) enables you to compute sentence embeddings for millions of reviews in only a few minutes.

https://towardsdatascience.com/vis-amz-83dea6fcb059

Query Understanding: An efficient way how to deal with long tail queries

Our data shows that when people search for a certain product, most of them use roughly 1.5 words. These short queries unfortunately make it hard for full-text search to offer them relevant results. While there is improvement to be found in using filters, there are often so many that it can be confusing. One of the ways to make searching more effective is to use the ‘learning to rank’ approach, which creates an optimal ranking of results. However, even this machine-learning method is not all-mighty – and that’s why we’ve come up with Query Understanding, a great companion to ‘learning to rank’.

https://www.luigisbox.com/blog/query-understanding/

Search Optimization 101 – How do you fix a broken search?

https://blog.supahands.com/2020/08/04/search-optimization-101-how-do-you-fix-a-broken-search/

Testing Search for Relevancy and Precision

Despite the fact that site search often receives the most traffic, it’s also the place where the user experience designer bears the least influence. Few tools exist to appraise the quality of the search experience, much less strategize ways to improve it. When it comes to site search, user experience designers are often sidelined like the single person at an old flame’s wedding: Everything seems to be moving along without you, and if you slipped out halfway through, chances are no one would notice. But relevancy testing and precision testing offer hope.

https://alistapart.com/article/testing-search-for-relevancy-and-precision/

philosophe.*

Testing Search

https://www.philosophe.com/archived_content/search_topics/search_tests.html

Assumptions About User Search Behavior

https://www.philosophe.com/archived_content/search_topics/user_behavior.html

How not to use BERT for Document Ranking

BERT (Bidirectional Encoder Representations from Transformers) turned 2 years a few days ago, and since its introduction it has been a revolution for Search and Information Retrieval. It has drastically improved the accuracy on many different information seeking tasks, be it answering questions or ranking documents, far beyond what was thought possible just a few years ago. In this blog post I’ll give an quick overview of how to evaluate search ranking models using well established relevancy datasets and how to achieve terrible ranking results using BERT in a way it was not meant to be used with a few good pointers on how to successfully apply BERT for ranking.

https://bergum.medium.com/how-not-to-use-bert-for-search-ranking-4586716428d9

“Avacado” or Avocado?

A simple search query correction heuristic for the resource-constrained

https://tech.instacart.com/avacado-or-avocado-4b4b78dc0698

10-step checklist to build a great search

https://medium.com/videdressing-engineering/10-step-checklist-for-building-a-great-search-1c8373a97a87

Fess Search Engine

I just learned about https://github.com/codelibs/fess Fess, and got to play with it this week. It's basically a clone of GSA on Elasticsearch, combinging the front end (though it's optional) with a backend crawler.

It seems actively maintained... I would have opened a PR, but not quite sure where to put it...

sandbox April 2021

Datasets: Adding a section for open-source datasets?

Appreciate the collection of awesome search resources! There are a lot of different publicly available data sets in regards to search & relevance. For example, Home Depo Search Relevance at Kaggle (https://www.kaggle.com/c/home-depot-product-search-relevance) or WANDS (https://github.com/wayfair/WANDS), among many others.

In this repo, I find no section with datasets, which could be every valuable one is interested in pre-training or playing around with training their own ML-models. Is this something that could be added perhaps?

sandbox May 2021

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.