GithubHelp home page GithubHelp logo

dkharazi / site-search Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 18 KB

A project for creating the search backend used on site hosted by Github Pages. For now, the search engine is built using Typesense and hosted on a single free tier e2-instance on GCP.

License: MIT License

Shell 100.00%

site-search's Introduction

site-search

The purpose of this project is to create a free, light-weight search engine for any Gatsby website hosted by Github Pages. For now, the search documents are scraped using Python and stored in an Elasticsearch database that is hosted on a single, free tier e2-instance on GCP. ReactiveSearch is used for its pre-built search UI components, which allow users to interact and query notes and blog posts. In theory, any search database used on a small website could be able to run on the e2-instance on GCP. These search engines may include Typesense, Lucene, etc.

To create and host a search engine on GCP for a website hosted on Github Pages, we need to do the following steps:

  1. Create a free tier account on GCP
  2. Instantiate a micro e2-instance (with Ubuntu)
  3. Install Docker on the e2-instance
  4. Install and run Elasticsearch on the e2-instance
  5. Run and schedule Python code for scraping our website and ingesting blog posts and notes into Elasticsearch
  6. Implement ReactiveSearch components in our site's code for querying our site's posts and notes saved in Elasticsearch
  7. Configure a firewall on GCP (using UFW on Ubuntu)

Installing Docker on an Ubuntu E2-Instance

  1. Update all existing packages on the e2-instance
  2. Install pre-requisite packages for letting apt use packages over HTTPS
  3. Add the GPG key for the official Docker repository to the system
  4. Add the Docker repository to apt resources
  5. Ensure the installation comes from the official Docker repo, rather than the default Ubuntu repo
  6. Install Docker CE
$ ###
$ ### COMMANDS FOR INSTALLING DOCKER
$ ###
$ 
$ # 1. Update all existing packages on the e2-instance
$ sudo apt update
$ # 2. Install pre-requisite packages
$ sudo apt install apt-transport-https ca-certificates curl software-properties-common
$ # 3. Add the GPG key to the system
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ # 4. Add the Docker repository to `apt` resources
$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable"
$ # 5. Ensure the installation comes from the official Docker repo
$ apt-cache policy docker-ce
$ # 6. Install Docker CE
$ sudo apt install docker-ce
$
$ ###
$ ### OTHER USEFUL COMMANDS
$ ###
$
$ # Print if docker service is running
$ sudo systemctl status docker
$ # Stop docker service
$ sudo service docker stop
$ # Start docker service
$ sudo service docker start
$ # Print list of running containers
$ sudo docker ps

For any additional information about downloading Docker, read the detailed steps and overview found in this article. For more recent, up-to-date information about downloading Docker on Ubuntu machines, please refer to the official Docker installation docs.

Installing Elasticsearch on an Ubuntu E2-Instance

  1. Update all existing packages on the e2-instance
$ ###
$ ### COMMANDS FOR RUNNING TYPESENSE SERVICE
$ ###
$ 
$ # 1. Update all existing packages on the e2-instance
$ sudo apt update
$ # 2. Install Node.js and npm
$ sudo apt install nodejs npm
$ # 3. Verify the installation
$ sudo nodejs --version
$ # 4. Create directory for Typesense search service
$ mkdir /home/dkharazif/typesense-server-data ; cd /home/dkharazif/typesense-server-data
$ # 5. Run search service as Typesense container
$ sudo nohup docker run -i -p 8108:8108 -v/home/dkharazif/typesense-server-data/:/data typesense/typesense:0.15.0 --data-dir /data --api-key=xyz --listen-port 8108 --enable-cors > typesense-server-data.log &
$ # 6. Create shell script for purging logs
$ touch purge-logs.sh
$ # 7. Open crontab
$ crontab -e
$ # 8. Schedule shell script at 2AM every day
$ 0 2 * * * sh /home/dkharazif/typesense-server-data/purge-logs.sh

$ ###
$ ### OTHER USEFUL COMMANDS
$ ###
$
$ # Verify search engine service is running in docker container
$ sudo docker ps
$ # Verify which port docker container is running on
$ sudo netstat -nlp | grep 8108

For any additional information about downloading npm on an Ubuntu system, read the walkthrough outlined in this article. For additional steps about installing Gatsby-related packages and/or Typesense in an Ubuntu environment, please refer to this article.

site-search's People

Contributors

dkharazi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.