GithubHelp home page GithubHelp logo

url-lookup's Introduction

URL-lookup

Coding assignment for interview. Implement a URL lookup that will block if malware is known to exist in that URL

Contents

System Requirements

  • golang insalled
  • mySQL server running locally that you have the credentials to access
  • I am using on a Windows dev env - this code should be compatible across platforms but the execution of the server file may be different depeding on your operating system

Project Design

image

For this API I used golang's built in library net/http to serve up a simple web service that implments a REST API. I implemented a very simple authorization for this service. The password and user must be passed to the server in the GET HTTP packet. I disabled all other HTTP request methods other than GET and limited the number of requests per second to reduce risk. If this were to be implemented on a production level there would need to be an api access token implemented. I used a MySQL database for the URL storage.

The API endpoints are /v1/urlinfo/

  • GET - Get info on URL whether it contains malware or not

    /v1/urlinfo/{url}

    How to query this API

    http://user:password@localhost:8080/v1/urlinfo/{url to query}

    Returned value is a JSON object that has information on whether malware is present or not (yes/no) against the requested URL. If the URL is not found in the database, the malware status is returned as unknown.

    {"URL":"abc","Malware":"unknown"}

System Setup

Clone the repo

git clone https://github.com/nhennigan/URL-lookup.git

Edit the credentials.env file to add your own MySQL server details and the user you want for the website. The db username and hostname are the default settings when setting up a MySQL server.

dbPassword=XXXXXXXX
dbUsername=root
hostname =127.0.0.1:3306

appUsername=XXXXXXXXXXX
appPassword=XXXXXXXXXXX

Build the executable

go build

This will create a server.exe file in your local repo. Run this to start the server

.\server.exe

If you open a new tab on your browser and enter the following:

http://admin:[email protected]:8080/v1/urlinfo/

This will show the output of the server. Admin and pass are the sample username and password I have set in the credentials file. You may be prompted to input the username and password for the web service by your browser. If this is successful, you can drop the admin:pass@ from the http request.

image

To input values to check the url database, add the url you want to search to the search bar. For example:

127.0.0.1:8080/v1/urlinfo/wxy.com

image

image

The service will check the database if that URL entry exists and if it has malware present.

Task Questions

What are some strategies you might use to update the service with new URLs? Updates may be as many as 5000 URLs a day with updates arriving every 10 minutes

For this system I only created a sample database using MySQL as that is the database that I have already configured on my development environment. This database is also very simple with only one table with two fields. MySQL is known to be slower in look ups than a NoSQL server, especially as the data grows in size and the type of data being stored is not uniform. As the databases scale, it would make sense to move to NoSQL.

I have implemented a simple ticker here that will read the entries.json file and add new URLs to the database or update the exsisting URL malware status if necessary. The thought process here would be that the entries.json file would be controlled by whoever is providing the info on whether a URL is safe or not for our DB. We could extend the server functionality to check for new json files to read in from a certain location. As the ticker is using a go routine, this would have to made safe with something like a sync lock.

The size of the URL list could grow infinitely, how might you scale this beyond the memory capacity of the system? Bonus if you implement this

Vertically scaling the location where the database is held will allow it to scale beyong my development environment constraints. In many cloud providers, this scaling opportunity is seemingly endless. Database sharding can also be used to split the datbase into smaller more manageable pieces. These shards can be places accross different servers thus resources can be shared across more than one server.

The number of requests may exceed the capacity of this system, how might you solve that? Bonus if you implement this

At the moment I have set the limit of requests per second to 15 here as a low threshold for testing on my development environment. This could be much higher on more powerful servers. This would be vertically scaling the system by adding more resoures. I also set db connection limits here. Again, scaling vertically with a more powerful server would allow you to drastically increase these limits.

You coud also horizontially scale the process in the form of containers/pods that can be duplicated. A service IP would provide the one endpoint to the proxy HTML server and the service will provide by default a round robin distribution of traffic accross however many instances of the url lookup you want. There would have to be one underlying database that all url lookup containers/pods can use - this would also have to be go routine safe. This could be done by extracting the ticker (i.e. the database updater) to a seperate service that is the only one with write access to the database. A simplified diagram of this setup using docker compose would look like:

image

And a simlar sentiment for a kubernetes cluster:

image

Testing

There are unit tests for all functions in this repo. To run the test enter:

go test

To see the more verbose output add -v .

Task Definition

Malware URL Lookup Exercise For this exercise, we would like to see how you solve a coding challenge with architecting for the future in mind.

One of our key values in how we develop new systems is to start with simple implementations and progressively make them more capable, scalable, and reliable. You are encouraged to get something that meets the base requirements working ASAP, and then iterate to improve on it. We ask that you use a Git-based repository (Bitbucket, GitHub, etc.) to commit your updates. It's up to you how frequently you commit and what you decide to include in each push, but we are particularly curious about your development workflow and how you handle revision control. Please also include some unit tests for your project, and detailed instructions on how to get the application up and running. Assume we know nothing about how it needs to be run. You can use any languages/technologies/platforms you like. Here's what we would like you to build:

Malware URL lookup service We have an HTTP proxy that is scanning traffic, looking for malware URLs. Before allowing HTTP connections to be made, this proxy asks a service that maintains several databases of malware URLs if the resource being requested is known to contain malware.

Write a small web service, preferably in Go or Python, that responds to GET requests where the caller passes in a URL and the service responds with some information about that URL. The GET requests would look like this: GET /v1/urlinfo/{resource_url_with_query_string}

The caller wants to know if it is safe to access that URL or not. As the implementer, you get to choose the authorization, response format and structure. Please document the API in the README. These lookups are blocking users from accessing the URL until the caller receives a response from your service.

Give some thought to the following. Write-up the design, if you do not have time to code. ● The size of the URL list could grow infinitely, how might you scale this beyond the memory capacity of the system? Bonus if you implement this. ● The number of requests may exceed the capacity of this system, how might you solve that? Bonus if you implement this.

● What are some strategies you might use to update the service with new URLs? Updates may be as many as 5000 URLs a day with updates arriving every 10 minutes

url-lookup's People

Contributors

nhennigan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.