GithubHelp home page GithubHelp logo

hdrhbbu1 / marvel-search Goto Github PK

View Code? Open in Web Editor NEW

This project forked from algolia/marvel-search

0.0 1.0 0.0 16.68 MB

Searchable list of all Marvel superheroes and supervillains

Home Page: https://algolia.github.io/marvel-search/

JavaScript 90.54% HTML 3.23% CSS 5.62% CoffeeScript 0.11% Shell 0.50%

marvel-search's Introduction

Marvel

This repository holds the list of all the Marvel superheroes and supervillains, in JSON format. It also contains the set of scripts used to get them (by scrapping various APIs and websites), as well as UI to allow searching through them.

Slides of the demo I presented in a few meetups are available here and here

Run the demo

Screencast

Just run npm install and npm run serve. This will open the local demo on http://localhost:5006.

Regenerate the data

Everything is done through npm run scripts. Start with an npm install, and then run an npm run init. This will download the list of all the Marvel characters available on the Wikipedia and save them in ./download/urls.

npm run wikidata will query the Wikidata API to get metadata of the various Wikipedia pages. Wikidata data essentially include metadata about the Wikipedia page itself, and not about the content of it. This is not the best source of data, we actually do not use it much. All the Wikidata information is saved in ./download/wikidata.

npm run dbpedia will get information about our pages from the DBPedia project. The DBPedia is an unofficial API for the Wikipedia. It contains much more data than the Wikidata, including the actual content of the right infobox as well as all the page redirects. All DBPedia data is saved in ./download/dbpedia.

npm run infobox will get data from the infoboxes (box on the right of each Wikipedia page). This should get the same data than npm run dbpedia but this one will be more fresh (the DBPedia is only a dump at a specific date). The downside is that the data is harder to parse and results are sometimes not that great. We take data from both sources and merge them, to be sure we have the best of both worlds.

npm run images will crawl all the Wikipedia pages to get the url of the character image. This data is not available in the DBPedia dump. The complete list of images is stored in ./download/images/images.json.

npm run pageviews will crawl the http://stats.grok.se API to get visits stats on the last 90 days of all the urls in the original list. The website is quite slow, so this command can take a long time to execute. Also, it does not currently handles multiples urls redirecting to the same place, so only the most popular will be taken into account. Pageviews data is stored in ./download/pageviews.

npm run marvel will try to find on the official Marvel API all the characters we extracted from the Wikipedia. The Marvel API can be unreliable (ie. down or slow) sometimes, so it includes its own "try again until it works" mechanism. The API requires a set of API keys to be used. Those can be passed either as environment variables (MARVEL_API_KEY and MARVEL_API_KEY_PRIVATE) or as files in the root directory (_marvel_api_key and _marvel_api_key_private). The Marvel API gives us access to a nice description as well as a nice picture (we only use the one from the Wikipedia as fallback). This data is stored in ./download/marvel.

You can also run npm run download:all to download from all the sources in sequence.

npm run consolidate actually grabs data from all the previous downloaded json files and build a curated list of records saved in ./records. Then npm run push will push all those records to the Algolia index (and configure it properly).

Tests

Getting data from various sources and cleaning them is an error-prone process. You can easily break something when fixing something else. That's why this project has so many tests. You can run them all with npm run test and start a TDD-ready live watching with npm run test:watch.

npm run lint also take care of the styleguide.

Front-end

The front-end code uses Brunch. It has everything a front-end build tool should have, including live reload and SCSS/Babel compiling. Just run npm run serve to start the server on http://localhost:5006/. You can also manually run npm run build to populate the ./public directory with all the built website.

Following Brunch conventions:

  • ./app/*.js files will be compiled through Babel
  • ./app/styles/*.scss files will be compiled to CSS
  • ./app/assets/* files will be copied without modifications to ./public

marvel-search's People

Contributors

lukyvj avatar pixelastic avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.