GithubHelp home page GithubHelp logo

davidmr001 / dig-elasticsearch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from usc-isi-i2/dig-elasticsearch

0.0 1.0 0.0 3.37 MB

Code to process datasets for elastic search

License: Apache License 2.0

Java 69.26% Shell 14.59% Python 16.15%

dig-elasticsearch's Introduction

dig-elasticsearch

Prerequisites:

Steps to load data into Elastic Search locally.

  1. Create/have ready the dataset to be uploaded to ES

  2. In the sense extension of chrome (or elastic search, if you have the Marvel plugin installed locally)

a. In the server field - http://localhost:9200/

b. Copy the contents of the file at the location - https://github.com/usc-isi-i2/dig-elasticsearch/blob/master/types/webpage/esMapping-dig-Ads.json

into the editor area of sense and send request.

c. This will create an index named 'dig' with a document type 'WebPage' on your local machine with all the settings and mappings as we have on the Elastic Search ISI server.

3 . Load data into your elastic search server

a. Change directory to <DIG-ES>/types/webpage/scripts

b. Type python loadDataElasticSearch.py -h. This will provide help for the script as below

usage: loadDataElasticSearch.py [-h] [-hostname HOSTNAME] [-port PORT]
                                [-mappingFilePath MAPPINGFILEPATH] 
                             filepath indexname doctype dataFileType

positional arguments:
   filepath            json file to be loaded in ElasticSearch
   indexname           desired name of the index in ElasticSearch
   doctype             type of the document to be indexed
   dataFileType        Specify '0' if every line in the data file is
                       different json object or '1' otherwise

optional arguments:
   -h, --help                       show this help message and exit
   -hostname HOSTNAME               Elastic Search Server hostname, defaults to 'localhost'
   -port PORT                       Elastic Search Server port,defaults to 9200
   -mappingFilePath MAPPINGFILEPATH mapping/setting file for the index

Execute:

python loadDataElasticSearch.py $FilePath/100kWebPages.json dig WebPage

Please note that $FilePath is the path where the json file is stored locally, downloaded in step 1. 'dig' and 'WebPage' is the name of index and document type as created in the step 2. hostname and port default to 'localhost' and 9200 as required in this case but can be specified with the optional parameters.

dig-elasticsearch's People

Contributors

dkapoor avatar philpot avatar rajagopal067 avatar saggu avatar szeke avatar ukby1234 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.