GithubHelp home page GithubHelp logo

address-index-data's Introduction

address-index-data

Build Status Codacy Badge

Purpose

This repository contains the Scala code for an Apache Spark job to create an Elasticsearch index from the AddressBase premium product.

For testing purposes there is a free AddressBase sample available from Ordnance Survey.

Software and Versions

Development Setup (IntelliJ)

  • File, New, Project From Version Control, GitHub
  • Git Repository URL - select https://github.com/ONSdigital/address-index-data
  • Parent Directory: any local drive, typically your IdeaProjects directory
  • Directory Name: address-index-data
  • Clone

Running

To package the project in a runnable fat-jar: From the root of the project

sbt clean assembly

The resulting jar will be located in batch/target/scala-2.10/ons-ai-batch-assembly-version.jar

To run the jar:

java -Dconfig.file=application.conf -jar batch/target/scala-2.10/ons-ai-batch-assembly-version.jar

The target Elasticsearch index can be on a local ES deployment or an external server (configurable) The application.conf file may contain:

addressindex.elasticsearch.nodes="just-the-hostname.com"
addressindex.elasticsearch.pass="password"

These will override the default configuration. The location and names of the input files can also be overridden. Note that these input files are extracted from AddressBase and subject to some pre-processing.

The job can also be run from inside IntelliJ. In this case you can run the Main class directly but need to remove lines 40-83 and replace them with:

val indexName = generateIndexName(false, true)
val url = s"http://$nodes:$port/$indexName"
postMapping(indexName, true)
saveHybridAddresses(false, true)

where the first boolean is for a historic index and second for a skinny index

Running Tests

Before you can run tests on this project if using Windows you must

Then next time you right-click the green arrow "Run ScalaTests" should be shown.

Note that you can't run the tests using sbt on the command line.

Dependencies

Top level project dependencies may be found in the build.sbt file.

address-index-data's People

Contributors

alexflav23 avatar alwestvt avatar analytically avatar gaskyk avatar ivyons avatar jameshoskisson avatar mironor avatar paul-joel avatar richardsmithons avatar saniemi avatar steve-thorne-ons avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.