This is a simple program that sums up daily new covid cases for each country from Jan, 2020 to Sep, 2021 using MapReduce technique written in Scala.
- sbt 1.5
- java sdk >= 8
- sbt-assembly 1.1.0
- hadoop-core 1.2.1
- IntelliJ IDEA
- VMWare/VirtualBox
- Hortonworks Sandbox VM
-
Download the code:
> git clone https://github.com/gnzeleven/CovidCases-MapReduce.git
-
Launch sbt:
> cd CovidCases-MapReduce > ./sbt (or sbt.bat for Windows)
This downloads all the dependencies for the project.
-
In SBT:
> clean compile assembly
sbt assembly should have assembly.sbt inside the project folder(file should be in the same level as build.properties) and build.sbt in the root folder configured. Assembly will create a fat jar file in the location: target/scala-2.13/CovidCases-assembly-0.1.jar with all the required dependencies. assembly.sbt file should contain - addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "1.1.0")
-
In case, you are using Hortonworks Sandbox VM, copy the input file and the jar to the VM
> scp -P 2222 {path/to/local/file}/filename.txt [email protected]:/home/hdfs/{some folder} > scp -P 2222 {path/to/local/file}/ProjectName-assembly-0.1.jar [email protected]:/home/hdfs/{some folder}
-
Create a hdfs directory for copying the local file to hdfs format
> cd /home/hdfs/{some folder} > hdfs dfs -mkdir /{hadoop_folder}/
Note: {some folder} is where we copied the input file and jar file
You might want to switch to hdfs as the user(not root or any other user) by running the command su hdfs
-
Copy the file from local file format to hdfs format
> hdfs dfs -copyFromLocal {filename} {hadoop_folder/filename}
-
To view hadoop folder contents, run
> hdfs dfs -ls /{hadoop folder}/
-
To execute the jar file,
> hadoop jar ProjectName-assembly-0.1.jar {hadoop_folder/filename} {output_folder_path}
Note: The output folder path will be created during runtime.
-
To view the contents of the output folder,
> hdfs dfs -ls {output_folder_path}
-
Check the output using following command,
> hdfs dfs -text {output_folder_path}/part-r-00000
Distributed under the Apache License Version 2.0. See LICENSE.txt for more information.