GithubHelp home page GithubHelp logo

devcenter-square / sum-files-challenge Goto Github PK

View Code? Open in Web Editor NEW
15.0 13.0 5.0 157.95 MB

This repository contains 1,000 files, each containing 100,000 Integers ... your task is to sum all integers in all files

License: MIT License

Shell 7.33% JavaScript 92.67%
algorithms files summation

sum-files-challenge's Introduction

Sum Files Challenge

This repository contains 1,000 files, each containing 100,000 space-separated integers ... your task is write a program to sum all integers in all files.

Each integer is guaranteed to be between 0 and 1000.

Your submission if accepted, will be added to the rankings section below:

How to Submit

  • You can host your program code anywhere public.
  • Create an issue on this repo, and give a link to your entry.
  • If you need to, give a command line script for how to run your program.

How we'll judge

  • Execution Time
  • Memory Usage
  • Big O Complexity

Rankings

< No Rankings Yet >

sum-files-challenge's People

Contributors

mykeels avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sum-files-challenge's Issues

Submission: C#

dotnet core 2.1.302

  • Time : ~ 15s

Hardware :

  Processor Name: Intel Core i5
  Processor Speed: 1.7 GHz
  Number of Processors: 1
  Total Number of Cores: 2
  L2 Cache (per Core):  512KB
  L3 Cache: 3 MB
  Memory: 6 GB

Program

using System;
using System.IO;

namespace SumNumbers
{
    class Program
    {
        static void Main(string[] args)
        {
            SumIntegers();

        }

        public static void SumIntegers()
        {
            var startTime = DateTime.Now;

            //replace with directory path
            var dirPath = Directory.GetCurrentDirectory() + @"\files";

            var folders = Directory.GetDirectories(dirPath);
            long total = 0;

            for (int i = 0; i < folders.Length; i++)
            {
                var folder = folders[i];
                var files = Directory.GetFiles(folder);

                for (int j = 0; j < files.Length; j++)
                {
                    var file = files[j];
                    var fileContent = File.ReadAllLines(file);

                    for (int k = 0; k < fileContent.Length; k++)
                    {
                        var numbers = fileContent[k].Split(',');

                        for (int l = 0; l < numbers.Length; l++)
                        {
                            total += Convert.ToInt64(numbers[l]);
                        }
                    }

                }

            }

            Console.WriteLine("total : " + total);
            Console.WriteLine("Milliseconds : " + (DateTime.Now - startTime).TotalMilliseconds);
        }


    }
}

Output

total : 49947871404
Milliseconds : 15210.5908

Submission: JavaScript (Single-Process)

  • Time ~5137ms
  • Hardware:
      Processor Name: Intel Core i5
      Processor Speed: 2.3 GHz
      Number of Processors: 1
      Total Number of Cores: 2
      L2 Cache (per Core): 256 KB
      L3 Cache: 4 MB
      Memory: 16 GB
  • Program:
const fs = require('fs')
const path = require('path')
const readline = require('readline')
const basePath = '<path-to-files-directory>'

const getFiles = () => {
    return new Promise((resolve, reject) => {
        fs.readdir(basePath, (err, folders) => {
            if (err) {
                reject(err)
            }
            else {
                resolve(
                    Promise.all(folders.map(dir => {
                        const dirPath = path.join(basePath, dir)
                        return new Promise((resolve, reject) => {
                            fs.readdir(dirPath, (err, files) => {
                                if (err) {
                                    reject(err)
                                }
                                else {
                                    resolve(files.map(file => path.join(dirPath, file)))
                                }
                            })
                        })
                    })).then(files => files.reduce((a, b) => a.concat(b)))
                )
            }
        })
    })
}

console.time('total')
let total = 0
getFiles()
    .then(function* (files) {
        for (let file of files) {
            yield new Promise((resolve, reject) => {
                const lineReader = readline.createInterface({
                    input: fs.createReadStream(file)
                })
                
                let fileSum = 0
    
                lineReader.on('line', line => {
                    let lineSum = 0
                    let current = 0
                    for(let i = 0; i < line.length; i++) {
                        let char = line[i]
                        if (char != ',' ) {
                            if (current === 0) {
                                current = +char
                            }
                            else {
                                current = (current * 10) + (+char)
                            }
                        }
                        if ((char === ',') || (i === (line.length - 1))) {
                            lineSum += current
                            current = 0
                        }
                    }
                    fileSum += lineSum
                })
    
                lineReader.on('close', () => {
                    total += fileSum
                    resolve(fileSum)
                })
            })
        }
    })
    .then(sums => {
        return Promise.all(sums)
    })
    .then(() => {
        console.log(total)
        console.timeEnd('total')
    })
    .catch(err => {
        console.error(err)
    })

Submission: Javascript

  • Time ~1968ms
  • Hardware:
Processor Name: Intel Core i7
Processor Speed: 2.5 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 6 MB
Memory: 16 GB
  • Program:
const fs = require('fs')
const path = require('path')
const pad = require('./utils/pad')
const Promise = require('bluebird')

const dir = path.join(__dirname, '../files')

function readFile(dir) {
  return new Promise((resolve, reject) => {
    fs.readFile(dir, (err, data) => {
      if (err) reject(err)
      resolve(data)
    })
  })
}

function addFolder(dir, position) {
  let summation = 0
  return new Promise(async (resolve, reject) => {
    try {
      for (let i = 0; i < 10; i += 1) {
        let buffer = await readFile(dir + `/${pad(position + i, 6)}.csv`)
        let data = buffer.toString()
        let stack = ''

        for (let i = 0; i < data.length; i++) {
          const element = data[i]

          if (!isNaN(element)) {
            stack += element
          } else if (element === '\n' || element === ' ' || element === ',') {
            summation += Number.parseInt(stack, 10)
            stack = ''
          }
        }
      }
      resolve(summation)
    } catch (error) {
      reject(error)
    }
  })
}

console.time('run')
function getPromises() {
  return new Promise((resolve, reject) => {
    try {
      const promises = []
      for (let i = 0; i < 100; i += 10) {
        promises.push(
          addFolder(dir + `/${pad(i + 1, 6)}-${pad(i + 10, 6)}`, i + 1)
        )
      }
      resolve(promises)
    } catch (error) {
      reject(error)
    }
  })
}

getPromises().then((promises) => {
  Promise.all(promises)
    .then((all) => {
      Promise.reduce(
        all,
        (accumulator, current) => {
          return accumulator + current
        },
        0
      ).then((num) => {
        console.log(`Sum: ${num}`)
        console.timeEnd('run')
      })
    })
    .catch((errors) => {
      throw Error(errors)
    })
})

Submission: JavaScript (Multi-Process)

  • Time ~2991.455ms
  • Hardware:
      Processor Name: Intel Core i5
      Processor Speed: 2.3 GHz
      Number of Processors: 1
      Total Number of Cores: 2
      L2 Cache (per Core): 256 KB
      L3 Cache: 4 MB
      Memory: 16 GB
const fs = require('fs')
const path = require('path')
const readline = require('readline')
const basePath = '<path-to-files-directory>'
const workerFarm = require('worker-farm')
let worker = workerFarm(require.resolve('./worker.js'))

const getFiles = function* () {
    for (let dir of fs.readdirSync(basePath)) {
        const dirPath = path.join(basePath, dir)
        yield (() => new Promise((resolve, reject) => {
            fs.readdir(dirPath, function (err, files) {
                if (err) {
                    reject (err)
                }
                else {
                    resolve (files.map(file => path.join(dirPath, file)))
                }
            })
        }))()
    }
}

let total = 0
let count = 0

console.time('total')
for (let promise of getFiles()) {
    promise
        .then(files => {
            files.map(filename => {
                worker(filename, (err, sum) => {
                    if (err) {
                        throw err
                    }
                    else {
                        total += sum
                        count++;
                        if (count === 1000) {
                            console.log(total)
                            console.timeEnd('total')
                            process.exit(0)
                        }
                    }
                })
            })
        })
        .catch(err => console.error(err))
}
  • Worker Script
const fs = require('fs')
const readline = require('readline')

module.exports = function (filename, cb) {
    try {
        const lineReader = readline.createInterface({
            input: fs.createReadStream(filename)
        })
        
        let fileSum = 0

        lineReader.on('line', line => {
            let lineSum = 0
            let current = 0
            for(let i = 0; i < line.length; i++) {
                let char = line[i]
                if (char != ',' ) {
                    if (current === 0) {
                        current = +char
                    }
                    else {
                        current = (current * 10) + (+char)
                    }
                }
                if ((char === ',') || (i === (line.length - 1))) {
                    lineSum += current
                    current = 0
                }
            }
            fileSum += lineSum
        })

        lineReader.on('close', () => {
            cb(null, fileSum)
        })
    }
    catch (err) {
        cb(err)
    }
}

Submission: Java 8

  • Time ~ 1.4 seconds
  • Hardware

Processor Name: Intel Core i7
Processor Speed: 2.9GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256KB
L3 Cache: 8MB
Memory: 16GB

  • External Dependencies

    • JMH (Java Microbenchmark Harness) - used for benchmarking
  • Program

    • SumNumbersProgram.java - can be run standalone, replace pathName
    • BenchmarkSumNumbersProgram.java - runs the SumNumbersProgram using JMH.
  • Results from JMH

Result "com.topriddy.devslack.sumnumberschallenge.BenchmarkSumNumbersProgram.init":
  1.403 ±(99.9%) 0.043 s/op [Average]
  (min, avg, max) = (1.334, 1.403, 1.564), stdev = 0.050
  CI (99.9%): [1.360, 1.446] (assumes normal distribution)


# Run complete. Total time: 00:02:03

Benchmark                        Mode  Cnt  Score   Error  Units
BenchmarkSumNumbersProgram.init  avgt   20  1.403 ± 0.043   s/op
package com.topriddy.devslack.sumnumberschallenge;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;

import static java.util.Arrays.asList;
import static java.util.function.Function.identity;

public class SumNumbersProgram {
    private final static String pathName = "/Users/topriddy/dev/sum-files-challenge/files";

    public void sumFiles(String dataPath) throws Exception {
        System.out.println("Starting...");
        Long startTime = System.currentTimeMillis();

        List<Path> files = Files.walk(Paths.get(dataPath))
                .filter(p -> !Files.isDirectory(p))
                .collect(Collectors.toList());

        Long sum = files.parallelStream()
                .map(path -> {
                    try {
                        return Files.lines(path);
                    } catch (IOException ioex) {
                        ioex.printStackTrace();
                        return Stream.empty();
                    }
                })
                .map(lines -> lines.map(line -> asList(((String) line).split(",")).parallelStream())
                        .flatMap(identity())
                        .map(value -> Long.valueOf(value))
                )
                .flatMap(identity())
                .mapToLong(v -> v).sum();

        Long endTime = System.currentTimeMillis();

        System.out.println("Sum of numbers in file is : " + sum);
        System.out.printf("\nDuration: %f seconds", (endTime - startTime) / 1000.0);
        System.out.println("\nEnd");
    }

    public static void main(String args[]) throws Exception {
        SumNumbersProgram program = new SumNumbersProgram();
        program.sumFiles(pathName);
        // run second time to gain JVM warm up benefit
        program.sumFiles(pathName);
    }
}
package com.topriddy.devslack.sumnumberschallenge;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Mode;

public class BenchmarkSumNumbersProgram {
    private final static String pathName = "/Users/topriddy/dev/sum-files-challenge/files";

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1, warmups = 1)
    public void init() throws Exception {
        SumNumbersProgram sumNumbersProgram = new SumNumbersProgram();
        sumNumbersProgram.sumFiles(pathName);
    }

    public static void main(String args[]) throws Exception {
        org.openjdk.jmh.Main.main(args);
    }
}

Submission: Python 3

  • Hardware
Processor Name: Intel® Core™ i5-2540M CPU @ 2.60GHz × 4 
Memory: 8 GB
  • Time ~= 7s

  • Memory ~= 82000 kilobytes

  • Program

class FileNumberSummer:
	def __init__(self, rootDir, start, end):
		self.rootDir = rootDir
		self.start = start
		self.end = end

	def generateFolderName(self,value):
		if value % 10 == 0:
			value = value - 1
		highest = value + (10 - (value % 10))
		lowest = highest - 9
		return "{0:0=6}".format(lowest) + "-" + "{0:0=6}".format(highest)

	def generateFileName(self,value):
		return "{0:0=6}".format(value) + ".csv"

	def sumNumbersInFile(self,fileDir):
		fileContent = open(fileDir, "r")
		sums = 0
		try:
			line = fileContent.readline()
			while line:
				sums += sum(map(int,line.split(',')))
				line = fileContent.readline()
		finally:
			fileContent.close()
		return sums

	def getSum(self):
		sums = 0
		for i in range(self.start, self.end + 1):
			sums += self.sumNumbersInFile(self.rootDir + "/" + self.generateFolderName(i) + "/" + self.generateFileName(i))
		return sums
import os
from FileNumberSummer import FileNumberSummer
# 1, 1000 because the files are named from 000001.csv to 001000.csv
FNS = FileNumberSummer(os.getcwd() + "/files", 1, 1000)
print(FNS.getSum())

solution hosted on: https://github.com/the-fanan/sum-files-challenge

  • Command
    pypy python-solution/main.py
  • Output
    49947871404

Submission: C++ solution

  • Hardware
    Processor Name: Intel® Core™ i5-2540M CPU @ 2.60GHz × 4
    Memory: 8 GB

  • Time ~= 2.5s

  • Program

#include <stdio.h>
#include <iostream>
#include <fstream>
#include <thread>
#include <vector>

using namespace std;

string rd = "<ABSOLUTE-PATH-TO-FILES-FOLDER>";
vector<thread> threads;
long sum = 0;

string generateFolderName(int a)
{
	char fn[24];
	if (a % 10 == 0) {
		a -= 1;
	}
	int h = a + (10 - (a % 10));
	int l = h - 9;
	sprintf(fn, "%06d-%06d", l, h);
	return fn;
}

string generateFileName(int a)
{
	char fn[12];
	sprintf(fn, "%06d.csv", a);
	return fn;
}

int sumNumbersInFile(string fd)
{
	ifstream f;
	f.open(fd);
	long sum = 0;
	int num = 0;
	char c;
	while (!f.eof() ) {
		f.get(c);
		if (c != ',' && c != '\n') {
			int ic = c - '0';
			if (num == 0) {
				num = ic;
			} else {
				num = (num * 10) + ic;
			}
		} 
		if (c == ',' || c == '\n') {
			sum += num;
			num = 0;
		}
	}
	f.close();
	//last number
	//division is done because the last digit is repeated
	sum += num / 10;
	return sum;
}

int sumNumbers(int i)
{
	string fd;
	fd = rd;
	fd.append(generateFolderName(i)).append("/").append(generateFileName(i));
	sum += sumNumbersInFile(fd);
}

int main() 
{
	for (int i = 1; i <= 1000; i++) {
		threads.emplace_back(sumNumbers,i);
	}
	
	for (thread & t : threads) {
                t.join();
	}

	cout << sum << '\n';
	return 1;
}

solution hosted on https://github.com/the-fanan/sum-files-challenge/cpp-solution

  • Command
    From the root directory of the solution run the following commands
  1. g++ -std=c++11 -pthread main.cpp -o main
  2. ./main
  • Output =49947871404

Submission : Golang

  • Time : ~940ms
  • Hardware :
Processor Name : Intel Core i7
Processor Speed : 2.8 GHz
Memory : 16 GB
  • Program :
package main

import (
	"fmt"
	"os"
	"bufio"
	"strconv"
	"time"
)

func readFile(filename string, separator func(data []byte, atEOF bool) (advance int, token []byte, err error), integerChan chan int, signalChan chan int){
	file, err := os.Open(filename)
	defer file.Close()

	if err != nil {
		fmt.Print(err)
	}

	scanner := bufio.NewScanner(file)

	const capacity = 100000*4
	buf := make([]byte, capacity)
	scanner.Buffer(buf, capacity)
	
	scanner.Split(separator)
	var sum int
	for scanner.Scan() {
		
		value, err := strconv.Atoi(scanner.Text())	
		if err == nil {						
			sum += value
		} else {			
		}	
	}	
	integerChan <- sum	
	signalChan <- 1
}

func main(){	
	start := time.Now()	
	separatorFunc := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
		for i := 0; i < len(data); i++ {
			if data[i] == ',' || data[i] == '\r' || data[i] == '\n' {
				return i + 1, data[:i], nil
			}
		}		
		return 0, data, bufio.ErrFinalToken
	}

	integerChan, signalChan := make(chan int, 200001), make(chan int, 200001)
	for dirIndex := 1; dirIndex <= 991; dirIndex += 10 {
		var basePath string
		if dirIndex == 91 {
			basePath = fmt.Sprintf("files/0000%d-000%d/", dirIndex, dirIndex + 9)
		} else if dirIndex == 991 {
			basePath = fmt.Sprintf("files/000%d-00%d/", dirIndex, dirIndex + 9)
		} else if dirIndex > 91 {
			basePath = fmt.Sprintf("files/000%d-000%d/", dirIndex, dirIndex + 9)
		} else if dirIndex > 1 {
			basePath = fmt.Sprintf("files/0000%d-0000%d/", dirIndex, dirIndex + 9)
		} else {
			basePath = fmt.Sprintf("files/00000%d-0000%d/", dirIndex, dirIndex + 9)
		}

		for index := dirIndex; index <= dirIndex + 9; index++ {
		
			var path string
			if index >= 1000{
				path = fmt.Sprintf("%s00%d.csv", basePath, index)
			} else if index >= 100 {
				path = fmt.Sprintf("%s000%d.csv", basePath, index)
			} else if index >= 10 {
				path = fmt.Sprintf("%s0000%d.csv", basePath, index)
			} else {
				path = fmt.Sprintf("%s00000%d.csv", basePath, index)
			}			
			go readFile(path, separatorFunc, integerChan, signalChan)
		}
		
	}	

	var sum, signal int 
	for ; signal < 1000 ; signal += <- signalChan {
		sum += <- integerChan
	}	

	finished := time.Now()
	elapsed := finished.Sub(start)
	
	fmt.Printf("The sum is %d gotten after %s", sum, fmt.Sprint(elapsed))

	
}

Submission: .Net Core 3

using System;
using System.Collections.Generic;
using System.IO;
using System.Numerics;
using System.Threading;
using System.Threading.Tasks;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

namespace NetCoreSumFiles
{
    class Program
    {
        static async Task Main(string[] args)
        {
            var summary = BenchmarkRunner.Run<FileSummerBenchmarker>();
        }
    }

    [MemoryDiagnoser]
    public class FileSummerBenchmarker
    {
        private readonly FileSummer fileSummer = new FileSummer();
        [Benchmark]
        public async Task FileSum()
        {
            await fileSummer.StartProcess();
        }
    }



    public class FileSummer
    {
        private long TotalSum;
        private static readonly string RootDir = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
        public async Task StartProcess()
        {
            
            var tasks = new Task[1000];
            for (int i = 1; i < 1000; i+=10)
            {
                var currentRoot = $"{i.ToString().PadLeft(6,'0')}-{(i+9).ToString().PadLeft(6,'0')}";
                for (int j = i; j <= i+9; j++)
                {
                    tasks[j-1] = GenerateTask(Path.Combine(RootDir,"files",currentRoot,$"{j.ToString().PadLeft(6,'0')}.csv"));
                }
            }

            await Task.WhenAll(tasks);
        }

        private Task GenerateTask(string filename)=>Task.Run(() => ComputeFile(filename));
            
        

        private void ComputeFile(string fileName)
        {
            long totalSum = 0;
            using (var stream = new StreamReader(File.OpenRead(fileName)))
            {
                var numbers = ReadNumbers(stream);
                foreach (var number in numbers)
                {
                    totalSum += number;
                }
            }
            Interlocked.Add(ref TotalSum, totalSum);
        }
        public IEnumerable<int> ReadNumbers (TextReader reader)
        {
            var lastVal = -1;
            while (reader.Peek() >= 0)
            {
                char c = (char)reader.Read ();

                if (!char.IsNumber(c))
                {
                    yield return lastVal;
                    lastVal = -1;
                    continue;
                }

                if (lastVal < 0) lastVal = (int) char.GetNumericValue(c);
                else lastVal = Concatenate(lastVal, (int) char.GetNumericValue(c));
            }
        }
        
        
        private static int Concatenate(int x,int y)
        {
            var pow = 10;
            while(y>=pow) pow*=10;
            return x*pow+y;
        }
    }
}
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
FileSum 651.6 ms 12.91 ms 17.67 ms 1000.0000 - - 432.92 KB

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.