GithubHelp home page GithubHelp logo

sdechallenge's Introduction

SDE Challenge Solution

by Steve Kotsopoulos

Design Solution

My solution to the design question is provided in design/WebsiteAnalytics.docx and design/WebsiteAnalytics.pdf - both documents have the same content.

Coding Solution

All other files are for the coding question.

In order to easily support new requirements with minimal coding, I wanted to base my data structure on the Collections API. I wanted my solution to be performant for inserts and access. We needed to preserve ordering, so we needed a solution that implements the List interface. I considered both LinkedList and ArrayList. Vector didn't make sense because it is synchronized, so it would be slower.

In order to support high-precision arithmetic, elements are stored in BigDecimal.

I created 2 implementations, the best implementation is based on ArrayList, which is O(1) on average for inserts and O(1) on average for access and backed by a dynamically resizing array. The other advantage of ArrayList is that it consumes less memory than a LinkedList implementation, since the array is laid out on a continuous block of memory and doesn't need auxiliary pointers as found in a LinkedList.

The other implementation was based on LinkedList, which is O(1) on inserts and O(n) on access and backed by a doubly-linked list. It is provided for illustration and comparison purposes.

Assumptions

In a given business context, it is normal to have a commonly-used form of moving average. For example, when analyzing stock prices you might be interested in the 5-day moving average. Thus, I decided to make the sampleSize a constructor parameter rather than a parameter on the getMovingAverage() function.

Navigating the Source Code

  1. Import the project into your favourite IDE via 'import existing maven project'
  2. Generate javadoc by running mvn javadoc:javadoc
  3. View the generated javadoc under target/site/apidocs/index.html
  4. Run the unit tests found in src/test/java/com/paytm/sdechallenge

Performance

Note that each of the 2 test classes have a testMovingAverageScalability which was used to compare performance of the ArrayList and LinkedList implementations when dealing with large lists. On a Macbook Pro with a 2.5 GHz Core i7 processor, the ArrayList test took 6 seconds, while the LinkedList test took 10 seconds.

The test duration includes both the time to insert elements into the list, as well as calculate the moving average. We insert 9 million elements, and calculate the moving average over the most recent 6 million.

If we only consider the time to calculate the moving average, and ignore the time to insert elements, then both implementations are almost the same, taking about 590 ms on the same hardware. This tells us the main contributor to the LinkedList implementation being slower is inserts, which is likely due to the memory allocation that needs to happen in creating each node.

On the assumption that we might request the moving average a large number of times without doing inserts into the list, a trivial performance enhancement to getMovingAverage() was implemented to cache the most recent result. We clear that cached result anytime a new element is added to the ArrayList.

Future Enhancements

Depending on the access pattern of how this interface is used in production, an area to explore would be to re-calculate the moving average incrementally as each element is added.

sdechallenge's People

Contributors

stuvie avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.