GithubHelp home page GithubHelp logo

upmem / dpu_kmeans Goto Github PK

View Code? Open in Web Editor NEW
7.0 7.0 0.0 384 KB

Implementation of the K-means algorithm on UPMEM PIM architecture

License: MIT License

Dockerfile 0.32% CMake 1.57% Python 78.07% C 16.09% C++ 3.28% Shell 0.66%

dpu_kmeans's People

Contributors

dependabot[bot] avatar sylvanbrocard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

dpu_kmeans's Issues

CPU benchmark error

It looks like part of all of the CPU benchmarks were done with Elkan.

To do:

  • Check which
  • If necessary rerun them with Lloyd

Segfault

The script benchmarks/strong_scaling/CPU+DPU.py crashes on 2048 DPU.
The script 2048_only doesn't.
There's probably a memory leak. Check the C arrays.

tasks will not fit in WRAM

This Email is mainly about the K-means application https://github.com/upmem/dpu_kmeans. I have attached the code that gives "task will not fit in WRAM error" using the K-means code on 2546 DPUs. The code computes K-means with 1,000,000 data points of dimension 10. However, the same code works for 100,000 data points.

import numpy as np
import time
from dpu_kmeans import KMeans as DPUKMeans
from dpu_kmeans import _dimm

#error: tasks will not fit in WRAM(base)

if __name__ == "__main__":
    k, dim, num_elements, iter = 10, 10, 1000000, 1
    input = np.random.randint(0, 100, size=(num_elements, dim))
    init_centroids = np.zeros((k, dim), dtype=np.int32)
    #_dimm.set_n_dpu(num_dpus)

    for i in range(k):
        init_centroids[i] = input[i]
    
    input = _dimm.DIMM_data(input)
    dpu_kmeans = DPUKMeans(k, n_init=1, verbose=False, max_iter=iter, tol=0)
    dpu_kmeans.n_iter_ = iter

    start = time.time()
    result, iterations, duration = dpu_kmeans.fit(input)
    end = time.time()
    t = (end-start)*1000.


    print("the time consumed is "+str(t)+" ms")
    print("the kernel time consumed is "+str(duration))
    print("the number of iterations "+str(iterations))
    print("centroids of kmeans: ")
    print(np.rint(result))

Multi experiment management

To do:

  • use templating to set the name of the experiment, get it from params
  • no need to differentiate params.yaml, stage names and output names (doesn't do anything anyway)
  • add a top level key on metrics to differentiate between different kind of experiments
  • consider using branches (probably not) if it's the only way to select what to show, check with DVC studio
  • use dvc exp show --param-deps to avoid showing everything

Cluster relocation

Right now cluster relocation is done on the host side. It essentially calls one Lloyd iteration to get the labels and then computes all points-clusters distances. This is likely slow.

To do:

  • Profile the application to see if this needs to be optimized.
  • If it's slow, either
    • Move it to the DPU side, finding the points furthest from all centroids can be done as part of the main loop.
    • Do it with daal4py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.