synodicmonth / chebykan Goto Github PK

View Code? Open in Web Editor NEW

325.0 325.0 35.0 1.97 MB

Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.

Jupyter Notebook 99.76% Python 0.24%

chebykan's Introduction

SynodicMonth, Synodic
Nankai University
Interested in Computer Architecture and AIGC
BiliBili : https://space.bilibili.com/26738256
Contact me : [email protected]

chebykan's People

Contributors

Stargazers

Watchers

chebykan's Issues

Use properly tuned baselines!

Your MLP baseline in functional interpolation is big and undertrained.

This is properly tuned MLP:
https://colab.research.google.com/drive/1wJFhSeTF9xTikN_ranR2xebf9HEaHo5Y

Check out KAL-Nets

Would really love your feedback: https://github.com/1ssb/torchkan

Using Legendre Polynomials instead; ~98% on MNIST.

Poor generalization

I tried using ChebyKAN to train signal waveforms, but it showed poor generalization. What may be the reason？？

is train data

is test data.

Tanh and infinite support

We can avoid using tanh if we used sine-cosine form of Fourier series instead don't you think ?

ChebyKAN Having troubles on solving dynamical systems

Hi, ChebyKAN is indeed simple, elegant and powerful, I believe it can do more.

So I implemented it on solving some models with dynamical systems, economic models to be precise, where the dynamical systems or equations quite similar to PDEs.

The main problem I encountered is that ChebyKAN is more prone to be "stuck", preventing training going any further. Here are two illustrations with KAN structure

valuefunction_KAN  =  KAN(width=[2,5,5,1], grid=5, k=3, grid_eps=1.0, noise_scale_base=0.25)

and ChebyKAN structure

class ChebyKAN(nn.Module):
    def __init__(self):
        super(ChebyKAN, self).__init__()
        self.chebykan1 = ChebyKANLayer(2, 8, 8)
        self.chebykan2 = ChebyKANLayer(8, 16, 5)
        self.chebykan3 = ChebyKANLayer(16, 1, 5)

    def forward(self, x):
        x = self.chebykan1(x)
        x = self.chebykan2(x)
        x = self.chebykan3(x)
        return x

valuefunction_cheb  =  ChebyKAN()

Results on the first fig are trained by LBFGS, and by Adam with learning rate 1e-2 on the second fig.

I have tested it multiple times, with different input, output dimensions and degree range from 4 to 12, the issue remains.

ChebyKANs do not support continual learning

It seems like ChebyKANs do not support continual learning. This is probably because b-splines are inherently a local activation function, while Chebyshev expansion is not.
https://colab.research.google.com/drive/1lSYvnOmfoRnmrvBx-zkUTpavkjmfpNT1

No Dropout found in any experiment

As the title says, I've not found any regularization method as a dropout in your current implementation and experiments. Is there a specific reason for not doing that? Perhaps the learned activation function is already very complex and shouldn't be able to overfit compared to traditional perceptrons?

Also the experiments done in the following repository neither use dropout: https://github.com/1ssb/torchkan

ChebyKAN layer is equivalent to custom activation + nn.Linear

Hi, very interesting idea, kudos!

I believe the proposed layer is equivalent to the following combination (I fix degree to be 4 for simplicity):

from ChebyKANLayer import ChebyKANLayer

class ChebyActivation(nn.Module):
    def __init__(self, degree):
        assert degree == 4
        super().__init__()

    def forward(self, x):
        x = torch.tanh(x)

        x = torch.cat(
            [
                torch.ones_like(x),
                x,
                2 * x**2 - 1,
                4 * x**3 - 3 * x,
                8 * x**4 - 8 * x**2 + 1,
            ],
            dim=1,
        )
        return x

input_dim = 128
output_dim = 256
for _ in range(100):
    variant_0 = ChebyKANLayer(input_dim, output_dim, 4)
    variant_1 = nn.Sequential(
        ChebyActivation(4),
        nn.Linear(input_dim * 5, output_dim, bias=False),
    )
    # ensure same weights
    variant_1[1].weight.data.copy_(variant_0.cheby_coeffs.permute(1, 2, 0).flatten(1))
    for _ in range(100):
        x = torch.randn(1234, input_dim)
        res1 = variant_0(x)
        res2 = variant_1(x)

        assert (
            res1 - res2
        ).abs().max() < 1e-6, "Found inconsistency between implementations!"

print("Two implementations are equivalent!")

This makes it a variant of LAN network (see app. B2 in KAN paper), which is nice, but it's a double-edged sword.

On one side, with this rewrite you can train it pretty efficiently (by checkpointing ChebyActivation function and using optimized cuda Linear kernel).

On the other side, modern networks like LLAMA3 already use Gated Linear Unit activations, which should give roughly equivalent representation (I'm not 100% sure on this point tho).

Do you think it's correct reasoning or maybe I'm missing smth?

Thanks in advance!

synodicmonth / chebykan Goto Github PK

chebykan's Introduction

chebykan's People

Contributors

Stargazers

Watchers

Forkers

chebykan's Issues

Use properly tuned baselines!

Check out KAL-Nets

Poor generalization

Tanh and infinite support

ChebyKAN Having troubles on solving dynamical systems

ChebyKANs do not support continual learning

No Dropout found in any experiment

ChebyKAN layer is equivalent to custom activation + nn.Linear

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs