Hi, I have the problem of <a class="issue-link js-issue-link" data-error-text="Fai

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I have python 3.8.8 and I use it fine with other projects that use torch and pyt

Can't load dataset file for semisupervised TU about graphcl HOT 5 OPEN

shen-lab commented on July 24, 2024

Can't load dataset file for semisupervised TU

from graphcl.

Comments (5)

Ripper346 commented on July 24, 2024 1

Hi again, so, your code didn't solve the issue, I mitigated something else resulting the class as the following

class TUDatasetExt(TUDataset):
    def __init__(self,
                 root,
                 name,
                 transform=None,
                 pre_transform=None,
                 pre_filter=None,
                 use_node_attr=False,
                 processed_filename='data.pt',
                 aug="none", aug_ratio=None):
        self.name = name
        self.processed_filename = processed_filename

        self.aug = aug
        self.aug_ratio = None

        super(TUDatasetExt, self).__init__(root, self.name, transform, pre_transform,
                                           pre_filter, use_node_attr)
        self.data, self.slices = torch.load(self.processed_paths[0])
        if self.data.x is not None and not use_node_attr:
            self.data.x = self.data.x[:, self.num_node_attributes:]

    @property
    def processed_file_names(self):
        return self.processed_filename

    @property
    def num_node_features(self):
        r"""Returns the number of features per node in the dataset."""
        return self[0][0].num_node_features

    def download(self):
        super().download()

    def get(self, idx):
        data = self.data.__class__()

        if hasattr(self.data, '__num_nodes__'):
            data.num_nodes = self.data.__num_nodes__[idx]

        for key in self.data.keys:
            item, slices = self.data[key], self.slices[key]
            if torch.is_tensor(item):
                s = list(repeat(slice(None), item.dim()))
                s[self.data.__cat_dim__(key,
                                        item)] = slice(slices[idx],
                                                       slices[idx + 1])
            else:
                s = slice(slices[idx], slices[idx + 1])
            data[key] = item[s]

        if self.aug == 'dropN':
            data = drop_nodes(data, self.aug_ratio)
        elif self.aug == 'wdropN':
            data = weighted_drop_nodes(data, self.aug_ratio, self.npower)
        elif self.aug == 'permE':
            data = permute_edges(data, self.aug_ratio)
        elif self.aug == 'subgraph':
            data = subgraph(data, self.aug_ratio)
        elif self.aug == 'maskN':
            data = mask_nodes(data, self.aug_ratio)
        elif self.aug == 'none':
            data = data
        elif self.aug == 'random4':
            ri = np.random.randint(4)
            if ri == 0:
                data = drop_nodes(data, self.aug_ratio)
            elif ri == 1:
                data = subgraph(data, self.aug_ratio)
            elif ri == 2:
                data = permute_edges(data, self.aug_ratio)
            elif ri == 3:
                data = mask_nodes(data, self.aug_ratio)
            else:
                print('sample augmentation error')
                assert False

        elif self.aug == 'random3':
            ri = np.random.randint(3)
            if ri == 0:
                data = drop_nodes(data, self.aug_ratio)
            elif ri == 1:
                data = subgraph(data, self.aug_ratio)
            elif ri == 2:
                data = permute_edges(data, self.aug_ratio)
            else:
                print('sample augmentation error')
                assert False

        elif self.aug == 'random2':
            ri = np.random.randint(2)
            if ri == 0:
                data = drop_nodes(data, self.aug_ratio)
            elif ri == 1:
                data = subgraph(data, self.aug_ratio)
            else:
                print('sample augmentation error')
                assert False

        else:
            print('augmentation error')
            assert False

        return data

It can now download the dataset but it raises again the error of the issue.

Then I tried to install the conda environment of semisupervised TU first but it can't solve some dependencies:

ResolvePackageNotFound:
  - ld_impl_linux-64=2.33.1
  - libffi=3.3
  - readline=8.0
  - libgcc-ng=9.1.0
  - libstdcxx-ng=9.1.0
  - ncurses=6.2
  - libedit=3.1.20191231

I tried with a docker devcontainer, python 3.7 on debian buster with the requirements:

decorator==4.4.2
future==0.18.2
isodate==0.6.0
joblib==0.16.0
networkx==2.4
numpy==1.19.0
pandas==1.0.5
pillow==7.2.0
plyfile==0.7.2
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2020.1
rdflib==5.0.0
scikit-learn==0.23.1
scipy==1.5.0
six==1.15.0
threadpoolctl==2.1.0

and then installed manually

pip3 install torch==1.4.0 torch-vision==0.5.0 -f https://download.pytorch.org/whl/torch_stable.html
pip3 install torch-scatter==1.1.0 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip3 install torch-sparse==0.4.4 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip3 install torch-cluster==1.4.5 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip3 install torch-spline-conv==1.1.0 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip3 install torch-geometric==1.1.0

The installation and run of the original code went fine. I had just to do an adjustment in train_eval.py from r146 I had to place two checks for the logs and models folders to create them if they don't exist.

I noticed that the issue starts facing from torch-geometric 1.4.2, before it doesn't have that problem.

from graphcl.

yyou1996 commented on July 24, 2024

Hi @Ripper346,

Thanks for your interest and a big apology for your frustration. The following solutions are I can come with:

Would you mind share your env information that I can double check? This experiment is constructed upon an old repo https://github.com/chentingpc/gfn#requirements with slightly outdated packages, so I understand you may install the required ones but in case there is an oversight.
I notice in the error information that FileNotFoundError: [Errno 2] No such file or directory: 'data\\MUTAG\\MUTAG\\processed\\data_deg+odeg100+ak3+reall.pt'. It looks weird for me that the program concat the path as 'data\\MUTAG\\MUTAG\\processed\\data_deg+odeg100+ak3+reall.pt' rather than 'data\MUTAG\MUTAG\processed\data_deg+odeg100+ak3+reall.pt'. Is there anyway for you to debug this?

from graphcl.

Ripper346 commented on July 24, 2024

I have python 3.8.8 and I use it fine with other projects that use torch and pytorch-geometric, but here is my requirements of my env (a bit long)

alembic==1.5.8
ase==3.21.1
astroid==2.5.1
async-generator==1.10
attrs==20.3.0
autopep8==1.5.5
backcall==0.2.0
bleach==3.3.0
certifi==2020.12.5
chardet==3.0.4
cliff==3.7.0
cmaes==0.8.2
cmd2==1.5.0
colorama==0.4.4
colorlog==4.8.0
control==0.8.4
cvxopt==1.2.6
cycler==0.10.0
Cython==0.29.22
decorator==4.4.2
defusedxml==0.7.0
dgl-cu110==0.6.0
entrypoints==0.3
future==0.18.2
googledrivedownloader==0.4
grakel==0.1.8
graphkit-learn==0.2.0.post1
greenlet==1.0.0
h5py==3.2.0
idna==2.10
ipdb==0.13.5
ipykernel==5.5.0
ipython==7.21.0
ipython-genutils==0.2.0
isodate==0.6.0
isort==5.7.0
jedi==0.18.0
Jinja2==2.11.3
joblib==1.0.1
jsonschema==3.2.0
jupyter-client==6.1.11
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
kiwisolver==1.3.1
lazy-object-proxy==1.5.2
llvmlite==0.35.0
Mako==1.1.4
mariadb==1.0.6
MarkupSafe==1.1.1
matplotlib==3.3.4
mccabe==0.6.1
mistune==0.8.4
Mosek==9.2.38
mysql-connector-python==8.0.23
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.2
nest-asyncio==1.5.1
networkx==2.5
nose==1.3.7
numba==0.52.0
numpy==1.20.1
optuna==2.7.0
packaging==20.9
pandas==1.2.3
pandocfilters==1.4.3
parso==0.8.1
pbr==5.5.1
pickleshare==0.7.5
Pillow==8.1.1
prettytable==2.1.0
prompt-toolkit==3.0.16
protobuf==3.15.4
pycodestyle==2.6.0
Pygments==2.8.0
pylint==2.7.2
pyparsing==2.4.7
pyperclip==1.8.2
pyreadline3==3.3
pyrsistent==0.17.3
python-dateutil==2.8.1
python-editor==1.0.4
python-louvain==0.15
pytz==2021.1
pywin32==300
PyYAML==5.4.1
pyzmq==22.0.3
rdflib==5.0.0
requests==2.25.1
rope==0.18.0
scikit-learn==0.24.1
scipy==1.6.1
seaborn==0.11.1
six==1.15.0
SQLAlchemy==1.4.7
stevedore==3.3.0
tabulate==0.8.9
testpath==0.4.4
threadpoolctl==2.1.0
toml==0.10.2
torch==1.8.0+cu111
torch-cluster==1.5.9
torch-geometric==1.6.3
torch-scatter==2.0.6
torch-sparse==0.6.9
torch-spline-conv==1.2.1
torchaudio==0.8.0
torchvision==0.9.0+cu111
tornado==6.1
tqdm==4.58.0
traitlets==5.0.5
typing-extensions==3.7.4.3
urllib3==1.26.3
wcwidth==0.2.5
webencodings==0.5.1
wrapt==1.12.1

I am on windows, it is normal that it places two \\ as escaping the backslash

from graphcl.

yyou1996 commented on July 24, 2024

Thank you. I see your env and it is too new (torch_geometric>=1.6.0 rather than the required 1.4.0) for semi_TU repo (please refer to https://github.com/Shen-Lab/GraphCL/tree/master/semisupervised_TU#option-1 for the correct environment).

Another option is that you can try replacing the __init__ function in tu_dataset by:

    url = 'https://ls11-www.cs.tu-dortmund.de/people/morris/' \ 
             'graphkerneldatasets'

    def __init__(self,
                 root,
                 name,
                 transform=None,
                 pre_transform=None,
                 pre_filter=None,
                 use_node_attr=False,
                 processed_filename='data.pt', aug_ratio=None):
        self.name = name
        self.processed_filename = processed_filename

        self.aug = "none"
        self.aug_ratio = None

        super(TUDatasetExt, self).__init__(root, transform, pre_transform,
                                        pre_filter)
        self.data, self.slices = torch.load(self.processed_paths[0])
        if self.data.x is not None and not use_node_attr:
            self.data.x = self.data.x[:, self.num_node_attributes:]

    @property
    def num_node_labels(self):
        if self.data.x is None:
            return 0
        for i in range(self.data.x.size(1)):
            if self.data.x[:, i:].sum().item() == self.data.x.size(0):
                return self.data.x.size(1) - i
        return 0

    @property
    def num_node_attributes(self):
        if self.data.x is None:
            return 0
        return self.data.x.size(1) - self.num_node_labels

    @property
    def raw_file_names(self):
        names = ['A', 'graph_indicator']
        return ['{}_{}.txt'.format(self.name, name) for name in names]

    @property
    def processed_file_names(self):
        return self.processed_filename

    @property
    def num_node_features(self):
        r"""Returns the number of features per node in the dataset."""
        return self[0][0].num_node_features

which might solve the download issue. A new version of this experiment to adapt to torch_geometric>=1.6.0 will also be released in the following weeks.

from graphcl.

Ripper346 commented on July 24, 2024

Ok, thanks I will try on Monday and I will keep you informed in this issue. I think that that behavior is strange, maybe I could look at differences too between torch geometric 1.4 and 1.6

from graphcl.

Can't load dataset file for semisupervised TU about graphcl HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs