GithubHelp home page GithubHelp logo

asreview / asreview-datatools Goto Github PK

View Code? Open in Web Editor NEW
17.0 5.0 13.0 126 KB

Tool to preprocess datasets for ASReview

License: MIT License

Python 100.00%
asreview plugin systematic-literature-reviews systematic-reviews utrecht-university

asreview-datatools's People

Contributors

fiobyr avatar gimoai avatar j535d165 avatar kequach avatar laurens88 avatar pablov-1995 avatar peterlombaers avatar qubixes avatar rensvandeschoot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

asreview-datatools's Issues

Parser of asreview stat does not work correctly

Is your feature request related to a problem? Please describe.
Given a state file sf.h5, if I wanted to know its WSS at 85% and RRF at 1% of the data screened, for example, I would try to type the following line of code in the command line:

$ asreview stat sf.h5 --wss 85 --rrf 1

This is the output error I would get:

...
File "path\asreview\analysis\analysis.py", line 239, in wss
    if norm_yr[i] >= val - 1e-6:
TypeError: unsupported operand type(s) for -: 'str' and 'float'

It happens because the arguments are not specified to be floats in entrypoint.py, so they are read as strings.

Describe the solution you'd like
Adding type=float to both parameters fixes the bug.

Documentation is lacking in API usage.

from asreviewcontrib.statistics import StateStatistics

from contextlib import redirect_stdout

with open('out.txt', 'w') as f:
    with redirect_stdout(f):
        for file in files:
            print(StateStatistics.from_path(file, wss_vals=[95, 100]))

In this example, I collect statistics for many runs simultaneously and store them in a file.

The API can be handy and is very easy to use, but the documentation regarding this usage is non-existent. There is also no separate page in the ASReview documentation. I'd suggest modifying the readme.MD for now, and adding a separate page later.

Update dedup command documentation for synergy dataset

Issue Description

The documentation for the dedup command in the ASReview datatools readme currently provides outdated information regarding the dataset used for deduplication examples. The command example given is asreview data dedup benchmark:van_de_schoot_2017 -o van_de_schoot_2017_dedup.csv, which is deprecated. This may lead to confusion and errors for users attempting to follow the current instructions.

Background

I encountered an error when attempting to use the dedup command as documented. After reaching out to the ASReview support team, it was clarified that the documentation had not been updated to reflect the correct dataset. Following their guidance, I successfully used the command with my data and the synergy dataset upon updating my version of datatools.

Suggested Changes

  • Update the Dataset Reference: The correct command, as informed by the ASReview support team, should be asreview data dedup synergy:van_de_schoot_2018 -o van_de_schoot_2018_dedup.csv.
  • Note on Datatools Version: It might be beneficial to add a note about ensuring that the datatools package is up to date. A simple command like pip install asreview-datatools --upgrade can be suggested to prevent potential issues with executing the dedup command.

Request

I kindly request that the documentation be updated to reflect the correct use of the dedup command with the synergy dataset and to include a reminder for users to ensure their datatools package is current.

ASReview stat for datasets broken

jonathan$ asreview stat raw_data/demo.csv 
Traceback (most recent call last):
  File "/Users/jonathan/.pyenv/versions/asreview-production/bin/asreview", line 10, in <module>
    sys.exit(main())
  File "/Users/jonathan/.pyenv/versions/3.8.0/envs/asreview-production/lib/python3.8/site-packages/asreview/__main__.py", line 80, in main
    entry.load()().execute(sys.argv[2:])
  File "/Users/jonathan/.pyenv/versions/3.8.0/envs/asreview-production/lib/python3.8/site-packages/asreviewcontrib/statistics/entrypoint.py", line 66, in execute
    with StateStatistics.from_path(
  File "/Users/jonathan/.pyenv/versions/3.8.0/envs/asreview-production/lib/python3.8/site-packages/asreviewcontrib/statistics/statistics.py", line 41, in from_path
    stat_inst = cls(path, *args, prefix=prefix, **kwargs)
  File "/Users/jonathan/.pyenv/versions/3.8.0/envs/asreview-production/lib/python3.8/site-packages/asreviewcontrib/statistics/statistics.py", line 31, in __init__
    self.analysis = Analysis.from_path(path, prefix=prefix)
  File "/Users/jonathan/.pyenv/versions/3.8.0/envs/asreview-production/lib/python3.8/site-packages/asreview/analysis/analysis.py", line 119, in from_path
    return cls.from_file(data_path, key=key)
  File "/Users/jonathan/.pyenv/versions/3.8.0/envs/asreview-production/lib/python3.8/site-packages/asreview/analysis/analysis.py", line 107, in from_file
    state = state_from_file(data_fp)
  File "/Users/jonathan/.pyenv/versions/3.8.0/envs/asreview-production/lib/python3.8/site-packages/asreview/state/utils.py", line 139, in state_from_file
    raise ValueError(f"Expected ASReview file or file {data_fp} with "
ValueError: Expected ASReview file or file raw_data/demo.csv with extension ['.h5', '.hdf5', '.he5', '.json'].

@PeterLombaers might be the result of our changes in state file reader. Any idea?

Add API documentation

The datatools package has an intuitive API. At the moment, only the CLI is documented. We are welcoming contributions to the documentation that describe the API.

Source Code ASReview Datatool Dedup

Hi all, thank you for the new addition to the ASReview Datatool-set! Could you navigate me to the source code of the deduplication method. For a deduplication-project we are looking into different deduplication methods and I would like to add and test the performance of the ASReview Datatool dedup. TIA

Datatools compose flags

Compose does not seem to be able to handle multiple datasets with the same flag, i.e. I cannot add together -l file1 -l file2. I think this is because it overlaps functionality with vstack, but I'd like to beable to assign labels to datasets with compose.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.