asreview / asreview-datatools Goto Github PK
View Code? Open in Web Editor NEWTool to preprocess datasets for ASReview
License: MIT License
Tool to preprocess datasets for ASReview
License: MIT License
Is your feature request related to a problem? Please describe.
Given a state file sf.h5, if I wanted to know its WSS at 85% and RRF at 1% of the data screened, for example, I would try to type the following line of code in the command line:
$ asreview stat sf.h5 --wss 85 --rrf 1
This is the output error I would get:
...
File "path\asreview\analysis\analysis.py", line 239, in wss
if norm_yr[i] >= val - 1e-6:
TypeError: unsupported operand type(s) for -: 'str' and 'float'
It happens because the arguments are not specified to be floats in entrypoint.py, so they are read as strings.
Describe the solution you'd like
Adding type=float
to both parameters fixes the bug.
from asreviewcontrib.statistics import StateStatistics
from contextlib import redirect_stdout
with open('out.txt', 'w') as f:
with redirect_stdout(f):
for file in files:
print(StateStatistics.from_path(file, wss_vals=[95, 100]))
In this example, I collect statistics for many runs simultaneously and store them in a file.
The API can be handy and is very easy to use, but the documentation regarding this usage is non-existent. There is also no separate page in the ASReview documentation. I'd suggest modifying the readme.MD for now, and adding a separate page later.
The documentation for the dedup
command in the ASReview datatools readme currently provides outdated information regarding the dataset used for deduplication examples. The command example given is asreview data dedup benchmark:van_de_schoot_2017 -o van_de_schoot_2017_dedup.csv
, which is deprecated. This may lead to confusion and errors for users attempting to follow the current instructions.
I encountered an error when attempting to use the dedup
command as documented. After reaching out to the ASReview support team, it was clarified that the documentation had not been updated to reflect the correct dataset. Following their guidance, I successfully used the command with my data and the synergy dataset upon updating my version of datatools.
asreview data dedup synergy:van_de_schoot_2018 -o van_de_schoot_2018_dedup.csv
.pip install asreview-datatools --upgrade
can be suggested to prevent potential issues with executing the dedup
command.I kindly request that the documentation be updated to reflect the correct use of the dedup
command with the synergy dataset and to include a reminder for users to ensure their datatools package is current.
jonathan$ asreview stat raw_data/demo.csv
Traceback (most recent call last):
File "/Users/jonathan/.pyenv/versions/asreview-production/bin/asreview", line 10, in <module>
sys.exit(main())
File "/Users/jonathan/.pyenv/versions/3.8.0/envs/asreview-production/lib/python3.8/site-packages/asreview/__main__.py", line 80, in main
entry.load()().execute(sys.argv[2:])
File "/Users/jonathan/.pyenv/versions/3.8.0/envs/asreview-production/lib/python3.8/site-packages/asreviewcontrib/statistics/entrypoint.py", line 66, in execute
with StateStatistics.from_path(
File "/Users/jonathan/.pyenv/versions/3.8.0/envs/asreview-production/lib/python3.8/site-packages/asreviewcontrib/statistics/statistics.py", line 41, in from_path
stat_inst = cls(path, *args, prefix=prefix, **kwargs)
File "/Users/jonathan/.pyenv/versions/3.8.0/envs/asreview-production/lib/python3.8/site-packages/asreviewcontrib/statistics/statistics.py", line 31, in __init__
self.analysis = Analysis.from_path(path, prefix=prefix)
File "/Users/jonathan/.pyenv/versions/3.8.0/envs/asreview-production/lib/python3.8/site-packages/asreview/analysis/analysis.py", line 119, in from_path
return cls.from_file(data_path, key=key)
File "/Users/jonathan/.pyenv/versions/3.8.0/envs/asreview-production/lib/python3.8/site-packages/asreview/analysis/analysis.py", line 107, in from_file
state = state_from_file(data_fp)
File "/Users/jonathan/.pyenv/versions/3.8.0/envs/asreview-production/lib/python3.8/site-packages/asreview/state/utils.py", line 139, in state_from_file
raise ValueError(f"Expected ASReview file or file {data_fp} with "
ValueError: Expected ASReview file or file raw_data/demo.csv with extension ['.h5', '.hdf5', '.he5', '.json'].
@PeterLombaers might be the result of our changes in state file reader. Any idea?
The datatools package has an intuitive API. At the moment, only the CLI is documented. We are welcoming contributions to the documentation that describe the API.
Hi all, thank you for the new addition to the ASReview Datatool-set! Could you navigate me to the source code of the deduplication method. For a deduplication-project we are looking into different deduplication methods and I would like to add and test the performance of the ASReview Datatool dedup. TIA
Compose does not seem to be able to handle multiple datasets with the same flag, i.e. I cannot add together -l file1 -l file2. I think this is because it overlaps functionality with vstack, but I'd like to beable to assign labels to datasets with compose.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.