Comments (6)
from bioawk.
Ok I git cloned bioawk to another server today and it works correctly. Closing.
from bioawk.
No actually.
So cloning worked on two servers (CentOS and gentoo), however I had an error when doing make
on the OpenSUSE server. Installing bison with conda install bison
made it go away and I compiled succesfully, but when I then use bioawk the parsing is still wrong and produces the output as in the examples above. I made sure I'm calling it from the git folder and not from conda, but it still doesn't work.
Maybe there is an issue with the yacc
?
from bioawk.
I will close this as I think it's a problem with the parser generator bison
.
from bioawk.
Hi again,
I'm still having this issue with pretty much every installation I attempt (I cannot even reproduce correct behaviour with older installs on the servers mentioned in this thread).
Today I cloned this git repo to my new laptop with elementary OS 5.1.7 (based on Ubuntu 18.04 LTS). I successfully compiled it (I just run sudo apt install bison
before I started) and tried to use it on some example data. And the parsing was wrong.
I'm attaching an example csv file bioawk-test.csv, created in a spreadsheet. I tried several commands and got really strange behaviours:
$ cat bioawk-test.csv
1 1 2 3 4 4
2 10 20 30 0.40000000000000002 0.40000000000000002
3 19 38 57 -3.2000000000000011 3.2000000000000011
4 28 56 84 -6.8000000000000007 6.8000000000000007
5 37 74 111 -10.400000000000002 10.400000000000002
6 46 92 138 -14.000000000000004 14.000000000000004
7 55 110 165 -17.600000000000001 17.600000000000001
8 64 128 192 -21.200000000000003 21.200000000000003
9 73 146 219 -24.800000000000004 24.800000000000004
10 82 164 246 -28.400000000000006 28.400000000000006
## first failed test
$ bioawk '$6<10' bioawk-test.csv
1 1 2 3 4 4
2 10 20 30 0.40000000000000002 0.40000000000000002
## this is also wrong
$ bioawk '$6>10' bioawk-test.csv
3 19 38 57 -3.2000000000000011 3.2000000000000011
4 28 56 84 -6.8000000000000007 6.8000000000000007
5 37 74 111 -10.400000000000002 10.400000000000002
6 46 92 138 -14.000000000000004 14.000000000000004
7 55 110 165 -17.600000000000001 17.600000000000001
8 64 128 192 -21.200000000000003 21.200000000000003
9 73 146 219 -24.800000000000004 24.800000000000004
10 82 164 246 -28.400000000000006 28.400000000000006
## one more test
$ bioawk '$5<-20' bioawk-test.csv
5 37 74 111 -10.400000000000002 10.400000000000002
6 46 92 138 -14.000000000000004 14.000000000000004
7 55 110 165 -17.600000000000001 17.600000000000001
## this is the weirdest one
$ bioawk '$5>-20' bioawk-test.csv
1 1 2 3 4 4
2 10 20 30 0.40000000000000002 0.40000000000000002
3 19 38 57 -3.2000000000000011 3.2000000000000011
4 28 56 84 -6.8000000000000007 6.8000000000000007
8 64 128 192 -21.200000000000003 21.200000000000003
9 73 146 219 -24.800000000000004 24.800000000000004
10 82 164 246 -28.400000000000006 28.400000000000006
It looks to me that bioawk
has some trouble with interpreting numbers with decimal points. Parsing other columns of this file is otherwise fine.
from bioawk.
Real cause: locale
Hi again,
recently, I encountered a weird issue with system sort
(GNU coreutils), where I was numerically sorting a text file on a column with US-style decimal numbers (with dot as decimal point) and my sort
was producing wrong results. After a few minutes of frustration, I've found the issue was caused by my Czech locale - running the command export LC_ALL=C
and rerunning the sort
command fixed my issue. I still don't understand why the parsing of such decimals would be wrong, but nevertheless - it is wrong.
Anyway - yesterday, I got the idea if perhaps this issue with bioawk
could be caused by locale too. And it seems it is!
Specifically, I can get wrong parsing under Czech locale with bioawk
, and mawk
1.3.4, but not with gawk
4.0.2 or miller. Setting my locale to "C
" as above fixes the issue and bioawk
(as well as mawk
) starts parsing as expected. Note that my tests in previous comments were testing against gawk
, which is not affected by locale in this way.
My original locale:
$ locale [94/94]
LANG=cs_CZ.UTF-8
LC_CTYPE="cs_CZ.UTF-8"
LC_NUMERIC=cs_CZ.UTF-8
LC_TIME=cs_CZ.UTF-8
LC_COLLATE="cs_CZ.UTF-8"
LC_MONETARY=cs_CZ.UTF-8
LC_MESSAGES="cs_CZ.UTF-8"
LC_PAPER=cs_CZ.UTF-8
LC_NAME=cs_CZ.UTF-8
LC_ADDRESS=cs_CZ.UTF-8
LC_TELEPHONE=cs_CZ.UTF-8
LC_MEASUREMENT=cs_CZ.UTF-8
LC_IDENTIFICATION=cs_CZ.UTF-8
LC_ALL=
from bioawk.
Related Issues (20)
- What is the license of bioawk?
- [notbug] unexpected termination on fasta.gz HOT 2
- recipe for target 'ytab.o' failed HOT 2
- How to cite bioawk? HOT 1
- support multiple file of different format?
- Is is possible to extract the value of certain tag in sam file?
- Put out new release tag since new functionality. HOT 1
- bioawk -c fastx sets OFS=\\t with no way to change it. HOT 5
- fastq comment line
- bioawk trimming the protein sequences in the start
- Segmentation fault on empty files
- missing license for the repo
- compilation fatal error - missing zlib.h HOT 2
- bioawk
- match with regexp does not work for me HOT 2
- Segmentation fault (core dumped) HOT 1
- test
- bioawk does not stop parsing a file on `nextfile` HOT 1
- FR: add asorti function
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bioawk.