GithubHelp home page GithubHelp logo

Possible segfault about fastnumbers HOT 33 CLOSED

sethmmorton avatar sethmmorton commented on May 27, 2024
Possible segfault

from fastnumbers.

Comments (33)

SethMMorton avatar SethMMorton commented on May 27, 2024

Thanks, this stack trace was very helpful.

I'm wondering if it would be possible for you to determine which input created this? I have some guesses as to what might cause this, but without being able to replicate it will be difficult to fix.

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

I'm working on that now. It's a file that's given me trouble before, probably something to do with control characters or something like that. I'll try and narrow it down to a specific line of code.

I have a GUI that reads a delimited file into pandas, then runs various calculations on each column like min/max, frequency count, etc. I use natsort after I've determined that a column contains both numbers and characters to sort it naturally.

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

I've been working on this for a while now and it's very frustrating. I've gotten the file that causes the crash down to 122Kb, but I can't get it any smaller. Here's a link:

http://pastebin.com/sMMbGCe0

I've never used pastebin before, so hopefully that works, I don't see a way to add an attachment here.

I also can't reproduce the crash on a smaller program than my full one, which is 500 lines of python, wx, etc. Hopefully looking at the file that causes the crash will help you. Otherwise, I'm stuck.

Thanks again.

P. S. The problem happens 100% of the time on a huge file (63MB), but happens intermittently on the pastebin file.

from fastnumbers.

SethMMorton avatar SethMMorton commented on May 27, 2024

Thanks, I'll take a look at this tonight. For reference, what system are you on?

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

Linux lepore-desktop 3.19.0-16-generic #16-Ubuntu SMP Thu Apr 30 16:13:00 UTC 2015 i686 athlon i686 GNU/Linux

Running KDE.

I spun up a Windows 7 virtual machine and did not get the error.

from fastnumbers.

SethMMorton avatar SethMMorton commented on May 27, 2024

I realize that this isn't the question that you asked, but I am finding that the sorting is not working properly because there is an NaN in your data. This confuses Python's sort because 5 < NaN is False and 5 > NaN is False. This created a jump discontinuity in your sorted data (see below). I will update natsort to better handle this case after I solve this issue, but I don't think this is related to the seg fault (which I haven't been able to replicate yet, but I'm on a Mac, so it may be machine dependent). I will dig more.

     SERIAL_NUMBER                  NAME
1927             6        APLIN -OR-&  -
3253      33053 06  BALDASANO BENJAMIN M
1412       2919302     ANDERSON ARVINE L
1323       6135134        AMORE ERNEST S
898        6145219          ALLARD LEO L
3873       6149528      BARNEY WILLIAM A
740        6149858       ALDRICH HENRY W
4813       6248805           BECK JOHN C
4889       6865158       BECKLUND EDWARD
4680       6909807    BEARDSLEY HAROLD F
4683       6953423       BEARLEY HARRY L
4686      11110897       BEARSE SELWYN F
4715      13046508     BEATTIE JOHN H JR
4689      15044122     BEASLEY CHARLES P
4708      16006589     BEASTER RICHARD H
4702      17068735      BEASLEY JOSEPH C
4681      20310601        BEARE GEORGE D
4703      20407637      BEASLEY MARVIN J
4682      31309985    BEARISTO WILLIAM E
4711      33393550     BEATTIE CHARLES D
4714      33404711      BEATTIE HERMAN H
4696      33646001       BEASLEY JAMES B
4695      34174220       BEASLEY HENRY L
4698      34426838       BEASLEY JAMES T
4705      34517074       BEASLEY PEARMAN
4699      34538587       BEASLEY JAMES W
4697      34801955       BEASLEY JAMES L
4701      35790825      BEASLEY JOSEPH B
4709      36531700       BEATON ROBBIE R
4693      36737603      BEASLEY FRANK JR
4687      37197229        BEARY MARTIN C
4691      37563286      BEASLEY DONALD L
4700      37611309       BEASLEY JESSE E
4688      37627746     BEASLEY CHARLES A
4690      38107155     BEASLEY CHESTER J
4685      38466544           BEARPAW TOM
4706      38564225  BEASLEY STEWART R SR
4718      39203811      BEATTIE ROBERT J
4717      39342618     BEATTIE KENNETH M
4710      42054165           BEATTIE C W
4712           NaN        BEATTIE EDWARD   # <=== COUNT RESETS STARTING HERE
4757       6262518        BEAULIEU LEO E
2105       6264303        ARMON THEODORE
4492       6269549     BAUMGARTEN OTIS K
674        6271743         ALBIN HENRY D
3766       6277281     BARNES CLARENCE B
4139       6285548      BARTLEY JESSIE B
250        6294035        ADAMS CLAUDE E
3087       6296739      BAKER CLARENCE F
3685       6379336       BARKER ERNEST P

from fastnumbers.

SethMMorton avatar SethMMorton commented on May 27, 2024

Can you try testing with the development version that I have just pushed? My suspicion is that there was some problem when converting one of your inputs to a char*, and I have switched to the Python C function that does a bit more error checking when doing the string conversion.

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

Off on vacation for a week, will test next Thursday. Thanks!

On 05/22/2015 12:22 AM, Seth Morton wrote:

Can you try testing with the development version that I have just
pushed? My suspicion is that there was some problem when converting
one of your inputs to a |char*|, and I have switched to the Python C
function that does a bit more error checking when doing the string
conversion.


Reply to this email directly or view it on GitHub
#2 (comment).

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

No luck with the development version, here's the error:

home/lepore/.local/lib/python2.7/site-packages/pkg_resources/init.py:1250: UserWarning: /home/lepore/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
warnings.warn(msg, UserWarning)
Skipping line 15027: expected 26 fields, saw 27
Skipping line 18505: expected 26 fields, saw 27

Skipping line 21991: expected 26 fields, saw 31

Skipping line 44022: expected 26 fields, saw 31

[New Thread 0xac0ffb40 (LWP 5978)]
[New Thread 0xb5351b40 (LWP 5958)]
[New Thread 0xb3b50b40 (LWP 5957)]

Program received signal SIGSEGV, Segmentation fault.
fast_atoi (p=0xac940034 <error: Cannot access memory at address 0xac940034>, error=0xbfffcc26, overflow=0xbfffcc27) at src/fast_atoi.c:24
24 while (white_space(*p)) { p += 1; }

from fastnumbers.

SethMMorton avatar SethMMorton commented on May 27, 2024

Can you try using the following function as a key to natsorted? This will print out every input individually to natsorted before fast_int is run on it. The last one printed before the segfault should be the input causing the problem.

import sys
def printer(x):
    print(x)
    sys.stdout.flush()
    return x
b = natsorted(your_data, key=printer)

from fastnumbers.

SethMMorton avatar SethMMorton commented on May 27, 2024

Since you have the source code, you can also add the following before line 24 in fast_atoi.c, preferably in conjuction with the printer function suggested.

    fprintf(stdout, "fast_atoi string: %d\n", p);
    while (white_space(*p)) { p += 1; }

This should print out the string right before the problem occurs.

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

I think you're making progress. The file that crashes fastnumbers that I posted above no longer crashes it. However, the larger file that the excerpt came from still crashes it. Here are the last values before the segfault:

O&795577
fast_atoi string: -1423753252
fast_atoi string: -1423641420
10305793
fast_atoi string: -1423641548
6132688
fast_atoi string: -1423642156
O&401818
fast_atoi string: -1423753228
fast_atoi string: -1423641292
10300351
fast_atoi string: -1423641420
O&366604
fast_atoi string: -1424162764
Segmentation fault (core dumped)

Thanks for working on this!

from fastnumbers.

SethMMorton avatar SethMMorton commented on May 27, 2024

Great, this helps narrow down the possible problem. I wish that I had given you the right code to add, though. In the C function, can you change it to the following?

fprintf(stdout, "fast_atoi string: ");
fprintf(stdout, "%s\n", p);
while (white_space(*p)) { p += 1; }

I had accidentally had you use the %d format, which will print out an integer, but really I need %s which prints the string in the character array. I also think it will be helpful to know if it is the printing that causes the crash now, or if it is still searching for a space, so I separated the first part of the string from the second.

In the python printer function, can you change print(x), to print(x, repr(x))? This should show any control characters in the string that we aren't thinking about.

Last, if you do this multiple times, does it always crash on the same input, or does it change from run to run?

Sorry to ask you to modify the tests again. I think we are making headway.

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

Happy to help! Here's the latest output. It always crashes on this file, but on the smaller version it only crashed most of the time.

(u'16062279', "u'16062279'")
fast_atoi string: 16062279
(u'31129792', "u'31129792'")
fast_atoi string: 31129792
(u'39093001', "u'39093001'")
fast_atoi string: 39093001
(u'37693447', "u'37693447'")
fast_atoi string: 37693447
(u'O&699536', "u'O&699536'")
fast_atoi string: O&
Segmentation fault (core dumped)

Do you need the GDB output?

from fastnumbers.

SethMMorton avatar SethMMorton commented on May 27, 2024

I imagine the GDB output won't tell anything we haven't seen before.

One thing I notice right away from the two runs is that it is not failing on the same input, but they both begin with O&. I wonder what would happen if you didn't let those strings go to fast_int...

Could you let me know if you get a crash doing either of the following?

First, try modifying the printer function to look like this:

def printer(x):
    print(x)
    sys.stdout.flush()
    return '' if x.startswith('O&') else x

This will remove any string beginning with the "bad" characters from the pool. If you don't get any crashes with that, try the following:

def printer(x):
    print(x)
    sys.stdout.flush()
    return x.replace('O&')

To see if we can stop the problem just by removing the leading bad characters.

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

Unfortunately removing the bad characters isn't acceptable for my purposes (ditto for the nans). The data that I'm reading and sorting must remain exactly as it's written in the source file. Otherwise the output will not match the inputs. It's a government thing!

Trying either new printer function I get:

Traceback (most recent call last):
File "daeric2.py", line 375, in readCSV
result_list = natsorted(result_list, key=self.printer)# if the results are mixed text and numbers, use natural sort
File "/usr/local/lib/python2.7/dist-packages/natsort-4.0.0-py2.7.egg/natsort/natsort.py", line 234, in natsorted
return sorted(seq, reverse=reverse, key=natsort_keygen(key, alg=alg))
File "/usr/local/lib/python2.7/dist-packages/natsort-4.0.0-py2.7.egg/natsort/utils.py", line 294, in _natsort_key
val = key(val)
TypeError: printer() takes exactly 1 argument (2 given)

from fastnumbers.

SethMMorton avatar SethMMorton commented on May 27, 2024

If you made printer part of a class, you will need to add self as part of the function definition, as in def printer(self, x):, or you should make it a @staticmethod to not need self. I think this is the origin of the new error you are seeing.

I wasn't suggesting removing the bad stuff for real, just in our debugging.

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

Ahh! I see. Would that also apply to the replace code? I think so (was getting TypeError: replace() takes at least 2 arguments (1 given)). I added it there as well and got:

12138003
Traceback (most recent call last):
File "daeric2.py", line 375, in readCSV
result_list = natsorted(result_list, key=self.printer)# if the results are mixed text and numbers, use natural sort
File "/usr/local/lib/python2.7/dist-packages/natsort-4.0.0-py2.7.egg/natsort/natsort.py", line 234, in natsorted
return sorted(seq, reverse=reverse, key=natsort_keygen(key, alg=alg))
File "/usr/local/lib/python2.7/dist-packages/natsort-4.0.0-py2.7.egg/natsort/utils.py", line 294, in _natsort_key
val = key(val)
File "daeric2.py", line 539, in printer
return x.replace(self, 'O&')
TypeError: coercing to Unicode: need string or buffer, Example found

from fastnumbers.

SethMMorton avatar SethMMorton commented on May 27, 2024

Sorry, it should be x.replace('O&', ''), since we need to replace the string with something.

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

I should have seen that, sorry. I fixed that line and the file processed successfully! So it's something about the O& that's causing the problem?

from fastnumbers.

SethMMorton avatar SethMMorton commented on May 27, 2024

That's what it looks like.

As a temporary workaround, can you try the following?

a = natsorted(your_data, key=lambda x: x.replace("&", "$"))

This will replace all ampersands with dollar signs. These are next to each other on the ASCII table, so it shouldn't mess up the sort order, but it might prevent this seg fault. This might get you by while I figure out the seg fault.

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

Hmm....

fast_atoi string: O$
fast_atoi string: 795367
fast_atoi string: O$
fast_atoi string: 718174
fast_atoi string: 37490261
fast_atoi string: 37529450
fast_atoi string: 35570246
fast_atoi string: O
fast_atoi string: 1062485
fast_atoi string: 35241067
fast_atoi string: O$
Segmentation fault (core dumped)

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

Ran the code in gdb again and got a different segfault:

fast_atoi string: 11082136
fast_atoi string: 37593519
fast_atoi string: 12005032
fast_atoi string: T!
[New Thread 0xa97fab40 (LWP 26397)]
[New Thread 0xb4351b40 (LWP 26386)]
[New Thread 0xb3b50b40 (LWP 26385)]

Program received signal SIGSEGV, Segmentation fault.
0xb7e102f4 in _IO_vfprintf_internal (s=0xb7f85e80 <IO_2_1_stdout>, format=, ap=0xbfffcbbc "4\300۬\360\064") at vfprintf.c:2039
2039 vfprintf.c: No such file or directory.

from fastnumbers.

SethMMorton avatar SethMMorton commented on May 27, 2024

Ok, so it is related to having to split the string before sending to fast_int. I will try to get a VM to replicate this. Thanks for your help.

In the meantime, you can uninstall fastnumbers to avoid the segfault.

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

No worries, I still have several weeks before initial deployment. Thanks for working so hard on this.

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

I installed Kubuntu in a virtualbox (and wasn't that fun) but was unable to reproduce the problem, using the same code and data file as on my machine. The versions of Kubuntu were both 15.04.

from fastnumbers.

SethMMorton avatar SethMMorton commented on May 27, 2024

Huh... that doesn't give me much hope that I will be able to reproduce.

It's not clear to me if the problem is originating from my C code, or if it originating from something else. Internally, natsort is using re.findall to split your input into numbers and non-numbers, and sending this split list to fast_int from fastnumbers to do the conversion. So, it's not clear to me if the reason for the failure is because I am not handling this input correctly, or if re.findall is giving poorly formed strings to parse. It is also entirely possible that there is some third problem causing this. Without being able to reproduce I am not sure how I will solve the problem.

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

Understood. I'll try to re-install everything and see if I can get my system like the virtualbox I set up. I'll let you know what happens. Thanks for working on this.

from fastnumbers.

SethMMorton avatar SethMMorton commented on May 27, 2024

You didn't happen to be using any special arguments to natsort like LOCALE, did you?

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

Nothing but:

result_list = natsorted(result_list)

I'll fiddle around some more with this when I get a chance.

from fastnumbers.

glepore70 avatar glepore70 commented on May 27, 2024

I re-created the crash on a Kubuntu 15.04 virtualbox image. I've saved the box as a .ova file, which you should be able to download and open in virtualbox. Please email me at [email protected] and I will give you the download address of the .ova file and some brief instructions on reproducing the error. Thanks!

from fastnumbers.

SethMMorton avatar SethMMorton commented on May 27, 2024

I would like to award @glepore70 the "Best Bug Reporter" imaginary internet award for taking the time to create a virtual machine image of the system on which the segfault occurs and sending it to me to debug. I don't imagine many users would go through the hassle to fix the problem... they would just uninstall and move on. Thanks so much!

from fastnumbers.

SethMMorton avatar SethMMorton commented on May 27, 2024

The segfault was related to making a bad assumption when dealing with character arrays.

The Python C-API to get a char* from a string/bytes object is varied, but the simplest version looks a bit like the following:

if (PyBytes_Check(input)) {
    str = PyBytes_AS_STRING(input);
}

Note this is just a straight pointer assignment, no strcpy call is done. As long as the input object is not deleted and str is being used as read-only, this is a fairly safe strategy. The problem arises when the input is not string/bytes, but unicode:

if (PyUnicode_Check(input)) {
    temp_bytes = PyUnicode_AsEncodedString(input, "ascii", "strict");
    if (temp_bytes != NULL) {
        str = PyBytes_AS_STRING(temp_bytes);
        Py_DECREF(temp_bytes);   // <-- Uh-Oh!
    }
}

To extract the char* the unicode object must be first converted to bytes. This bytes object is only temporary, which means that as soon as the object is garbage collected* (i.e. deallocated) the str pointer will not point to anything meaningful. When one tries to access the dangling str, a sefgault happens.

The interesting thing is that Python only periodically performs garbage collection, so most of the time the temporary bytes object remains in memory for the duration of the fastnumbers function call even though its reference count is zero. In fact, a segfault would only occur if Python initiates garbage collection on a Py_DECREF call inside the fastnumbers code. Apparently, this is a rare event since I was unable to reproduce the segfault on my machine, and none of my Travis-CI runs had a segfault either.

The solution of this problem is to force fastnumbers to take ownership of the contents of str (i.e. make a strcpy call), and not rely on Python keeping it alive for the duration of the function call:

if (PyBytes_Check(input)) {
    PyBytes_AsStringAndSize(input, &s, &s_len);
    str = malloc((size_t)s_len + 1);
    strcpy(str, s);
} else if (PyUnicode_Check(input)) {
    temp_bytes = PyUnicode_AsEncodedString(input, "ascii", "strict");
    if (temp_bytes != NULL) {
        PyBytes_AsStringAndSize(temp_bytes, &s, &s_len);
        str = malloc((size_t)s_len + 1);
        strcpy(str, s);  // <-- Now I own the contents of str
        Py_DECREF(temp_bytes);  // <-- Now not a problem
    }
}

The only caveat now is that str must be freed at some point, so I had to do a bit of rework of my other code to ensure a free(str) call was made before returning to Python.

I will merge this with master tonight and make an official release to PyPI.


*Calling Py_DECREF reduces the reference count of the object, and when the garbage collector detects that an object has a 0 reference count it will be destroyed (i.e. deallocated).

from fastnumbers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.