mtresearcher / pysuffix Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/pysuffix
Automatically exported from code.google.com/p/pysuffix
What steps will reproduce the problem?
1. Use the provided source code with the provided file with the correct version
of python and pysuffix, I don't know if this happens with other versions, but
it is better to be safe then sorry.
2. Compare the output of BWT(text) with bwt(text), using BWT(text)==bwt(text)
will give you a boolean answer if they are equivalent.
3. (This is more of a suggestion) play around with changing the length of the
string, I know that 12500 characters long from ww2.txt doesn't work (the last 2
are switched, and anything above that length seems to either do the same, or
other things seem to start going... weird), but 12000 characters does work.
Assuming the ww2.txt is in the same directory as the python file, it is best to
use text=readFile('ww2.txt')[:length] (this function is in my source code) to
get the first part of the document that has the length indicated.
What is the expected output? What do you see instead?
The expected output is the result of a Burrows Wheeler Transformation on the
text that is inputted, my code works for shorter strings, but pysuffix seems to
fail in the order (in my example it is the last 2 characters being switched) on
longer strings (+12500), and it seems to be the last part of the string that is
wrong. I am using the attached ww2.txt as a test to see if it is working or
not, and considering it is a 632545 character long file, I didn't think it is
the best idea to paste it here so it is attached. Also attached are ww2_bwt.txt
which is what the output should be (at 12500 characters) and ww2_BWT.txt which
is what the version with pysuffix is giving me (at 12500 characters). I believe
this difference is because of an error in the returned suffix array. My best
guess for why this is happening is a difference in sorting between unicode and
ascii, but I am probably very wrong. I would like to see this fixed, but I
don't know where to start on my end. You may ask 'you already have a function
to do this, why do you need pysuffix?' Well... my function is very slow, and it
takes an insane amount of memory when doing anything with longer strings, as
it's memory usage and processing time seem to be exponential with the length of
the string. On the other hand, there is a module (pysuffix) that does the hard
work for you and it is impressively fast (bravo by the way!) :)
What version of the product are you using? On what operating system?
I am using pysuffix v2.1 with python 2.7.6 on Windows 7
Please provide any additional information below.
Please email me ([email protected]) for any further questions, or if this
gets resolved, whether it is an error on my part or a new version of pysuffix :)
This is my source code:
import sys
sys.path.append('C:\Users\****\Desktop\python\pysuffix')#I have the pysuffix
file here, and the import works after I add the file location to sys.path, so
if you already have pysuffix installed somewhere else, this is not needed.
from tools_karkkainen_sanders import *
def BWT(text):#New Burrows-Wheeler transform using pysuffix
text+='\0'#addition of sign byte
def f(x):return text[x-1]#function for map to return value
return ''.join(map(f, simple_kark_sort(unicode(text,'utf-8','replace'))[:len(text)]))
def bwt(text):#old and slow, but reliable, Burrows-Wheeler transform
text+='\0'#addition of sign byte
def perm(x):return text[x:]+text[:x]#function for returning cyclic permutation
return ''.join([row[-1:] for row in sorted(map(perm,range(len(text))))])
def readFile(filename):#This is the function I am using to read files.
f=open(filename, 'rb')
text=f.read()
f.close()
return text
Thanks :)
Original issue reported on code.google.com by [email protected]
on 21 Apr 2015 at 10:27
Attachments:
What steps will reproduce the problem?
1. Open a command prompt.
2. Navigate to the root of the project.
3. Run this command: find . -name "*.py" | xargs grep \"\"\"
What is the expected output? What do you see instead?
I expect to see at least one line of docstrings on every module, class, method,
and function. Complex codeblocks whose purpose is not immediately obvious
should have more exentsive docs. Any function or method that takes a parameter
or returns a value should document that interface, including expected types, in
its docstring. Any function or method that requires context or bibliographical
references should get it.
What version of the product are you using? On what operating system?
Version 2.1, Ubuntu Linux, but this is an issue of source code hygiene; it is
is not system specific.
Please provide any additional information below.
An open source project will be adopted only if it has good docs.
Original issue reported on code.google.com by [email protected]
on 3 Apr 2012 at 6:05
What steps will reproduce the problem?
1.
2.
3.
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 4 Oct 2011 at 7:01
What steps will reproduce the problem?
1. Open a command prompt.
2. If you're in a virtualenv, run this: easy_install pysuffix
3. If not, run this: sudo easy_install pysuffix
What is the expected output? What do you see instead?
I expect pysuffix to install itself automatically from a known egg server. It
does not. Instead, I get this:
Searching for pysuffix
Reading http://pypi.python.org/simple/pysuffix/
Couldn't find index page for 'pysuffix' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading http://pypi.python.org/simple/
No local packages or download links found for pysuffix
error: Could not find suitable distribution for Requirement.parse('pysuffix')
What version of the product are you using? On what operating system?
Version 2.1, Ubuntu linux. This is a packaging issue which transcends operating
systems.
Please provide any additional information below.
Please provide for the packaging and distribution of this library. An open
source release will only be adopted if it is easy to install from known
repositories using standard tools.
Original issue reported on code.google.com by [email protected]
on 3 Apr 2012 at 6:10
$ python suffix_array.test.py
[2, 0, 3, 1, 0, 0, 0]
[2, 0, 1, 0]
Traceback (most recent call last):
File "suffix_array.test.py", line 14, in <module>
1/0
ZeroDivisionError: integer division or modulo by zero
Original issue reported on code.google.com by [email protected]
on 4 Oct 2011 at 7:02
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.