nphoff / saxpy Goto Github PK
View Code? Open in Web Editor NEWPython implementation of Symbolic Aggregate approXimation
License: MIT License
Python implementation of Symbolic Aggregate approXimation
License: MIT License
Thanks for sharing this.
I'm not an expert, so I would like to ask you, would it be possible to use your code to search for the top-k similar substrings.
For instance, given
target = [ sin(x)*2 for x in range(0, 100) ]
query = [ sin(x)*2 for x in range(2, 5) ]
to retrieve the windows from target
that are most similar to query
?
I guess I should use s.batch_compare(tStrings,qString)
somehow, but I'm not sure how.
Thanks a lot!
Can we assume it's MIT or GPL?
Hi,
I read the original paper about SAX and get really confused about the meaning of scalingFactor in your implementation.
I get a use case in which there are 7 template signals that are not in the same length and the signal being compared is also in the different length. When comparing that signal with templates, it seems that scalingFactor does not affect the final result, which is the template with minimum distance.
Would you please give some hints about how to use the library for my case?
Thanks in advance
ImportError Traceback (most recent call last)
in ()
----> 1 from saxpy import SAX
ImportError: cannot import name 'SAX'
Hi. In the normalize function, if np.nanstd(X) < self.eps: the code goes and constructs the list res with 0's and NaN's as appropriate. However, it doesn't return res, but falls through to the computation which should only be done if np.nanstd(X) >= self.eps. I think this is a bug. I can submit a patch if you like, but it's so simple, I figured it should just be added. Thanks.
Jeff Becker ([email protected])
Hi Nathan,
I was trying to use your implementation, but I guess it contains some bugs, as far as I can figure it out.
I have a sax_sequence A,
A: aaaaccccbbaa
and a longer sequence "sequence" C:
match score between subsequence, and A
^ indexes
| ^ |> subsequence in C
W: 0.000 (0, 24) bbbbbbbbbbbb
W: 0.000 (2, 26) bbbbbbbbbbbb
W: 0.000 (18, 42) aaaaccbbbbba
W: 0.000 (48, 72) aaaaccccbbaa
W: 0.000 (70, 94) bbbbbbbbbbbb
W: 0.000 (72, 96) bbbbbbbbbbbb
W: 0.000 (74, 98) bbbbbbbbbbbb
W: 0.000 (76, 100) bbbbbbbbbbbb
It hink it's a bug that bbbbbbbbbbbb is equal to aaaaccccbbaa, no?
The problem is that compareDict does not make sense (e.g. difference(a,b)=0 and difference(b,c)=0)
e.g. print s.compareDict
{'aa': 0, 'ac': 0.86, 'ab': 0, 'ba': 0, 'bb': 0, 'bc': 0, 'cc': 0, '**cb': 0, '**ca': 0.86}
sequence = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 23.0, 73.0, 73.0, 75.0, 30.0, 16.0, 19.0, 27.0, 33.0, 19.0, 5.0, 20.0, 19.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.0, 18.0, 16.0, 11.0, 30.0, 10.0, 39.0, 12.0, 2.0, 15.0, 16.0, 4.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 19.0, 6.0, 39.0, 27.0, 18.0, 20.0, 38.0, 34.0, 33.0, 10.0, 10.0, 15.0, 10.0, 8.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 9.0, 10.0, 10.0, 35.0, 25.0, 24.0, 18.0, 28.0, 18.0, 16.0, 18.0, 31.0, 10.0, 10.0, 15.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 8.0, 30.0, 25.0, 13.0, 13.0, 28.0, 27.0, 20.0, 13.0, 9.0, 11.0, 5.0, 8.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 11.0, 4.0, 18.0, 26.0, 13.0, 23.0, 16.0, 13.0, 15.0, 12.0, 17.0, 15.0, 24.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 16.0, 0.0, 0.0, 10.0, 9.0, 3.0, 27.0, 15.0, 18.0, 23.0, 25.0, 16.0, 12.0, 23.0, 13.0, 16.0, 10.0, 8.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 13.0, 8.0, 28.0, 25.0, 19.0, 15.0, 23.0, 8.0, 23.0, 30.0, 28.0, 20.0, 25.0, 16.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 16.0, 10.0, 27.0, 24.0, 30.0, 27.0, 28.0, 41.0, 31.0, 25.0, 6.0, 25.0, 9.0, 9.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 15.0, 11.0, 25.0, 28.0, 15.0, 15.0, 23.0, 15.0, 23.0, 26.0, 15.0, 17.0, 12.0, 9.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 18.0, 12.0, 36.0, 28.0, 13.0, 21.0, 15.0, 19.0, 33.0, 36.0, 9.0, 6.0, 10.0, 6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 12.0, 42.0, 13.0, 23.0, 23.0, 49.0, 5.0, 6.0, 15.0, 13.0, 13.0, 11.0, 16.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 2.0, 16.0, 25.0, 17.0, 16.0, 25.0, 18.0, 18.0, 25.0, 17.0, 13.0, 12.0, 4.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 11.0, 3.0, 28.0, 20.0, 24.0, 21.0, 21.0, 21.0, 16.0, 32.0, 28.0, 15.0, 18.0, 15.0, 2.0, 11.0, 23.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 17.0, 13.0, 26.0, 15.0, 18.0, 15.0, 3.0, 0.0, 11.0, 19.0, 11.0, 17.0, 12.0, 4.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 18.0, 16.0, 26.0, 15.0, 19.0, 18.0, 20.0, 26.0, 11.0, 12.0, 10.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 12.0, 10.0, 45.0, 20.0, 15.0, 28.0, 20.0, 24.0, 16.0, 19.0, 20.0, 13.0, 19.0, 15.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 9.0, 4.0, 11.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 2.0, 5.0, 1.0, 15.0, 8.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 15.0, 12.0, 21.0, 25.0, 15.0, 15.0, 26.0, 2.0, 0.0, 2.0, 0.0, 4.0, 12.0, 16.0, 18.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 16.0, 0.0, 10.0, 12.0, 6.0, 20.0, 0.0, 0.0, 1.0, 27.0, 19.0, 25.0, 3.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.0, 24.0, 11.0, 25.0, 17.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 26.0, 26.0, 17.0, 6.0, 18.0, 17.0, 8.0, 17.0, 4.0, 21.0, 12.0, 16.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.0, 0.0, 6.0, 5.0, 16.0, 18.0, 23.0, 32.0, 17.0, 25.0, 5.0, 12.0, 13.0, 0.0, 0.0, 0.0]
motif = sequence[48:72]
s = SAX(12, 3)
(a_sax, a_indexes) = s.to_letter_rep(motif)
print "a_sax: %s" % a_sax
(sequence_strings, sequence_indexes) = s.sliding_window(sequence, len(sequence)/ len(motif)) x3x2ComparisonScores = s.batch_compare(sequence_strings,a_sax)_
count = 0.0
threshold = 0.1
print "A:\t\t_\t%s" % a_sax
for i, score in enumerate(x3x2ComparisonScores):
if score<threshold:
print "W: %.3f\t%s\t%s" % (score, sequence_indexes[i], sequence_strings[i])
Your input parameters are chosen somewhat strange in my opinion. I would expect to just pass, the gap_size in sliding_window and the size of interval for PAA (e.g. 2 for taking averages of two number in input sequence) in the SAX constructor.
I'm not sure if you still maintain this code, but it would be nice to have a correct implementation in python! However, if you choose not to maintain this code, I think it's better that you remove it as it does not seem to work properly...
Kind regards!
Len Feremans
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.