papidb / interesting-papers Goto Github PK
View Code? Open in Web Editor NEWThis project forked from dgryski/interesting-papers
Interesting papers I'd like to implement (or at least have implementations of)
This project forked from dgryski/interesting-papers
Interesting papers I'd like to implement (or at least have implementations of)
This is a list of papers I would like to implement, or would like to have an implementation of. This list is likely to change as my interests change, including deletions. Do not expect this list to remain static. These are in no particular order: SONIK: Efficient In-situ All Item Rank Generation using Bit Operations - https://arxiv.org/abs/1605.06992 CAMP: A Cost Adaptive Multi-Queue Eviction Policy for Key-Value Stores - http://dblab.usc.edu/users/papers/CAMPTR.pdf SimString: A fast and simple algorithm for approximate string matching/retrieval http://www.chokkan.org/software/simstring/ Simpira: cryptographic permutations designed to be fast on modern 64-bit processors, yet provide a comfortable security margin against all currently-known attacks. - http://mouha.be/simpira/ Autoscaling Bloom Filter: Controlling Trade-off Between True and False Positives - https://arxiv.org/abs/1705.03934 Adaptive Cuckoo-Filters - https://arxiv.org/abs/1704.06818 Continuous Top-k Queries over Real-Time Web Streams - https://arxiv.org/abs/1610.06500 A practical index for approximate dictionary matching with few mismatches - https://arxiv.org/abs/1501.04948 Robust benchmarking in noisy environments - https://arxiv.org/abs/1608.04295 Fast intersection of sorted lists with SSE: - https://highlyscalable.wordpress.com/2012/06/05/fast-intersection-sorted-lists-sse/ - Also, https://arxiv.org/abs/1401.6399 PAD: Performance Anomaly Detection in Multi-Server Distributed Systems https://www.microsoft.com/en-us/research/wp-content/uploads/2014/06/PAD-Performance-Anomaly-Detection-in-Multi-Server-Distributed-Systems.pdf Detecting Abnormal Machine Characteristics in Cloud Infrastructures - https://ti.arc.nasa.gov/publications/4268/download/ PerfAugur: Robust Diagnostics for Performance Anomalies in Cloud Services - https://www.microsoft.com/en-us/research/publication/perfaugur-robust-diagnostics-for-performance-anomalies-in-cloud-services/ Statistical Techniques for Online Anomaly Detection in Data Centers - http://www.hpl.hp.com/techreports/2011/HPL-2011-8.pdf Fast table-driven base64 encoding/decoding: - https://github.com/powturbo/TurboBase64/blob/master/turbob64d.c Assembly versions of hash functions / cryptographic algorithms: - t1ha (Go version: https://github.com/dgryski/go-t1ha ) - rc5 / rc6 (Go version: https://github.com/dgryski/go-rc5 / https://github.com/dgryski/go-rc6 ) In-memory data layout for Netflix's Hollow: - http://hollow.how/advanced-topics/#in-memory-data-layout Omnisearch Index Formats - https://blog.twitter.com/2016/omnisearch-index-formats NORX8 and NORX16: Authenticated Encryption for Low-End Systems - https://eprint.iacr.org/2015/1154 LightMAC: A MAC Mode for Lightweight Block Ciphers: - https://eprint.iacr.org/2016/190.pdf Fast Deterministic Selection (adaptive QuickSelect) - https://arxiv.org/abs/1606.00484 A Bloom filter based semi-index on q-grams - https://arxiv.org/abs/1507.02989 Faster Population Counts using AVX2 Instructions - https://arxiv.org/abs/1611.07612 Quasi-Succinct Indices (compressed inverted indexes): - http://vigna.di.unimi.it/ftp/papers/QuasiSuccinctIndices.pdf Efficient Summing over Sliding Windows (stream statistics) - http://arxiv.org/pdf/1604.02450v1.pdf A Novel Technique for Long-Term Anomaly Detection in the Cloud - https://www.usenix.org/system/files/conference/hotcloud14/hotcloud14-vallis.pdf - Twitter's anomaly detection algorithm - related, http://www.ebaytechblog.com/2015/08/19/statistical-anomaly-detection/ - related, http://nerds.airbnb.com/anomaly-detection/ TinySet - An Access Efficient Self Adjusting Bloom Filter Construction - http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2015/CS/CS-2015-03.pdf Detecting Change in Data Streams: - https://cs.uwaterloo.ca/~shai/vldb04.pdf Hierarchical Delta Debugging: - https://blog.acolyer.org/2015/11/17/hierarchical-delta-debugging/ - (to go with https://github.com/dgryski/go-ddmin ) FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space - http://cs.fit.edu/~pkc/papers/tdm04.pdf - many implementations to use as base, for example https://github.com/slaypni/fastdtw/blob/master/fastdtw.py Mining frequent items in the time fading model - http://arxiv.org/pdf/1601.03892v1.pdf Hierarchical Agglomerative Clustering: - http://nlp.stanford.edu/IR-book/html/htmledition/hierarchical-agglomerative-clustering-1.html - needed for https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/rebucket-icse2012.pdf - preliminary implementation of rebucket: https://github.com/dgryski/go-rebucket Balanced Allocation: Patience is not a Virtue (FirstDiff load balancing): - http://arxiv.org/abs/1602.08298 Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream - http://www.cs.ubc.ca/~xujian/paper/quant.pdf The Eternal Sunshine of the Sketch Data Structure - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.146.2889&rep=rep1&type=pdf Copysets and Chainsets: A Better Way to Replicate http://hackingdistributed.com/2014/02/14/chainsets/ A Fast Algorithm for Approximate Quantiles in High Speed Data Streams - http://web.cs.ucla.edu/~weiwang/paper/SSDBM07_2.pdf - this algorithm has haunted me for ages, I could never get my code working - unresponsive authors, details missing from papers, etc - there now appear to be more implementations that could be used as a base
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.