GithubHelp home page GithubHelp logo

jumble_solver_python's Introduction

Jumble Solver

Given a file with a list of words (e.g. this) and an English word, it will find all Jumble solutions for it, including substrings and anagrams.

Usage

python3 jumble.py [-h] word_list word

Positional arguments

  • word_list: File containing a list of words, one per line
  • word: Word used to find Jumble solutions

On this repository, there is an example word list file, from this repo.

Example

Using words.txt and dog as arguments, we get the following output:

$ python3 jumble.py words.txt dog
do
od
dg
gd
go
og
god

Implementation

As a reference for the time complexity, I've used this official Python page and these lecture notes. I've separated the code into the following parts:

main

Parses the arguments, calls the parse_word_list_file and jumble functions, then prints the result.

parse_word_list_file

Creates a jumble dictionary from a file containing a list of words. The keys of this dict are lowercase words with their letters sorted alphabetically, while the values are a list of original words. Storing the lowercase sorted words as the key ensures we will find anagrams with O(1) using Python dict.

jumble

Given the Jumble dict object and an input word, it will search for all jumble solutions for this word. Since we also want to search for substrings, it first finds the unique combinations of all letters for the input, from 2 to input word length. For each combination, we sort it and remove any whitespaces, then search on our jumble dictionary. If we find the sorted word on the jumble dict, we then merge the original words list our temporary dict object. We use it to avoid duplicates. The implementation of itertools.combinations returns unique combinations considering the position, not the value of the element. For example, the word "robot" would give the combination "ot" two times. However, as we will see in the Complexity Analysis section, the complexity of this code scales exponentially with the size of the input word, and the insertion on a dict is constant. So, in terms of time complexity, it is better to not remove the duplicates and let dict deal with the duplicates, because it has an O(1) index and store complexity.

Complexity analysis

Consider N the number of words on file and S the length of a word on that list. On parse_word_list_file function, we iterate over all words in file O(N) and run the following operations, with their respective complexity:

  1. strip: O(S)
  2. lower: O(S)
  3. sorted: O(S log S)
  4. setdefault: O(1)
  5. append: O(1)

Thus, the complexity of parse_word_list_file function is:

O(N) * (O(S) + O(S) + O(S log S) + O(1)) = O(N * (2S + S log S + 1)) = O(2NS + NS log S + N) = O(NS log S)

Assuming that we are reading a list of words in English, and the average word length in English is 4.7, we could then replace S by 5:

O(N 5 log 5) = O(N)

On jumble, consider L the length of the input word, R the combination size, N the number of words from the input file, and C() the choose operator. Then, we do the following operations:

  • Loop R from 2 to input word length: so O(L-1) = O(L)
    • Find the combinations, and iterate over their result: O(C(L, R) * R)
      • For each combination, we lower and sort, worst case it has length L: O(L) + O(L log L)
      • Then we search for the word on the jumble dict: O(1)
      • Append the result to another list: O(1)

Thus, the final complexity of jumble is O(L) * O(C(L, R) * R) * (O(L log L) + O(L) + O(1) + O(1))). But, from R=1 to L, the complexity of O(L) * O(C(L, R) * R) is O(2^L -1), and from R=2 to L, O(2^L-L-1) = O(2^L). Then, simplifying the complexity of jumble, we get: O(2^L) * (O(L log L) + O(L) + O(1) + O(1)) = O(2^L * L log L)

On main, we iterate over the results from jumble and print. In the worst case, we have all words from the initial file, so for printing the results we would have an O(N) complexity. Finally, adding main, jumble, and parse_word_list_file complexities, we get:

O(N) + O(N) + O(2^L * L log L) = O(N) + O(2^L * L log L)

Experiments

To evaluate this result, I've prepared the script complexity_experiment.py, which runs the jumble function with a list of words with an increasing number of letters. The following figure shows the execution time in seconds on Y-axis against the number of letters in the X-axis. We can see the exponential behavior on execution time, as we found in section Complexity analysis.

plot

Unit tests

To test this software and allow a safe improvement of its performance without breaking the initial requirements, I used the Python built-in library unittest to implement tests. To run them, execute the following command from the project directory:

python3 -m unittest tests.tests.JumbleTests

jumble_solver_python's People

Contributors

lucascoelhof avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.