GithubHelp home page GithubHelp logo

Comments (2)

SpiralAPI avatar SpiralAPI commented on May 26, 2024

I have been experiencing the same thing. For me, it seems to clone the json files contents at the end of the file (basically if you copied everything, and then pasted it again at the end of the json file)

from tinydb.

MrPigss avatar MrPigss commented on May 26, 2024

The problem is that you don't use Locks while reading and writing. Because of this it's possible that one thread or process does a 'seek' operation just before another thread or process wants to read.

For example, below is the code for a read using the JSONStorage.

def read(self) -> Optional[Dict[str, Dict[str, Any]]]:
        # Get the file size by moving the cursor to the file end and reading
        # its location
        self._handle.seek(0, os.SEEK_END)
        size = self._handle.tell()

        if not size:
            # File is empty, so we return ``None`` so TinyDB can properly
            # initialize the database
            return None
        else:
            # Return the cursor to the beginning of the file
            self._handle.seek(0)

            # Load the JSON contents of the file
            return json.load(self._handle)
  1. proces one does a 'read'.
    1.1 proces one does self._handle.seek(0, os.SEEK_END), the cursor is now at the end of the file.
    1.2 proces one does self._handle.seek(0), the cursor is now at the beginning of the file.
  2. proces two start a read.
    2.1 proces two does self._handle.seek(0, os.SEEK_END), the cursor is now at the end of the file.
  3. proces one does a json.load(self._handle) -> you read from the end of a file -> file seems empty

process one set the cursor to the beginning, but proces 2 changed it to the end just before process one wants to read, resulting in an empty str. This is one way things can go wrong but you can imagine that there are a lot of ways that this can fail. Like @SpiralAPI saw there might be a proces that does a self._handle.seek(0, os.SEEK_END) just before a write resulting in appending all the data instead of overwriting.

It's not very usefull to do a search using multiple processes or thread over a single file. Since you would need to use locks every time you read or write, you basically turned it into a synchronous operation.

If you would need to do a CPU-intensive task, it's beter to read everything ahead of time (or at least in chunks) and then pass the data to different processes or threads.

from tinydb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.