GithubHelp home page GithubHelp logo

Comments (10)

peterjaap avatar peterjaap commented on September 21, 2024

What would you propose we use for threading? pthreads is only safe in PHP 7.2+ which will restrict usage to only 7.2, which isn't very widely adopted yet.

Another but hacky way would be to let masquerade call itself through Symfony Process + one of the Parallel Process packages.

from masquerade.

tdgroot avatar tdgroot commented on September 21, 2024

While pthreads is a really nice extension, I think it's better to go for Symfony Process. I've seen several applications where runtime calls itself to launch the subprocesses.

Aside from the php 7.2+ support, php-pthreads hasn't been adopted by most distros. I think the extension needs a bit more time to get adopted into the php ecosystem.

from masquerade.

peterjaap avatar peterjaap commented on September 21, 2024

I've done some preliminary work in the 24-subprocesses branch. See commit 09ce2ac. I start a parallel process per group here, not per table.

It works, but it immediately outputs the progressbar for every running process (not to mention that it shows the logo and the 'Done anonymizing' outputs for every process as well), making a mess;

 _  _. _ _.    _ .__. _| _  
| | |(_|_>(_||_|(/_|(_|(_|(/_ 
            |
                   by elgentos
                        v0.1.1
                              
._ _  _. _ _.    _ .__. _| _  
| | |(_|_>(_||_|(/_|(_|(_|(/_ 
            |
                   by elgentos
                        v0.1.1

Updating admin_user
     0/26750 [>---------------------------]   0%                              
._ _  _. _ _.    _ .__. _| _  
| | |(_|_>(_||_|(/_|(_|(_|(/_ 
            |
                   by elgentos
                        v0.1.1

Updating sales_creditmemo

Updating email_contact

Updating newsletter_subscriber
  0/14 [>---------------------------]   0%   0/280 [>---------------------------]   0%     0/11616 [>---------------------------]   0%     0/11670 [>---------------------------]   0%                              
._ _  _. _ _.    _ .__. _| _  
| | |(_|_>(_||_|(/_|(_|(_|(/_ 
            |
                   by elgentos
                        v0.1.1
                              
._ _  _. _ _.    _ .__. _| _  
| | |(_|_>(_||_|(/_|(_|(_|(/_ 
            |
                   by elgentos
                        v0.1.1
                              
._ _  _. _ _.    _ .__. _| _  
| | |(_|_>(_||_|(/_|(_|(_|(/_ 
            |
                   by elgentos
                        v0.1.1

Updating sales_invoice

Updating review_detail
    0 [>---------------------------]
Done anonymizing
     0/32941 [>---------------------------]   0%                              
._ _  _. _ _.    _ .__. _| _  
| | |(_|_>(_||_|(/_|(_|(_|(/_ 
            |
                   by elgentos
                        v0.1.1

Updating sales_order
     0/38379 [>---------------------------]   0%
Updating quote
     0/78297 [>---------------------------]   0%

Done anonymizing
 14/14 [============================] 100%^

from masquerade.

tdgroot avatar tdgroot commented on September 21, 2024

Cool stuff! It might be better to work with return values in the subprocesses, read them in the master process and create progress bars based on that.

Not sure if that's possible, this is new stuff for me haha.

from masquerade.

peterjaap avatar peterjaap commented on September 21, 2024

https://github.com/krakjoe/parallel

This is really nice but unfortunately a PECL extension which limits usage. So a no-go.

from masquerade.

tdgroot avatar tdgroot commented on September 21, 2024

FWIW, I wrote a little bash script to make it possible for now:

DATABASE="your_db_name"
MASQUERADE_PLATFORM="magento2"
MASQUERADE_GROUPS=($(bin/masquerade groups --platform "${MASQUERADE_PLATFORM}" | grep '|' | grep -v 'Group' | awk -F'|' '{print $3}' | uniq))

for group in "${MASQUERADE_GROUPS[@]}"
do
    echo "Starting process for group $group"
    screen -d -m -S "anonymize_${group}" bin/masquerade run --platform "${MASQUERADE_PLATFORM}" --database "${DATABASE}" --group "${group}"
done

from masquerade.

peterjaap avatar peterjaap commented on September 21, 2024

I created a subprocess per group/table combi;

image

from masquerade.

peterjaap avatar peterjaap commented on September 21, 2024

@tdgroot could you test it?

You can clone the https://github.com/elgentos/masquerade/tree/24-subprocesses branch and run bin/masquerade to test it

from masquerade.

peterjaap avatar peterjaap commented on September 21, 2024

This is a nicer package with some more options; https://github.com/graze/parallel-process. Using this, we could do something like;

<?php

namespace Elgentos\Masquerade\Commands;

use Graze\ParallelProcess\Event\RunEvent;
use Graze\ParallelProcess\PriorityPool;
use Graze\ParallelProcess\RunInterface;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Input\InputOption;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Process\Process;

class ExampleCommand extends Command
{
    /**
     * @var OutputInterface
     */
    private OutputInterface $output;

    protected function configure()
    {
        $this
            //
            ->addOption('subprocess', 's', InputOption::VALUE_OPTIONAL, 'Whether command is ran as subprocess', false);
    }

    protected function execute(InputInterface $input, OutputInterface $output)
    {
        $this->output = $output;
        if ($input->getOption('subprocess')) {
            sleep(2);
            $output->write(json_encode(['date' => date('d-m-Y H:i:s')]));
            return 0;
        }

        $pool = new PriorityPool();
        $pool->setMaxSimultaneous(5);
        for ($i = 0; $i < $pool->getMaxSimultaneous() * 4; $i++) {
            $pool->add(new Process(['php', 'application.php', '--subprocess=1']));
        }

        array_map([$this, 'addCallback'], $pool->getAll());

        $pool->run();

        return 0;
    }

    public function addCallback(RunInterface $run)
    {
        $run->addListener(
            RunEvent::SUCCESSFUL,
            function (RunEvent $event) {
                $data = json_decode($event->getRun()->getLastMessage(), true);
                $this->output->writeln('The date is ' . $data['date']);
            }
        );
    }
}

from masquerade.

peterjaap avatar peterjaap commented on September 21, 2024

The main problem this issue tried to solve was speed, and that was fixed in version 0.3.0. So closing this issue.

from masquerade.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.