GithubHelp home page GithubHelp logo

Images blacklist about embed HOT 13 CLOSED

oscarotero avatar oscarotero commented on May 18, 2024
Images blacklist

from embed.

Comments (13)

oscarotero avatar oscarotero commented on May 18, 2024

Hi, that's a great idea, and it's not difficult to implement. How about something like this?

$config = [
    'adapter' => [
        'config' => [
            'minImageWidth' => 16,
            'minImageHeight' => 16,
            'imagesBlacklist' => [
                'http?://ads.com/*',
                '*banners/*',
            ]
        ]
    ]
];

from embed.

soullivaneuh avatar soullivaneuh commented on May 18, 2024

I didn't think about regex, good catch!

But, will not regexp be annoying to make with special chars like / or ?, used for standard URL?

Maybe something like this?

$config = [
    'adapter' => [
        'config' => [
            'minImageWidth' => 16,
            'minImageHeight' => 16,
            'imagesBlacklist' => [
                'plain' => [
                    'http://example.com/full/path/to/image.png'
                ],
                'regex' => [
                    'http?://ads.com/*',
                    '*banners/*',
                ]
            ]
        ]
    ]
];

from embed.

oscarotero avatar oscarotero commented on May 18, 2024

Well, that's not really a regex, it's a url string that can contain two special chars: ? and * and it's converted to regex automatically by this function:
https://github.com/oscarotero/Embed/blob/master/src/Url.php#L57

This is used already in many places, for example for oembed patterns (https://github.com/oscarotero/Embed/blob/master/src/Providers/OEmbed/Instagram.php#L21), so you can do this:

$config = [
    'adapter' => [
        'config' => [
            'minImageWidth' => 16,
            'minImageHeight' => 16,
            'imagesBlacklist' => [
                'http://example.com/full/path/to/image.png',
                'http?://ads.com/*',
                '*banners/*',
            ]
        ]
    ]
];

from embed.

soullivaneuh avatar soullivaneuh commented on May 18, 2024

Ok, let's do this so! 👍

Do you have time to do it soon? If you want, I can work for a PR this evening.

from embed.

oscarotero avatar oscarotero commented on May 18, 2024

I wont have time today, so pull requests are welcome.
I think the filter can be done in this function https://github.com/oscarotero/Embed/blob/master/src/Adapters/Adapter.php#L305
Thanks 😄

from embed.

soullivaneuh avatar soullivaneuh commented on May 18, 2024

Ok I will take a look, thanks.

from embed.

soullivaneuh avatar soullivaneuh commented on May 18, 2024

I'm woking on it.

If I try this var_dump for testing:

    public function getImagesUrls()
    {
        $data = Utils::getData($this->providers, 'imagesUrls', $this->request->url);

        var_dump($data);

        return $data;
    }

I sometime got this:

array(1) {
  [0] =>
  array(2) {
    'value' =>
    string(0) ""
    'providers' =>
    array(1) {
      [0] =>
      string(9) "opengraph"
    }
  }
}

Is that normal? What sould I suppose to do with that? Nothing?

from embed.

oscarotero avatar oscarotero commented on May 18, 2024

value should contain the image url and providers the list of providers that provide this url (in this case, only opengraph). The value key is empty, what url are you using for testing?

from embed.

soullivaneuh avatar soullivaneuh commented on May 18, 2024

Many URLs with this kind of result. But, if I dump getImages data the empty data aren't present.

For example:

string(67) "http://rss.lefigaro.fr/~r/lefigaro/laune/~3/FpJOta0Eow0/story01.htm"
string(13) "getImagesUrls"
array(2) {
  [0] =>
  array(2) {
    'value' =>
    string(0) ""
    'providers' =>
    array(1) {
      [0] =>
      string(9) "opengraph"
    }
  }
  [1] =>
  array(2) {
    'value' =>
    string(82) "http://evene.lefigaro.fr/files/imagecache/celebrity_image_full/celebrity/13658.jpg"
    'providers' =>
    array(1) {
      [0] =>
      string(4) "html"
    }
  }
}
string(9) "getImages"
array(1) {
  [0] =>
  array(6) {
    'value' =>
    string(82) "http://evene.lefigaro.fr/files/imagecache/celebrity_image_full/celebrity/13658.jpg"
    'providers' =>
    array(1) {
      [0] =>
      string(4) "html"
    }
    'width' =>
    int(90)
    'height' =>
    int(90)
    'size' =>
    int(8100)
    'mime' =>
    string(10) "image/jpeg"
  }
}

Maybe they are filtered on getImages? I think I should filter for the blacklist here instead of getImagesUrls.

from embed.

oscarotero avatar oscarotero commented on May 18, 2024

getImages get the urls provided by getImagesUrls and execute curl process to get the image dimmensions and mimetypes, and removes the non-valid images (urls that does not exist or does not belong to real images). I think it's better to filter the images before this, to avoid unnecessary requests. GetImagesUrls should remove the empty values and check each value whether or not is included in the black list.

from embed.

soullivaneuh avatar soullivaneuh commented on May 18, 2024

Ok so two goals now for my PR:

  • Remove empty URLs
  • Remove blacklisted URLs

All on getImagesUrlsfunction. Are we good?

from embed.

oscarotero avatar oscarotero commented on May 18, 2024

👍

from embed.

soullivaneuh avatar soullivaneuh commented on May 18, 2024

I close to continue on #55.

from embed.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.