Comments (13)
Hi, that's a great idea, and it's not difficult to implement. How about something like this?
$config = [
'adapter' => [
'config' => [
'minImageWidth' => 16,
'minImageHeight' => 16,
'imagesBlacklist' => [
'http?://ads.com/*',
'*banners/*',
]
]
]
];
from embed.
I didn't think about regex, good catch!
But, will not regexp be annoying to make with special chars like /
or ?
, used for standard URL?
Maybe something like this?
$config = [
'adapter' => [
'config' => [
'minImageWidth' => 16,
'minImageHeight' => 16,
'imagesBlacklist' => [
'plain' => [
'http://example.com/full/path/to/image.png'
],
'regex' => [
'http?://ads.com/*',
'*banners/*',
]
]
]
]
];
from embed.
Well, that's not really a regex, it's a url string that can contain two special chars: ?
and *
and it's converted to regex automatically by this function:
https://github.com/oscarotero/Embed/blob/master/src/Url.php#L57
This is used already in many places, for example for oembed patterns (https://github.com/oscarotero/Embed/blob/master/src/Providers/OEmbed/Instagram.php#L21), so you can do this:
$config = [
'adapter' => [
'config' => [
'minImageWidth' => 16,
'minImageHeight' => 16,
'imagesBlacklist' => [
'http://example.com/full/path/to/image.png',
'http?://ads.com/*',
'*banners/*',
]
]
]
];
from embed.
Ok, let's do this so! 👍
Do you have time to do it soon? If you want, I can work for a PR this evening.
from embed.
I wont have time today, so pull requests are welcome.
I think the filter can be done in this function https://github.com/oscarotero/Embed/blob/master/src/Adapters/Adapter.php#L305
Thanks 😄
from embed.
Ok I will take a look, thanks.
from embed.
I'm woking on it.
If I try this var_dump for testing:
public function getImagesUrls()
{
$data = Utils::getData($this->providers, 'imagesUrls', $this->request->url);
var_dump($data);
return $data;
}
I sometime got this:
array(1) {
[0] =>
array(2) {
'value' =>
string(0) ""
'providers' =>
array(1) {
[0] =>
string(9) "opengraph"
}
}
}
Is that normal? What sould I suppose to do with that? Nothing?
from embed.
value
should contain the image url and providers
the list of providers that provide this url (in this case, only opengraph). The value key is empty, what url are you using for testing?
from embed.
Many URLs with this kind of result. But, if I dump getImages
data the empty data aren't present.
For example:
string(67) "http://rss.lefigaro.fr/~r/lefigaro/laune/~3/FpJOta0Eow0/story01.htm"
string(13) "getImagesUrls"
array(2) {
[0] =>
array(2) {
'value' =>
string(0) ""
'providers' =>
array(1) {
[0] =>
string(9) "opengraph"
}
}
[1] =>
array(2) {
'value' =>
string(82) "http://evene.lefigaro.fr/files/imagecache/celebrity_image_full/celebrity/13658.jpg"
'providers' =>
array(1) {
[0] =>
string(4) "html"
}
}
}
string(9) "getImages"
array(1) {
[0] =>
array(6) {
'value' =>
string(82) "http://evene.lefigaro.fr/files/imagecache/celebrity_image_full/celebrity/13658.jpg"
'providers' =>
array(1) {
[0] =>
string(4) "html"
}
'width' =>
int(90)
'height' =>
int(90)
'size' =>
int(8100)
'mime' =>
string(10) "image/jpeg"
}
}
Maybe they are filtered on getImages? I think I should filter for the blacklist here instead of getImagesUrls
.
from embed.
getImages
get the urls provided by getImagesUrls
and execute curl process to get the image dimmensions and mimetypes, and removes the non-valid images (urls that does not exist or does not belong to real images). I think it's better to filter the images before this, to avoid unnecessary requests. GetImagesUrls should remove the empty values and check each value whether or not is included in the black list.
from embed.
Ok so two goals now for my PR:
- Remove empty URLs
- Remove blacklisted URLs
All on getImagesUrls
function. Are we good?
from embed.
👍
from embed.
I close to continue on #55.
from embed.
Related Issues (20)
- How to display only html.description and html.tags HOT 4
- Smart/Curly Quotes Problem (Plus Emojis) HOT 1
- charset 1251 problem HOT 6
- Particular Website Shows Different URL contents HOT 3
- Stopped working with SPOTIFY HOT 1
- [Question] Vimeo with "Hide from Vimeo" setting but embeds allowed on specific domains HOT 4
- request's locale change support HOT 2
- psr/http-message V2 support?
- Maximum redirects followed instagram HOT 3
- New release? HOT 1
- Twitter embed->get returns null on production HOT 5
- Twitter extractor will retrieve "/home" instead of a tweet URL HOT 2
- Reddit data is very lacking
- The demo site is offline HOT 3
- It is not work for same domain
- wrong extracted data when fetching image link behind redirect
- Memory leak HOT 8
- Setting user agent for request does not seem to work HOT 2
- Add option to fetch OEmbed only HOT 2
- Unexpected aspect ratio format HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from embed.