<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Research EXIF to inform whether browsers should honor it without opt-in about legacy.httparchive.org HOT 5 CLOSED

httparchive commented on May 27, 2024

Research EXIF to inform whether browsers should honor it without opt-in

from legacy.httparchive.org.

Comments (5)

yoavweiss commented on May 27, 2024

@zcorpan - Not sure what you want the project to do here.
The HTTPArchive crawl and gathers raw data, it doesn't do any data analysis in particular. It stores response bodies, but only for HTML/text AFAIK, e.g. the image response bodies are not stored.

We can achieve what you're after by downloading the bulk of request data, filter it for image requests, downloading them, and then run analysis on them.

from legacy.httparchive.org.

andydavies commented on May 27, 2024

WebPagetest already does some image analysis, might be possible to extend
that to produce the relevant data.
On 8 Aug 2014 08:36, "Yoav Weiss" [email protected] wrote:

@zcorpan https://github.com/zcorpan - Not sure what you want the
project to do here.
The HTTPArchive crawl and gathers raw data, it doesn't do any data
analysis in particular. It stores response bodies, but only for HTML/text
AFAIK, e.g. the image response bodies are not stored.

We can achieve what you're after by downloading the bulk of request data,
filter it for image requests, downloading them, and then run analysis on
them.

—
Reply to this email directly or view it on GitHub
#33 (comment)
.

from legacy.httparchive.org.

pmeenan commented on May 27, 2024

There are very few (no?) native photos in the websites that we test in the HTTP Archive. The pages are all landing pages for the alexa top 300k sites and actually having full-resolution photos on them would be a bad idea and the vast majority of the images are expected to be post-processed, recompressed, etc (and get flagged if they are not).

I'm not sure what the use case is for exif support in the browser but odds are you're going to have to do a custom crawl or study of some kind to find the photos you are looking for. Actual photo sites like flickr, Google+ Facebook, etc all do a bunch of processing as well for presentation in the browser though you can sometimes get through to the original raw image and I expect those are the ones you are looking for.

from legacy.httparchive.org.

yoavweiss commented on May 27, 2024

It's true that EXIF images are likely have a much larger presence in long tail Web sites rather than in Alexa's landing pages.
Maybe it's best to try to get data on that using telemetry/use counters.

Regardless, how hard would it be to start storing image response bodies? With that we could get arbitrary stats like that (even if biased towards Alexa sites) by processing the image data itself.

from legacy.httparchive.org.

pmeenan commented on May 27, 2024

We store text response bodies but storing full images would not be something we'd be able to do without some serious benefit and justification (it would add about 400GB per crawl to the data we store).

from legacy.httparchive.org.

Research EXIF to inform whether browsers should honor it without opt-in about legacy.httparchive.org HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs