GithubHelp home page GithubHelp logo

mayoff / arc90labs-readability Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 904 KB

Automatically exported from code.google.com/p/arc90labs-readability

ApacheConf 0.02% HTML 2.87% CSS 2.70% PHP 84.31% JavaScript 10.11%

arc90labs-readability's People

Contributors

umbrae avatar

Watchers

 avatar

arc90labs-readability's Issues

cannot proceed MOSS pages

What steps will reproduce the problem?
1. open www.healthcare4kids.org -> announcements
2. click "readability" tool button
3. nothing appears :-(

What is the expected output? What do you see instead?

rendered page

What version of the product are you using? On what operating system?

IE8 Vista ultimate

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 9 Apr 2009 at 9:46

Cannot Parse Page

What steps will reproduce the problem?
1. GO to http://www.todayonline.com/articles/306771.asp
2. Click on Readability toolbar button
3.

What is the expected output? What do you see instead?
"Sorry, readability was unable to parse this page for content."

What version of the product are you using? On what operating system?
Firefox 3.0 on Mac OSX Tiger

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 11 Mar 2009 at 2:28

Can't read this page in the Readability format

http://www.livemint.com/Articles/PrintArticle.aspx?artid=82A5F85E-1C7F-11DE-94CC
-000B5DABF613

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 29 Mar 2009 at 6:10

Article fails in 0.2 but works in 0.1

What steps will reproduce the problem?
1. Visit
http://www.dcexaminer.com/opinion/blogs/beltway-confidential/Limbaugh-My-Ratings
-are-Way-Way-Up-40846352.html

Original issue reported on code.google.com by [email protected] on 7 Mar 2009 at 12:30

Introductions don't get parsed

What steps will reproduce the problem?
1. Visit a site with an introduction like
*
http://www.br-online.de/wissen/weltraum/weltraumschrott-weltraum-schrott-ID12017
09383383.xml
*
http://grenzwissenschaft-aktuell.blogspot.com/2009/03/hinweise-auf-eisvulkane-un
d-wasserozean.html
2. Click on Readability Bookmarklet
3. See that the preface of the page is not there anymore

This issue is true for almost all pages with introductions that are bold.

Original issue reported on code.google.com by [email protected] on 30 Mar 2009 at 3:14

Add support for hAtom microformat

From http://blog.arc90.com/2009/03/shhh_im_trying_to_read.php#comment-37821:
{{{
One question/suggestion: currently, your script looks for the section with
the most paragraph tags, and assumes that is the section to hang onto. That
works fine, but I can see some cases (particularly code-related posts which
may feature a lot of ul's or ol's) where perhaps that may not be entirely
accurate.

How about checking first to see if an article is posted in hAtom format? If
you can locate a div with the class '.hentry' you use that content,
otherwise you can go through and determine which area has the most paragraphs.

A real quick example of how this could work is at:
www.timkadlec.com/readable.js. You can see the changes starting at line 42.
While this wouldn't fix all the remaining 10%, it could certainly help to
close the gap a bit.

Either way, fantastic work on an incredibly useful tool!
}}}

Patch attached. Props Tim Kadlec ( http://www.timkadlec.com/ )

Original issue reported on code.google.com by [email protected] on 5 Mar 2009 at 4:11

Attachments:

Sparknotes.com incompatability

What steps will reproduce the problem?
1. Go to sparknotes.com
2. Open up a guide
3. Try to use readability

What is the expected output? What do you see instead?
Readability formated page;standard readability error message

What version of the product are you using? On what operating system?
IE8 on Windows xp

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 11 Mar 2009 at 6:39

No content displayed

What steps will reproduce the problem?
1. http://www.bmp.be/Auteurs/Dirk%20De%20Prins/Dirk%20De%20Prins.html
2. your script
3.

What is the expected output?
DIRK DE PRINS
    Boeken  
        Dirk De Prins (° 1952), gekend van het radioprogramma KOOK (Radio 2) en
hoofdredacteur van het maandblad Ambiance, is met 'Dirk de keukenPrins'
niet aan zijn culinair proefstuk toe. Velen kennen Dirk van het populaire
radioprogramma Bistro & Co … of als 'redder in nood' om je mislukt
kerstmenu toch nog via zijn gekende tips (die je trouwens ook in zijn
nieuwe boek terugvindt) op de radio tot een goed einde te brengen…
Dirk De Prins is kunsthistoricus van beroep en specialiseerde zich in de
iconografie van de Vlaamse kunst van de 16e en 17e eeuw. Hij is al meer dan
20 jaar journalist en is sinds 2000 hoofdredacteur van Ambiance, het
magazine over gastronomie, wijn en toerisme. Hij werkte mee aan het
programma Krokant (einde jaren '80) en specialiseerde zich in de culinaire
geschiedenis. Hij is o.a. auteur van het zeer succesvolle 'Bistro & Co
kookboek' (1994) het prestigieuze boek 'De Belgische Keuken' (samen met
Nest Mertens, auteur van 'Ciao, het beste uit Italië')

What do you see instead?
Untitled Document
Sorry, readability was unable to parse this page for content. If you feel
like it should have been able to, please let us know by submitting an issue.

What version of the product are you using?
Not mentioned

On what operating system?
Linux; openSuse 11

Please provide any additional information below.
Browser: Firefox 3.0.1

Original issue reported on code.google.com by [email protected] on 28 Mar 2009 at 1:59

Sparknotes wouldn't load

What steps will reproduce the problem?
1. http://www.sparknotes.com/poetry/paradiselost/section3.rhtml
2. clicked the bookmarkelet
3. got an error message

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 20 Mar 2009 at 4:16

DaringFireball, BBC News and most Wordpress blogs fail

What steps will reproduce the problem?
1. Go to "daringfireball.net"
2. Invoke readability

What is the expected output? What do you see instead?
Daring fireball in readability, what I see instead is a random entry from 
months ago in 
readability.
The same occurs with wordpress blogs.
BBC news site presents on usability links, no other data at all.

What version of the product are you using? On what operating system?
Latest version available - Mac OS 10.5

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 5 Mar 2009 at 5:36

Html cache of Pdf files.

What steps will reproduce the problem?
1. Opening a pdf file, converted to html by google cache.
2. attempting to run readability bookmarklet.

What is the expected output? What do you see instead?
I was hoping to see less whitespace, and just see the text in regular size
and paragraphing. Instead i recieved the "unable to parse this content"
message.

What version of the product are you using? On what operating system?
Firefox 3.0.7, latest readablity at current date(19/3/09), windows xp
professional sp3.

Please provide any additional information below.

this is less of a bug and more of an irritating lack of a feature. sorry if
i have wasted your time.

Original issue reported on code.google.com by [email protected] on 19 Mar 2009 at 5:18

g

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 15 Mar 2009 at 5:47

Only intro isolated

What steps will reproduce the problem?
1. Visit http://is.gd/tCzR
2. Click the Readability bookmarklet
3. Only the intro text is isolated

What is the expected output? What do you see instead?
[see above]

What version of the product are you using? On what operating system?
Mozilla Firefox 3.0.8
Readibility bookmarklet [as of today's date]

Original issue reported on code.google.com by pete.fairhurst on 21 Apr 2009 at 8:37

Readability and Lifehacker's Darken bookmarklet are not compatible

What steps will reproduce the problem?
1. Readability an article
2. Darken that page (using Lifehacker's Darken bookmarklet 
<http://tr.im/jkVT>)

What is the expected output? What do you see instead?
The Readability buttons (refresh, print, etc) are now hidden. Their style is 
overridden by the Darken bookmarklet.

What version of the product are you using? On what operating system?
Google Chrome 2.0.172.6

Original issue reported on code.google.com by [email protected] on 21 Apr 2009 at 4:40

https://share.acrobat.com/adc/adc.do?app=cpdf

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 9 Apr 2009 at 5:14

Suggest users to use Readability on print-friendly pages instead if not working

What steps will reproduce the problem?
1. Visit any page that has multiple pages to it.
2. Click Readabiltiy - only page one is cleaned up.

Readability should try to find the link to the print view, probably by
seeking out <a >print ... </a> in the code. Regex is probably ideal here
b/c the text can vary "Printer Friendly" "Print" "Print View" The trick
here is to probably find <a> tags with the word "print" but probably no
more than 20 characters ("Printer Friendly" is the longest possible link I
can think of).

Also, this doesn't help for sites that use JS or some other way to launch
Print View (the majority don't).


Original issue reported on code.google.com by [email protected] on 7 Mar 2009 at 12:41

Text parsing doesn't work on www.americanthinker.com

What steps will reproduce the problem?
1. http://www.americanthinker.com/2009/03/who_is_barney_frank.html
2.
3.

What is the expected output? What do you see instead?

Text article.

What version of the product are you using? On what operating system?


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 12 Mar 2009 at 5:26

couldn't read page on pharmatimes

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 17 Mar 2009 at 1:31

Discussion boards and Readability

What steps will reproduce the problem?
1. Trying to use Readability on a discussion board, 
e.g. powered by vBulletin.
2. Threads cannot be converted to a nice readable 
column of posts...

What is the expected output? What do you see instead?
Posts in a thread should be displayed using a clean 
layout but I see a page with a link to this very page.

What version of the product are you using? On what 
operating system?
Latest version on XP.

Please provide any additional information below.

I absolutely love your project and appreciate your 
work. Thanks !!! :-)

Original issue reported on code.google.com by [email protected] on 28 Mar 2009 at 10:19

CNN.com article doesn't parse

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 27 Mar 2009 at 8:29

Boing Boing fail on Mac OS 10.5.6 / Firefox 3.0.8

What steps will reproduce the problem?
1.Install Readability 
2.Use the readability bookmarklet on boingboing.net
3.The output of the page re-render contains only one Boing Boing post.

What is the expected output? What do you see instead?

I expect to see all of the front-page posts. Instead, I only see one. It is
neither the first nor last post.

What version of the product are you using? On what operating system?
I don't know which version of Readibility I'm using; I d/led it at 2:00pm
CST, 04/17/2009. I'm using Firefox 3.0.8 on Mac OS 10.5.6.


Please provide any additional information below.

I did a search for boingboing and nothing popped up. 

Original issue reported on code.google.com by [email protected] on 17 Apr 2009 at 7:03

Readability doesn't capture all page content

What steps will reproduce the problem?
1.  Load site http://shadow.foreignpolicy.com/ 
2. Click "Readability" bookmarklet
3.

What is the expected output? What do you see instead?
The home page of this site is a collection of blog articles. I expected to
see all of the articles from the home page in order. What I see instead is
a single article from the middle of the set (but in the correct readability
format). 

What version of the product are you using? On what operating system?
Latest online version on Win XP /Firefox 3.0.8

Please provide any additional information below.

thanks for the tool; it's very useful!

Original issue reported on code.google.com by [email protected] on 6 Apr 2009 at 2:41

Blog not parsed

What steps will reproduce the problem?
1. Open http://www.managersonline.nl/vaknieuws/281/drie-omstandigheden-
frustreren-besluitvorming.html
2. Open Readability bookmarklet
3. Error parsing appears

What is the expected output? What do you see instead?
Readable page since this is a blog

What version of the product are you using? On what operating system?
Internet Explorer 7

Please provide any additional information below.


Original issue reported on code.google.com by mackaaij on 23 Mar 2009 at 6:32

Get rid of all the <br/> tags

What steps will reproduce the problem?
1. Visit http://www.bloomberg.com/apps/news?pid=20601091&sid=ayVsI_55G8.0


Note the gap of space up top - b/c of <br/> tags.

Original issue reported on code.google.com by [email protected] on 7 Mar 2009 at 12:35

readability failed when try to parse a poetry site

What steps will reproduce the problem?
1. go to http://www.sparknotes.com/poetry/coleridge/section3.rhtml
2. click readability shortcut
3. fail :(

What is the expected output? What do you see instead?
wanted to see the poem without the ads
ended up with an error and a link to this feedback form

What version of the product are you using? On what operating system?
Firefox 3.0.5 mac OSX 10.5.6

Please provide any additional information below.
I like the tool and thought that it could be used for something other than
news (i.e. poetry sites)

Original issue reported on code.google.com by [email protected] on 18 Mar 2009 at 3:34

thetailsection.com shows comments not article

What steps will reproduce the problem?
1. 
http://www.thetailsection.com/index.php/archives/2009/03/05/fan-theory-pull-leve
r-in-
case-of-paradox/ 
2. Open Reability


What is the expected output? What do you see instead?
Some blogs that I've found with more paragraphs in the comments than in the 
main article will 
cause Readability to display the comments instead of the main article.


What version of the product are you using? On what operating system?
0.1
Mac OS 10.5.6 on Safari 4

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 9 Mar 2009 at 5:44

Style attribute somehow survives

What steps will reproduce the problem?
1. Visit
http://dyn.politico.com/printstory.cfm?uuid=DDF12EFA-18FE-70B2-A878D1086B3EE9DB

Style attribute still in place.

Original issue reported on code.google.com by [email protected] on 7 Mar 2009 at 12:28

Support for video?

Would it be possible to use readability to hide everything except videos on 
websites?
Pretty much the same thing it does for text, but for video websites?

YouTube, for example, has an id="watch-this-vid" and it seems like the video is 
in that div.  

Thanks for the great work.

Original issue reported on code.google.com by [email protected] on 22 Apr 2009 at 12:56

Failed to display content (cut a chunk out)

What steps will reproduce the problem?
1. Visit
http://www.stevenberlinjohnson.com/2009/03/the-following-is-a-speech-i-gave-yest
erday-at-the-south-by-southwest-interactive-festival-in-austiniif-you-happened-t
o-being.html
2. Click Readability bookmarklet

What is the expected output? What do you see instead?

First few paragraphs are cut out. This is because there are 2 div
containers - one housing the excerpt and another for the remaining content.

Please use labels and text to provide additional information.


Original issue reported on code.google.com by [email protected] on 18 Mar 2009 at 6:36

Newspaper "Hurriyet", from Turkey

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 2 Apr 2009 at 3:33

Just strip font styling

thank you arc90 for this great bookmarklet!

but it'd be much more useful to me if there was an option to just remove 
font-size and font-family styling, instead of assigning new styles for them 
from a few options

this way, that I could read pages with the default font settings in my 
browser  

Original issue reported on code.google.com by felipevaz on 9 Apr 2009 at 5:38

doesn't parse page

What steps will reproduce the problem?
1. call bookmarklet on page http://www.lib.ru/STRUGACKIE/glubokii.txt

Original issue reported on code.google.com by [email protected] on 13 Mar 2009 at 2:46

  • Merged into: #29

google news

What steps will reproduce the problem?
1. go to news.google.com
2. go to text-only page
3. that's it

What is the expected output? What do you see instead?

just got a could not parse error, nothing else

What version of the product are you using? On what operating system?

FF3/OSX 10.5 newsest version as of today (3/28)

Please provide any additional information below.

thanks!

Original issue reported on code.google.com by [email protected] on 28 Mar 2009 at 9:03

Readability broken for heise.de

What steps will reproduce the problem?
1. go to www.heise.de
2. press readability bookmarklet

What is the expected output? What do you see instead?
expected: readability-enhanced version of the web page
actual:  readability-enhanced version of the first advertisement on the web page

What version of the product are you using? On what operating system?
Safari 4 beta on mac os x 10.5.6

Original issue reported on code.google.com by [email protected] on 20 Mar 2009 at 8:22

Remove <font> tags

What steps will reproduce the problem?
1. Visit http://www.cla.wayne.edu/polisci/kdk/general/sources/zinsser.htm
2. Click Readability
3. <font> tags survive, thus allowing old styles to persist.



Original issue reported on code.google.com by [email protected] on 7 Mar 2009 at 12:17

Doesn't work on Sparknotes

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?
Expected a readable view of the chapter, but instead got the message that 
the page couldn't be parsed.

What version of the product are you using? On what operating system?
I'm on IE7.

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 1 Apr 2009 at 10:11

Fails to parse PRE formatted content (especially in TXT files but containing HTML)

What steps will reproduce the problem?
1. Point your browser to http://www.lib.ru/RUFANT/SEMENOWA/wolkodaw4.txt
2. Click the bookmarklet invoking Readability.

Expected to see the book styled for reading.  Got a message saying it's not 
possible, and that I 
should leave an issue.

MacOS X 10.5.6 and Safari 4 beta.

I think Readability should style the page even if it can't format it.  That way 
you get some of the 
benefits, at least.  

Original issue reported on code.google.com by ycherkashin on 22 Mar 2009 at 2:34

gameFAQs guides

What steps will reproduce the problem?
1. go into a guide such as
http://www.gamefaqs.com/console/psx/file/198265/50263
2. press the readability button
3. wont display at all...

What is the expected output? What do you see instead?
i expected to see it with the desired readability settings, i just got an
error message saying it couldnt be displayed

What version of the product are you using? On what operating system?
i was using the one from this  link
http://lab.arc90.com/experiments/readability/

Windows Vista


Please provide any additional information below.

Not much to say, thats about all there is to it


Original issue reported on code.google.com by [email protected] on 18 Apr 2009 at 1:00

readability does not 'grab' the text on a page

What steps will reproduce the problem?
1. Run the bookmarklet on this page: 
http://www.sparknotes.com/philosophy/disciplinepunish/section1.html
2.
3.

What is the expected output? What do you see instead?
I get an error directing me here. I expected to see the text.

What version of the product are you using? On what operating system?
I am using Safari 4 and the newest webkit release.

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 5 Apr 2009 at 10:08

Remove comments from readability page

What steps will reproduce the problem?
1. Click Readability bookmarklet on this page http://tinyurl.com/d95r4s
2. User comments appear after the article.

What is the expected output? What do you see instead?
I don't really care to read the user comments most of the time. An option
to toggle them on or off would be nice, but not necessary since they can be
accessed by reloading the page.

Original issue reported on code.google.com by [email protected] on 24 Mar 2009 at 7:16

Content not shown

What steps will reproduce the problem?
1. Visit http://www.paulgraham.com/13sentences.html
2. Invoke Readability

Readability only shows the title, the (site) logo image and the Readability
footer. The content isn't shown.

>What version of the product are you using? On what operating system?
Firefox 3.06, on Windows Vista.

Original issue reported on code.google.com by [email protected] on 5 Mar 2009 at 8:52

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.