bccampus / pressbooks-openstax-import Goto Github PK
View Code? Open in Web Editor NEW[UNMAINTAINED] Pressbooks Plugin for OpenStax Textbook Import
License: GNU General Public License v3.0
[UNMAINTAINED] Pressbooks Plugin for OpenStax Textbook Import
License: GNU General Public License v3.0
Some zip files do not contain index.cnxml.html files - they only have index.cnxml
Expected behaviour
While not as straight forward, the Importer should at least try and bring something in from the xml file.
Actual behaviour
An exception is thrown and it imports a blank page.
try importing any one (or more) chapters from https://cnx.org/contents/[email protected]:YzfkjC2r@17/
returns boolean, must check against what the user selected for import
OpenStax Import plugin times out (error HTTP 500) when trying to import
Expected behaviour
A book can be imported using the link to the zip file from OpenStax. Here is the link address used
https://cnx.org/exports/[email protected]/elementary-algebra-2.49.zip
[What you expected to happen]
The book to be imported into the following Pressbook https://pressbooks.bccampus.ca/capilanosandbox/
Actual behaviour
The process timed-out and we received the following error HTTP ERROR 500
[First Step, Second Step, etc]
Book ID: 349
Book URL: https://pressbooks.bccampus.ca/capilanosandbox/
Book Privacy: Public
Platform: OS X
Browser Name: Chrome
Browser Version: 65.0.3325.162
User Agent String: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36
Network URL: http://pressbooks.bccampus.ca/
Network Type: Subdirectory
Version: 4.9.4
Language: en_US
WP_ENV: Not set
WP_DEBUG: Enabled
Memory Limit: 64M
Version: 5.1.0
Book Theme: Open Textbooks
Book Theme Version: 2.1.1
Root Theme: Aldine
Root Theme Version: 1.1.0
Epubcheck: Installed
Kindlegen: Installed
xmllint: Installed
PrinceXML: Installed
Saxon-HE: Installed
hm-autoloader.php: n/a
BC Post-Secondary Validator: 1.0.0
CC Export for Pressbooks: 0.2.1
f5 Force SSL: 1.0.0
iThemes Security: 6.9.2
Openstax Import for Pressbooks: 1.0.1
Pressbooks: 5.1.0
Pressbooks Stats: 1.4.0
Textbooks for Pressbooks: 4.0.2
WP-Piwik: 1.0.19
Akismet Anti-Spam: 4.0.3
BuddyPress: 2.9.3
H5P: 1.10.1
mPDF for Pressbooks: 3.1.1
WP QuickLaTeX: 3.8.4
PHP Version: 7.1.15
MySQL Version: 5.5.5
Webserver Info: Apache
Safe Mode: Disabled
Memory Limit: 512M
Upload Max Size: 100M
Post Max Size: 100M
Upload Max Filesize: 100M
Time Limit: 60
Max Input Vars: 1000
URL-aware fopen: On (1)
Display Errors: N/A
OPcache: Disabled
XDebug: Disabled
cURL: Supported
cURL Version: 7.19.7
imagick: Not Installed
xsl: Installed
for contextual learning, or just in time learning, add meaningful notification messages at the time of import that:
screenshots are out of date, formatting of some text is misaligned.
Expected behaviour
up to date screenshots (pb5) - clean text
Actual behaviour
the opposite
go here: https://wordpress.org/plugins/presbooks-openstax-import/
transferring large files over http requires a lot of server resources. Will look at chunking in order to get around this limitation.
using filter pb_import_file_types
at https://github.com/pressbooks/pressbooks/blob/dev/inc/modules/import/class-import.php#L324 add application/zip
to list
similar to https://github.com/pressbooks/pressbooks/blob/dev/inc/modules/import/epub/class-epub201.php#L176 this function needs to iterate through the manifest col:content
in collection.xml
and if they are flagged for import, call kneadAndInsert
Importing with the OpenStax plugins - image captions appear as paragraph text above the picture rather as caption text below the picture. This seems to be related to CSS from the original OpenStax text. When imported from OpenStax the caption code looks like this
must:
collection.xml
file, get content from each index.cnxml.html
filecollection.xml
filearbitrary value of [link]
is given to all links to Figures. It would be more descriptive if it could be changed to (Figure)
Expected behaviour
links to Figures would be labelled semantically
Actual behaviour
labelling is vague and unhelpful
Import any book with anchor links to Figures
using filter pb_initialize_import
at https://github.com/pressbooks/pressbooks/blob/dev/inc/modules/import/class-import.php#L372 add Cnx
class to list of options
cnx_astro" appears in the table of contents under each chapter name
cc-by appears beside each chapter in the table of contents
Bringing in the chapter metadata puts this data into the TOC in the PDF output. The more common use case is to transfer the burden of attribution to the author, rather than assuming it programmatically.
Certification server is currently open to the world presenting a large attack surface, especially with over 140 user accounts. This can be mitigated by locking down access to BCcampus IP addresses as is the case already with other certification servers. Users of the site would access the servers via VPN or by being in either one of the offices.
Expected behaviour
I would expect servers that we are working on for internal purposes are available only to our internal network.
Actual behaviour
It's exposed to the world.
Importing this book: https://cnx.org/contents/[email protected]:wsOQ6HtH@8/Preface-to-Pfeiffer-Applied-Pr
There's some weird behaviour in terms of line breaks being added within paragraphs.
Expected behaviour
Markup should be relatively consistent.
Actual behaviour
Line breaks are added.
Before:
<p id="id84958">This is a "first course" in the sense that it presumes no previous course in probability. The units are
modules taken from the unpublished text: Paul E. Pfeiffer, ELEMENTS OF APPLIED PROBABILITY,
USING MATLAB. The units are numbered as they appear in the text, although of course they may
be used in any desired order. For those who wish to use the order of the text, an outline is
provided, with indication of which modules contain the material.</p>
(Note that there appear to be some line breaks within the paragraph content, but they aren't rendered because they aren't actually <br />
tags.)
After:
<p id="id84958">This is a “first course” in the sense that it presumes no previous course in probability. The units are<br>
modules taken from the unpublished text: Paul E. Pfeiffer, ELEMENTS OF APPLIED PROBABILITY,<br>
USING MATLAB. The units are numbered as they appear in the text, although of course they may<br>
be used in any desired order. For those who wish to use the order of the text, an outline is<br>
provided, with indication of which modules contain the material.</p>
I expect this is some ghastly wpautop()
behaviour.
evaluate the work already done to incorporate candela attributions plugin to determine if this is a good place to start or if something else needs to be built.
primary responsibility is to insert filtered and prepped HTML into the WP database, similar to https://github.com/pressbooks/pressbooks/blob/dev/inc/modules/import/epub/class-epub201.php#L292
currently it asks users to network activate. This can cause problems for collections of books that were created using PB Latex shortcode.
I'm getting timeout error as shown below.
[10-Nov-2017 16:39:37 UTC] \Pressbooks\Modules\Import::formSubmit html import error, wp_remote_head()cURL error 28: Operation timed out after 5001 milliseconds with 0 bytes received
It sometimes gives me HTTP 500 error.
I have my .htaccess settings to
php_value upload_max_filesize 192M
php_value post_max_size 192M
php_value max_execution_time 1000
php_value max_input_time 1000
So I don't know why it's timing out.
I'm using this book: https://cnx.org/exports/[email protected]/college-algebra-5.57.zip
Expected behaviour
Should import.
Actual behaviour
Should import, but it loads, hangs, and eventually gives me the mentioned error above. It also sometimes gives me HTTP 500 error.
I'm using Ubuntu 16.04, I have pressbooks fully installed with themes and plugins.
Using this url book https://cnx.org/exports/[email protected]/college-algebra-5.57.zip
When importing a given OpenStax book, all chapters are identified as back matter in the importer UI.
Expected behaviour
Front matter, chapters, and back matter should be correctly identified as such.
Actual behaviour
Chapters are identified as back matter.
Import this book via URL: https://cnx.org/exports/[email protected]/bitcoin-1.1.zip
Observe that chapters are identified as back matter.
Installing pressbooks-openstax-import
using composer causes a white screen. The bug is in collizo4sky/persist-admin-notices-dismissal
. More info:
w3guy/persist-admin-notices-dismissal#14
Expected behaviour
WordPress site loads.
Actual behaviour
White screen.
Images appear smaller and left rather than center justified
using filter pb_initialize_import
at https://github.com/pressbooks/pressbooks/blob/dev/inc/modules/import/class-import.php#L296 add Cnx
class to the list of import options
College Physics has Appendix pages and Preface pages that are not included on import.
Expected behaviour
All pages come in
Actual behaviour
Exclusion.
Import https://cnx.org/contents/[email protected]:pFeekPiU@17/Preface and look for Glossary, Preface, Atomic Masses, Selected Radioactive Isotopes
based on #53 implement a solution.
Updating OpenStax Import Plugin requires testing it on the latest version of:
similar to https://github.com/pressbooks/pressbooks/blob/dev/inc/modules/import/epub/class-epub201.php#L149 this will look for and return relevant metadata. (Needs to be verified what is relevant: author, contributor, date created, language, etc)
Using filter pb_select_import_type
defined at https://github.com/pressbooks/pressbooks/blob/dev/templates/admin/import.php#L19 add our import type to the dropdown of options
after import provide meaningful error messages that the user can provide to a server administrator:
the objective is to provide enough information to the user that they can pass on to someone else for a remedy
Importing an OpenStax file with WP_DEBUG enabled reveals a PHP notice during the import routine:
Notice: Undefined variable: organizations in pressbooks-openstax-import/inc/import/openstax/class-cnx.php on line 308
Expected behaviour
No PHP notices.
Actual behaviour
Undefined variable.
Import this file: https://cnx.org/exports/[email protected]/bitcoin-1.1.zip
Large OpenStax books present memory challenges for PHP. If the server configuration is insufficiently resourced, the import will fail. Adjusting this value dynamically could diminish the likelihood of this happening. Values to target:
post_max_size
upload_max_size
Needed to get past https://github.com/pressbooks/pressbooks/blob/dev/inc/modules/import/class-import.php#L510
Perhaps also look at:
memory_limit
Expected behaviour
Some assurance of reliability for importing large files from URL.
Actual behaviour
Fatal error: Allowed memory size exhausted
https://cnx.org/exports/[email protected]/elementary-algebra-2.49.zip
see #50
the mathml to LaTeX conversion requires a LaTeX plugin to be available in the PB instance. While pb-latex is part of core, quick latex has better support for multiline equations and produces svg images that do not have rendering problems upon export.
The activation of this plugin should check for the existence and activation of quick latex plugin. Should provide messaging/opportunity to download the plugin, or activate the plugin.
Images import from OpenStax aligned "left" in the original OpenStax text images were aligned "center"
requires some investigation into whether injecting html attributes is too prescriptive.
must check that:
collection.xml
filecollection.xml
fileAs a branding/compliance/best-practice consideration, we must update the display names of our plugins in order to clarify the distinction between plugins maintained by the Pressbooks organization vs. us — This follows the naming pattern required by guideline 17 (link above)
As a user, I see a notice in the admin interface stating:
Your Network Administrator has made WP QuickLaTeX available to you from your plugins menu. WP QuickLaTeX supports multiline equations, and svg image exports.
I can click the [x] to dismiss this notice, but when I navigate to a new screen it reappears.
Expected behaviour
The dismissal of the notice persists (it stays gone).
Actual behaviour
It returns.
The notice is back.
equations are no longer centred
large error message in Chapter 5.6 (Radiation and Spectra > The Doppler Effect) in "The Doppler Effect" textbosee attached image "astr.JPG")
an equation missing a part: see screenshot <-- this is how the equation appears in Pressbooksdev <-- this is how the equation appears in the OpenStax book
Equations missing symbols. For example in Δθ=Δsr. in OpenStax becomes in Pressbooks
original:
https://cnx.org/contents/[email protected]:RCFtg_5-@7/Rotation-Angle-and-Angular-Vel
similar to https://github.com/pressbooks/pressbooks/blob/dev/inc/modules/import/epub/class-epub201.php#L216 this returns a SimpleXML object if collection.xml exists. Could also check for the existence of cnx attribute(s)
I asked Josie to do some testing on the import plugin. Here are her notes
Other issues, these are true for online and digital pdf formats:
lost all chapter/section numbering
none of the internal links work (including same page links and links between chapters)
captions are just in running text above the image, lost all figure numbers
some of the styling is lost (bolded text)
"cnx_astro" appears at the top of each chapter under the chapter title
Print PDF
"cnx_astro" appears in the table of contents under each chapter name
cc-by appears beside each chapter in the table of contents
images left-justified and smaller (images are fine in the online version)
Equations in PDF
math equation errors in the PDF are the same as the online version (If they work, they both work. If they don't work, they both don't work).
Enabling the administration menus will be an important option to enable if we want admins to be able to activate wp-quicklatex
Expected behaviour
Super Admins would be encouraged/notified to enable administration menus (plugins) for site administrators
Actual behaviour
No prompt, or connection is made between enabling admin menus and the capability of administrators that we are wanting to encourage.
not importing into from OpenStax into https://opentextbc.ca/
persist admin notices dismissal is currently under version control.
Expected behaviour
Rely on the build process to include deps.
Actual behaviour
deps are included in VCS
Opens a zip file for reading, writing or modifying
could perform a safety dance to verify that we are dealing with an expected file
equations are better rendered by wp-quicklatex
Expected behaviour
admins need the ability to activate wp-quicklatex plugin if it is installed to improve the quality of equations being rendered from an openstax import.
Actual behaviour
as of PB 5.2.1 the list of plugins available to admins is restricted to h5p and hyptothesis. PB 5.3.0 includes wp-quicklatex as a plugin available to admins. Since we don't know when v5.3.0 will be released and some of our messaging relies on an admin being able to activate the plugin, I'll look at removing that barrier
similar to https://github.com/pressbooks/pressbooks/blob/dev/inc/modules/import/epub/class-epub201.php#L262 this locates an entry using its name and returns the entry contents
We've made changes to the pb_initialize_import
filter. The old one was buggy. It only let one 3rd party importer, the last loaded, work at time. It would override all the others. Old code:
$importer = apply_filters( 'pb_initialize_import', null );
New code:
As you can hopefully see from the new code, we've done it in way that if you only have one extra importer on your PB install, everything will work the way it did before. We're essentially supporting broken behaviour to minimize impact on existing installs.
Right now, as I type, any PB with more than one 3rd party importer is broken.
My request is that when PB5 is released, please change your code to:
/**
* Inserts our OpenStax Class into the mix
*
* @param array $a
* @return OpenStax\Cnx[]
*/
function poi_add_initialize_import( $a ) {
$a[] = new OpenStax\Cnx();
return $a;
}
// ...
class Cnx extends Import {
CONST TYPE_OF = 'zip';
// ...
}
Thanks.
based on #54 create copy/messaging for notifications
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.