GithubHelp home page GithubHelp logo

Comments (7)

harishbalachandran avatar harishbalachandran commented on July 29, 2024

@rtibbles @aronasorman Can you please put a ETA on this? Its a Blocker for all the MG channel.

from studio.

rtibbles avatar rtibbles commented on July 29, 2024

Looks the Magogenie content needs to run through an HTML to Markdown parser before being put in?

This package should do the trick, I think: https://pypi.python.org/pypi/html2text

from studio.

jamalex avatar jamalex commented on July 29, 2024

Note that the ricecooker does not support HTML, only Markdown (with embedded $-delimited latex formulas, as needed, as well). The tool that @rtibbles links to may be helpful for converting HTML to Markdown in the sushi chef.

from studio.

yogeshmhaskule avatar yogeshmhaskule commented on July 29, 2024

@rtibbles @aronasorman @jamalex For more clarification, will this issue be handled in ricecooker or we need to handle it in our code (i.e sushi chef). I remember @aronasorman @jayoshih had taken care of the same issue before.

from studio.

jayoshih avatar jayoshih commented on July 29, 2024

@yogeshmhaskule For security reasons, we needed to escape html tags to prevent script attacks. To maintain the paragraphs, you'll need to use \n instead

from studio.

yogeshmhaskule avatar yogeshmhaskule commented on July 29, 2024

@jayoshih @aronasorman I tried using \n in place of </p> tag. But got the problem for other tags like <span>,</br>etc then some .png and base64 images. For this I have used "htm2text" python package, then It removes the 'img' tag of base64 image. and put the '![]' before the base64 image data and "!\[\]\" before the png image. So it failed to download the png image. If you provide more details to parse html in the "Sample program" of ricecooker. It will be beneficial to us for more understanding and easy to move forward.

I have attached the sample response of question in file:
sample_response.txt

Check the format of answer content which is similar to question content. for your reference take a look on answer content(it's combination of text, mathml, base64) which is in the file.

from studio.

jamalex avatar jamalex commented on July 29, 2024

The examples in your sample text contain MathML source, but also include the images that are the rendered version of the MathML, so we can just use the images in this case. The examples you describe (e.g. ![](...)) are valid Markdown, and will work with the ricecooker, even with base64-encoded images. However, there's some escaping in there, and newlines, that throw it off. The code below shows an example of converting the source you have provided into something that works for the ricecooker. For fuller MathML code (with no image alternative), you'll need to follow the instructions in the other issue.

import json
import requests
import html2text

from ricecooker.classes.nodes import ChannelNode, ExerciseNode
from ricecooker.classes.questions import MultipleSelectQuestion

from le_utils.constants import licenses

def convert_html_to_markdown(html):
    return html2text.html2text(html.replace("\/", "/").replace("\n", ""))

def construct_channel(*args, **kwargs):

    channel = ChannelNode(
        source_domain="test.com",
        source_id="test",
        title="Exercise test",
    )

    exercise = ExerciseNode(source_id="ex1", title="My Ex", license=licenses.CC_BY)
    channel.add_child(exercise)

    question_source = json.loads(requests.get("https://github.com/fle-internal/content-curation/files/958703/sample_response.txt").content.decode())["103898"]

    question = MultipleSelectQuestion(
        id="question",
        question=convert_html_to_markdown(question_source["question"]["content"]),
        correct_answers=[convert_html_to_markdown(a["content"]) for a in question_source["possible_answers"] if a["is_correct"]],
        all_answers=[convert_html_to_markdown(a["content"]) for a in question_source["possible_answers"]],
    )

    exercise.add_question(question)

    return channel

from studio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.