GithubHelp home page GithubHelp logo

jsonize's People

Contributors

cshu avatar dclub avatar feeeper avatar jackwfinlay avatar tmasternak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

jsonize's Issues

Add Option to convert back to an html page ?

Add the ability to convert the json created back to an html string

JsonizeParser parser = new JsonizeParser();
JsonizeSerializer serializer = new JsonizeSerializer();

Jsonizer jsonizer = new Jsonizer(parser, serializer);

jsonizer.ParseToHtmlAsync(JsonizeNode);

Port to .NET 4.6

Need to create another version to work with older versions of .NET. Can probably go further back than 4.6.

JsonizeParserConfiguration uses AngleSharp v1

Assembly 'Jsonize.Parser' with identity 'Jsonize.Parser, Version=3.1.0.0, Culture=neutral, PublicKeyToken=null' uses 'AngleSharp, Version=1.0.0.0, Culture=neutral, PublicKeyToken=e83494dcdc6d31ea' which has a higher version than referenced assembly 'AngleSharp' with identity 'AngleSharp, Version=0.17.1.0, Culture=neutral, PublicKeyToken=e83494dcdc6d31ea'

When using JsonizeParserConfiguration i am getting this error

Not getting root tag text after included tags

Doesnt parse or returns text after children tags in parent tag.
Example of html:

<p>    Завдання   nj jnjnjk knj ccjnds
<span style="background-color:rgb(97,189,109);">
kjc djck sdjkc 
</span>
dsckj dc dc csd c
</p>

Pursing result:

 {
    "NodeType": "Element",
    "Tag": "p",
    "Text": "Завдання   nj jnjnjk knj ccjnds",
    "Attr": {},
    "Children": [
      {
        "NodeType": "Element",
        "Tag": "span",
        "Text": "kjc djck sdjkc",
        "Attr": {
          "style": "background-color:rgb(97,189,109);"
        },
        "Children": []
      }
    ]
  }

And it also trims all start and end spaces in tags. So the text can not be set to one object, cause spaces gone

Node property is null if "EmptyTextNodeHandling" is equls to "Ignore"

Why if I set EmptyTextNodeHandling to EmptyTextNodeHandling.Ignore I don't have "node" property in the resulting JSON? Is it ok or not?

EmptyTextNodeHandling.Include example:

JsonizeConfiguration jsonizeConfiguration = new JsonizeConfiguration
{
    EmptyTextNodeHandling = EmptyTextNodeHandling.Include
};
string html = "<html><head></head><body><form></form><p></p></body></html>";
Jsonize jsonize = new Jsonize(html);
string result = jsonize.ParseHtmlAsJsonString(jsonizeConfiguration);

/*
result:
{
    "node":"Document",
    "child":[
        {
            "node":"Element",
            "tag":"html",
            "child":[
                {
                    "node":"Element",
                    "tag":"head",
                    "text":""
                },
                {
                    "node":"Element",
                    "tag":"body",
                    "child":[
                        {
                            "node":"Element",
                            "tag":"form",
                            "text":""
                        },
                        {
                            "node":"Element",
                            "tag":"p",
                            "text":""
                        }
                    ]
                }
            ]
        }
    ]
}
*/

EmptyTextNodeHandling.Ignore example:

JsonizeConfiguration jsonizeConfiguration = new JsonizeConfiguration
{
    EmptyTextNodeHandling = EmptyTextNodeHandling.Ignore
};
string html = "<html><head></head><body><form></form><p></p></body></html>";
Jsonize jsonize = new Jsonize(html);
string result = jsonize.ParseHtmlAsJsonString(jsonizeConfiguration);
/*
result:
{
  "node": "Document",
  "child": [
    {
      "tag": "html",
      "child": [
        {
          "tag": "head"
        },
        {
          "tag": "body",
          "child": [
            {
              "tag": "form"
            },
            {
              "tag": "p"
            }
          ]
        }
      ]
    }
  ]
}
*/

As I can see JsonizeNode.Node property is sets only if innerText is not empty or if EmptyTextNodeHandling == EmptyTextNodeHandling.Include:

// Jsonize.GetChildren method
// ...
if (_emptyTextNodeHandling == EmptyTextNodeHandling.Include || !String.IsNullOrWhiteSpace(innerText))
{
    if (!htmlNode.HasChildNodes)
    {   
        childJsonizeNode.Text = innerText;
    }

    childJsonizeNode.Node = htmlNode.NodeType.ToString();
    addToParent = true;
}
// ...

Is it bug or feature?

Set up configuration options as an object.

I want to set up configuration options as an object to be passed to the method so that parameters on the method aren't constantly changing. A good example is Newtonsoft.Json's JsonSerializer class.

Incorrect <form> tag processing

I start working on tests and found some problem (or may be I misunderstanding

tag processing).

My test:

Jsonize jsonize = new Jsonize("<html><head></head><body><form></form></body></html>");
var result = jsonize.ParseHtmlAsJsonString(jsonizeConfiguration);

Result JSON string:

{
  "node": "Document",
  "child": [
    {
      "node": "Element",
      "tag": "html",
      "child": [
        {
          "tag": "head"
        },
        {
          "node": "Element",
          "tag": "body",
          "child": [
            {
              "tag": "form"
            },
            {
              "node": "Text",
              "text": "</form>"
            }
          ]
        }
      ]
    }
  ]
}

Is it true that node "Text" with text "</form>" is not correct? And it should look so:

{
  "node": "Document",
  "child": [
    {
      "tag": "html",
      "child": [
        {
          "tag": "head"
        },
        {
          "tag": "body",
          "child": [
            {
              "tag": "form"
            }
          ]
        }
      ]
    }
  ]
}

Unable to resolve 'Jsonize' for '.NETCoreApp,Version=v1.0'

I'm getting above error when running dotnet restore and can't getting it to work (whether on Windows or on macOS). Any ideas?

dotnet --info on Windows
.NET Command Line Tools (1.0.0-preview2-003131)

Product Information:
Version: 1.0.0-preview2-003131
Commit SHA-1 hash: (hidden)

Runtime Environment:
OS Name: Windows
OS Version: 10.0.14393
OS Platform: Windows
RID: win10-x64

dotnet --info on macOS
.NET Command Line Tools (1.0.0-preview2-003131)

Product Information:
Version: 1.0.0-preview2-003131
Commit SHA-1 hash: (hidden)

Runtime Environment:
OS Name: Mac OS X
OS Version: 10.12
OS Platform: Darwin
RID: osx.10.12-x64

screen shot 2016-10-08 at 14 03 12

Port tests from nUnitLite to xUnit

Switched to xUnit in version 2.0.0 (new jsonize-2.0.0 branch) to reduce dependencies and make use of visual studio (& vscode) debugging and test runner features. Porting of tests is required. I know the tests are primitive, but some of them are useful. Discard any that require manual inspection (primarily the ones that download a string from a url).

All new tests should have a defined input and assertions that the resulting output matches a given predefined expected output.

Add Exception handling.

Currently all exceptions are thrown to the calling program. Catch expected exceptions e.g. anything that calls remote data sources should attempt to catch any exceptions and resolve them at first chance. If required to re-throw: Wrap them in a new type of exception, with the thrown exception as an inner exception on the new type of exception.

Types of Exceptions to create:

  • JsonizeHttpException: Thrown when an http request fails.
  • JsonizeException: Generic.
  • JsonizeHtmlParseException: Thrown to catch errors parsing the html. Should mostly catch errors in HTMLAgilityPack.
  • JsonizeJsonParseException: Thrown to catch errors in parsing the generated objects to json object/string. Should mostly catch errors from the Newtonsoft.Json package.

Suggestions about implementation and exception types is welcome.

Investigate migration to AngleSharp as HTML parsing engine

Investigate whether migration to the AngleSharp HTML parsing engine is feasible.

Points to consider:

  • Faster than HTMLAgilityPack in most cases.
  • HTML5 support.
  • Javascript execution support built in.
  • Possible breaking changes to API. E.g. EmptyTextNodeHandling, other configuration settings, etc.
  • First class .Net Core support.
  • Linq!
  • Possible increased configuration surface.
  • AngleSharp is more up-to-date (by several years), but not updated since 2016.
  • Not yet v1.0.0.

Produce tests to increase coverage.

Currently the tests are very basic and don't accurately cover the methods being tested (they just run the methods and the output is inspected manually). Proper test cases need to be set up. Currently we are using nUnitlite.

Add option whether to trim inner-text of text nodes.

Currently, the inner-text of html text nodes is trimmed to remove white-space as it can be excessive due to the file formatting if loading HTML from a URL source. This should be optional through a JsonizeConfiguration setting. The default should be to trim the white-space.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.