GithubHelp home page GithubHelp logo

raymccrae / swift-htmlsaxparser Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 2.0 103 KB

Swift wrapper around libxml2 HTML Parser to provide SAX style HTML Parsing

License: Apache License 2.0

Objective-C 8.13% Swift 82.36% C 5.76% HTML 3.74%
swift html-parser html-parsing libxml2 sax-parser

swift-htmlsaxparser's Introduction

HTML SAX Parser for Swift 4

HTMLSAXParser is a swift module that wraps the libxml2 HTMLParser for the purposes of providing a simple lightweight SAX parser for HTML content. libxml2 is part of the Mac, iOS and Apple TV SDK, if you are developing for those platforms then you will not require any additional dependencies. SAX parsers provide an event based parsing process, where a closure you provide will be called with a series of events as the parser moves through the document.

HTMLSAXParser take inspiration from NSXMLParser however it uses enums with associated types for the parsing events, rather than a delegate class. A simple of example of usage is: -

let parser = HTMLSAXParser()
do {
	try parser.parse(string: "<html><body>Some HTML Content</body></html>") { context, event in
		switch event {
			case let .startElement(name, attributes):
				print("Found character : \(name)")
			case let .character(text):
				print("Found character : \(text)")
			default:
				break
		}
	}
}
catch {
	// Handle error
}

This approach lends itself to short inlined processing of HTML without the need for a parser delegate class.

/**
 Example function to extract all the image sources from HTML data. Specifically
 fetching the "src" attribute from all "img" tags.
*/
func imageSources(from htmlData: Data) throws -> [String] {
	var sources: [String] = []
	let parser = HTMLSAXParser()
	try parser.parse(data: htmlData) { context, event in
		switch event {
			case let .startElement(name, attributes) where name == "img":
				if let source = attributes["src"] {
					sources.append(source)
				}
			default:
				break
		}
	}
	return sources
}

Installation

Swift Package Manager

Add HTMLSAXParser as a dependency to your projects Package.swift. For example: -

// swift-tools-version:4.0
import PackageDescription

let package = Package(
    name: "YourProject",
    dependencies: [
        // Dependencies declare other packages that this package depends on.
        .package(url: "https://github.com/raymccrae/swift-htmlsaxparser.git", .branch("master"))
    ],
    targets: [
        // Targets are the basic building blocks of a package. A target can define a module or a test suite.
        // Targets can depend on other targets in this package, and on products in packages which this package depends on.
        .target(
            name: "YourProject",
            dependencies: ["HTMLSAXParser"]),
    ]
)

Since this module makes use of libxml2 you will need to inform the C compiler where the header files for libxml2 are located. If you have Xcode installed (Mac Only) then you can include the following additional arguments to the swift build command to the current SDK path:

$ swift build -Xcc -I"$(xcrun --show-sdk-path)/usr/include/libxml2"

Contributors

Contributors on GitHub

License

Version

  • Version 0.4

swift-htmlsaxparser's People

Contributors

raymccrae avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.