Google's developer team confirms this week that it has open sourced the Gumbo HTML parser, a C language library implementation of the HTML5 parsing algorithm.
- Wrangling Actionable Insights from Organizational Data
- Cloud Information Technology: A Model for the Networked Company
- Architecting Private and Hybrid Cloud Solutions: Best Practices Revealed
- How to Transform Paper Insurance Documents into Digital Data
NOTE: A parser works to receive source program instructions, interactive online commands, and other defined sequential inputs (including markup tags) to break them down into component parts in order that programming engines such as those inside a compiler can process them.
Google's wider motives with this move are (one hopes) openly philanthropic.
If other browser developers follow Google's workflow methodology, we could see all HTML5 written code in the same way.
For its part, Google has already explained that one of the big accomplishments of the HTML5 standard was the standardization of the HTML parsing algorithm, which means that all browsers will see the same HTML document in the same way.
"So far, most implementations of this algorithm have either been tied to specific browsers or rendering engines, or they've been written in specific scripting languages. This makes it hard to write quick one-off tools to manipulate and clean up HTML if you don't happen to be working in a language that already has an HTML5-compatible parsing library," said Jonathan Tang, of Google's search features team.
"Gumbo seeks to provide a simple library that can serve as a basic building block for linters, refactoring tools, templating languages, page analysis, and other small programs that need to manipulate HTML. It's written in pure C for ease of interfacing with other languages, and has no outside dependencies. Gumbo was built from the start to support source locations and correlating nodes in the parse tree with positions in the original text," added Tang.