Squeak
  links to this page:    
View this PageEdit this PageUploads to this PageHistory of this PageTop of the SwikiRecent ChangesSearch the SwikiHelp Guide
XMLParserHTML
Last updated at 8:50 pm UTC on 11 July 2018
> Sent: Friday, May 04, 2018 at 4:47 AM
> From: "Tobias Pape" <Das.Linux@gmx.de>
> To: "The general-purpose Squeak developers list" <squeak-dev@lists.squeakfoundation.org>
> Subject: Re: [squeak-dev] Please transfer ownership or make me a co-maintainer of the orphaned SM "XPath" project

> > On 04.05.2018, at 06:00, monty <monty2@programmer.net> wrote:
> >
> > For Squeak (and Pharo and GemStone), the only actively-maintained, standards-compliant library I know of is the SmalltalkHub PharoExtras/XPath lib that I maintain, installable from the SM as "XMLParser-XPath".
> > 
>
> I want to stress that monty's implementation (xml and xpath) is very good, feature-, test-, and code-wise.

Thanks
 
> Note that the xml lib is not a drop-in replacement for the XML stuff in Trunk, as some API got straightened out.
>
> Maybe we should adopt the XML/XPath lib for trunk also. It makes talking to web resources that speak XML soooo much easier.

There's also the "XMLParser-HTML" project on SM, which allows you to parse HTML with XMLParser, and it works with related XMLParser libs, including XMLParser-XPath. It's also the fastest HTML parser (by far) available for Squeak (and Pharo and GemStone). I wrote it more as a proof of concept and haven't advertised it much, but I've gotten positive feedback on it from Pharo users.

The package is available on SqueakMap.


https://montyos.wordpress.com/

This is the latest version of the XML/XPath Scraping Booklet (written for Pharo, but as mentioned above the code is also maintained for Squeak, so the content of the booklet applies to Squeak as well to a large extent):
https://files.pharo.org/books-pdfs/booklet-Scraping/2018-01-07-scrapingbook.pdf