Extract content from wiki pages (titles)
Last updated at 8:10 am UTC on 29 June 2018
To extract the title of this page use the following code snippet
((('http://wiki.squeak.org/squeak/1190' asUrl retrieveContents contents)
splitBy: '<div style="font-weight: bold; font-size: x-large; padding-top: 1ex">') second
splitBy: '</div>'
) first
to get a list of Swiki page titles
1 to: 50 do: [:pageNo |
Transcript show: '- ', pageNo printString, '. *',
(((('http://wiki.squeak.org/squeak/', pageNo printString) asUrl retrieveContents contents)
splitBy: '<div style="font-weight: bold; font-size: x-large; padding-top: 1ex">') second
splitBy: '</div>'
) first, '*'; cr
]
More
SqueakWikiTitles := Dictionary new.
Transcript clear.
SqueakWiki keys collect: [:aKeyNo | |content | content := SqueakWiki
at: aKeyNo.
SqueakWikiTitles at: aKeyNo put:
(( content
splitBy: '<div style="font-weight: bold; font-size: x-large;
padding-top: 1ex">') second
splitBy: '</div>') first
].
fileStream := FileStream newFileNamed: 'WikiTitles.csv' asFileName.
SqueakWikiTitles values asSortedCollection do:
[:value | fileStream nextPutAll: (SqueakWikiTitles keyAtValue: value
ifAbsent: [' ']) asString, '@', value;cr].
fileStream close.