Squeak
  links to this page:    
View this PageEdit this PageUploads to this PageHistory of this PageTop of the SwikiRecent ChangesSearch the SwikiHelp Guide
Extract content from wiki pages - page numbers of changed pages
Last updated at 9:48 am UTC on 30 June 2018
This page is about getting the content of the 'changes' page of this wiki, actually the full changes and then querying the result to get at the page numbers of pages which changed since a particular date.


Steps
1. Get page content of http://wiki.squeak.org/squeak/completeChanges
2. Get the collection of dates on which changes occurred.
3. Get all lines before a certain date
4. Extract the page numbers

Summary (all together as a script)


1. Get page content

Evaluate the following code in a Workspace


| url pageSource contentAfterH2Header |


url := 'http://wiki.squeak.org/squeak/completeChanges' asUrl.

pageSource := url retrieveContents contents.


contentAfterH2Header := (pageSource splitBy: '</h2>') second.


contentAfterH2Header  inspect



The result of this operation is that you get an Inspector window on the result which is a ByteString.

ByteStringWithHTMLreferencesToChangedSWikiPages.png


SequenceableCollection has a #beginsWith: method implemented. Thus the ByteString being a subclass of String which in turn is a subclass of SequenceableCollection inherits this method.


2. Find out about dates on which changes occurred.


The inspector window of the previous section might be used to get a list of all dates where changes occurred.

Paste

 (self lines select: [:aLine | aLine beginsWith: '<h3>']) inspect

into the code box of the Inspector object and execute it. The result is an array of all change dates.

Use the array it to find out about a date one wants to have changes up to that date.
Poke into the array with e.g.

 self at: 250

to get at the data

 <h3>31 March 2017</h3>


2. Get the collection of dates on which changes occurred.

Then go back to the first array with which has all lines and do

 |copy |
 copy := true.
 (self lines select: [:aLine | (aLine beginsWith: '<h3>31 March 2017</h3>') ifTrue: [copy := false]. 
                                copy ]) 
 inspect


This gives all the lines before 31 March 2017 in a new Inspector object.

From this inspector window we want to go for

 <li><a href="/squeak/

to get at the page numbers of the changed wiki pages.


So we need to put the following code snippet into the evaluation pane of the Inspector window

 (self select: [:aLine | aLine beginsWith: '<li><a href="/squeak/']) inspect


This gives us another Inspector window, this time only with the lines which actually reference a changed page. Entries with change dates later than 31 March 2017 are in this collection. The object shown is an array with each element referencing a line.


ArrayWithStringsAsElementsShowingChangedSwikiPages.png

4. Extract the page numbers


Array_collect_example_SelectingPageReferences.png
 self collect: [:aLine | (aLine splitBy: '"') second]


Summary (all together as a script)

| url pageSource contentAfterH2Header linesWithDates aDateString copy 
linesBeforeTheDate pageReferences pagesDict |

url := 'http://wiki.squeak.org/squeak/completeChanges' asUrl.

pageSource := url retrieveContents contents.

contentAfterH2Header := (pageSource splitBy: '</h2>') second.

linesWithDates := contentAfterH2Header  lines select: [:aLine | aLine beginsWith: '<h3>'].

aDateString := UIManager default chooseFrom: linesWithDates.

copy := true.
linesBeforeTheDate := contentAfterH2Header lines select:
                                      [:aLine | (aLine beginsWith: '<h3>31 March 2017</h3>') ifTrue: [copy := false]. 
	                                           copy ].
pageReferences := linesBeforeTheDate select: [:aLine | aLine beginsWith: '<li><a href=' ].	
pageReferences := linesBeforeTheDate collect: [:aLine | (aLine splitBy: '"') second].

pageReferences inspect.