Difference between revisions of "SimpleWebKit"

From GNUstepWiki
Jump to navigation Jump to search
Line 77: Line 77:
 
* it provides an estimated content length (for a progress indicator) and the MIMEType of the incoming data stream
 
* it provides an estimated content length (for a progress indicator) and the MIMEType of the incoming data stream
 
* as soon as the header comes in a WebDocomentRepresentation is created and incoming segments are notified
 
* as soon as the header comes in a WebDocomentRepresentation is created and incoming segments are notified
* it also collects the incoming data, so that a WebDocomentRepresentation can handle either segments or the collected data
+
* it also collects the incoming data, so that a WebDocumentRepresentation can handle either segments or the collected data
  
 
4. the WebDocumentRepresentation(s)
 
4. the WebDocumentRepresentation(s)
Line 86: Line 86:
 
* so, if you want to handle an additional MIME type, write a class that conforms to the WebDocumentRepresentation protocol
 
* so, if you want to handle an additional MIME type, write a class that conforms to the WebDocumentRepresentation protocol
  
5. the DOMHTMLTree
+
5. NSXMLParser
* is only for HTML content
+
* a private variant is used that adds a stalling mechanism and selection of the file encoding
* is (re)built each time a new segment of HTML data comes in
+
* recognizes the <?xml> tag for XHTML and has a lazy mode for pure HTML
 +
* it also has an Entity table that translates the HTML entities to character strings
 +
 
 +
6. the DOMHTMLTree
 +
* is used only for HTML content
 +
* is built in WebHTMLDocumentRepresentation by parsing segment of HTML data coming in
 
* any change in the DOMHTMLTree is notified to the WebDocumentView (or one of its subviews) by setNeedsLayout
 
* any change in the DOMHTMLTree is notified to the WebDocumentView (or one of its subviews) by setNeedsLayout
 +
* each class of potential DOMHTMLTree records has methods to denote how to handle tags
 +
* the tag to class mapping is table driven
  
6. the WebDocumentView(s) an its subviews
+
7. the WebDocumentView(s) an its subviews
 
* are responsible for displaying the contents of its WebDataRepresentation
 
* are responsible for displaying the contents of its WebDataRepresentation
 
* either HTML, Images, PDF or whatever (e.g. SVG, XML, ...)
 
* either HTML, Images, PDF or whatever (e.g. SVG, XML, ...)
Line 99: Line 106:
 
* for HTML, we do a simple trick: the WebDocumentView is an NSTextView and the DOMHTMLTree objects can be traversed to return an attributedString with embedded Tables and NSTextAttachments
 
* for HTML, we do a simple trick: the WebDocumentView is an NSTextView and the DOMHTMLTree objects can be traversed to return an attributedString with embedded Tables and NSTextAttachments
  
7. the JavaScript engine
+
8. the JavaScript engine
 
* is programmed according to the specificaion of [[http://www.ecma-international.org/publications/standards/Ecma-262.htm] ECMA-262]
 
* is programmed according to the specificaion of [[http://www.ecma-international.org/publications/standards/Ecma-262.htm] ECMA-262]
 
* uses a simple recursive stateless parser (could be optimized in stack useage and speed by a state-table driven approach)
 
* uses a simple recursive stateless parser (could be optimized in stack useage and speed by a state-table driven approach)
Line 108: Line 115:
 
* uses WebScriptObject as the base Object representation
 
* uses WebScriptObject as the base Object representation
 
* DOMObjects are a subclass of WebScriptObjects and therefore provide bridging, so that changing a DOMHTML tree element through JavaScript automativally triggers the appropriate WebDocumentView notification
 
* DOMObjects are a subclass of WebScriptObjects and therefore provide bridging, so that changing a DOMHTML tree element through JavaScript automativally triggers the appropriate WebDocumentView notification
 +
 +
9. the CSS engine
 +
* has to be written
 +
* shall have three components: CSS database, CSS parser, CSS evaluator
  
 
== Contact ==
 
== Contact ==

Revision as of 08:11, 21 January 2008

SimpleWebKit

Background Information

  • originated in mySTEP
  • is completely written in Objective-C (1.0) so that it can be compiled on any system, even with gcc 2.95.3
  • aims at providing the most popular documented methods of Full WebKit for the classes WebView, WebFrame, WebDataSource, etc.
  • aims at rendering (X)HTML as well as possible (but not perfectly)
  • uses NSAttributedStrings passed into NSTextView as the rendering backend
  • already displays many pages
  • is used in the Vespucci.app Web Browser application for GNUstep

Source Code

svn+ssh://user@svn.gna.org/svn/gnustep/libs/simplewebkit/trunk

http://svn.gna.org/viewcvs/gnustep/libs/simplewebkit/trunk

Status

Features of the Subversion (SVN) trunk code:

  • parses (X)HTML into a DOM tree
  • renders approx. 90% of the HTML 4.0 tags in a reasonable way (e.g. < font color="#667788">, < center>, < h2>, works)
  • makes <a> links clickable and processes them
  • loads <img>
  • loads <script> etc. asynchronously i.e. loads them as subresources
  • is prepared to handle <frame>
  • is prepared to handle <form>
  • has an ECMAScript engine that parses 90% of the syntax and evaluates expressions (missing are Statements and the native Objects including "document", "window", "event" etc.)

Missing:

  • properly handle < table>, < ul> etc.
  • really process forms and POST results
  • properly compile and run <script>
  • completion of the ECMAScript engine
  • all CSS processing

SWK Browser

SWK Browser is part of the SimpleWebKit project and more or less a test bed for it, although it has most features of a full browser. These are

  • multiple documents
  • shows (X)HTML, Images etc.
  • can show page source code
  • can show activities (i.e. subresources)
  • can show a DOM Tree inspector
  • has a JavaScript console

We have compiled Simple WebKit and SWK Browser with Cocoa so that it runs natively on a Mac.

Download it here: [1]

Screenshots

Some first screen shots (made from SWK Browser on a Mac)

File:SimpleWebKit Example 1.png

How it Works

1. the WebView

  • is the master view object and there is only one per browser (or browser tab)
  • it holds the mainFrame which represents either the normal <body> or the top level <frame> or <frameset>
  • if there is a <frameset> hierarchy, there are additional child WebFrames

2. the WebFrame

  • is repsonsible for loading and rendering content from a specific URL
  • it uses a WebDataSource to trigger loading and get callbacks
  • it is also the owner of the DOMDocument tree
  • JavaScript statements are evaluated in a frame context
  • it is also the target of user clicks on links since it knows the base URL (through the WebDataSource)

3. the WebDataSource

  • is responsible for loading data from an URL
  • it may cache data and handle/synchronize loading fo subresources (e.g. for an embedded <img> tag)
  • it translates the request and the response URLs
  • it provides an estimated content length (for a progress indicator) and the MIMEType of the incoming data stream
  • as soon as the header comes in a WebDocomentRepresentation is created and incoming segments are notified
  • it also collects the incoming data, so that a WebDocumentRepresentation can handle either segments or the collected data

4. the WebDocumentRepresentation(s)

  • there is one for each MIME type (the WebView provides a mapping database)
  • it is responsible for parsing the incoming data stream (either completely when finished, or partially)
  • and provide a better suitable representation, e.g. an NSImage or a DOMHTMLTree
  • finally, it creates a WebDocumentView as the child of the WebView and attaches it to the WebFrame as the -webFrameView
  • so, if you want to handle an additional MIME type, write a class that conforms to the WebDocumentRepresentation protocol

5. NSXMLParser

  • a private variant is used that adds a stalling mechanism and selection of the file encoding
  • recognizes the <?xml> tag for XHTML and has a lazy mode for pure HTML
  • it also has an Entity table that translates the HTML entities to character strings

6. the DOMHTMLTree

  • is used only for HTML content
  • is built in WebHTMLDocumentRepresentation by parsing segment of HTML data coming in
  • any change in the DOMHTMLTree is notified to the WebDocumentView (or one of its subviews) by setNeedsLayout
  • each class of potential DOMHTMLTree records has methods to denote how to handle tags
  • the tag to class mapping is table driven

7. the WebDocumentView(s) an its subviews

  • are responsible for displaying the contents of its WebDataRepresentation
  • either HTML, Images, PDF or whatever (e.g. SVG, XML, ...)
  • they gets notified about changes either by updates of the WebDataSource (-dadaSourceUpdated:) or directly (-setNeedsLayout:)
  • if one needs layout, it must go to the DOM Tree to find out what has changed and update its size, content, children, layout etc.
  • this is a little tricky/risky since the -layout method is called within -drawRect: - so changing e.g. the View frame is very critical and may result in drawing glitches
  • for HTML, we do a simple trick: the WebDocumentView is an NSTextView and the DOMHTMLTree objects can be traversed to return an attributedString with embedded Tables and NSTextAttachments

8. the JavaScript engine

  • is programmed according to the specificaion of [[2] ECMA-262]
  • uses a simple recursive stateless parser (could be optimized in stack useage and speed by a state-table driven approach)
  • parses the script into a Tree representation in a first step
  • then, evaluates the expressions and statements according to the current environement
  • this allows to store scripts in translated form and reevaluate them when needed (e.g. on mouse events)
  • uses Foundation for basic types (string, number, boolean, null)
  • uses WebScriptObject as the base Object representation
  • DOMObjects are a subclass of WebScriptObjects and therefore provide bridging, so that changing a DOMHTML tree element through JavaScript automativally triggers the appropriate WebDocumentView notification

9. the CSS engine

  • has to be written
  • shall have three components: CSS database, CSS parser, CSS evaluator

Contact

Author: Nikolaus Schaller QuantumSTEP: http://www.quantum-step.com