Difference between revisions of "SimpleWebKit"
 (→Status)  | 
				|||
| Line 77: | Line 77: | ||
* it provides an estimated content length (for a progress indicator) and the MIMEType of the incoming data stream  | * it provides an estimated content length (for a progress indicator) and the MIMEType of the incoming data stream  | ||
* as soon as the header comes in a WebDocomentRepresentation is created and incoming segments are notified  | * as soon as the header comes in a WebDocomentRepresentation is created and incoming segments are notified  | ||
| − | * it also collects the incoming data, so that a   | + | * it also collects the incoming data, so that a WebDocumentRepresentation can handle either segments or the collected data  | 
4. the WebDocumentRepresentation(s)  | 4. the WebDocumentRepresentation(s)  | ||
| Line 86: | Line 86: | ||
* so, if you want to handle an additional MIME type, write a class that conforms to the WebDocumentRepresentation protocol  | * so, if you want to handle an additional MIME type, write a class that conforms to the WebDocumentRepresentation protocol  | ||
| − | 5. the DOMHTMLTree  | + | 5. NSXMLParser  | 
| − | * is only for HTML content  | + | * a private variant is used that adds a stalling mechanism and selection of the file encoding  | 
| − | * is   | + | * recognizes the <?xml> tag for XHTML and has a lazy mode for pure HTML  | 
| + | * it also has an Entity table that translates the HTML entities to character strings  | ||
| + | |||
| + | 6. the DOMHTMLTree  | ||
| + | * is used only for HTML content  | ||
| + | * is built in WebHTMLDocumentRepresentation by parsing segment of HTML data coming in  | ||
* any change in the DOMHTMLTree is notified to the WebDocumentView (or one of its subviews) by setNeedsLayout  | * any change in the DOMHTMLTree is notified to the WebDocumentView (or one of its subviews) by setNeedsLayout  | ||
| + | * each class of potential DOMHTMLTree records has methods to denote how to handle tags  | ||
| + | * the tag to class mapping is table driven  | ||
| − | + | 7. the WebDocumentView(s) an its subviews  | |
* are responsible for displaying the contents of its WebDataRepresentation  | * are responsible for displaying the contents of its WebDataRepresentation  | ||
* either HTML, Images, PDF or whatever (e.g. SVG, XML, ...)  | * either HTML, Images, PDF or whatever (e.g. SVG, XML, ...)  | ||
| Line 99: | Line 106: | ||
* for HTML, we do a simple trick: the WebDocumentView is an NSTextView and the DOMHTMLTree objects can be traversed to return an attributedString with embedded Tables and NSTextAttachments  | * for HTML, we do a simple trick: the WebDocumentView is an NSTextView and the DOMHTMLTree objects can be traversed to return an attributedString with embedded Tables and NSTextAttachments  | ||
| − | + | 8. the JavaScript engine  | |
* is programmed according to the specificaion of [[http://www.ecma-international.org/publications/standards/Ecma-262.htm] ECMA-262]  | * is programmed according to the specificaion of [[http://www.ecma-international.org/publications/standards/Ecma-262.htm] ECMA-262]  | ||
* uses a simple recursive stateless parser (could be optimized in stack useage and speed by a state-table driven approach)  | * uses a simple recursive stateless parser (could be optimized in stack useage and speed by a state-table driven approach)  | ||
| Line 108: | Line 115: | ||
* uses WebScriptObject as the base Object representation  | * uses WebScriptObject as the base Object representation  | ||
* DOMObjects are a subclass of WebScriptObjects and therefore provide bridging, so that changing a DOMHTML tree element through JavaScript automativally triggers the appropriate WebDocumentView notification  | * DOMObjects are a subclass of WebScriptObjects and therefore provide bridging, so that changing a DOMHTML tree element through JavaScript automativally triggers the appropriate WebDocumentView notification  | ||
| + | |||
| + | 9. the CSS engine  | ||
| + | * has to be written  | ||
| + | * shall have three components: CSS database, CSS parser, CSS evaluator  | ||
== Contact ==  | == Contact ==  | ||
Revision as of 08:11, 21 January 2008
SimpleWebKit
Background Information
- originated in mySTEP
 - is completely written in Objective-C (1.0) so that it can be compiled on any system, even with gcc 2.95.3
 - aims at providing the most popular documented methods of Full WebKit for the classes WebView, WebFrame, WebDataSource, etc.
 - aims at rendering (X)HTML as well as possible (but not perfectly)
 - uses NSAttributedStrings passed into NSTextView as the rendering backend
 - already displays many pages
 - is used in the Vespucci.app Web Browser application for GNUstep
 
Source Code
svn+ssh://user@svn.gna.org/svn/gnustep/libs/simplewebkit/trunk
http://svn.gna.org/viewcvs/gnustep/libs/simplewebkit/trunk
Status
Features of the Subversion (SVN) trunk code:
- parses (X)HTML into a DOM tree
 - renders approx. 90% of the HTML 4.0 tags in a reasonable way (e.g. < font color="#667788">, < center>, < h2>, works)
 - makes <a> links clickable and processes them
 - loads <img>
 - loads <script> etc. asynchronously i.e. loads them as subresources
 - is prepared to handle <frame>
 - is prepared to handle <form>
 - has an ECMAScript engine that parses 90% of the syntax and evaluates expressions (missing are Statements and the native Objects including "document", "window", "event" etc.)
 
Missing:
- properly handle < table>, < ul> etc.
 - really process forms and POST results
 - properly compile and run <script>
 - completion of the ECMAScript engine
 - all CSS processing
 
SWK Browser
SWK Browser is part of the SimpleWebKit project and more or less a test bed for it, although it has most features of a full browser. These are
- multiple documents
 - shows (X)HTML, Images etc.
 - can show page source code
 - can show activities (i.e. subresources)
 - can show a DOM Tree inspector
 - has a JavaScript console
 
We have compiled Simple WebKit and SWK Browser with Cocoa so that it runs natively on a Mac.
Download it here: [1]
Screenshots
Some first screen shots (made from SWK Browser on a Mac)
File:SimpleWebKit Example 1.png
How it Works
1. the WebView
- is the master view object and there is only one per browser (or browser tab)
 - it holds the mainFrame which represents either the normal <body> or the top level <frame> or <frameset>
 - if there is a <frameset> hierarchy, there are additional child WebFrames
 
2. the WebFrame
- is repsonsible for loading and rendering content from a specific URL
 - it uses a WebDataSource to trigger loading and get callbacks
 - it is also the owner of the DOMDocument tree
 - JavaScript statements are evaluated in a frame context
 - it is also the target of user clicks on links since it knows the base URL (through the WebDataSource)
 
3. the WebDataSource
- is responsible for loading data from an URL
 - it may cache data and handle/synchronize loading fo subresources (e.g. for an embedded <img> tag)
 - it translates the request and the response URLs
 - it provides an estimated content length (for a progress indicator) and the MIMEType of the incoming data stream
 - as soon as the header comes in a WebDocomentRepresentation is created and incoming segments are notified
 - it also collects the incoming data, so that a WebDocumentRepresentation can handle either segments or the collected data
 
4. the WebDocumentRepresentation(s)
- there is one for each MIME type (the WebView provides a mapping database)
 - it is responsible for parsing the incoming data stream (either completely when finished, or partially)
 - and provide a better suitable representation, e.g. an NSImage or a DOMHTMLTree
 - finally, it creates a WebDocumentView as the child of the WebView and attaches it to the WebFrame as the -webFrameView
 - so, if you want to handle an additional MIME type, write a class that conforms to the WebDocumentRepresentation protocol
 
5. NSXMLParser
- a private variant is used that adds a stalling mechanism and selection of the file encoding
 - recognizes the <?xml> tag for XHTML and has a lazy mode for pure HTML
 - it also has an Entity table that translates the HTML entities to character strings
 
6. the DOMHTMLTree
- is used only for HTML content
 - is built in WebHTMLDocumentRepresentation by parsing segment of HTML data coming in
 - any change in the DOMHTMLTree is notified to the WebDocumentView (or one of its subviews) by setNeedsLayout
 - each class of potential DOMHTMLTree records has methods to denote how to handle tags
 - the tag to class mapping is table driven
 
7. the WebDocumentView(s) an its subviews
- are responsible for displaying the contents of its WebDataRepresentation
 - either HTML, Images, PDF or whatever (e.g. SVG, XML, ...)
 - they gets notified about changes either by updates of the WebDataSource (-dadaSourceUpdated:) or directly (-setNeedsLayout:)
 - if one needs layout, it must go to the DOM Tree to find out what has changed and update its size, content, children, layout etc.
 - this is a little tricky/risky since the -layout method is called within -drawRect: - so changing e.g. the View frame is very critical and may result in drawing glitches
 - for HTML, we do a simple trick: the WebDocumentView is an NSTextView and the DOMHTMLTree objects can be traversed to return an attributedString with embedded Tables and NSTextAttachments
 
8. the JavaScript engine
- is programmed according to the specificaion of [[2] ECMA-262]
 - uses a simple recursive stateless parser (could be optimized in stack useage and speed by a state-table driven approach)
 - parses the script into a Tree representation in a first step
 - then, evaluates the expressions and statements according to the current environement
 - this allows to store scripts in translated form and reevaluate them when needed (e.g. on mouse events)
 - uses Foundation for basic types (string, number, boolean, null)
 - uses WebScriptObject as the base Object representation
 - DOMObjects are a subclass of WebScriptObjects and therefore provide bridging, so that changing a DOMHTML tree element through JavaScript automativally triggers the appropriate WebDocumentView notification
 
9. the CSS engine
- has to be written
 - shall have three components: CSS database, CSS parser, CSS evaluator
 
Contact
Author: Nikolaus Schaller QuantumSTEP: http://www.quantum-step.com