ParseKit Documentation
ParseKit
ParseKit is a Mac OS X Framework written by Todd Ditchendorf in Objective-C 2.0 and released under the MIT Open Source License. ParseKit is suitable for use on Mac OS X Leopard, Snow Leopard or iPhone OS. ParseKit is an Objective-C implementation of the tools described in "Building Parsers with Java" by Steven John Metsker. ParseKit includes additional features beyond the designs from the book and also some changes to match common Cocoa/Objective-C conventions. These changes are relatively superficial, however, and Metsker's book is the best documentation available for ParseKit.
The ParseKit Framework offers 3 basic services of general interest to Cocoa developers:
- String Tokenization via the Objective-C PKTokenizer and PKToken classes.
- High-Level Language Parsing via Objective-C - An Objective-C parser-building API (the PKParser class and sublcasses).
- Objective-C Parser Generation via Grammars - Generate an Objective-C parser for your custom language using a BNF-style grammar syntax (similar to yacc or ANTLR). While parsing, the parser will provide callbacks to your Objective-C code.
The ParseKit source code is available on GitHub.
More documentation:
Projects using ParseKit:
- TaskPaper for iPhone: Simple to-do lists app by Jesse Grosjean
- Spike: A Rails log file viewer/analyzer by Matt Mower
- JSTalk: Interprocess Cocoa scripting with JavaScript by Gus Mueller
- Objective-J Port of ParseKit by Ross Boucher
- HTTP Client: HTTP debugging/testing tool
- Fluid: Site-Specific Browser for Mac OS X
- Fluidium: Rich Internet Application Platform for Mac OS X
- Cruz: Social Browser for Mac OS X
- Fake: A Recordable/Automated Browser for Mac OS X
- OkudaKit: Syntax Highlighting Framework for Mac OS X
- Exedore: XPath 1.0 implemented in Cocoa (ported from Saxon)
Xcode Project
The ParseKit Xcode project consists of 6 targets:
- ParseKit : the ParseKit Objective-C framework. The central feature/codebase of this project.
- Tests : a UnitTest Bundle containing hundreds of unit tests (actually, more correctly, interaction tests) for the framework as well as some example classes that serve as real-world uses of the framework.
- DemoApp : A simple Cocoa demo app that gives a visual presentation of the results of tokenizing text using the PKTokenizer class.
- DebugApp : A simple Cocoa app that exists only to run arbitrary test code thru GDB with breakpoints for debugging (I was not able to do that with the UnitTest bundle).
- JSParseKit : A JavaScriptCore-based scripting interface to ParseKit which can be used to expose the entire framework to JavaScript environments.
- JSDemoApp: A simple Cocoa application used for exercising the JavaScript interface provided by JSParseKit. Note that this is the only target which links to the WebKit framework. Neither ParseKit nor JSParseKit requires WebKit.
ParseKit Framework
Tokenization
The API for tokenization is provided by the PKTokenizer class. Cocoa developers will be familiar with the NSScanner class provided by the Foundation Framework which provides a similar service. However, the PKTokenizer class is simpler and more powerful for many use cases.
Example usage:
NSString *s = @"\"It's 123 blast-off!\", she said, // watch out!\n"
@"and <= 3.5 'ticks' later /* wince */, it's blast-off!";
PKTokenizer *t = [PKTokenizer tokenizerWithString:s];
PKToken *eof = [PKToken EOFToken];
PKToken *tok = nil;
while ((tok = [t nextToken]) != eof) {
NSLog(@" (%@)", tok);
}
outputs:
("It's 123 blast-off!")
(,)
(she)
(said)
(,)
(and)
(<=)
(3.5)
('ticks')
(later)
(,)
(it's)
(blast-off)
(!)
Each token produced is an object of class PKToken. PKTokens have a tokenType (Word, Symbol, Num, QuotedString, etc.) and both a stringValue and a floatValue.
More information about a token can be easily discovered using the -debugDescription method instead of the default -description. Replace the line containing NSLog above with this line:
NSLog(@" (%@)", [tok debugDescription]);
and each token's type will be printed as well:
<Quoted String «"It's 123 blast-off!"»> <Symbol «,»> <Word «she»> <Word «said»> <Symbol «,»> <Word «and»> <Symbol «<=»> <Number «3.5»> <Quoted String «'ticks'»> <Word «later»> <Symbol «,»> <Word «it's»> <Word «blast-off»> <Symbol «!»>
As you can see from the output, PKTokenzier is configured by default to properly group characters into tokens including:
- single- and double-quoted string tokens
- common multiple character symbols (<=)
- apostrophes, dashes and other symbol chars that should not signal the start of a new Symbol token, but rather be included in the current Word or Num token (it's, blast-off, 3.5)
- silently ignoring C- and C++-style comments
- silently ignoring whitespace
The PKTokenizer class is very flexible, and all of those features are configurable. PKTokenizer may be configured to:
- recognize more (or fewer) multi-char symbols. ex:
[t.symbolState add:@"!="];
allows != to be recognized as a single Symbol token rather than two adjacent Symbol tokens
- add new internal symbol chars to be included in the current Word token OR recognize internal symbols like apostrophe and dash to actually signal a new Symbol token rather than being part of the current Word token. ex:
[t.wordState setWordChars:YES from:'_' to:'_'];
allows Word tokens to contain internal underscores
[t.wordState setWordChars:NO from:'-' to:'-'];
disallows Word tokens from containing internal dashes.
- change which chars singnal the start of a token of any given type. ex:
[t setTokenizerState:t.wordState from:'_' to:'_'];
allows Word tokens to start with underscore
[t setTokenizerState:t.quoteState from:'*' to:'*'];
allows Quoted String tokens to start with an asterisk, effectively making * a new quote symbol (like " or ')
- turn off recognition of single-line "slash-slash" (//) comments. ex:
[t setTokenizerState:t.symbolState from:'/' to:'/'];
slash chars now produce individual Symbol tokens rather than causing the tokenizer to strip text until the next newline char or begin striping for a multiline comment if appropriate (/*)
- turn on recognition of "hash" (#) single-line comments. ex:
[t setTokenizerState:t.commentState from:'#' to:'#']; [t.commentState addSingleLineStartSymbol:@"#"];
- turn on recognition of "XML/HTML" (<!-- -->) multi-line comments. ex:
[t setTokenizerState:t.commentState from:'<' to:'<']; [t.commentState addMultiLineStartSymbol:@"<!--" endSymbol:@"-->"];
- report (rather than silently consume) Comment tokens. ex:
t.commentState.reportsCommentTokens = YES; // default is NO
- report (rather than silently consume) Whitespace tokens. ex:
t.whitespaceState.reportsWhitespaceTokens = YES; // default is NO
- turn on recognition of any characters (say, digits) as whitespace to be silently ignored. ex:
[t setTokenizerState:t.whitespaceState from:'0' to:'9'];
-
Canadian pharmacy viagra
cialis online
viagra 100mg england
viagra 50 mg
viagra best buy
viagra free pills
viagra free sample
viagra free samples
viagra from canada
viagra health store
viagra in australia
viagra in canada
viagra in uk
viagra in us
viagra in usa
viagra mail order
viagra next day
viagra no prescription
viagra no rx
viagra non prescription
viagra on line
best price viagra
buy cialis canada
buy cialis daily
buy cialis generic
buy cialis online canada
buy cialis online
buy cialis without prescription
buy generic cialis online
buy generic cialis
buy generic viagra
buy pfizer viagra online
buy pfizer viagra
buy real viagra online without prescription
buy viagra in canada
buy viagra no prescription
buy viagra online canada
buy viagra online no prescription
buy viagra pills
buy viagra without prescription
buy viagra
canada meds viagra
canada pharmacy viagra
canada viagra generic
canada viagra
canadian generic viagra online
canadian pharmacy cialis
canadian pharmacy viagra
cheap generic viagra online
cheap generic viagra
cheap viagra 100mg
cheap viagra no prescription
cheap viagra online without prescription
cheap viagra overnight
cheap viagra pills
cialis 10 mg
cialis 20mg price
cialis for sale
cialis generic online
cialis no prescription
cialis online pharmacy
cialis online prescription
cialis online without prescription
cialis price comparison
cialis sales online
cialis vs viagra
cialis without prescription
free viagra without prescription
generic cialis canada
generic cialis no prescription
generic viagra canada
generic viagra in canada
generic viagra no prescription
generic viagra online pharmacy
generic viagra online
get viagra without a prescription
get viagra without prescription
online cheap viagra
online order viagra overnight delivery
online viagra
order usa viagra online
pfizer viagra 50 mg online
pfizer viagra canada
real viagra online
viagra buy now
viagra buy
viagra canada generic
viagra canada
viagra compare prices
viagra for sale online
viagra for sale
viagra generic canada
viagra in canada pfizer
viagra in canada
viagra online 50mgs
viagra online pharmacy
viagra online sales
viagra online without a prescription
viagra online without prescription
viagra pfizer canada
viagra pfizer online
viagra prescription online
viagra rx in canada
viagra sales canada
viagra sales in canada
viagra without prescription
buy cialis canada
buy cialis online
buy online cialis
buy online viagra
buy viagra canada
buy viagra online
canada viagra generic
canadian generic cialis
canadian pharmacy cialis
canadian pharmacy viagra
canadian viagra 50mg
cialis buy online
cialis daily canada
cialis online canada
cialis pharmacy online
generic cialis canada
generic cialis canadian
generic cialis online
generic viagra canada
generic viagra canadian
generic viagra online
get cialis online
online pharmacy cialis
online pharmacy viagra
real viagra online
viagra buy online
viagra canada generic
viagra canadian pharmacy
viagra canadian sales
viagra generic canada
viagra in canada
viagra online pharmacy
viagra online purchase
viagra online sales
viagra pharmacy online
viagra sales canada
buy cialis canada
buy cialis daily
buy cialis generic
buy cialis online
buy cialis uk
buy generic cialis
buy generic viagra
buy viagra cheap
buy viagra uk
canada pharmacy viagra
canadian pharmacy cialis
canadian pharmacy viagra
cheap generic viagra
cheap viagra 100mg
cheap viagra overnight
cheap viagra pills
cheapest generic viagra
cialis 10 mg
cialis 20mg price
cialis for sale
cialis generic online
cialis no prescription
cialis online pharmacy
cialis online prescription
cialis online uk
cialis price comparison
cialis sales online
cialis super active
cialis vs viagra
cialis without prescription
generic cialis canada
generic viagra canada
generic viagra online
viagra compare prices
viagra for sale
viagra online pharmacy
viagra prescription online
viagra without prescription
Parsing
ParseKit also includes a collection of token parser subclasses (of the abstract PKParser class) including collection parsers such as PKAlternation, PKSequence, and PKRepetition as well as terminal parsers including PKWord, PKNum, PKSymbol, PKQuotedString, etc. Also included are parser subclasses which work in individual chars such as PKChar, PKDigit, and PKSpecificChar. These char parsers are useful for things like RegEx parsing. Generally speaking though, the token parsers will be more useful and interesting.
The parser classes represent a Composite pattern. Programs can build a composite parser, in Objective-C (rather than a separate language like with lex&yacc), from a collection of terminal parsers composed into alternations, sequences, and repetitions to represent an infinite number of languages.
Parsers built from ParseKit are non-deterministic, recursive descent parsers, which basically means they trade some performance for ease of user programming and simplicity of implementation.
Here is an example of how one might build a parser for a simple voice-search command language (note: ParseKit does not include any kind of speech recognition technology). The language consists of:
search google for? <search-term>
...
[self parseString:@"search google 'iphone'"];
...
- (void)parseString:(NSString *)s {
PKSequence *parser = [PKSequence sequence];
[parser add:[[PKLiteral literalWithString:@"search"] discard]];
[parser add:[[PKLiteral literalWithString:@"google"] discard]];
PKAlternation *optionalFor = [PKAlternation alternation];
[optionalFor add:[PKEmpty empty]];
[optionalFor add:[PKLiteral literalWithString:@"for"]];
[parser add:[optionalFor discard]];
PKParser *searchTerm = [PKQuotedString quotedString];
[searchTerm setAssembler:self selector:@selector(workOnSearchTermAssembly:)];
[parser add:searchTerm];
PKAssembly *result = [parser bestMatchFor:[PKTokenAssembly assmeblyWithString:s]];
NSLog(@" %@", result);
// output:
// ['iphone']search/google/'iphone'^
}
...
- (void)workOnSearchTermAssembly:(PKAssembly *)a {
PKToken *t = [a pop]; // a QuotedString token with a stringValue of 'iphone'
[self doGoogleSearchForTerm:t.stringValue];
}