ParseKit Cocoa Objective C Framework for parsing, tokenizing and language processing

ParseKit

ParseKit is a Mac OS X Framework written by Todd Ditchendorf in Objective-C and released under the Apache 2 Open Source License. ParseKit is suitable for use on Mac OS X Leopard and later or iOS. ParseKit is an Objective-C is heavily influced by ANTLR by Terence Parr and “Building Parsers with Java” by Steven John Metsker. Also, ParseKit depends on MGTemplateEngine by Matt Gemmell for its templating features.

The ParseKit Framework offers 3 basic services of general interest to Cocoa developers:

  1. String Tokenization via the Objective-C PKTokenizer and PKToken classes.
  2. High-Level Language Parsing via Objective-C – An Objective-C parser-building API (the PKParser class and sublcasses).
  3. Objective-C Parser Generation via Grammars – Generate Objective-C source code for parser for your custom language using a BNF-style grammar syntax (similar to yacc or ANTLR). While parsing, the parser will provide callbacks to your Objective-C code.

The ParseKit source code is available on Github.

More documentation:

  • Instructions for including ParseKit in your iOS app
  • Instructions for including ParseKit in your OS X app
  • Doxygen-generated Header Docs

Projects using ParseKit:

  • SQLite Professional: Mac SQLite tool by Kyle Hankinson
  • Base: Mac SQLite tool by Ben Barnett
  • SQL Client: Microsoft SQL tool for OS X by Kyle Hankinson
  • TaskPaper for iPhone: Simple to-do lists app by Jesse Grosjean
  • Worqshop: Development environment for iOS with GitHub support by Donny Kurniawan
  • JSTalk: Interprocess Cocoa scripting with JavaScript by Gus Mueller
  • Spike: A Rails log file viewer/analyzer by Matt Mower
  • BayesianKit: A Cocoa framework implementing a bayesian classifier by Samuel Mendes
  • Cocoa ODBC Framework: A Cocoa framework for ODBC access by Mikael Hakman
  • Objective-J Port of ParseKit by Ross Boucher
  • HTTP Client: HTTP debugging/testing tool
  • Fluid: Site-Specific Browser for Mac OS X
  • Cruz: Social Browser for Mac OS X
  • Fake: A Recordable/Automated Browser for Mac OS X
  • Shapes: Simple, Elegant Diagramming tool for Mac OS X
  • OkudaKit: Syntax Highlighting Framework for Mac OS X
  • Exedore: XPath 1.0 implemented in Cocoa (ported from Saxon)

Xcode Project

The ParseKit Xcode project consists of 6 targets:

  1. ParseKit : the ParseKit Objective-C framework. The central feature/codebase of this project.
  2. libParseKit : the ParseKit Framework as a static library for Mac OS X applications.
  3. libParseKitMobile : the ParseKit Framework as a static library for iOS applications.
  4. ParserGenApp : a simple Mac app that can convert your ParseKit grammars into Objective-C parser source code.
  5. Tests : a UnitTest Bundle containing hundreds of unit tests (or more correctly, interaction tests) for the framework as well as some example classes that serve as real-world uses of the framework.
  6. DemoApp : a simple Cocoa demo app that gives a visual presentation of the results of tokenizing text using the PKTokenizer class.
  7. DebugApp : a simple Cocoa app that exists only to run arbitrary test code thru GDB with breakpoints for debugging (I was not able to do that with the UnitTest bundle).

ParseKit Framework


Tokenization

The API for tokenization is provided by the PKTokenizer class. Cocoa developers will be familiar with the NSScanner class provided by the Foundation Framework which provides a similar service. However, the PKTokenizer class is simpler and more powerful for many use cases.

Example usage:

NSString *s = @"\"It's 123 blast-off!\", she said, // watch out!\n"
              @"and <= 3.5 'ticks' later /* wince */, it's blast-off!";
PKTokenizer *t = [PKTokenizer tokenizerWithString:s];

PKToken *eof = [PKToken EOFToken];
PKToken *tok = nil;

while ((tok = [t nextToken]) != eof) {
    NSLog(@" (%@)", tok);
}

outputs:

 ("It's 123 blast-off!")
 (,)
 (she)
 (said)
 (,)
 (and)
 (<=)
 (3.5)
 ('ticks')
 (later)
 (,)
 (it's)
 (blast-off)
 (!)

Each token produced is an object of class PKToken. PKTokens have a tokenType (Word, Symbol, Number, QuotedString, etc.) and both a stringValue and a floatValue.

More information about a token can be easily discovered using the -debugDescription method instead of the default -description. Replace the line containing NSLog above with this line:

NSLog(@"%@", [tok debugDescription]);

and each token’s type will be printed as well:

 <Quoted String «"It's 123 blast-off!"»>
 <Symbol «,»>
 <Word «she»>
 <Word «said»>
 <Symbol «,»>
 <Word «and»>
 <Symbol «<=»>
 <Number «3.5»>
 <Quoted String «'ticks'»>
 <Word «later»>
 <Symbol «,»>
 <Word «it's»>
 <Word «blast-off»>
 <Symbol «!»>

As you can see from the output, PKTokenzier is configured by default to properly group characters into tokens including:

  • single- and double-quoted string tokens
  • common multiple character symbols (<=)
  • apostrophes, dashes and other symbol chars that should not signal the start of a new Symbol token, but rather be included in the current Word or Number token (it's, blast-off, 3.5)
  • silently ignoring C- and C++-style comments
  • silently ignoring whitespace

The PKTokenizer class is very flexible, and all of those features are configurable. PKTokenizer may be configured to:

  • recognize more (or fewer) multi-char symbols. ex:
    [t.symbolState add:@"!="];

    allows != to be recognized as a single Symbol token rather than two adjacent Symbol tokens

  • add new internal symbol chars to be included in the current Word token OR recognize internal symbols like apostrophe and dash to actually signal a new Symbol token rather than being part of the current Word token. ex:
    [t.wordState setWordChars:YES from:'_' to:'_'];

    allows Word tokens to contain internal underscores

    [t.wordState setWordChars:NO from:'-' to:'-'];

    disallows Word tokens from containing internal dashes.

  • change which chars signal the start of a token of any given type. e.g.:
    [t setTokenizerState:t.wordState from:'_' to:'_'];

    allows Word tokens to start with underscore

    [t setTokenizerState:t.quoteState from:'*' to:'*'];

    allows Quoted String tokens to start with an asterisk, effectively making * a new quote symbol (like " or ')

  • turn off recognition of single-line “slash-slash” (//) comments. ex:
    [t setTokenizerState:t.symbolState from:'/' to:'/'];

    slash chars now produce individual Symbol tokens rather than causing the tokenizer to strip text until the next newline char or begin striping for a multiline comment if appropriate (/*)

  • turn on recognition of “hash” (#) single-line comments. ex:
    [t setTokenizerState:t.commentState from:'#' to:'#'];
    [t.commentState addSingleLineStartSymbol:@"#"];
  • turn on recognition of “XML/HTML” (<!-- -->) multi-line comments. ex:
    [t setTokenizerState:t.commentState from:'<' to:'<'];
    [t.commentState addMultiLineStartSymbol:@"<!--" endSymbol:@"-->"];
  • report (rather than silently consume) Comment tokens. ex:
    t.commentState.reportsCommentTokens = YES; // default is NO
  • report (rather than silently consume) Whitespace tokens. ex:
    t.whitespaceState.reportsWhitespaceTokens = YES; // default is NO
  • turn on recognition of any characters (say, digits) as whitespace to be silently ignored. ex:
    [t setTokenizerState:t.whitespaceState from:'0' to:'9'];

Parsing

ParseKit also includes a collection of token parser subclasses (of the abstract PKParser class) including collection parsers such as PKAlternation, PKSequence, and PKRepetition as well as terminal parsers including PKWord, PKNum, PKSymbol, PKQuotedString, etc. Also included are parser subclasses which work in individual chars such as PKChar, PKDigit, and PKSpecificChar. These char parsers are useful for things like RegEx parsing. Generally speaking though, the token parsers will be more useful and interesting.

The parser classes represent a Composite pattern. Programs can build a composite parser, in Objective-C (rather than a separate language like with lex&yacc), from a collection of terminal parsers composed into alternations, sequences, and repetitions to represent an infinite number of languages.

Parsers built from ParseKit are non-deterministic, recursive descent parsers, which basically means they trade some performance for ease of user programming and simplicity of implementation.

Here is an example of how one might build a parser for a simple voice-search command language (note: ParseKit does not include any kind of speech recognition technology). The language consists of:

search google for? <search-term>
...

	[self parseString:@"search google 'iphone'"];
...
	
- (void)parseString:(NSString *)s {
	PKSequence *parser = [PKSequence sequence];

	[parser add:[[PKLiteral literalWithString:@"search"] discard]];
	[parser add:[[PKLiteral literalWithString:@"google"] discard]];

	PKAlternation *optionalFor = [PKAlternation alternation];
	[optionalFor add:[PKEmpty empty]];
	[optionalFor add:[PKLiteral literalWithString:@"for"]];

	[parser add:[optionalFor discard]];

	PKParser *searchTerm = [PKQuotedString quotedString];
	[searchTerm setAssembler:self selector:@selector(workOnSearchTermAssembly:)];
	[parser add:searchTerm];

	PKAssembly *result = [parser bestMatchFor:[PKTokenAssembly assmeblyWithString:s]];
	
	NSLog(@" %@", result);

	// output:
	//  ['iphone']search/google/'iphone'^
}

...

- (void)workOnSearchTermAssembly:(PKAssembly *)a {
	PKToken *t = [a pop]; // a QuotedString token with a stringValue of 'iphone'
	[self doGoogleSearchForTerm:t.stringValue];
}

Health’s Influence on Life Insurance Policies

life insurance coverage
Life Insurance – Coverage For Loved Ones

So, you decide to begin the search for the right life insurance plan. There is not any way around it, eventually everyone will almost certainly demand a policy available to shield their loved ones from financial burdens whenever they aren’t around any further. But, being aware what affects your policy rates and ability to obtain a policy with a particular insurer is one thing not everybody understands. So, consider these five factors underwriters do take a look at, when determining rates plus your eligibility for the policy.

1. Diabetes – A carrier would like to understand how you treat the trouble and make it. So, once its disclosed on the application that you’re a diabetic expect further questions from your insurer under consideration. The more well maintained the situation is, along with the better you might be at managing your problem, the better it is to find a life insurance policy. A bad impact in your rates will likely come if additional health issues or ailments are disclosed together with your diabetes.

2. Smokers beware – One of the biggest variations in life insurance policy rates is seen between smokers and non-smokers. Constentability clauses can also be included with most life policies, which basically ensures they can deny an insurance claim as a result of misrepresentation or fraud by those people who are smokers and boast of being non-smokers. So, even if you know your rates is going to be higher, be certain that you’re honest and upfront regarding how much and exactly how often you smoke.

3. Snore (as well as other sleep conditions) – From mild to severe, anti snoring will come in many shapes, forms, and sizes. An apnea/hypopnea index (AHI) is generally considered when insurers determine your rates. When untreated, stop snoring can lead to other conditions which include hypertension. This in turn can further hike up those rates when getting the insurance rates.

4. Depression – Should you suffer severe or high degrees of depression you may be considered a “high risk” category individual. Cardiovascular issues, high levels of stress, and immune defense issues all can stem from depression. Medications you’re taking, and manners in which you are your depression will be considered when insurers are determining rates at the same time.

5. Family track record – Underwriters need to know what individuals within your family have suffered from during the past. Because of the fact that there are many health problems, specially in the past when medicine wasn’t as advanced, underwriters will request a complete history to discover your rates. If issues found are minor this shouldn’t affect rates, but diabetes, cancer, or occurrences of heart problems inherited will greatly affect your coverage rates. Take this into account when getting the quotes for coverage.

At the JK Insurance Hickory of NC you will end up walked through each step of the process when you are receiving an insurance quote. Everything will be told you prior to choosing an insurance plan coverage by an underwriter that’s working on your insurance coverage quote. Each detail will likely be discussed and then for any questions you might have about coverage and rates will probably be answered. Get started protecting your family’s future today! Contact us to get the quote process started.