Struct xxcalc::tokenizer::Tokenizer [] [src]

pub struct Tokenizer { /* fields omitted */ }

Tokenizer performs the very first step of parsing mathematical expression into Tokens. These tokens can be then processed by TokensProcessor.

Tokenizer is a state machine, which can be reused multiple times. Internally it stores a buffer of Tokens, which can be reused multiple times without requesting new memory from the operating system. If Tokenizer lives long enough this behaviour can greatly reduce time wasted on mallocs.

Examples

let mut tokenizer = Tokenizer::default();

{
  let tokens = tokenizer.process("2.0+2");
  assert_eq!(tokens[0], (0, Token::Number(2.0)));
  assert_eq!(tokens[1], (3, Token::Operator('+')));
  assert_eq!(tokens[2], (4, Token::Number(2.0)));
}

{
  let tokens = tokenizer.process("x+log10(100)+x");
  assert_eq!(tokens[0], (0, Token::Identifier(0)));
  assert_eq!(tokens.identifiers[0], "x");
  assert_eq!(tokens[1], (1, Token::Operator('+')));
  assert_eq!(tokens[2], (2, Token::Identifier(1)));
  assert_eq!(tokens.identifiers[1], "log10");
  assert_eq!(tokens[3], (7, Token::BracketOpening));
  assert_eq!(tokens[4], (8, Token::Number(100.0)));
  assert_eq!(tokens[5], (11, Token::BracketClosing));
  assert_eq!(tokens[6], (12, Token::Operator('+')));
  assert_eq!(tokens[7], (13, Token::Identifier(0)));
}Run

Trait Implementations

impl Default for Tokenizer
[src]

Creates a new default Tokenizer.

Such tokenizer is optimized (but not limited) for values up to 10 characters and up to 10 tokens. However these are default space capacities and they can extend dynamically.

Returns the "default value" for a type. Read more

impl StringProcessor for Tokenizer
[src]

This is a main processing unit in the tokenizer. It takes a string expression and creates a list of tokens representing this string using a state machine.

This tokenizer supports floating point numbers in traditional and scientific notation (as well as shorthand point notation), text identifiers and operators such as +, -, *, /, ^ and =. Parentheses () and comma , are supported too. Whitespaces are always skipped, not recognized characters are wrapped into Unknown token.

Signed numbers are detected when they cannot be mistaken for operators + or -. Implicit multiplication before an identifier or a parantheses is replaced with explicit multiplication with * operator.

Extending

New features can be add to tokenizer by either embedding this tokenizer into new one and replacing Unknown tokens with some other tokens or by implementing a TokensProcessor which takes output of this tokenizer and replaces Unknown tokens or some combination of tokens with other ones.

State machine

Complete, hand-designed state machine used by this StringProcessor can be seen in the image below:

Tokenizer State Machine