[Top] [All Lists]

Re: Combining tokens

Subject: Re: Combining tokens
From: Istvan Sandor
Date: Mon, 15 Mar 2010 11:36:03 +0100

On Mon, Mar 15, 2010 at 10:57:11AM +0100, S?ren Andersen wrote:
> Consider a language with all the normal expressions - you can add,
> subtract, multiply, etc.  Now, you'd like for the user to be able to
> define his own operators - for instance, '+?' or something like that.
> In order to help with ambiguities, you decide these user defined
> operators must be at least 2 "elements" long (I'm specifically NOT
> using the word "tokens" here for reasons to become clear).  So, you'll
> allow '++' and '-+', etc.
> Now, the problem is that this still ends in shift / reduce conflicts -
> mainly because if you write this naturally:
> UserOp = PossOp PossOp*; PossOp = '+' | '-' | '*' | ....;

I think your problem can be much more easier solved on the lexer level
than on the grammar level. That is, you can handle all user-defined
operators as *one* token on the grammar level, like this:

expr: expr builtin_op expr
    | expr USEROP expr

builtin_op: '+' | '-' | '*' ;

Here builtin_ops are the "usual" operators represented by their own
character value and user-defined operators are represented by the USEROP
token. In this situation you would have to set up the lexer in a way
that when it sees a builtin_op (either a single '+', '-' or '*') it
returns the character itself and when it sees a user-defined operator it
returns USEROP. USEROP's semantic value can then hold the real meaning
of USEROP for example as a symbol table pointer:

%token <symptr> USEROP

> I realize that what I'm asking is... Somewhat unorthodox. :) But is it
> possible?

In fact it's not unorthodox at all :-) Many languages allow the
definition of new operators.

> In either Bison or another system? It would seem a
> relatively simple change to make as you basically just need to turn
> whitespace-awareness back on for some rules to disallow whitespace
> inside them?

Whitespace is also usually handled by the lexer, bison doesn't care
about whitespace or non-whitespace characters per se, it cares about
tokens and tokens are generated by the lexer. Of course you can have
your lexer selectively disregard whitespace for some parts of your
grammar, but I don't think it would be a good solution to this
particular problem.

I hope this short explanation was clear, if not, i can explain in more
detail :-)



<Prev in Thread] Current Thread [Next in Thread>