Since writing a compiler is difficult, we need to structure the work. The standard way to do this is to split the compilation into multiple steps with well-defined expectations. Conceptually, these phases operate in series (although in practice they often alternate), each phase (except the first) takes the output of the previous phase as input. Usually, each stage is handled by a separate module. Some of these modules are written by hand, while others can be created based on specifications. Often, some modules can be used by multiple compilers. The general division into phases is described below. On some compilers, the order of the steps may be slightly different, some steps may be combined or split into multiple steps, or some additional steps may be inserted between those mentioned below.
Lexical analysis is the initial part of reading and analyzing the text of a program: the text is read and divided into tokens, each of which corresponds to a character in the programming language, for example, a variable name, keyword, class, or number. Parsing, at this stage, the list of tokens obtained as a result of lexical analysis is ordered in a tree structure (called a syntax tree) that reflects the structure of the program. This stage is often referred to as parsing. Type checking, this step parses the syntax tree to determine if the program violates certain consistency requirements, for example, if a variable is used but not declared, or if it is used in a context that does not make sense given the type of the variable, such as trying to use a boolean value as a function pointer. Intermediate code generation, the program is translated into a simple machine-independent intermediate language. Register placement, symbolic variable names used in intermediate code are converted to numbers, each of which corresponds to a register in the target machine code. Machine code generation, the intermediate language is translated into assembly language (textual representation of machine code) for a specific machine architecture. Assembly and linking, assembly language code are translated into a binary representation, and addresses of variables, functions, etc. are determined.
The first three phases are collectively called the front end of the compiler, and the last three phases are collectively called the back end. The middle part of the compiler in this context is only an intermediate code generator, but this often includes various optimizations and transformations of the intermediate code. Each phase, through verification and transformation, establishes stronger invariants with respect to the things that it conveys to the next so that it is easier to record each subsequent phase than if they had to take into account all the previous ones. For example, a type checker might assume there were no syntax errors, and code generation might assume there were no type errors. Building and linking are usually done by programs supplied by the manufacturer of the machine or operating system.
Today, together with you, we will develop an extended version of a calculator, which in the future will serve as the basis for creating a full-fledged frontend. For this, we need Linux OS, flex, and bison.
Lex is a computer program that generates lexical analyzers (“scanners” or “lexers”). Lex is commonly used with the yacc parser generator. Lex, originally written by Mike Lesk and Eric Schmidt and described in 1975, is the standard lexical analyzer generator on many Unix systems, and an equivalent tool is specified as part of the POSIX standard. Lex reads an input stream specifying the lexical analyzer and outputs source code implementing the lexer in the C programming language. In addition to C, some old versions of Lex could also generate a lexer in Ratfor. In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an assigned and thus identified meaning). A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, although the scanner is also a term for the first stage of a lexer. A lexer is generally combined with a parser, which together analyzes the syntax of programming languages, web pages, and so forth.
Programs that perform lexical analysis are called lexical analyzers or lexers. A lexer contains a tokenizer or scanner. If the lexical analyzer detects that the token is invalid, it generates an error. It reads character streams from the source code, checks for legal tokens, and pass the data to the syntax analyzer when it demands. Lexical analysis is the first phase of a compiler. It takes the modified source code from language preprocessors that are written in the form of sentences. The lexical analyzer breaks these syntaxes into a series of tokens, by removing any whitespace or comments in the source code.
If the lexical analyzer finds a token invalid, it generates an error. The lexical analyzer works closely with the syntax analyzer. It reads character streams from the source code, checks for legal tokens, and passes the data to the syntax analyzer when it demands. Lexemes are said to be a sequence of characters (alphanumeric) in a token. There are some predefined rules for every lexeme to be identified as a valid token. These rules are defined by grammar rules, by means of a pattern. A pattern explains what can be a token, and these patterns are defined by means of regular expressions. In a programming language, keywords, constants, identifiers, strings, numbers, operators, and punctuations symbols can be considered as tokens.
%{ #include <iostream> #include "scanner.h" #include "interpreter.h" #include "parser.hpp" #include "location.hh" #define yyterminate() McubyCalc::Parser::make_END(McubyCalc::location()); #define YY_USER_ACTION driver_.stateLocation(yyleng); %} %option nodefault %option noyywrap %option c++ %option yyclass="Scanner" %option prefix="McubyCalc_" %% [ \t] { // ignore all whitespace } [0-9]+\.[0-9]+ { #ifdef DEBUG_MCUBY_CALC std::cout << "Scanner float number: " << yytext << std::endl; #endif return McubyCalc::Parser::make_FLOAT(std::stof(std::string(yytext)), McubyCalc::location()); } [0-9]+ { #ifdef DEBUG_MCUBY_CALC std::cout << "Scanner integer number: " << yytext << std::endl; #endif return McubyCalc::Parser::make_INT(std::stoi(std::string(yytext)), McubyCalc::location()); } [0-9]+(\.([0-9]+)?([eE][-+]?[0-9]+)?|[eE][-+]?[0-9]+) { #ifdef DEBUG_MCUBY_CALC std::cout << "Scanner double number: " << yytext << std::endl; #endif return McubyCalc::Parser::make_DOUBLE(std::stod(std::string(yytext)), McubyCalc::location()); } \n { #ifdef DEBUG_MCUBY_CALC std::cout << "Scanner: 'NEWLINE'" << std::endl; #endif return McubyCalc::Parser::make_NEWLINE(McubyCalc::location()); } "=" { #ifdef DEBUG_MCUBY_CALC std::cout << "Scanner: '='" << std::endl; #endif return McubyCalc::Parser::make_EQUAL(McubyCalc::location()); } "+" { #ifdef DEBUG_MCUBY_CALC std::cout << "Scanner: '+'" << std::endl; #endif return McubyCalc::Parser::make_PLUS(McubyCalc::location()); } "-" { #ifdef DEBUG_MCUBY_CALC std::cout << "Scanner: '-'" << std::endl; #endif return McubyCalc::Parser::make_MINUS(McubyCalc::location()); } "*" { #ifdef DEBUG_MCUBY_CALC std::cout << "Scanner: '*'" << std::endl; #endif return McubyCalc::Parser::make_MULTIPLICATION(McubyCalc::location()); } "%" { #ifdef DEBUG_MCUBY_CALC std::cout << "Scanner: '%'" << std::endl; #endif return McubyCalc::Parser::make_MOD(McubyCalc::location()); } "/" { #ifdef DEBUG_MCUBY_CALC std::cout << "Scanner: '/'" << std::endl; #endif return McubyCalc::Parser::make_DIVIDE(McubyCalc::location()); } "(" { #ifdef DEBUG_MCUBY_CALC std::cout << "Scanner: '('" << std::endl; #endif return McubyCalc::Parser::make_LEFTPAR(McubyCalc::location()); } ")" { #ifdef DEBUG_MCUBY_CALC std::cout << "Scanner: ')'" << std::endl; #endif return McubyCalc::Parser::make_RIGHTPAR(McubyCalc::location()); } (?i:sin) { #ifdef DEBUG_MCUBY_CALC //computes sine (sinx) std::cout << "Scanner: 'sin'" << std::endl; #endif return McubyCalc::Parser::make_SIN(McubyCalc::location()); } (?i:cos) { #ifdef DEBUG_MCUBY_CALC //computes cosine (cosx) std::cout << "Scanner: 'cos'" << std::endl; #endif return McubyCalc::Parser::make_COS(McubyCalc::location()); } (?i:tan) { #ifdef DEBUG_MCUBY_CALC // computes tangent (tanx) std::cout << "Scanner: 'tan'" << std::endl; #endif return McubyCalc::Parser::make_TAN(McubyCalc::location()); } (?i:asin) { #ifdef DEBUG_MCUBY_CALC //computes arc sine (arcsinx) std::cout << "Scanner: 'asin'" << std::endl; #endif return McubyCalc::Parser::make_ASIN(McubyCalc::location()); } (?i:acos) { #ifdef DEBUG_MCUBY_CALC //computes arc cosine (arccosx) std::cout << "Scanner: 'acos'" << std::endl; #endif return McubyCalc::Parser::make_ACOS(McubyCalc::location()); } (?i:atan) { #ifdef DEBUG_MCUBY_CALC //computes arc tangent (arctanx) std::cout << "Scanner: 'atan'" << std::endl; #endif return McubyCalc::Parser::make_ATAN(McubyCalc::location()); } (?i:atan2) { #ifdef DEBUG_MCUBY_CALC //arc tangent, using signs to determine quadrants std::cout << "Scanner: 'atan2'" << std::endl; #endif return McubyCalc::Parser::make_ATAN2(McubyCalc::location()); } "^" { #ifdef DEBUG_MCUBY_CALC std::cout << "Scanner: '^'" << std::endl; #endif return McubyCalc::Parser::make_POWER(McubyCalc::location()); } (?i:sqrt) { #ifdef DEBUG_MCUBY_CALC //computes square root (√x) std::cout << "Scanner: 'sqrt'" << std::endl; #endif return McubyCalc::Parser::make_SQRT(McubyCalc::location()); } (?i:cbrt) { #ifdef DEBUG_MCUBY_CALC //computes cubic root (3√x) std::cout << "Scanner: 'cbrt'" << std::endl; #endif return McubyCalc::Parser::make_CBRT(McubyCalc::location()); } (?i:hypot) { #ifdef DEBUG_MCUBY_CALC //computes square root of the sum of the squares of two or three (C++17) given numbers (√x2+y2), (√x2+y2+z2) std::cout << "Scanner: 'hypot'" << std::endl; #endif return McubyCalc::Parser::make_HYPOT(McubyCalc::location()); } (?i:exp) { #ifdef DEBUG_MCUBY_CALC //returns e raised to the given power (ex) std::cout << "Scanner: 'exp'" << std::endl; #endif return McubyCalc::Parser::make_EXP(McubyCalc::location()); } (?i:exp2) { #ifdef DEBUG_MCUBY_CALC //returns e raised to the given power (2x) std::cout << "Scanner: 'exp2'" << std::endl; #endif return McubyCalc::Parser::make_EXP2(McubyCalc::location()); } (?i:log) { #ifdef DEBUG_MCUBY_CALC //computes natural (base e) logarithm (lnx) std::cout << "Scanner: 'log'" << std::endl; #endif return McubyCalc::Parser::make_LOG(McubyCalc::location()); } (?i:log10) { #ifdef DEBUG_MCUBY_CALC //computes common (base 10) logarithm (log10x) std::cout << "Scanner: 'log10'" << std::endl; #endif return McubyCalc::Parser::make_LOG10(McubyCalc::location()); } (?i:log2) { #ifdef DEBUG_MCUBY_CALC //base 2 logarithm of the given number (log2x) std::cout << "Scanner: 'log2'" << std::endl; #endif return McubyCalc::Parser::make_LOG2(McubyCalc::location()); } (?i:sinh) { #ifdef DEBUG_MCUBY_CALC //computes hyperbolic sine (sinhx) std::cout << "Scanner: 'sinh'" << std::endl; #endif return McubyCalc::Parser::make_SINH(McubyCalc::location()); } (?i:cosh) { #ifdef DEBUG_MCUBY_CALC //computes hyperbolic cosh (coshx) std::cout << "Scanner: 'cosh'" << std::endl; #endif return McubyCalc::Parser::make_COSH(McubyCalc::location()); } (?i:tanh) { #ifdef DEBUG_MCUBY_CALC //computes hyperbolic tanh (tanhhx) std::cout << "Scanner: 'tanh'" << std::endl; #endif return McubyCalc::Parser::make_TANH(McubyCalc::location()); } (?i:asinh) { #ifdef DEBUG_MCUBY_CALC //computes the inverse hyperbolic sine (arsinhx) std::cout << "Scanner: 'asinh'" << std::endl; #endif return McubyCalc::Parser::make_ASINH(McubyCalc::location()); } (?i:acosh) { #ifdef DEBUG_MCUBY_CALC //computes the inverse hyperbolic cosine (arcoshx) std::cout << "Scanner: 'acosh'" << std::endl; #endif return McubyCalc::Parser::make_ACOSH(McubyCalc::location()); } (?i:atanh) { #ifdef DEBUG_MCUBY_CALC //computes the inverse hyperbolic tangent (artanhx) std::cout << "Scanner: 'atanh'" << std::endl; #endif return McubyCalc::Parser::make_ATANH(McubyCalc::location()); } (?i:abs) { #ifdef DEBUG_MCUBY_CALC std::cout << "Scanner: 'abs'" << std::endl; #endif return McubyCalc::Parser::make_ABS(McubyCalc::location()); } (?i:exit) { #ifdef DEBUG_MCUBY_CALC std::cout << "Scanner: 'exit'" << std::endl; #endif return McubyCalc::Parser::make_EXIT(McubyCalc::location()); } <<EOF>> { return yyterminate(); } %%
GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser (either in C, C++, or Java) which reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar. The generated parsers are portable: they do not require any specific compilers. Bison by default generates LALR(1) parsers but it can also generate canonical LR, IELR(1), and GLR parsers.
Bison is a general-purpose parser generator that converts an annotated context-free grammar into a deterministic LR or generalized LR (GLR) parser employing LALR(1), IELR(1), or canonical LR(1) parser tables. Once you are proficient with Bison, you can use it to develop a wide range of language parsers, from those used in simple desk calculators to complex programming languages. Bison is upward compatible with Yacc: all properly-written Yacc grammars ought to work with Bison with no change. Anyone familiar with Yacc should be able to use Bison with little trouble. You need to be fluent in C, C++, or Java programming in order to use Bison.
We start by explaining the basic concepts of using Bison. Bison was written originally by Robert Corbett. Richard Stallman made it Yacc-compatible. Wilfred Hansen of Carnegie Mellon University added multi-character string literals and other features. Since then, Bison has grown more robust and evolved many other new features thanks to the hard work of a long list of volunteers. For details, see the THANKS and ChangeLog files included in the Bison distribution. In POSIX mode, Bison is compatible with Yacc, but also has several extensions over this earlier program, including:
- generation of counterexamples for conflicts,
- location tracking (e.g., file, line, column),
- rich and internationalize syntax error messages in the generated parsers,
- customizable syntax error generation,
- reentrant parsers,
- push parsers, with autocompletion,
- support for named references,
- several types of reports (graphical, XML) on the generated parser,
- support for several programming languages,
- etc.
Now we need to create a Bison file that matches the grammar described above. It will create a sort of AST using the AST classes already described. There is a subtle difference between using Bison with C and using Bison with C++, function prototypes. In the example, you see the prototypes for yyerror () and for yylex () , since you have to put them in case of C++, but in C you can forget them (I don’t advise you to do that, because it’s bad practice).
Syntactic analysis, or parsing, is needed to determine if the series of tokens given are appropriate in a language – that is, whether or not the sentence has the right shape/form. However, not all syntactically valid sentences are meaningful, further semantic analysis has to be applied for this. For syntactic analysis, context-free grammars and the associated parsing techniques are powerful enough to be used – this overall process is called parsing.
In syntactic analysis, parse trees are used to show the structure of the sentence, but they often contain redundant information due to implicit definitions (e.g., an assignment always has an assignment operator in it, so we can imply that), so syntax trees, which are compact representations are used instead. Trees are recursive structures, which complement CFGs nicely, as these are also recursive (unlike regular expressions).
There are many techniques for parsing algorithms (vs FSA-centred lexical analysis), and the two main classes of the algorithm are top-down and bottom-up parsing. Context-free grammars can be represented using Backus-Naur Form (BNF). BNF uses three classes of symbols: non-terminal symbols (phrases) enclosed by brackets <>, terminal symbols (tokens) that stand for themselves, and the meta symbol::= – is defined to be. As derivations are ambiguous, a more abstract structure is needed. Parse trees generalize derivations and provide structural information needed by the later stages of compilation.
%skeleton "lalr1.cc" /* -*- C++ -*- */ %require "3.0.4" %defines %define parser_class_name { Parser } /* Obsoleted by api.parser.class */ %define api.token.constructor %define api.value.type variant %define parse.assert %define api.namespace { McubyCalc } /* Purpose: The unqualified %code or %code requires should usually be more appropriate than %code top. However, occasionally it is necessary to insert code much nearer the top of the parser implementation file. */ %code requires { #include <iostream> #include <string> #include <cstdint> #include <cmath> using namespace std; namespace McubyCalc { class Scanner; class Interpreter; } } %code top { #include <iostream> #include "interpreter.h" #include "parser.hpp" #include "scanner.h" #include "location.hh" static McubyCalc::Parser::symbol_type yylex(McubyCalc::Scanner &scanner, McubyCalc::Interpreter &driver) { return scanner.get_next_token(); } using namespace McubyCalc; } %lex-param { McubyCalc::Scanner &scanner } %lex-param { McubyCalc::Interpreter &driver } %parse-param { McubyCalc::Scanner &scanner } %parse-param { McubyCalc::Interpreter &driver } %locations %define parse.trace %define parse.error verbose /* Add a prefix to the token names when generating their definition in the target language. */ %define api.token.prefix {MCUBY_} %token END 0 "EOF" %token <int> INT %token <float> FLOAT %token <double> DOUBLE %token MULTIPLICATION PLUS MINUS POWER DIVIDE MOD %token NEWLINE EXIT %token SIN COS SQRT EXP COSH SINH ABS TAN ASIN ACOS ATAN ATAN2 CBRT HYPOT EXP2 LOG LOG10 LOG2 TANH ASINH ACOSH ATANH %token EQUAL %token LEFTPAR %token RIGHTPAR %left PLUS MINUS %left MULTIPLICATION DIVIDE %type<int> expression %type<float> float_expression %type<double> double_expression %start calc %% calc: | calc line ; line: NEWLINE {} | double_expression EQUAL NEWLINE { std::cout << "\tResult: " << $1 << "\n"; } | float_expression EQUAL NEWLINE { std::cout << "\tResult: " << $1 << "\n"; } | expression EQUAL NEWLINE { std::cout << "\tResult: " << $1 << "\n"; } | EXIT NEWLINE { std::cout << "Bye bye!\n"; exit(0); } ; double_expression: DOUBLE { $$ = $1; } | double_expression PLUS double_expression { $$ = $1 + $3; } | double_expression PLUS float_expression { $$ = $1 + $3; } | double_expression PLUS expression { $$ = $1 + $3; } | float_expression PLUS double_expression { $$ = static_cast<double>($1) + $3; } | expression PLUS double_expression { $$ = static_cast<double>($1) + $3; } | double_expression MINUS double_expression { $$ = $1 - $3; } | double_expression MINUS float_expression { $$ = $1 - $3; } | double_expression MINUS expression { $$ = $1 - $3; } | float_expression MINUS double_expression { $$ = static_cast<double>($1) - $3; } | expression MINUS double_expression { $$ = static_cast<double>($1) - $3; } | double_expression MULTIPLICATION double_expression { $$ = $1 * $3; } | double_expression MULTIPLICATION float_expression { $$ = $1 * $3; } | double_expression MULTIPLICATION expression { $$ = $1 * $3; } | float_expression MULTIPLICATION double_expression { $$ = static_cast<double>($1) * $3; } | expression MULTIPLICATION double_expression { $$ = static_cast<double>($1) * $3; } | double_expression DIVIDE double_expression { $$ = $1 / $3; } | double_expression DIVIDE float_expression { $$ = $1 / $3; } | double_expression DIVIDE expression { $$ = $1 / $3; } | float_expression DIVIDE double_expression { $$ = static_cast<double>($1) / $3; } | expression DIVIDE double_expression { $$ = static_cast<double>($1) / $3; } | double_expression POWER double_expression { $$ = std::pow($1, $3); } | double_expression POWER float_expression { $$ = std::pow($1, $3); } | double_expression POWER expression { $$ = std::pow($1, $3); } | float_expression POWER double_expression { $$ = std::pow(static_cast<double>($1), $3); } | expression POWER double_expression { $$ = std::pow(static_cast<double>($1), $3); } | double_expression MOD double_expression { $$ = std::fmod($1, $3); } | double_expression MOD float_expression { $$ = std::fmod($1, $3); } | double_expression MOD expression { $$ = std::fmod($1, $3); } | float_expression MOD double_expression { $$ = std::fmod($1, $3); } | expression MOD double_expression { $$ = std::fmod($1, $3); } | SIN LEFTPAR double_expression RIGHTPAR { $$ = std::sin($3); } | COS LEFTPAR double_expression RIGHTPAR { $$ = std::cos($3); } | TAN LEFTPAR double_expression RIGHTPAR { $$ = std::tan($3); } | ASIN LEFTPAR double_expression RIGHTPAR { $$ = std::asin($3); } | ACOS LEFTPAR double_expression RIGHTPAR { $$ = std::acos($3); } | ATAN LEFTPAR double_expression RIGHTPAR { $$ = std::atan($3); } | SINH LEFTPAR double_expression RIGHTPAR { $$ = std::sinh($3); } | COSH LEFTPAR double_expression RIGHTPAR { $$ = std::cosh($3); } | TANH LEFTPAR double_expression RIGHTPAR { $$ = std::tanh($3); } | ASINH LEFTPAR double_expression RIGHTPAR { $$ = std::asinh($3); } | ACOSH LEFTPAR double_expression RIGHTPAR { $$ = std::acosh($3); } | SQRT LEFTPAR double_expression RIGHTPAR { $$ = std::sqrt($3); } | CBRT LEFTPAR double_expression RIGHTPAR { $$ = std::cbrt($3); } | EXP LEFTPAR double_expression RIGHTPAR { $$ = std::exp($3); } | EXP2 LEFTPAR double_expression RIGHTPAR { $$ = std::exp2($3); } | LOG LEFTPAR double_expression RIGHTPAR { $$ = std::log($3); } | LOG10 LEFTPAR double_expression RIGHTPAR { $$ = std::log10($3); } | LOG2 LEFTPAR double_expression RIGHTPAR { $$ = std::log2($3); } | double_expression NEWLINE { $$ = $1; } | MINUS double_expression { $$ = -$2; } | LEFTPAR double_expression RIGHTPAR { $$ = $2; } ; float_expression: FLOAT { $$ = $1; } | MINUS float_expression { $$ = -$2; } | LEFTPAR float_expression RIGHTPAR { $$ = $2; } | float_expression NEWLINE { $$ = $1; } | float_expression PLUS float_expression { $$ = $1 + $3; } | float_expression PLUS expression { $$ = $1 + $3; } | expression PLUS float_expression { $$ = $1 + $3; } | float_expression MINUS float_expression { $$ = $1 - $3; } | float_expression MINUS expression { $$ = $1 - $3; } | expression MINUS float_expression { $$ = $1 - $3; } | float_expression MULTIPLICATION float_expression { $$ = $1 * $3; } | float_expression MULTIPLICATION expression { $$ = $1 * $3; } | expression MULTIPLICATION float_expression { $$ = $1 * $3; } | float_expression DIVIDE float_expression { $$ = $1 / $3; } | float_expression DIVIDE expression { $$ = $1 / $3; } | expression DIVIDE float_expression { $$ = $1 / $3; } | float_expression MOD expression { $$ = std::fmod($1, $3); } | float_expression MOD float_expression { $$ = std::fmod($1, $3); } | expression MOD float_expression { $$ = std::fmod($1, $3); } | float_expression POWER float_expression { $$ = std::pow($1, $3); } | float_expression POWER expression { $$ = std::pow($1, $3); } | SQRT LEFTPAR float_expression RIGHTPAR { $$ = std::sqrt($3); } | CBRT LEFTPAR float_expression RIGHTPAR { $$ = std::cbrt($3); } | EXP LEFTPAR float_expression RIGHTPAR { $$ = std::exp($3); } | EXP2 LEFTPAR float_expression RIGHTPAR { $$ = std::exp2($3); } | LOG LEFTPAR float_expression RIGHTPAR { $$ = std::log($3); } | LOG10 LEFTPAR float_expression RIGHTPAR { $$ = std::log10($3); } | LOG2 LEFTPAR float_expression RIGHTPAR { $$ = std::log2($3); } | SQRT LEFTPAR expression RIGHTPAR { $$ = std::sqrt(static_cast<float>($3)); } | CBRT LEFTPAR expression RIGHTPAR { $$ = std::cbrt(static_cast<float>($3)); } | EXP LEFTPAR expression RIGHTPAR { $$ = std::exp(static_cast<float>($3)); } | EXP2 LEFTPAR expression RIGHTPAR { $$ = std::exp2(static_cast<float>($3)); } | LOG LEFTPAR expression RIGHTPAR { $$ = std::log(static_cast<float>($3)); } | LOG10 LEFTPAR expression RIGHTPAR { $$ = std::log10(static_cast<float>($3)); } | LOG2 LEFTPAR expression RIGHTPAR { $$ = std::log2(static_cast<float>($3)); } | SIN LEFTPAR float_expression RIGHTPAR { $$ = std::sin($3); } | COS LEFTPAR float_expression RIGHTPAR { $$ = std::cos($3); } | TAN LEFTPAR float_expression RIGHTPAR { $$ = std::tan($3); } | ASIN LEFTPAR float_expression RIGHTPAR { $$ = std::asin($3); } | ACOS LEFTPAR float_expression RIGHTPAR { $$ = std::acos($3); } | ATAN LEFTPAR float_expression RIGHTPAR { $$ = std::atan($3); } | SINH LEFTPAR float_expression RIGHTPAR { $$ = std::sinh($3); } | COSH LEFTPAR float_expression RIGHTPAR { $$ = std::cosh($3); } | TANH LEFTPAR float_expression RIGHTPAR { $$ = std::tanh($3); } | ASINH LEFTPAR float_expression RIGHTPAR { $$ = std::asinh($3); } | ACOSH LEFTPAR float_expression RIGHTPAR { $$ = std::acosh($3); } | SIN LEFTPAR expression RIGHTPAR { $$ = std::sin(static_cast<float>($3)); } | COS LEFTPAR expression RIGHTPAR { $$ = std::cos(static_cast<float>($3)); } | TAN LEFTPAR expression RIGHTPAR { $$ = std::tan(static_cast<float>($3)); } | ASIN LEFTPAR expression RIGHTPAR { $$ = std::asin(static_cast<float>($3)); } | ACOS LEFTPAR expression RIGHTPAR { $$ = std::acos(static_cast<float>($3)); } | ATAN LEFTPAR expression RIGHTPAR { $$ = std::atan(static_cast<float>($3)); } | SINH LEFTPAR expression RIGHTPAR { $$ = std::sinh(static_cast<float>($3)); } | COSH LEFTPAR expression RIGHTPAR { $$ = std::cosh(static_cast<float>($3)); } | TANH LEFTPAR expression RIGHTPAR { $$ = std::tanh(static_cast<float>($3)); } | ASINH LEFTPAR expression RIGHTPAR { $$ = std::asinh(static_cast<float>($3)); } | ACOSH LEFTPAR expression RIGHTPAR { $$ = std::acosh(static_cast<float>($3)); } ; expression: INT { $$ = $1; } | expression PLUS expression { $$ = $1 + $3; } | expression MINUS expression { $$ = $1 - $3; } | expression MULTIPLICATION expression { $$ = $1 * $3; } | expression MOD expression { $$ = $1 % $3; } | expression POWER expression { $$ = std::pow($1, $3); } | ABS LEFTPAR expression RIGHTPAR { $$ = std::abs($3); } | LEFTPAR expression RIGHTPAR { $$ = $2; } | MINUS expression { $$ = -$2; } | expression NEWLINE { $$ = $1; } ; %% // Finally the error member function reports the errors. void McubyCalc::Parser::error(const location &loc , const std::string &message) { std::cerr << loc << ": " << message << '\n'; }
#ifndef SCANNER_H #define SCANNER_H #define DEBUG_MCUBY_CALC #if ! defined(yyFlexLexerOnce) #undef yyFlexLexer #define yyFlexLexer McubyCalc_FlexLexer #include <FlexLexer.h> #endif #undef YY_DECL #define YY_DECL McubyCalc::Parser::symbol_type McubyCalc::Scanner::get_next_token() #include "parser.hpp" namespace McubyCalc { class Interpreter; class Scanner : public yyFlexLexer { public: Scanner(Interpreter &driver) : driver_(driver) {} virtual ~Scanner() {} virtual McubyCalc::Parser::symbol_type get_next_token(); private: Interpreter &driver_; }; } #endif
#ifndef INTERPRETER_H #define INTERPRETER_H #include "scanner.h" #include "parser.hpp" namespace McubyCalc { class Interpreter { public: Interpreter(); int parse(); friend class Parser; friend class Scanner; private: void stateLocation(unsigned int loc); unsigned int location() const; private: Scanner scanner_; Parser parser_; unsigned int location_; }; } #endif // INTERPRETER_H
#include "interpreter.h" #include <sstream> using namespace McubyCalc; Interpreter::Interpreter() : scanner_(*this), parser_(scanner_, *this), location_(0) { } int Interpreter::parse() { location_ = 0; return parser_.parse(); } void Interpreter::stateLocation(unsigned int loc) { location_ += loc; #ifdef DEBUG_MCUBY_CALC std::cout << "stateLocation: " << loc << ", location_ = " << location_ << std::endl; #endif } unsigned int Interpreter::location() const { return location_; }
#include "scanner.h" #include "parser.hpp" #include "interpreter.h" using namespace McubyCalc; int main() { Interpreter interpreter; interpreter.parse(); return 0; }
all: flex -o scanner.cpp scanner.l bison -o parser.cpp parser.y g++ -g main.cpp scanner.cpp parser.cpp interpreter.cpp -o calc clean: rm -rf scanner.cpp rm -rf parser.cpp parser.hpp location.hh position.hh stack.hh rm -rf calc
4 thoughts on “A simple calculator with Bison and Flex.”
Bison programs have (not by coincidence) the same three-part structure as flex programs, with declarations, rules, and C code. The declarations here include C code to be copied to the beginning of the generated C parser, again enclosed in
It compiles fine but how can it be used?
If I type:
2+3 and hit
I get a few prints but no resul:
$ ./calc.exe
2+3
stateLocation: 1, location_ = 1
Scanner integer number: 2
stateLocation: 1, location_ = 2
Scanner: ‘+’
stateLocation: 1, location_ = 3
Scanner integer number: 3
stateLocation: 1, location_ = 4
Scanner: ‘NEWLINE’
need to add equals
you need to enter “2 + 3 =” and press Enter
Comments are closed.