Boost Spirit Karma による printf 互換の浮動小数点出力、符号付き整数出力

by taiji at 2013/06/04 23:50

Boost Karma 事始め

Boost Spirit のパーサ Qi に対応するジェネレータ Karma を使うと、以下のように型安全な printf 風の出力が簡単に行える。

#include <iostream>
#include <cmath>
#include <boost/spirit/include/karma.hpp>

namespace karma = boost::spirit::karma;

int main()
{
  long double v[] = { std::exp(1), std::sqrt(2), };
  std::cout << karma::format(karma::long_double << ',' << karma::long_double, v[0], v[1]) << std::endl;
  return 0;
}
2.718,1.414

それどころか、様々なコンテナに対応しているので、std::vector に格納された要素の出力も極めて簡単である。

#include <iostream>
#include <cmath>
#include <vector>
#include <boost/assign.hpp>
#include <boost/spirit/include/karma.hpp>

namespace karma = boost::spirit::karma;

int main()
{
  std::vector<long double> v = boost::assign::list_of(std::exp(1))(std::sqrt(2));
  std::cout << karma::format(karma::long_double % ',', v) << std::endl;
  return 0;
}
2.718,1.414

Boost Karma による C99 printf double ステージ1

しかし、fprintf のような書式指定を行おうとすると少々厄介なので、ここでは浮動小数点に関して C99 printf 互換のジェネレータを用意してしまおう。

具体的には、以下のような C99 printf 書式指定のような浮動小数点の表示を、段階的に可能としていこう。

        scientific_real
%.3e    1.000e+00
%e      1.000000e+00
%+e     +1.000000e+00
% e      1.000000e+00

        fixed_real
%.3f    1.000
%f      1.000000
%+f     +1.000000
% f      1.000000

        general_real
%.3g    1
%g      1
%+g     +1
% g      1

        alt_general_real
%#.3g   1.00
%#g     1.00000
%#+g    +1.00000
%# g     1.00000

ちなみに、幅指定は最後の最後で対応する。

さて、Karma における浮動小数点の書式指定は real_policies<T> を継承したクラスにて行うものとなっているので、まずはこれらの「ポリシー」を用意しよう。

/*
  boost-spirit-karma-real_generators.hpp

  Copyright (C) 2013 Taiji Yamada <taiji@aihara.co.jp>

  Distributed under the Boost Software License, Version 1.0.
  (See http://www.boost.org/LICENSE_1_0.txt)
*/
#ifndef _boost_spirit_karma_real_generators_hpp_
#define _boost_spirit_karma_real_generators_hpp_

#include <boost/spirit/include/karma.hpp>

namespace boost { namespace spirit { namespace karma {

template <typename T>
struct scientific_real_policies : real_policies<T> {
  static int floatfield(T) { return real_policies<T>::fmtflags::scientific; }
};

template <typename T>
struct fixed_real_policies : scientific_real_policies<T> {
  static int floatfield(T) { return real_policies<T>::fmtflags::fixed; }
};

template <typename T>
struct alt_general_real_policies : fixed_real_policies<T> {
  static int floatfield(T n)
  {
    return real_policies<T>::floatfield(n);
  }
};

template <typename T>
struct general_real_policies : alt_general_real_policies<T> {
};

} } }

#endif

floatfield(T n) が常に real_policies<T>::fmtflags::scientific を返せば %e 相当、常に real_policies<T>::fmtflags::fixed を返せば %f 相当、既定の real_policies<T>::floatfield(n) に従えば %g 相当のようである。

例えばこれは、以下のように利用する。

#include <iostream>
#include <cmath>
#include <boost/spirit/include/karma.hpp>
#include "boost-spirit-karma-real_generators00.hpp"

namespace karma = boost::spirit::karma;
typedef karma::real_generator<long double, karma::scientific_real_policies<long double> > long_double_generator;

int main()
{
  long_double_generator long_double;
  long double v[] = { std::exp(1), std::sqrt(2), };
  std::cout << karma::format(long_double << ',' << long_double, v[0], v[1]) << std::endl;
  return 0;
}

この既定の段階では、以下のような表示形式になっている。

        scientific_real_policies<T>
%e      1.0e00                  T=double

        fixed_real_policies<T>
%f      1.0                     T=double

        general_real_policies<T>
%g      1.0                     T=double

まずは、精度を返す precision(T n) および、符号を正符号においても強制的に印字するか否かを返す force_sign(T n) で利用者がテンプレート引数で指定した属性を返すようにする。

--- boost-spirit-karma-real_generators00.hpp    2013-06-04 16:57:44.000000000 +0900
+++ boost-spirit-karma-real_generators01.hpp    2013-06-04 16:57:44.000000000 +0900
@@ -13,26 +13,28 @@
 
 namespace boost { namespace spirit { namespace karma {
 
-template <typename T>
+template <typename T, unsigned Precision = 6, bool ForceSign = false>
 struct scientific_real_policies : real_policies<T> {
+  static bool force_sign(T) { return ForceSign; }
   static int floatfield(T) { return real_policies<T>::fmtflags::scientific; }
+  static unsigned precision(T) { return Precision; }
 };
 
-template <typename T>
-struct fixed_real_policies : scientific_real_policies<T> {
+template <typename T, unsigned Precision = 6, bool ForceSign = false>
+struct fixed_real_policies : scientific_real_policies<T, Precision, ForceSign> {
   static int floatfield(T) { return real_policies<T>::fmtflags::fixed; }
 };
 
-template <typename T>
-struct alt_general_real_policies : fixed_real_policies<T> {
+template <typename T, unsigned Precision = 6, bool ForceSign = false>
+struct alt_general_real_policies : fixed_real_policies<T, Precision, ForceSign> {
   static int floatfield(T n)
   {
     return real_policies<T>::floatfield(n);
   }
 };
 
-template <typename T>
-struct general_real_policies : alt_general_real_policies<T> {
+template <typename T, unsigned Precision = 6, bool ForceSign = false>
+struct general_real_policies : alt_general_real_policies<T, Precision, ForceSign> {
 };
 
 }}}

Karma の既定の精度が 3 に対して、C99 printf の既定の精度は 6 なので、そのようにした。

この段階で以下のような表示形式になっている。

        scientific_real_policies<T, Precision=6, ForceSign=false>
%.3e    1.0e00                  T=double, Precision=3, ForceSign=false
%e      1.0e00                  T=double, Precision=6, ForceSign=false
%+e     +1.0e00                 T=double, Precision=6, ForceSign=true

        fixed_real_policies<T, Precision=6, ForceSign=false>
%.3f    1.0                     T=double, Precision=3, ForceSign=false
%f      1.0                     T=double, Precision=6, ForceSign=false
%+f     +1.0                    T=double, Precision=6, ForceSign=true

        general_real_policies<T, Precision=6, ForceSign=false>
%.3g    1.0                     T=double, Precision=3, ForceSign=false
%g      1.0                     T=double, Precision=6, ForceSign=false
%+g     +1.0                    T=double, Precision=6, ForceSign=true

キリの良い数字の場合、ゼロが切り詰められているのは、%g 以外では意図した書式ではないので、それに trailing_zeros(T n) で対応しよう。

--- boost-spirit-karma-real_generators01.hpp    2013-06-04 16:57:44.000000000 +0900
+++ boost-spirit-karma-real_generators02.hpp    2013-06-04 16:57:44.000000000 +0900
@@ -16,6 +16,7 @@
 template <typename T, unsigned Precision = 6, bool ForceSign = false>
 struct scientific_real_policies : real_policies<T> {
   static bool force_sign(T) { return ForceSign; }
+  static bool trailing_zeros(T) { return true; }
   static int floatfield(T) { return real_policies<T>::fmtflags::scientific; }
   static unsigned precision(T) { return Precision; }
 };
@@ -27,6 +28,7 @@
 
 template <typename T, unsigned Precision = 6, bool ForceSign = false>
 struct alt_general_real_policies : fixed_real_policies<T, Precision, ForceSign> {
+  static bool trailing_zeros(T) { return true; }
   static int floatfield(T n)
   {
     return real_policies<T>::floatfield(n);
@@ -35,6 +37,7 @@
 
 template <typename T, unsigned Precision = 6, bool ForceSign = false>
 struct general_real_policies : alt_general_real_policies<T, Precision, ForceSign> {
+  static bool trailing_zeros(T) { return false; }
 };
 
 }}}

この段階で以下のような表示形式になる。

        scientific_real_policies<T, Precision=6, ForceSign=false>
%.3e    1.000e00                T=double, Precision=3, ForceSign=false
%e      1.000000e00             T=double, Precision=6, ForceSign=false
%+e     +1.000000e00            T=double, Precision=6, ForceSign=true

        fixed_real_policies<T, Precision=6, ForceSign=false>
%.3f    1.000                   T=double, Precision=3, ForceSign=false
%f      1.000000                T=double, Precision=6, ForceSign=false
%+f     +1.000000               T=double, Precision=6, ForceSign=true

        general_real_policies<T, Precision=6, ForceSign=false>
%.3g    1.0                     T=double, Precision=3, ForceSign=false
%g      1.0                     T=double, Precision=6, ForceSign=false
%+g     +1.0                    T=double, Precision=6, ForceSign=true

        alt_general_real_policies<T, Precision=6, ForceSign=false>
%#.3g   1.000                   T=double, Precision=3, ForceSign=false
%#g     1.000000                T=double, Precision=6, ForceSign=false
%#+g    +1.000000               T=double, Precision=6, ForceSign=true

次に、%e の指数部の符号は常に表示するようにしよう。

--- boost-spirit-karma-real_generators02.hpp    2013-06-04 22:38:15.000000000 +0900
+++ boost-spirit-karma-real_generators03.hpp    2013-06-04 22:38:15.000000000 +0900
@@ -17,10 +17,21 @@
 struct scientific_real_policies : real_policies<T> {
   static bool force_sign(T) { return ForceSign; }
   static bool trailing_zeros(T) { return true; }
   static int floatfield(T) { return real_policies<T>::fmtflags::scientific; }
   static unsigned precision(T) { return Precision; }
+  template <typename CharEncoding, typename Tag, typename OutputIterator>
+  static bool exponent(OutputIterator &sink, long n)
+  {
+    long abs_n = traits::get_absolute_value(n);
+    bool r = char_inserter<CharEncoding, Tag>::call(sink, 'e') &&
+      sign_inserter::call(sink, /*traits::test_zero(n)*/false,
+                          traits::test_negative(n), /*false*/true);
+    if (r && abs_n < 10) // the C99 Standard requires at least two digits in the exponent
+      r = char_inserter<CharEncoding, Tag>::call(sink, '0');
+    return r && int_inserter<10>::call(sink, abs_n);
+  }
 };
 
 template <typename T, unsigned Precision = 6, bool ForceSign = false>
 struct fixed_real_policies : scientific_real_policies<T, Precision, ForceSign> {
   static int floatfield(T) { return real_policies<T>::fmtflags::fixed; }

すると以下のような表示形式になる。

        scientific_real_policies<T, Precision=6, ForceSign=false>
%.3e    1.000e+00               T=double, Precision=3, ForceSign=false
%e      1.000000e+00            T=double, Precision=6, ForceSign=false
%+e     +1.000000e+00           T=double, Precision=6, ForceSign=true

sign_inserter::call の第4引数が強制的に符号を印字するか否かを決定している。ちなみに、第2引数は、真のときに正符号の代わりに「空白」を印字してしまうので、ここでは偽を指定してそれを抑止している。

次に、正符号の代わりに「空白」を印字することを、利用者がテンプレート引数で選べるようにする。

--- boost-spirit-karma-real_generators03.hpp    2013-06-04 16:57:45.000000000 +0900
+++ boost-spirit-karma-real_generators04.hpp    2013-06-04 16:57:45.000000000 +0900
@@ -13,12 +13,18 @@
 
 namespace boost { namespace spirit { namespace karma {
 
-template <typename T, unsigned Precision = 6, bool ForceSign = false>
+template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false>
 struct scientific_real_policies : real_policies<T> {
   static bool force_sign(T) { return ForceSign; }
   static bool trailing_zeros(T) { return true; }
   static int floatfield(T) { return real_policies<T>::fmtflags::scientific; }
   static unsigned precision(T) { return Precision; }
+  template <typename OutputIterator>
+  static bool integer_part(OutputIterator &sink, T n, bool sign, bool force_sign)
+  {
+    return sign_inserter::call(sink, (force_sign && !sign && BlankSign) || traits::test_zero(n), sign, force_sign) &&
+      int_inserter<10>::call(sink, n);
+  }
   template <typename CharEncoding, typename Tag, typename OutputIterator>
   static bool exponent(OutputIterator &sink, long n)
   {
@@ -32,13 +38,18 @@
   }
 };
 
-template <typename T, unsigned Precision = 6, bool ForceSign = false>
-struct fixed_real_policies : scientific_real_policies<T, Precision, ForceSign> {
+template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false>
+struct fixed_real_policies : scientific_real_policies<T, Precision, ForceSign, BlankSign> {
   static int floatfield(T) { return real_policies<T>::fmtflags::fixed; }
+  template <typename OutputIterator>
+  static bool integer_part(OutputIterator &sink, T n, bool sign, bool force_sign)
+  {
+    return real_policies<T>::integer_part(sink, n, sign, force_sign);
+  }
 };
 
-template <typename T, unsigned Precision = 6, bool ForceSign = false>
-struct alt_general_real_policies : fixed_real_policies<T, Precision, ForceSign> {
+template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false>
+struct alt_general_real_policies : fixed_real_policies<T, Precision, ForceSign, BlankSign> {
   static bool trailing_zeros(T) { return true; }
   static int floatfield(T n)
   {
@@ -46,8 +57,8 @@
   }
 };
 
-template <typename T, unsigned Precision = 6, bool ForceSign = false>
-struct general_real_policies : alt_general_real_policies<T, Precision, ForceSign> {
+template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false>
+struct general_real_policies : alt_general_real_policies<T, Precision, ForceSign, BlankSign> {
   static bool trailing_zeros(T) { return false; }
 };
 

ここでは、scientific_real_policies<T> の仮数部を印字する integer_part 内の sign_inserter::call の第2引数は、真のときに正符号の代わりに「空白」を印字する条件を「正のとき空白としての符号が強制されるとき、または、仮数部がゼロのとき」として C99 printf 互換となるようにしている。

また、fixed_real_policies<T> の仮数部を印字する integer_part は、取り敢えず real_policies<T> のものを踏襲している。

この段階で以下のような表示形式になる。

        scientific_real_policies<T, Precision=6, ForceSign=false, BlankSign=false>
%.3e    1.000e+00               T=double, Precision=3, ForceSign=false, BlankSign=false
%e      1.000000e+00            T=double, Precision=6, ForceSign=false, BlankSign=false
%+e     +1.000000e+00           T=double, Precision=6, ForceSign=true, BlankSign=false
% e      1.000000e+00           T=double, Precision=6, ForceSign=true, BlankSign=true

        fixed_real_policies<T, Precision=6, ForceSign=false, BlankSign=false>
%.3f    1.000                   T=double, Precision=3, ForceSign=false, BlankSign=false
%f      1.000000                T=double, Precision=6, ForceSign=false, BlankSign=false
%+f     +1.000000               T=double, Precision=6, ForceSign=true, BlankSign=false
% f     +1.000000               T=double, Precision=6, ForceSign=true, BlankSign=true

        general_real_policies<T, Precision=6, ForceSign=false, BlankSign=false>
%.3g    1.0                     T=double, Precision=3, ForceSign=false, BlankSign=false
%g      1.0                     T=double, Precision=6, ForceSign=false, BlankSign=false
%+g     +1.0                    T=double, Precision=6, ForceSign=true, BlankSign=false
% g     +1.0                    T=double, Precision=6, ForceSign=true, BlankSign=true

        alt_general_real_policies<T, Precision=6, ForceSign=false, BlankSign=false>
%#.3g   1.000                   T=double, Precision=3, ForceSign=false, BlankSign=false
%#g     1.000000                T=double, Precision=6, ForceSign=false, BlankSign=false
%#+g    +1.000000               T=double, Precision=6, ForceSign=true, BlankSign=false
%# g    +1.000000               T=double, Precision=6, ForceSign=true, BlankSign=true

ここまでで %e については意図した書式になったが、まだまだ C99 printf 互換には程遠い。

次に % f に対応しよう。

--- boost-spirit-karma-real_generators04.hpp    2013-06-04 22:38:15.000000000 +0900
+++ boost-spirit-karma-real_generators05.hpp    2013-06-04 22:38:15.000000000 +0900
@@ -42,11 +42,12 @@
 struct fixed_real_policies : scientific_real_policies<T, Precision, ForceSign, BlankSign> {
   static int floatfield(T) { return real_policies<T>::fmtflags::fixed; }
   template <typename OutputIterator>
   static bool integer_part(OutputIterator &sink, T n, bool sign, bool force_sign)
   {
-    return real_policies<T>::integer_part(sink, n, sign, force_sign);
+    return sign_inserter::call(sink, force_sign && !sign && BlankSign, sign, force_sign) &&
+      int_inserter<10>::call(sink, n);
   }
 };
 
 template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false>
 struct alt_general_real_policies : fixed_real_policies<T, Precision, ForceSign, BlankSign> {

ここでは、fixed_real_policies<T> の仮数部を印字する integer_part 内の sign_inserter::call の第2引数は、真のときに正符号の代わりに「空白」を印字する条件を「正のとき空白としての符号が強制されるとき」として C99 printf 互換となるようにしている。

次に、%g では、キリの良い数字の場合、ゼロだけでなく小数点も不要だ。

--- boost-spirit-karma-real_generators05.hpp    2013-06-04 16:57:45.000000000 +0900
+++ boost-spirit-karma-real_generators06.hpp    2013-06-04 16:57:45.000000000 +0900
@@ -61,6 +61,26 @@
 template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false>
 struct general_real_policies : alt_general_real_policies<T, Precision, ForceSign, BlankSign> {
   static bool trailing_zeros(T) { return false; }
+  static int floatfield(T n)
+  {
+    if (traits::test_zero(n))
+      return real_policies<T>::fmtflags::fixed;
+    return alt_general_real_policies<T, Precision, ForceSign, BlankSign>::floatfield(n);
+  }
+  template <typename OutputIterator>
+  static bool dot(OutputIterator &sink, T n, unsigned precision)
+  {
+    if (traits::test_zero(n))
+      return true;
+    return alt_general_real_policies<T, Precision, ForceSign, BlankSign>::dot(sink, n, precision);
+  }
+  template <typename OutputIterator>
+  static bool fraction_part(OutputIterator &sink, T n, unsigned adjprec, unsigned precision)
+  {
+    if (traits::test_zero(n))
+      return true;
+    return fixed_real_policies<T, Precision, ForceSign, BlankSign>::fraction_part(sink, n, adjprec, precision);
+  }
 };
 
 }}}

小数がゼロの時は real_policies<T>::fmtflags::fixed で印字し、小数点や小数部は何もしない処理をしている。さもなくば、都合の良い継承しているメンバ関数を呼んでいる。

この段階で以下のような表示形式になる。

        scientific_real_policies<T, Precision=6, ForceSign=false, BlankSign=false>
%.3e    1.000e+00               T=double, Precision=3, ForceSign=false, BlankSign=false
%e      1.000000e+00            T=double, Precision=6, ForceSign=false, BlankSign=false
%+e     +1.000000e+00           T=double, Precision=6, ForceSign=true, BlankSign=false
% e      1.000000e+00           T=double, Precision=6, ForceSign=true, BlankSign=true

        fixed_real_policies<T, Precision=6, ForceSign=false, BlankSign=false>
%.3f    1.000                   T=double, Precision=3, ForceSign=false, BlankSign=false
%f      1.000000                T=double, Precision=6, ForceSign=false, BlankSign=false
%+f     +1.000000               T=double, Precision=6, ForceSign=true, BlankSign=false
% f      1.000000               T=double, Precision=6, ForceSign=true, BlankSign=true

        general_real_policies<T, Precision=6, ForceSign=false, BlankSign=false>
%.3g    1                       T=double, Precision=3, ForceSign=false, BlankSign=false
%g      1                       T=double, Precision=6, ForceSign=false, BlankSign=false
%+g     +1                      T=double, Precision=6, ForceSign=true, BlankSign=false
% g      1                      T=double, Precision=6, ForceSign=true, BlankSign=true

        alt_general_real_policies<T, Precision=6, ForceSign=false, BlankSign=false>
%#.3g   1.000                   T=double, Precision=3, ForceSign=false, BlankSign=false
%#g     1.000000                T=double, Precision=6, ForceSign=false, BlankSign=false
%#+g    +1.000000               T=double, Precision=6, ForceSign=true, BlankSign=false
%# g    +1.000000               T=double, Precision=6, ForceSign=true, BlankSign=true

相当よくなってきたが、致命的な問題が残っている。%#g の精度とは「有効数字」のことであり、他における「小数点以下の桁数」ではない。よって、続けて floatfield(T n)precision(T n) の2箇所修正してみよう。

--- boost-spirit-karma-real_generators06.hpp    2013-06-04 22:38:15.000000000 +0900
+++ boost-spirit-karma-real_generators07.hpp    2013-06-04 22:38:15.000000000 +0900
@@ -52,11 +52,14 @@
 template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false>
 struct alt_general_real_policies : fixed_real_policies<T, Precision, ForceSign, BlankSign> {
   static bool trailing_zeros(T) { return true; }
   static int floatfield(T n)
   {
-    return real_policies<T>::floatfield(n);
+    using namespace std;
+    T abs_n = traits::get_absolute_value(n);
+    return (!(floor(log10(abs_n)) < Precision) || abs_n < 1e-4) ?
+      real_policies<T>::fmtflags::scientific : real_policies<T>::fmtflags::fixed;
   }
 };
 
 template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false>
 struct general_real_policies : alt_general_real_policies<T, Precision, ForceSign, BlankSign> {
--- boost-spirit-karma-real_generators07.hpp    2013-06-04 16:57:45.000000000 +0900
+++ boost-spirit-karma-real_generators08.hpp    2013-06-04 16:57:45.000000000 +0900
@@ -59,6 +59,15 @@
     return (!(floor(log10(abs_n)) < Precision) || abs_n < 1e-4) ?
       real_policies<T>::fmtflags::scientific : real_policies<T>::fmtflags::fixed;
   }
+  static unsigned precision(T n)
+  {
+    using namespace std;
+    T abs_n = traits::get_absolute_value(n), l;
+    if ((l = floor(log10(1./abs_n))) > 1)
+      return Precision + (l - 1);
+    l = floor(log10(abs_n));
+    return Precision > l ? Precision - (l + 1) : Precision - 1;
+  }
 };
 
 template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false>

つまり、precision(T n) は有効桁 Precision から得られる小数点以下の桁数を返すようにし、他所との整合性をとった。

この段階で以下のような表示形式になる。

        scientific_real_policies<T, Precision=6, ForceSign=false, BlankSign=false>
%.3e    1.000e+00               T=double, Precision=3, ForceSign=false, BlankSign=false
%e      1.000000e+00            T=double, Precision=6, ForceSign=false, BlankSign=false
%+e     +1.000000e+00           T=double, Precision=6, ForceSign=true, BlankSign=false
% e      1.000000e+00           T=double, Precision=6, ForceSign=true, BlankSign=true

        fixed_real_policies<T, Precision=6, ForceSign=false, BlankSign=false>
%.3f    1.000                   T=double, Precision=3, ForceSign=false, BlankSign=false
%f      1.000000                T=double, Precision=6, ForceSign=false, BlankSign=false
%+f     +1.000000               T=double, Precision=6, ForceSign=true, BlankSign=false
% f      1.000000               T=double, Precision=6, ForceSign=true, BlankSign=true

        general_real_policies<T, Precision=6, ForceSign=false, BlankSign=false>
%.3g    1                       T=double, Precision=3, ForceSign=false, BlankSign=false
%g      1                       T=double, Precision=6, ForceSign=false, BlankSign=false
%+g     +1                      T=double, Precision=6, ForceSign=true, BlankSign=false
% g      1                      T=double, Precision=6, ForceSign=true, BlankSign=true

        alt_general_real_policies<T, Precision=6, ForceSign=false, BlankSign=false>
%#.3g   1.00                    T=double, Precision=3, ForceSign=false, BlankSign=false
%#g     1.00000                 T=double, Precision=6, ForceSign=false, BlankSign=false
%#+g    +1.00000                T=double, Precision=6, ForceSign=true, BlankSign=false
%# g     1.00000                T=double, Precision=6, ForceSign=true, BlankSign=true

ほぼ完璧と思ったら大間違い。精度が3で値が -0.0001 のように精度を超えた負値の場合、以下のように符号が消えてしまう。

        fixed_real_policies<T, Precision=6, ForceSign=false, BlankSign=false>
%.3f    0.000                   T=double, Precision=3, ForceSign=false, BlankSign=false
%#.3f   0.000                   T=double, Precision=3, ForceSign=false, BlankSign=false

いくら精度以下とはいえ、負のほぼ零と正のほぼ零は意味が全く異る。

        fixed_real_policies<T, Precision=6, ForceSign=false, BlankSign=false>
%.3f    -0.000                  T=double, Precision=3, ForceSign=false, BlankSign=false
%#.3f   -0.000                  T=double, Precision=3, ForceSign=false, BlankSign=false

これに対応するには、結局、オリジナルが処理していることと微妙に異なるコードを書く必要があった。

--- boost-spirit-karma-real_generators08.hpp    2013-06-04 16:57:46.000000000 +0900
+++ boost-spirit-karma-real_generators09.hpp    2013-06-04 16:57:46.000000000 +0900
@@ -13,8 +13,74 @@
 
 namespace boost { namespace spirit { namespace karma {
 
-template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false>
+template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false, typename CE = unused_type, typename CC = unused_type>
 struct scientific_real_policies : real_policies<T> {
+  typedef CE CharEncoding;
+  typedef CC Tag;
+  template <typename Inserter, typename OutputIterator, typename Policies>
+  static bool call(OutputIterator &sink, T n, Policies const &p)
+  {
+    bool force_sign = p.force_sign(n);
+    bool sign_val = false;
+    int flags = p.floatfield(n);
+    if (traits::test_negative(n)) {
+      n = -n;
+      sign_val = true;
+    }
+    unsigned precision = p.precision(n);
+    if (std::numeric_limits<T>::digits10) {
+      precision = (std::min)(precision,
+                             (unsigned)std::numeric_limits<T>::digits10 + 1);
+    }
+    using namespace std;
+    T dim = 0;
+    if (0 == (Policies::fmtflags::fixed & flags) && !traits::test_zero(n)) {
+      dim = log10(n);
+      if (dim > 0)
+        n /= spirit::traits::pow10<T>(traits::truncate_to_long::call(dim));
+      else if (n < 1.) {
+        long exp = traits::truncate_to_long::call(-dim);
+        if (exp != -dim)
+          ++exp;
+        dim = -exp;
+        n *= spirit::traits::pow10<T>(exp);
+      }
+    }
+    T integer_part;
+    T precexp = spirit::traits::pow10<T>(precision);
+    T fractional_part = modf(n, &integer_part);
+    fractional_part = floor(fractional_part * precexp + T(0.5));
+    if (fractional_part >= precexp) {
+      fractional_part = floor(fractional_part - precexp);
+      integer_part += 1;
+    }
+    T long_int_part = floor(integer_part);
+    T long_frac_part = fractional_part;
+    unsigned prec = precision;
+    if (!p.trailing_zeros(n)) {
+      T frac_part_floor = long_frac_part;
+      if (0 != long_frac_part) {
+        while (0 != prec &&
+               0 == traits::remainder<10>::call(long_frac_part)) {
+          long_frac_part = traits::divide<10>::call(long_frac_part);
+          --prec;
+        }
+      }
+      else
+        prec = 0;
+      if (precision != prec)
+        long_frac_part = frac_part_floor / spirit::traits::pow10<T>(precision-prec);
+    }
+    if (false &&
+        sign_val && traits::test_zero(long_int_part) && traits::test_zero(long_frac_part))
+      sign_val = false;
+    bool r = p.integer_part(sink, long_int_part, sign_val, force_sign);
+    r = r && p.dot(sink, long_frac_part, precision);
+    r = r && p.fraction_part(sink, long_frac_part, prec, precision);
+    if (r && 0 == (Policies::fmtflags::fixed & flags))
+      return p.template exponent<CharEncoding, Tag>(sink, traits::truncate_to_long::call(dim));
+    return r;
+  }
   static bool force_sign(T) { return ForceSign; }
   static bool trailing_zeros(T) { return true; }
   static int floatfield(T) { return real_policies<T>::fmtflags::scientific; }
@@ -38,8 +104,8 @@
   }
 };
 
-template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false>
-struct fixed_real_policies : scientific_real_policies<T, Precision, ForceSign, BlankSign> {
+template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false, typename CE = unused_type, typename CC = unused_type>
+struct fixed_real_policies : scientific_real_policies<T, Precision, ForceSign, BlankSign, CE, CC> {
   static int floatfield(T) { return real_policies<T>::fmtflags::fixed; }
   template <typename OutputIterator>
   static bool integer_part(OutputIterator &sink, T n, bool sign, bool force_sign)
@@ -49,8 +115,8 @@
   }
 };
 
-template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false>
-struct alt_general_real_policies : fixed_real_policies<T, Precision, ForceSign, BlankSign> {
+template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false, typename CE = unused_type, typename CC = unused_type>
+struct alt_general_real_policies : fixed_real_policies<T, Precision, ForceSign, BlankSign, CE, CC> {
   static bool trailing_zeros(T) { return true; }
   static int floatfield(T n)
   {
@@ -70,8 +136,8 @@
   }
 };
 
-template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false>
-struct general_real_policies : alt_general_real_policies<T, Precision, ForceSign, BlankSign> {
+template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false, typename CE = unused_type, typename CC = unused_type>
+struct general_real_policies : alt_general_real_policies<T, Precision, ForceSign, BlankSign, CE, CC> {
   static bool trailing_zeros(T) { return false; }
   static int floatfield(T n)
   {

call でやっていることは sign_val = false を抑制しているだけで、オリジナルとほぼ一緒。ついでに、大文字の %E, %G に対応させるための変更 typename CE = karma::char_encoding::ascii, typename CC = karma::tag::upper もやっているので注意。

Boost Karma による C99 printf double ステージ2

先の書式指定に加えて、以下のような幅指定及び各種フラグ -0 の追加に対応させよう。ひとつの値だけでも組み合わせはこれだけある。

%-12.3e         1.000e+00   
%-12e           1.000000e+00
%-+12e          +1.000000e+00
%- 12e           1.000000e+00
%-12.3f         1.000       
%-12f           1.000000    
%-+12f          +1.000000   
%- 12f           1.000000   
%-12.3g         1           
%-12g           1           
%-+12g          +1          
%- 12g           1          
%#-12.3g        1.00        
%#-12g          1.00000     
%#-+12g         +1.00000    
%#- 12g          1.00000    
%12.3e             1.000e+00
%12e            1.000000e+00
%+12e           +1.000000e+00
% 12e            1.000000e+00
%12.3f                 1.000
%12f                1.000000
%+12f              +1.000000
% 12f               1.000000
%12.3g                     1
%12g                       1
%+12g                     +1
% 12g                      1
%#12.3g                 1.00
%#12g                1.00000
%#+12g              +1.00000
%# 12g               1.00000
%12.3g                     1
%12g                       1
%+12g                     +1
% 12g                      1
%012.3e         0001.000e+00
%012e           1.000000e+00
%+012e          +1.000000e+00
% 012e           1.000000e+00
%012.3f         00000001.000
%012f           00001.000000
%+012f          +0001.000000
% 012f           0001.000000
%012.3g         000000000001
%012g           000000000001
%+012g          +00000000001
% 012g           00000000001
%#012.3g        000000001.00
%#012g          000001.00000
%#+012g         +00001.00000
%# 012g          00001.00000
%012.3g         000000000001
%012g           000000000001
%+012g          +00000000001
% 012g           00000000001

以上、C99 printf と一致していることは確認した。

--- boost-spirit-karma-real_generators09.hpp    2013-06-04 16:57:46.000000000 +0900
+++ boost-spirit-karma-real_generators10.hpp    2013-06-04 16:57:46.000000000 +0900
@@ -13,14 +13,18 @@
 
 namespace boost { namespace spirit { namespace karma {
 
-template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false, typename CE = unused_type, typename CC = unused_type>
+template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false,
+          int Width = 0, int Align = -1, char Pad = ' ',
+          typename CE = unused_type, typename CC = unused_type>
 struct scientific_real_policies : real_policies<T> {
+  typedef boost::mpl::int_<
+    karma::generator_properties::countingbuffer
+  > properties;
   typedef CE CharEncoding;
   typedef CC Tag;
   template <typename Inserter, typename OutputIterator, typename Policies>
-  static bool call(OutputIterator &sink, T n, Policies const &p)
+  static bool call_n(OutputIterator &sink, T n, Policies const &p, bool force_sign)
   {
-    bool force_sign = p.force_sign(n);
     bool sign_val = false;
     int flags = p.floatfield(n);
     if (traits::test_negative(n)) {
@@ -81,6 +85,43 @@
       return p.template exponent<CharEncoding, Tag>(sink, traits::truncate_to_long::call(dim));
     return r;
   }
+
+  template <typename Inserter, typename OutputIterator, typename Policies>
+  static bool call(OutputIterator &sink, T n, Policies const &p)
+  {
+    if (!Width)
+      return call_n<Inserter, OutputIterator, Policies>(sink, n, p, p.force_sign(n));
+    bool r = false;
+    if (Align < 0) { // left align
+      karma::detail::enable_counting<OutputIterator> counting(sink);
+      r = call_n<Inserter, OutputIterator, Policies>(sink, n, p, p.force_sign(n));
+      while (r && int(counting.count()) < Width) 
+        r = karma::generate(sink, ' ');
+    }
+    else { // right align
+      bool sign_val = false;
+      if (Pad == '0' && traits::test_negative(n)) {
+        n = -n;
+        sign_val = true;
+      }
+      karma::detail::enable_buffering<OutputIterator> buffering(sink, Width);
+      {
+        karma::detail::disable_counting<OutputIterator> nocounting(sink);
+        r = call_n<Inserter, OutputIterator, Policies>(sink, n, p, (Pad == '0' && p.force_sign(n)) ? false : p.force_sign(n));
+      }
+      buffering.disable();
+      karma::detail::enable_counting<OutputIterator> counting(sink, buffering.buffer_size());
+      if (sign_val || (Pad == '0' && p.force_sign(n)))
+        r = sign_inserter::call(sink, (force_sign && !sign_val && BlankSign) || traits::test_zero(n), sign_val, p.force_sign(n));
+      while (r && int(counting.count()) < Width) 
+        r = karma::generate(sink, Pad);
+      if (r) {
+        buffering.buffer_copy();
+      }
+    }
+    return r;
+  }
+
   static bool force_sign(T) { return ForceSign; }
   static bool trailing_zeros(T) { return true; }
   static int floatfield(T) { return real_policies<T>::fmtflags::scientific; }
@@ -104,8 +145,10 @@
   }
 };
 
-template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false, typename CE = unused_type, typename CC = unused_type>
-struct fixed_real_policies : scientific_real_policies<T, Precision, ForceSign, BlankSign, CE, CC> {
+template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false,
+          int Width = 0, int Align = -1, char Pad = ' ',
+          typename CE = unused_type, typename CC = unused_type>
+struct fixed_real_policies : scientific_real_policies<T, Precision, ForceSign, BlankSign, Width, Align, Pad, CE, CC> {
   static int floatfield(T) { return real_policies<T>::fmtflags::fixed; }
   template <typename OutputIterator>
   static bool integer_part(OutputIterator &sink, T n, bool sign, bool force_sign)
@@ -115,8 +158,10 @@
   }
 };
 
-template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false, typename CE = unused_type, typename CC = unused_type>
-struct alt_general_real_policies : fixed_real_policies<T, Precision, ForceSign, BlankSign, CE, CC> {
+template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false,
+          int Width = 0, int Align = -1, char Pad = ' ',
+          typename CE = unused_type, typename CC = unused_type>
+struct alt_general_real_policies : fixed_real_policies<T, Precision, ForceSign, BlankSign, Width, Align, Pad, CE, CC> {
   static bool trailing_zeros(T) { return true; }
   static int floatfield(T n)
   {
@@ -136,8 +181,10 @@
   }
 };
 
-template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false, typename CE = unused_type, typename CC = unused_type>
-struct general_real_policies : alt_general_real_policies<T, Precision, ForceSign, BlankSign, CE, CC> {
+template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false,
+          int Width = 0, int Align = -1, char Pad = ' ',
+          typename CE = unused_type, typename CC = unused_type>
+struct general_real_policies : alt_general_real_policies<T, Precision, ForceSign, BlankSign, Width, Align, Pad, CE, CC> {
   static bool trailing_zeros(T) { return false; }
   static int floatfield(T n)
   {

最後に全体像を示す。Boost Spirit Karma のサンプルは参考にしたが、実用的ではないので、以下のものでよいと思う。

/*
  boost-spirit-karma-real_generators.hpp

  Copyright (C) 2013 Taiji Yamada <taiji@aihara.co.jp>

  Distributed under the Boost Software License, Version 1.0.
  (See http://www.boost.org/LICENSE_1_0.txt)
*/
#ifndef _boost_spirit_karma_real_generators_hpp_
#define _boost_spirit_karma_real_generators_hpp_

#include <boost/spirit/include/karma.hpp>

namespace boost { namespace spirit { namespace karma {

template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false,
          int Width = 0, int Align = -1, char Pad = ' ',
          typename CE = unused_type, typename CC = unused_type>
struct scientific_real_policies : real_policies<T> {
  typedef boost::mpl::int_<
    karma::generator_properties::countingbuffer
  > properties;
  typedef CE CharEncoding;
  typedef CC Tag;
  template <typename Inserter, typename OutputIterator, typename Policies>
  static bool call_n(OutputIterator &sink, T n, Policies const &p, bool force_sign)
  {
    bool sign_val = false;
    int flags = p.floatfield(n);
    if (traits::test_negative(n)) {
      n = -n;
      sign_val = true;
    }
    unsigned precision = p.precision(n);
    if (std::numeric_limits<T>::digits10) {
      precision = (std::min)(precision,
                             (unsigned)std::numeric_limits<T>::digits10 + 1);
    }
    using namespace std;
    T dim = 0;
    if (0 == (Policies::fmtflags::fixed & flags) && !traits::test_zero(n)) {
      dim = log10(n);
      if (dim > 0)
        n /= spirit::traits::pow10<T>(traits::truncate_to_long::call(dim));
      else if (n < 1.) {
        long exp = traits::truncate_to_long::call(-dim);
        if (exp != -dim)
          ++exp;
        dim = -exp;
        n *= spirit::traits::pow10<T>(exp);
      }
    }
    T integer_part;
    T precexp = spirit::traits::pow10<T>(precision);
    T fractional_part = modf(n, &integer_part);
    fractional_part = floor(fractional_part * precexp + T(0.5));
    if (fractional_part >= precexp) {
      fractional_part = floor(fractional_part - precexp);
      integer_part += 1;
    }
    T long_int_part = floor(integer_part);
    T long_frac_part = fractional_part;
    unsigned prec = precision;
    if (!p.trailing_zeros(n)) {
      T frac_part_floor = long_frac_part;
      if (0 != long_frac_part) {
        while (0 != prec &&
               0 == traits::remainder<10>::call(long_frac_part)) {
          long_frac_part = traits::divide<10>::call(long_frac_part);
          --prec;
        }
      }
      else
        prec = 0;
      if (precision != prec)
        long_frac_part = frac_part_floor / spirit::traits::pow10<T>(precision-prec);
    }
    if (false &&
        sign_val && traits::test_zero(long_int_part) && traits::test_zero(long_frac_part))
      sign_val = false;
    bool r = p.integer_part(sink, long_int_part, sign_val, force_sign);
    r = r && p.dot(sink, long_frac_part, precision);
    r = r && p.fraction_part(sink, long_frac_part, prec, precision);
    if (r && 0 == (Policies::fmtflags::fixed & flags))
      return p.template exponent<CharEncoding, Tag>(sink, traits::truncate_to_long::call(dim));
    return r;
  }

  template <typename Inserter, typename OutputIterator, typename Policies>
  static bool call(OutputIterator &sink, T n, Policies const &p)
  {
    if (!Width)
      return call_n<Inserter, OutputIterator, Policies>(sink, n, p, p.force_sign(n));
    bool r = false;
    if (Align < 0) { // left align
      karma::detail::enable_counting<OutputIterator> counting(sink);
      r = call_n<Inserter, OutputIterator, Policies>(sink, n, p, p.force_sign(n));
      while (r && int(counting.count()) < Width) 
        r = karma::generate(sink, ' ');
    }
    else { // right align
      bool sign_val = false;
      if (Pad == '0' && traits::test_negative(n)) {
        n = -n;
        sign_val = true;
      }
      karma::detail::enable_buffering<OutputIterator> buffering(sink, Width);
      {
        karma::detail::disable_counting<OutputIterator> nocounting(sink);
        r = call_n<Inserter, OutputIterator, Policies>(sink, n, p, (Pad == '0' && p.force_sign(n)) ? false : p.force_sign(n));
      }
      buffering.disable();
      karma::detail::enable_counting<OutputIterator> counting(sink, buffering.buffer_size());
      if (sign_val || (Pad == '0' && p.force_sign(n)))
        r = sign_inserter::call(sink, (force_sign && !sign_val && BlankSign) || traits::test_zero(n), sign_val, p.force_sign(n));
      while (r && int(counting.count()) < Width) 
        r = karma::generate(sink, Pad);
      if (r) {
        buffering.buffer_copy();
      }
    }
    return r;
  }

  static bool force_sign(T) { return ForceSign; }
  static bool trailing_zeros(T) { return true; }
  static int floatfield(T) { return real_policies<T>::fmtflags::scientific; }
  static unsigned precision(T) { return Precision; }
  template <typename OutputIterator>
  static bool integer_part(OutputIterator &sink, T n, bool sign, bool force_sign)
  {
    return sign_inserter::call(sink, (force_sign && !sign && BlankSign) || traits::test_zero(n), sign, force_sign) &&
      int_inserter<10>::call(sink, n);
  }
  template <typename CharEncoding, typename Tag, typename OutputIterator>
  static bool exponent(OutputIterator &sink, long n)
  {
    long abs_n = traits::get_absolute_value(n);
    bool r = char_inserter<CharEncoding, Tag>::call(sink, 'e') &&
      sign_inserter::call(sink, /*traits::test_zero(n)*/false,
                          traits::test_negative(n), /*false*/true);
    if (r && abs_n < 10) // the C99 Standard requires at least two digits in the exponent
      r = char_inserter<CharEncoding, Tag>::call(sink, '0');
    return r && int_inserter<10>::call(sink, abs_n);
  }
};

template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false,
          int Width = 0, int Align = -1, char Pad = ' ',
          typename CE = unused_type, typename CC = unused_type>
struct fixed_real_policies : scientific_real_policies<T, Precision, ForceSign, BlankSign, Width, Align, Pad, CE, CC> {
  static int floatfield(T) { return real_policies<T>::fmtflags::fixed; }
  template <typename OutputIterator>
  static bool integer_part(OutputIterator &sink, T n, bool sign, bool force_sign)
  {
    return sign_inserter::call(sink, force_sign && !sign && BlankSign, sign, force_sign) &&
      int_inserter<10>::call(sink, n);
  }
};

template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false,
          int Width = 0, int Align = -1, char Pad = ' ',
          typename CE = unused_type, typename CC = unused_type>
struct alt_general_real_policies : fixed_real_policies<T, Precision, ForceSign, BlankSign, Width, Align, Pad, CE, CC> {
  static bool trailing_zeros(T) { return true; }
  static int floatfield(T n)
  {
    using namespace std;
    T abs_n = traits::get_absolute_value(n);
    return (!(floor(log10(abs_n)) < Precision) || abs_n < 1e-4) ?
      real_policies<T>::fmtflags::scientific : real_policies<T>::fmtflags::fixed;
  }
  static unsigned precision(T n)
  {
    using namespace std;
    T abs_n = traits::get_absolute_value(n), l;
    if ((l = floor(log10(1./abs_n))) > 1)
      return Precision + (l - 1);
    l = floor(log10(abs_n));
    return Precision > l ? Precision - (l + 1) : Precision - 1;
  }
};

template <typename T, unsigned Precision = 6, bool ForceSign = false, bool BlankSign = false,
          int Width = 0, int Align = -1, char Pad = ' ',
          typename CE = unused_type, typename CC = unused_type>
struct general_real_policies : alt_general_real_policies<T, Precision, ForceSign, BlankSign, Width, Align, Pad, CE, CC> {
  static bool trailing_zeros(T) { return false; }
  static int floatfield(T n)
  {
    if (traits::test_zero(n))
      return real_policies<T>::fmtflags::fixed;
    return alt_general_real_policies<T, Precision, ForceSign, BlankSign>::floatfield(n);
  }
  template <typename OutputIterator>
  static bool dot(OutputIterator &sink, T n, unsigned precision)
  {
    if (traits::test_zero(n))
      return true;
    return alt_general_real_policies<T, Precision, ForceSign, BlankSign>::dot(sink, n, precision);
  }
  template <typename OutputIterator>
  static bool fraction_part(OutputIterator &sink, T n, unsigned adjprec, unsigned precision)
  {
    if (traits::test_zero(n))
      return true;
    return fixed_real_policies<T, Precision, ForceSign, BlankSign>::fraction_part(sink, n, adjprec, precision);
  }
};

} } }
#endif

Boost Karma による C99 printf int

さて、Karma における符号付き整数のジェネレータである int_generator<T, Radix=10, force_sign=false> は、right_align(width, pad) ディレクティブとの組み合わせなどで、ほとんどの書式が可能であるが、以下のような書式の実現は困難である。

%04lld  0001
%+04lld +001
% 04lld  001

そして、int_policies のようなポリシークラスがあるわけではないので、書式を拡張するには int_generator そのものを継承する必要がある。

よって、C99 printf 書式指定のような符号付き整数の表示を可能とする c99_int_generator を用意してしまおう。これは以下のように使う。

#include <iostream>
#include <boost/range.hpp>
#include <boost/spirit/include/karma.hpp>
#include "boost-spirit-karma-int_generators.hpp"

namespace karma = boost::spirit::karma;
typedef karma::c99_int_generator<long long, 10, true, true, 6, 1, '0'> long_long_generator;

int main()
{
  long_long_generator long_long;
  long long v[] = { 0, 1, -100, };
  std::cout << karma::format(long_long % karma::lit(','),
                             boost::make_iterator_range(v, v+sizeof(v)/sizeof(v[0]))) << std::endl;
  return 0;
}

right_align(width, pad) を使うとどうしても以下のようになってしまう表示が、

# karma::right_align(6, '0')[karma::int_generator<long long, 10, true>()]
0000 0,0000+1,00-100

以下のように表示できる。

 00000, 00001,-00100

ここで使われているのは、以下の通りである。基本的には、浮動小数点でやったことと同様だ。

/*
  boost-spirit-karma-int_generators.hpp

  Copyright (C) 2013 Taiji Yamada <taiji@aihara.co.jp>

  Distributed under the Boost Software License, Version 1.0.
  (See http://www.boost.org/LICENSE_1_0.txt)
*/
#ifndef _boost_spirit_karma_int_generators_hpp_
#define _boost_spirit_karma_int_generators_hpp_

#include <boost/spirit/include/karma.hpp>

namespace boost { namespace spirit { namespace karma {

template <typename T, unsigned Radix = 10, bool ForceSign = false, bool BlankSign = false,
          int Width = 0, int Align = -1, char Pad = ' ',
          typename CharEncoding = unused_type, typename Tag = unused_type>
struct c99_int_generator : any_int_generator<T, CharEncoding, Tag, Radix, ForceSign> {
  typedef boost::mpl::int_<
    karma::generator_properties::countingbuffer
  > properties;

  template <typename OutputIterator, typename Attribute>
  static bool insert_int(OutputIterator &sink, Attribute const &attr, bool force_sign = ForceSign)
  {
    return sign_inserter::call(sink, (force_sign && !traits::test_negative(attr) && BlankSign),
                               traits::test_negative(attr), force_sign) &&
      int_inserter<Radix, CharEncoding, Tag>::call(sink, traits::get_absolute_value(attr));
  }

  template <typename OutputIterator, typename Context, typename Delimiter, typename Attribute>
  static bool generate(OutputIterator &sink, Context &context, Delimiter const &d, Attribute const &attr)
  {
    if (!traits::has_optional_value(attr))
      return false;
    if (!Width)
      return insert_int(sink, traits::extract_from<T>(attr, context), ForceSign) &&
        delimit_out(sink, d);
    bool r = false;
    if (Align < 0) { // left align
      karma::detail::enable_counting<OutputIterator> counting(sink);
      r = insert_int(sink, traits::extract_from<T>(attr, context), ForceSign) &&
        delimit_out(sink, d);
      while (r && int(counting.count()) < Width)
        r = karma::generate(sink, ' ');
    }
    else { // right align
      bool sign_val = false;
      Attribute n = attr;
      if (Pad == '0' && traits::test_negative(n)) {
        n = -n;
        sign_val = true;
      }
      karma::detail::enable_buffering<OutputIterator> buffering(sink, Width);
      {
        karma::detail::disable_counting<OutputIterator> nocounting(sink);
        r = insert_int(sink, traits::extract_from<T>(n, context), (Pad == '0' && ForceSign) ? false : ForceSign) &&
          delimit_out(sink, d);
      }
      buffering.disable();
      karma::detail::enable_counting<OutputIterator> counting(sink, buffering.buffer_size());
      if (sign_val || (Pad == '0' && ForceSign))
        r = sign_inserter::call(sink, (ForceSign && !sign_val && BlankSign), sign_val, ForceSign);
      while (r && int(counting.count()) < Width)
        r = karma::generate(sink, Pad);
      if (r) {
        buffering.buffer_copy();
      }
    }
    return r;
  }

};

} } }

#endif

符号無し整数ではこうした問題はない。


Boost Spirit V2+Universal Tree における POSIX shell スクリプトパーサの例

by taiji at 2013/05/14 13:22

以前の Boost Spirit V2+Universal Tree における計算機の例 では一行毎の構文解析であったが、複数行に渡る構文解析の例として、シェルスクリプト構文解析を段階的に実装してみよう。但し、先の Boost Spirit V2+Universal Tree における bash スクリプトパーサの例 では pre-skipping を前提としたが、ここでは明示的に skipping を指定しつつ、今度は POSIX shell の BNF を最終的に目指した例を示していく。

Spirit V2+Universal Tree の基本形

複数行に対応した Spirit V2+Universal Tree の基本形で pre-skipping を前提としないものは以下のようになる。

#include <iostream>
#include <string>
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/support_utree.hpp>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;
namespace spirit = boost::spirit;

namespace client {
  template <typename Iterator>
  struct a_parser : qi::grammar<Iterator, spirit::utree()> {
    qi::rule<Iterator, spirit::utree()> top;
    a_parser() : a_parser::base_type(top)
    {
      top = *~qi::lit('\t'); // ここのパーサ規則を書く

      BOOST_SPIRIT_DEBUG_NODE(top);
    }
  };
}
int main()
{
  client::a_parser<std::string::const_iterator> a_parser;
  std::string str;
  std::cin.unsetf(std::ios::skipws);
  std::copy(std::istream_iterator<char>(std::cin), std::istream_iterator<char>(), std::back_inserter(str));
  {
    std::string::const_iterator it = str.begin(), end = str.end();
    spirit::utree ut;
    bool r = phrase_parse(it, end, a_parser, qi::ascii::space/*, qi::skip_flag::dont_postskip*/, ut);
    if (r && it == end) {
      std::cout << "succeeded:\t" << ut << std::endl;
    }
    else {
      std::cout << "failed:\t" << std::string(it, end) << std::endl;
      return 1;
    }
  }
  return 0;
}

これを雛形としてシェルスクリプトの構文解析を肉付けしていこう。以降、Unified diff 形式でコードの変遷を表現するものとする。

コメント処理、コマンド行、変数代入、シングルクォート

例えば、以下のようなスクリプトを受理できるようなパーサ規則を書く。

#!/bin/sh
i=0
message='hello world'!
echo $i $message
LC_ALL=C time
--- a_multiline_parser+utree00.cc       2013-05-14 14:24:45.000000000 +0900
+++ xsh00.cc    2013-05-14 14:24:45.000000000 +0900
@@ -9,11 +9,43 @@
 
 namespace client {
   template <typename Iterator>
-  struct a_parser : qi::grammar<Iterator, spirit::utree()> {
-    qi::rule<Iterator, spirit::utree()> top;
-    a_parser() : a_parser::base_type(top)
+  struct sh_parser : qi::grammar<Iterator, spirit::utree()> {
+    qi::rule<Iterator, spirit::utree()> top,
+      variable_name,
+      single_quoted_string,
+      unquoted_string,
+      command_name,
+      command_arguments,
+      argument,
+      arguments,
+      assignment,
+      assignments,
+      nl,
+      nl_list,
+      comment;
+    sh_parser() : sh_parser::base_type(top)
     {
-      top = *~qi::lit('\t');
+      top = *(*nl_list >>
+              ((qi::omit[*qi::ascii::blank] >>
+                (assignments >> command_arguments | assignments | command_arguments)) %
+               nl_list) >> *nl_list || nl);
+
+      variable_name = qi::as_string[(qi::ascii::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_'))];
+      single_quoted_string = qi::as_string[qi::char_("'") >> *~qi::char_("'") >> qi::char_("'")];
+      unquoted_string = qi::as_string[+~qi::char_("\t\n \"#&'();<=>\\`|")];
+
+      command_name = +(single_quoted_string | unquoted_string);
+      command_arguments = command_name >> *(qi::omit[+qi::ascii::blank] >> argument);
+
+      argument = +(single_quoted_string | unquoted_string);
+      arguments = argument >> *(qi::omit[+qi::ascii::blank] >> argument);
+
+      assignment = variable_name >> '=' >> -argument;
+      assignments = assignment >> *(!qi::eps | qi::omit[+qi::ascii::blank] >> assignment);
+
+      nl = qi::omit[*comment] >> qi::eol;
+      nl_list = +(qi::omit[*comment] >> qi::eol);
+      comment = qi::as_string[qi::skip(qi::ascii::blank)['#'] >> *(~qi::char_("\r\n"))];
 
       BOOST_SPIRIT_DEBUG_NODE(top);
     }
@@ -21,20 +53,20 @@
 }
 int main()
 {
-  client::a_parser<std::string::const_iterator> a_parser;
+  client::sh_parser<std::string::const_iterator> sh_parser;
   std::string str;
   std::cin.unsetf(std::ios::skipws);
   std::copy(std::istream_iterator<char>(std::cin), std::istream_iterator<char>(), std::back_inserter(str));
   {
     std::string::const_iterator it = str.begin(), end = str.end();
     spirit::utree ut;
-    bool r = phrase_parse(it, end, a_parser, qi::ascii::space/*, qi::skip_flag::dont_postskip*/, ut);
+    bool r = phrase_parse(it, end, sh_parser, qi::ascii::space/*, qi::skip_flag::dont_postskip*/, ut);
+    std::cout << str << std::endl;
     if (r && it == end) {
       std::cout << "succeeded:\t" << ut << std::endl;
     }
     else {
       std::cout << "failed:\t" << std::string(it, end) << std::endl;
-      return 1;
     }
   }
   return 0;

これでコメント処理、コマンド行、変数代入、シングルクォートされた任意の文字列を含む文字列や、空白区切りによる文字列のリストのパースが可能となっている。

ダブルクォート

次に以下のような、ダブルクォートされた文字列を受理できるようなパーサ規則を書く。

message="hello \"world\"!"
echo "$message"
--- xsh00.cc    2013-05-14 14:24:45.000000000 +0900
+++ xsh01.cc    2013-05-14 14:24:45.000000000 +0900
@@ -13,6 +13,7 @@
     qi::rule<Iterator, spirit::utree()> top,
       variable_name,
       single_quoted_string,
+      double_quoted_string,
       unquoted_string,
       command_name,
       command_arguments,
@@ -32,12 +33,16 @@
 
       variable_name = qi::as_string[(qi::ascii::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_'))];
       single_quoted_string = qi::as_string[qi::char_("'") >> *~qi::char_("'") >> qi::char_("'")];
+      double_quoted_string = qi::as_string[qi::char_('"') >> *(
+                                                               (qi::lit('\\') >> qi::char_) | (!qi::lit('\\') >> ~qi::lit('"'))
+                                                               ) >>
+                                           qi::char_('"')];
       unquoted_string = qi::as_string[+~qi::char_("\t\n \"#&'();<=>\\`|")];
 
-      command_name = +(single_quoted_string | unquoted_string);
+      command_name = +(single_quoted_string | double_quoted_string | unquoted_string);
       command_arguments = command_name >> *(qi::omit[+qi::ascii::blank] >> argument);
 
-      argument = +(single_quoted_string | unquoted_string);
+      argument = +(single_quoted_string | double_quoted_string | unquoted_string);
       arguments = argument >> *(qi::omit[+qi::ascii::blank] >> argument);
 
       assignment = variable_name >> '=' >> -argument;

これでバックスラッシュでエスケープされていないダブルクォーテーションでのみ閉じられたダブルクォート文字列のパースが可能となっている。

ヒアドキュメント

次に、以下のようなヒアドキュメントの入力を受理するようなパーサ規則を書くが、まずは終端文字列を「EOH」に固定したものから試みる。

cat <<EOH
usage:
        $0 < filename
EOH

ちなみに、ヒアドキュメントでタブを除去する「<<-」のときには真、さもなくば偽を付加するものとする。

--- xsh01.cc    2013-05-14 14:24:45.000000000 +0900
+++ xsh02.cc    2013-05-14 14:24:45.000000000 +0900
@@ -21,14 +21,19 @@
       arguments,
       assignment,
       assignments,
+      io_here,
       nl,
       nl_list,
       comment;
+    qi::rule<Iterator, spirit::utree()> DLESS, MINUS;
     sh_parser() : sh_parser::base_type(top)
     {
+      DLESS = qi::as<std::string>()["<<"];
+      MINUS = '-';
+
       top = *(*nl_list >>
               ((qi::omit[*qi::ascii::blank] >>
-                (assignments >> command_arguments | assignments | command_arguments)) %
+                (assignments >> command_arguments | assignments | command_arguments) >> -io_here) %
                nl_list) >> *nl_list || nl);
 
       variable_name = qi::as_string[(qi::ascii::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_'))];
@@ -48,6 +53,12 @@
       assignment = variable_name >> '=' >> -argument;
       assignments = assignment >> *(!qi::eps | qi::omit[+qi::ascii::blank] >> assignment);
 
+      io_here = qi::skip(qi::ascii::blank)[DLESS] >> qi::matches[MINUS] >>
+        qi::lit("EOH") >> nl >>
+        (*(qi::as_string[+(qi::char_ - (qi::eol >> qi::lit("EOH") >> qi::eol))]) >>
+         qi::omit[qi::eol >> qi::lit("EOH")] |
+         qi::omit[qi::lit("EOH")]);
+
       nl = qi::omit[*comment] >> qi::eol;
       nl_list = +(qi::omit[*comment] >> qi::eol);
       comment = qi::as_string[qi::skip(qi::ascii::blank)['#'] >> *(~qi::char_("\r\n"))];

このままでは「EOH」以外の終端文字列には対応できていない。よって、io_here にローカル変数 qi::locals<std::string> を使用可能にして、それを参照するようにする。

--- xsh02.cc    2013-05-14 14:24:45.000000000 +0900
+++ xsh03.cc    2013-05-14 14:24:45.000000000 +0900
@@ -3,14 +3,15 @@
 //#define BOOST_SPIRIT_DEBUG
 #include <boost/spirit/include/support_utree.hpp>
 #include <boost/spirit/include/qi.hpp>
+#include <boost/spirit/include/phoenix.hpp>
 
 namespace qi = boost::spirit::qi;
 namespace spirit = boost::spirit;
 
 namespace client {
   template <typename Iterator>
-  struct sh_parser : qi::grammar<Iterator, spirit::utree()> {
-    qi::rule<Iterator, spirit::utree()> top,
+  struct sh_parser : qi::grammar<Iterator, spirit::utree(), qi::locals<std::string> > {
+    qi::rule<Iterator, spirit::utree(), qi::locals<std::string> > top,
       variable_name,
       single_quoted_string,
       double_quoted_string,
@@ -53,11 +54,11 @@
       assignment = variable_name >> '=' >> -argument;
       assignments = assignment >> *(!qi::eps | qi::omit[+qi::ascii::blank] >> assignment);
 
-      io_here = qi::skip(qi::ascii::blank)[DLESS] >> qi::matches[MINUS] >>
-        qi::lit("EOH") >> nl >>
-        (*(qi::as_string[+(qi::char_ - (qi::eol >> qi::lit("EOH") >> qi::eol))]) >>
-         qi::omit[qi::eol >> qi::lit("EOH")] |
-         qi::omit[qi::lit("EOH")]);
+      io_here %= qi::skip(qi::ascii::blank)[DLESS] >> qi::matches[MINUS] >>
+        qi::as_string[(+(qi::char_ - qi::ascii::space))][qi::labels::_a = qi::labels::_1] >> nl >>
+        (*(qi::as_string[+(qi::char_ - (qi::eol >> qi::string(qi::labels::_a) >> qi::eol))]) >>
+         qi::omit[qi::eol >> qi::string(qi::labels::_a)] |
+         qi::omit[qi::string(qi::labels::_a)]);
 
       nl = qi::omit[*comment] >> qi::eol;
       nl_list = +(qi::omit[*comment] >> qi::eol);

これで任意の終端文字列でヒアドキュメントが受理できるようになる。さらに、以下のような変数展開を抑止するヒアドキュメントにも対応しよう。

cat <<'EOF'
$1 $2 ...
EOF
--- xsh03.cc    2013-05-14 14:24:45.000000000 +0900
+++ xsh04.cc    2013-05-14 14:24:45.000000000 +0900
@@ -26,6 +26,7 @@
       nl,
       nl_list,
       comment;
+    qi::rule<Iterator, std::string()> here_end, quoted_here_end, unquoted_here_end;
     qi::rule<Iterator, spirit::utree()> DLESS, MINUS;
     sh_parser() : sh_parser::base_type(top)
     {
@@ -54,8 +55,12 @@
       assignment = variable_name >> '=' >> -argument;
       assignments = assignment >> *(!qi::eps | qi::omit[+qi::ascii::blank] >> assignment);
 
+      quoted_here_end = '\'' >> qi::as_string[(+(qi::char_ - '\''))][qi::labels::_val = qi::labels::_1] >> '\'';
+      unquoted_here_end %= qi::as_string[(+(qi::char_ - qi::ascii::space))];
+      here_end %= quoted_here_end | unquoted_here_end;
+
       io_here %= qi::skip(qi::ascii::blank)[DLESS] >> qi::matches[MINUS] >>
-        qi::as_string[(+(qi::char_ - qi::ascii::space))][qi::labels::_a = qi::labels::_1] >> nl >>
+        here_end[qi::labels::_a = qi::labels::_1] >> nl >>
         (*(qi::as_string[+(qi::char_ - (qi::eol >> qi::string(qi::labels::_a) >> qi::eol))]) >>
          qi::omit[qi::eol >> qi::string(qi::labels::_a)] |
          qi::omit[qi::string(qi::labels::_a)]);

ちなみに規則で「=」の代わりに「%=」とすると「_val = _1」を省略できる。

リダイレクション

次に、以下のようなリダイレクションを受理するパーサ規則を書く。

sort < filename
printf '' > filename
--- xsh04.cc    2013-05-14 14:24:45.000000000 +0900
+++ xsh05.cc    2013-05-14 14:24:45.000000000 +0900
@@ -23,19 +23,43 @@
       assignment,
       assignments,
       io_here,
+      filename,
+      io_redirect,
+      io_file,
       nl,
       nl_list,
       comment;
     qi::rule<Iterator, std::string()> here_end, quoted_here_end, unquoted_here_end;
-    qi::rule<Iterator, spirit::utree()> DLESS, MINUS;
+    qi::rule<Iterator, spirit::utree()> IO_NUMBER;
+    qi::rule<Iterator, spirit::utree()> AMP, VLINE, SEMI;
+    qi::rule<Iterator, spirit::utree()> AND_IF, OR_IF, DSEMI;
+    qi::rule<Iterator, spirit::utree()> LESS, GREAT, DLESS, DGREAT, LESSAND, GREATAND, LESSGREAT, CLOBBER, ANDGREAT, MINUS;
     sh_parser() : sh_parser::base_type(top)
     {
+      IO_NUMBER = +qi::digit;
+
+      AMP = '&';
+      VLINE = '|';
+      SEMI = ';';
+
+      AND_IF = qi::as<std::string>()["&&"];
+      OR_IF = qi::as<std::string>()["||"];
+      DSEMI = qi::as<std::string>()[";;"];
+
+      LESS = '<';
+      GREAT = '>';
       DLESS = qi::as<std::string>()["<<"];
+      DGREAT = qi::as<std::string>()[">>"];
+      LESSAND = qi::as<std::string>()["<&"];
+      GREATAND = qi::as<std::string>()[">&"];
+      LESSGREAT = qi::as<std::string>()["<>"];
+      CLOBBER = qi::as<std::string>()[">|"];
+      ANDGREAT = qi::as<std::string>()["&>"];
       MINUS = '-';
 
       top = *(*nl_list >>
               ((qi::omit[*qi::ascii::blank] >>
-                (assignments >> command_arguments | assignments | command_arguments) >> -io_here) %
+                (assignments >> command_arguments | assignments | command_arguments) >> *io_redirect) %
                nl_list) >> *nl_list || nl);
 
       variable_name = qi::as_string[(qi::ascii::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_'))];
@@ -65,6 +89,11 @@
          qi::omit[qi::eol >> qi::string(qi::labels::_a)] |
          qi::omit[qi::string(qi::labels::_a)]);
 
+      filename = qi::omit[*qi::ascii::blank] >> argument;
+
+      io_redirect = io_file | io_here;
+      io_file = qi::skip(qi::ascii::blank)[LESS | LESSAND | GREAT | GREATAND | DGREAT | LESSGREAT | CLOBBER] >> filename;
+
       nl = qi::omit[*comment] >> qi::eol;
       nl_list = +(qi::omit[*comment] >> qi::eol);
       comment = qi::as_string[qi::skip(qi::ascii::blank)['#'] >> *(~qi::char_("\r\n"))];

これでリダイレクションの構文を受理できるようになる。

パイプライン

次に、以下のようなパイプラインを受理できるパーサ規則を書く。

echo | cat | sort | uniq
--- xsh05.cc    2013-05-14 14:24:46.000000000 +0900
+++ xsh06.cc    2013-05-14 14:24:46.000000000 +0900
@@ -24,6 +24,7 @@
       assignments,
       io_here,
       filename,
+      pipeline,
       io_redirect,
       io_file,
       nl,
@@ -57,10 +58,7 @@
       ANDGREAT = qi::as<std::string>()["&>"];
       MINUS = '-';
 
-      top = *(*nl_list >>
-              ((qi::omit[*qi::ascii::blank] >>
-                (assignments >> command_arguments | assignments | command_arguments) >> *io_redirect) %
-               nl_list) >> *nl_list || nl);
+      top = *(*nl >> (pipeline % nl_list) >> *nl || nl);
 
       variable_name = qi::as_string[(qi::ascii::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_'))];
       single_quoted_string = qi::as_string[qi::char_("'") >> *~qi::char_("'") >> qi::char_("'")];
@@ -91,6 +89,10 @@
 
       filename = qi::omit[*qi::ascii::blank] >> argument;
 
+      pipeline = (qi::omit[*qi::ascii::blank] >>
+                  (assignments >> command_arguments | assignments | command_arguments) >> *io_redirect) %
+        (qi::skip(qi::ascii::blank)[VLINE] >> *nl);
+
       io_redirect = io_file | io_here;
       io_file = qi::skip(qi::ascii::blank)[LESS | LESSAND | GREAT | GREATAND | DGREAT | LESSGREAT | CLOBBER] >> filename;
 

しかしこれでは、以下のようなヒアドキュメント直後のパイプラインには未対応である。

cat <<EOF | sort
        :
EOF

よって、以下のように pipe_here 規則を io_here 規則に挿入する。

--- xsh06.cc    2013-05-14 14:24:46.000000000 +0900
+++ xsh07.cc    2013-05-14 14:24:46.000000000 +0900
@@ -22,6 +22,7 @@
       arguments,
       assignment,
       assignments,
+      pipe_here,
       io_here,
       filename,
       pipeline,
@@ -77,12 +78,14 @@
       assignment = variable_name >> '=' >> -argument;
       assignments = assignment >> *(!qi::eps | qi::omit[+qi::ascii::blank] >> assignment);
 
+      pipe_here = qi::skip(qi::ascii::blank)[VLINE] >> *pipeline;
+
       quoted_here_end = '\'' >> qi::as_string[(+(qi::char_ - '\''))][qi::labels::_val = qi::labels::_1] >> '\'';
       unquoted_here_end %= qi::as_string[(+(qi::char_ - qi::ascii::space))];
       here_end %= quoted_here_end | unquoted_here_end;
 
       io_here %= qi::skip(qi::ascii::blank)[DLESS] >> qi::matches[MINUS] >>
-        here_end[qi::labels::_a = qi::labels::_1] >> nl >>
+        here_end[qi::labels::_a = qi::labels::_1] >> -pipe_here >> nl >>
         (*(qi::as_string[+(qi::char_ - (qi::eol >> qi::string(qi::labels::_a) >> qi::eol))]) >>
          qi::omit[qi::eol >> qi::string(qi::labels::_a)] |
          qi::omit[qi::string(qi::labels::_a)]);

これでパイプラインの構文を受理できるようになる。

リスト

次に、以下のようなリストを受理できるようなパーサ規則を書く。

true && echo yes || echo no
--- xsh07.cc    2013-05-14 14:24:46.000000000 +0900
+++ xsh08.cc    2013-05-14 14:24:46.000000000 +0900
@@ -26,6 +26,7 @@
       io_here,
       filename,
       pipeline,
+      and_or,
       io_redirect,
       io_file,
       nl,
@@ -59,7 +60,7 @@
       ANDGREAT = qi::as<std::string>()["&>"];
       MINUS = '-';
 
-      top = *(*nl >> (pipeline % nl_list) >> *nl || nl);
+      top = *(*nl >> (and_or % nl_list) >> *nl || nl);
 
       variable_name = qi::as_string[(qi::ascii::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_'))];
       single_quoted_string = qi::as_string[qi::char_("'") >> *~qi::char_("'") >> qi::char_("'")];
@@ -95,6 +96,7 @@
       pipeline = (qi::omit[*qi::ascii::blank] >>
                   (assignments >> command_arguments | assignments | command_arguments) >> *io_redirect) %
         (qi::skip(qi::ascii::blank)[VLINE] >> *nl);
+      and_or = pipeline % (qi::skip(qi::ascii::blank)[AND_IF | OR_IF] >> *nl);
 
       io_redirect = io_file | io_here;
       io_file = qi::skip(qi::ascii::blank)[LESS | LESSAND | GREAT | GREATAND | DGREAT | LESSGREAT | CLOBBER] >> filename;

さらに、次のようなリストを受理できるようなパーサ規則を書く。

echo & i=0; LC_ALL=C date
--- xsh08.cc    2013-05-14 14:24:46.000000000 +0900
+++ xsh09.cc    2013-05-14 14:24:46.000000000 +0900
@@ -25,8 +25,11 @@
       pipe_here,
       io_here,
       filename,
+      separator_op,
+      separator,
       pipeline,
       and_or,
+      list,
       io_redirect,
       io_file,
       nl,
@@ -60,7 +63,7 @@
       ANDGREAT = qi::as<std::string>()["&>"];
       MINUS = '-';
 
-      top = *(*nl >> (and_or % nl_list) >> *nl || nl);
+      top = *(list || nl);
 
       variable_name = qi::as_string[(qi::ascii::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_'))];
       single_quoted_string = qi::as_string[qi::char_("'") >> *~qi::char_("'") >> qi::char_("'")];
@@ -93,10 +96,14 @@
 
       filename = qi::omit[*qi::ascii::blank] >> argument;
 
+      separator_op = qi::skip(qi::ascii::blank)[AMP | SEMI];
+      separator = (separator_op >> *nl | nl_list);
+
       pipeline = (qi::omit[*qi::ascii::blank] >>
                   (assignments >> command_arguments | assignments | command_arguments) >> *io_redirect) %
         (qi::skip(qi::ascii::blank)[VLINE] >> *nl);
       and_or = pipeline % (qi::skip(qi::ascii::blank)[AND_IF | OR_IF] >> *nl);
+      list = and_or % separator_op;
 
       io_redirect = io_file | io_here;
       io_file = qi::skip(qi::ascii::blank)[LESS | LESSAND | GREAT | GREATAND | DGREAT | LESSGREAT | CLOBBER] >> filename;

これでリスト構文を受理できるようになる。

サブシェル、グループ

次に、以下のようなサブシェルやグループを受理できるようなパーサ規則を書く。

true && (echo yes) || (echo no)
true && {
  echo yes
} || {
  echo no
}

また、ついでにリダイレクションにてファイルデスクリプタ番号に対応させておく。

--- xsh09.cc    2013-05-14 14:24:46.000000000 +0900
+++ xsh10.cc    2013-05-14 14:24:46.000000000 +0900
@@ -30,8 +30,11 @@
       pipeline,
       and_or,
       list,
+      term,
+      compound_list,
       io_redirect,
       io_file,
+      command,
       nl,
       nl_list,
       comment;
@@ -40,6 +43,7 @@
     qi::rule<Iterator, spirit::utree()> AMP, VLINE, SEMI;
     qi::rule<Iterator, spirit::utree()> AND_IF, OR_IF, DSEMI;
     qi::rule<Iterator, spirit::utree()> LESS, GREAT, DLESS, DGREAT, LESSAND, GREATAND, LESSGREAT, CLOBBER, ANDGREAT, MINUS;
+    qi::rule<Iterator, spirit::utree()> If, Then, Elif, Else, Fi, For, While, Until, Do, Done, Case, Esac, In, Lparen, Rparen, Lbrace, Rbrace, Parens, Bang;
     sh_parser() : sh_parser::base_type(top)
     {
       IO_NUMBER = +qi::digit;
@@ -63,7 +67,27 @@
       ANDGREAT = qi::as<std::string>()["&>"];
       MINUS = '-';
 
-      top = *(list || nl);
+      If = qi::as<std::string>()["if"];
+      Then = qi::as<std::string>()["then"];
+      Elif = qi::as<std::string>()["elif"];
+      Else = qi::as<std::string>()["else"];
+      Fi = qi::as<std::string>()["fi"];
+      For = qi::as<std::string>()["for"];
+      While = qi::as<std::string>()["while"];
+      Until = qi::as<std::string>()["until"];
+      Do = qi::as<std::string>()["do"];
+      Done = qi::as<std::string>()["done"];
+      Case = qi::as<std::string>()["case"];
+      Esac = qi::as<std::string>()["esac"];
+      In = qi::as<std::string>()["in"];
+      Lparen = '(';
+      Rparen = ')';
+      Lbrace = '{';
+      Rbrace = '}';
+      Parens = qi::as<std::string>()["()"];
+      Bang = '!';
+
+      top = *compound_list;
 
       variable_name = qi::as_string[(qi::ascii::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_'))];
       single_quoted_string = qi::as_string[qi::char_("'") >> *~qi::char_("'") >> qi::char_("'")];
@@ -99,15 +123,23 @@
       separator_op = qi::skip(qi::ascii::blank)[AMP | SEMI];
       separator = (separator_op >> *nl | nl_list);
 
-      pipeline = (qi::omit[*qi::ascii::blank] >>
-                  (assignments >> command_arguments | assignments | command_arguments) >> *io_redirect) %
+      pipeline = command %
         (qi::skip(qi::ascii::blank)[VLINE] >> *nl);
       and_or = pipeline % (qi::skip(qi::ascii::blank)[AND_IF | OR_IF] >> *nl);
       list = and_or % separator_op;
+      term = and_or % separator;
+      compound_list = *nl >> term >> -separator;
 
-      io_redirect = io_file | io_here;
+      io_redirect = -IO_NUMBER >> (io_file | io_here);
       io_file = qi::skip(qi::ascii::blank)[LESS | LESSAND | GREAT | GREATAND | DGREAT | LESSGREAT | CLOBBER] >> filename;
 
+      command = qi::omit[*qi::ascii::blank] >>
+        (
+         (assignments >> command_arguments | assignments | command_arguments) |
+         qi::skip(qi::ascii::space)[Lbrace] >> compound_list >> qi::skip(qi::ascii::space)[Rbrace] |
+         qi::skip(qi::ascii::space)[Lparen] >> compound_list >> qi::skip(qi::ascii::space)[Rparen]
+         ) >> *io_redirect;
+
       nl = qi::omit[*comment] >> qi::eol;
       nl_list = +(qi::omit[*comment] >> qi::eol);
       comment = qi::as_string[qi::skip(qi::ascii::blank)['#'] >> *(~qi::char_("\r\n"))];

ここまで受理できるようになったので、残りは if などの制御構文のみとなった。

制御構文

if then elif else fi,

for in do done,

while do done,

until do done,

case in ) ;; esac,

function()

の制御構文を受理できるパーサ規則を書く。

--- xsh10.cc    2013-05-14 14:24:46.000000000 +0900
+++ xsh.cc      2013-05-14 14:24:46.000000000 +0900
@@ -24,9 +24,13 @@
       assignments,
       pipe_here,
       io_here,
+      name,
+      word,
+      wordlist,
       filename,
       separator_op,
       separator,
+      sequential_sep,
       pipeline,
       and_or,
       list,
@@ -34,6 +38,13 @@
       compound_list,
       io_redirect,
       io_file,
+      do_group,
+      for_clause,
+      case_item, case_item_ns, pattern,
+      case_clause,
+      if_clause,
+      compound_command,
+      function_definition,
       command,
       nl,
       nl_list,
@@ -97,7 +108,8 @@
                                            qi::char_('"')];
       unquoted_string = qi::as_string[+~qi::char_("\t\n \"#&'();<=>\\`|")];
 
-      command_name = +(single_quoted_string | double_quoted_string | unquoted_string);
+      command_name = +(single_quoted_string | double_quoted_string | unquoted_string) -
+        (/*Bang | */Lbrace | Rbrace/* | Lparen | Rparen*/ | Case | Do | Done | Elif | Else | Esac | Fi | For | If | In | Then | Until | While);
       command_arguments = command_name >> *(qi::omit[+qi::ascii::blank] >> argument);
 
       argument = +(single_quoted_string | double_quoted_string | unquoted_string);
@@ -118,12 +130,16 @@
          qi::omit[qi::eol >> qi::string(qi::labels::_a)] |
          qi::omit[qi::string(qi::labels::_a)]);
 
+      name = qi::omit[+qi::ascii::blank] >> variable_name;
+      word = qi::omit[+qi::ascii::blank] >> argument;
+      wordlist = qi::omit[+qi::ascii::blank] >> arguments;
       filename = qi::omit[*qi::ascii::blank] >> argument;
 
-      separator_op = qi::skip(qi::ascii::blank)[AMP | SEMI];
+      separator_op = qi::skip(qi::ascii::blank)[AMP | !DSEMI >> SEMI];
       separator = (separator_op >> *nl | nl_list);
+      sequential_sep = qi::skip(qi::ascii::blank)[/*!DSEMI >> */SEMI] >> *nl | nl_list;
 
-      pipeline = command %
+      pipeline = -Bang >> command %
         (qi::skip(qi::ascii::blank)[VLINE] >> *nl);
       and_or = pipeline % (qi::skip(qi::ascii::blank)[AND_IF | OR_IF] >> *nl);
       list = and_or % separator_op;
@@ -133,11 +149,47 @@
       io_redirect = -IO_NUMBER >> (io_file | io_here);
       io_file = qi::skip(qi::ascii::blank)[LESS | LESSAND | GREAT | GREATAND | DGREAT | LESSGREAT | CLOBBER] >> filename;
 
+      do_group = qi::skip(qi::ascii::space)[Do] >> compound_list >> qi::skip(qi::ascii::space)[Done];
+      for_clause = qi::skip(qi::ascii::blank)[For] >> name >>
+        (SEMI >> *nl | *nl >> -(qi::skip(qi::ascii::blank)[In] >> -wordlist >> -sequential_sep)) >> do_group;
+      pattern = argument >> *(qi::omit[VLINE] >> argument);
+
+      case_item = !qi::eps | -qi::skip(qi::ascii::blank)['('] >> pattern >> qi::lit(')') >>
+        (compound_list | *nl) >> qi::skip(qi::ascii::space)[DSEMI] >> *nl;
+      case_item_ns = !qi::eps | -qi::skip(qi::ascii::blank)['('] >> pattern >> qi::lit(')') >>
+        -compound_list >> *nl;
+
+      case_clause = qi::skip(qi::ascii::blank)[Case] >> word >> qi::omit[+qi::ascii::blank] >> *nl >>
+        qi::skip(qi::ascii::space)[In] >> *nl >>
+        *case_item >> -case_item_ns >>
+        qi::skip(qi::ascii::space)[Esac];
+
+      if_clause = qi::skip(qi::ascii::blank)[If] >> compound_list >>
+        qi::skip(qi::ascii::blank)[Then] >> compound_list >>
+        *(
+          qi::skip(qi::ascii::blank)[Elif] >> compound_list >>
+          qi::skip(qi::ascii::blank)[Then] >> compound_list
+          ) >>
+        -(
+          qi::skip(qi::ascii::blank)[Else] >> compound_list
+          ) >>
+        qi::skip(qi::ascii::space)[Fi];
+
+      compound_command = 
+        qi::skip(qi::ascii::space)[Lbrace] >> compound_list >> qi::skip(qi::ascii::space)[Rbrace] |
+        qi::skip(qi::ascii::space)[Lparen] >> compound_list >> qi::skip(qi::ascii::space)[Rparen] |
+        for_clause |
+        case_clause |
+        if_clause |
+        qi::skip(qi::ascii::blank)[While] >> compound_list >> do_group |
+        qi::skip(qi::ascii::blank)[Until] >> compound_list >> do_group;
+      function_definition =
+        variable_name >> *qi::ascii::blank >> Parens >> *nl >> compound_command;
       command = qi::omit[*qi::ascii::blank] >>
         (
+         function_definition |
          (assignments >> command_arguments | assignments | command_arguments) |
-         qi::skip(qi::ascii::space)[Lbrace] >> compound_list >> qi::skip(qi::ascii::space)[Rbrace] |
-         qi::skip(qi::ascii::space)[Lparen] >> compound_list >> qi::skip(qi::ascii::space)[Rparen]
+         compound_command
          ) >> *io_redirect;
 
       nl = qi::omit[*comment] >> qi::eol;

これで POSIX シェルスクリプト構文を受理できるようになった。

そしてこれは Shell Command Language 記載の BNF を Boost Spirit Qi で書き直したものになっている。最後に、全体像としてすべてのコードを載せておく。ちなみに、!qi::eps というのが稀に必要となっている箇所があるが、これがないと属性が重複してしまう現象を防いでいる。

#include <iostream>
#include <string>
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/support_utree.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace qi = boost::spirit::qi;
namespace spirit = boost::spirit;

namespace client {
  template <typename Iterator>
  struct sh_parser : qi::grammar<Iterator, spirit::utree(), qi::locals<std::string> > {
    qi::rule<Iterator, spirit::utree(), qi::locals<std::string> > top,
      variable_name,
      single_quoted_string,
      double_quoted_string,
      unquoted_string,
      command_name,
      command_arguments,
      argument,
      arguments,
      assignment,
      assignments,
      pipe_here,
      io_here,
      name,
      word,
      wordlist,
      filename,
      separator_op,
      separator,
      sequential_sep,
      pipeline,
      and_or,
      list,
      term,
      compound_list,
      io_redirect,
      io_file,
      do_group,
      for_clause,
      case_item, case_item_ns, pattern,
      case_clause,
      if_clause,
      compound_command,
      function_definition,
      command,
      nl,
      nl_list,
      comment;
    qi::rule<Iterator, std::string()> here_end, quoted_here_end, unquoted_here_end;
    qi::rule<Iterator, spirit::utree()> IO_NUMBER;
    qi::rule<Iterator, spirit::utree()> AMP, VLINE, SEMI;
    qi::rule<Iterator, spirit::utree()> AND_IF, OR_IF, DSEMI;
    qi::rule<Iterator, spirit::utree()> LESS, GREAT, DLESS, DGREAT, LESSAND, GREATAND, LESSGREAT, CLOBBER, ANDGREAT, MINUS;
    qi::rule<Iterator, spirit::utree()> If, Then, Elif, Else, Fi, For, While, Until, Do, Done, Case, Esac, In, Lparen, Rparen, Lbrace, Rbrace, Parens, Bang;
    sh_parser() : sh_parser::base_type(top)
    {
      IO_NUMBER = +qi::digit;

      AMP = '&';
      VLINE = '|';
      SEMI = ';';

      AND_IF = qi::as<std::string>()["&&"];
      OR_IF = qi::as<std::string>()["||"];
      DSEMI = qi::as<std::string>()[";;"];

      LESS = '<';
      GREAT = '>';
      DLESS = qi::as<std::string>()["<<"];
      DGREAT = qi::as<std::string>()[">>"];
      LESSAND = qi::as<std::string>()["<&"];
      GREATAND = qi::as<std::string>()[">&"];
      LESSGREAT = qi::as<std::string>()["<>"];
      CLOBBER = qi::as<std::string>()[">|"];
      ANDGREAT = qi::as<std::string>()["&>"];
      MINUS = '-';

      If = qi::as<std::string>()["if"];
      Then = qi::as<std::string>()["then"];
      Elif = qi::as<std::string>()["elif"];
      Else = qi::as<std::string>()["else"];
      Fi = qi::as<std::string>()["fi"];
      For = qi::as<std::string>()["for"];
      While = qi::as<std::string>()["while"];
      Until = qi::as<std::string>()["until"];
      Do = qi::as<std::string>()["do"];
      Done = qi::as<std::string>()["done"];
      Case = qi::as<std::string>()["case"];
      Esac = qi::as<std::string>()["esac"];
      In = qi::as<std::string>()["in"];
      Lparen = '(';
      Rparen = ')';
      Lbrace = '{';
      Rbrace = '}';
      Parens = qi::as<std::string>()["()"];
      Bang = '!';

      top = *compound_list;

      variable_name = qi::as_string[(qi::ascii::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_'))];
      single_quoted_string = qi::as_string[qi::char_("'") >> *~qi::char_("'") >> qi::char_("'")];
      double_quoted_string = qi::as_string[qi::char_('"') >> *(
                                                               (qi::lit('\\') >> qi::char_) | (!qi::lit('\\') >> ~qi::lit('"'))
                                                               ) >>
                                           qi::char_('"')];
      unquoted_string = qi::as_string[+~qi::char_("\t\n \"#&'();<=>\\`|")];

      command_name = +(single_quoted_string | double_quoted_string | unquoted_string) -
        (/*Bang | */Lbrace | Rbrace/* | Lparen | Rparen*/ | Case | Do | Done | Elif | Else | Esac | Fi | For | If | In | Then | Until | While);
      command_arguments = command_name >> *(qi::omit[+qi::ascii::blank] >> argument);

      argument = +(single_quoted_string | double_quoted_string | unquoted_string);
      arguments = argument >> *(qi::omit[+qi::ascii::blank] >> argument);

      assignment = variable_name >> '=' >> -argument;
      assignments = assignment >> *(!qi::eps | qi::omit[+qi::ascii::blank] >> assignment);

      pipe_here = qi::skip(qi::ascii::blank)[VLINE] >> *pipeline;

      quoted_here_end = '\'' >> qi::as_string[(+(qi::char_ - '\''))][qi::labels::_val = qi::labels::_1] >> '\'';
      unquoted_here_end %= qi::as_string[(+(qi::char_ - qi::ascii::space))];
      here_end %= quoted_here_end | unquoted_here_end;

      io_here %= qi::skip(qi::ascii::blank)[DLESS] >> qi::matches[MINUS] >>
        here_end[qi::labels::_a = qi::labels::_1] >> -pipe_here >> nl >>
        (*(qi::as_string[+(qi::char_ - (qi::eol >> qi::string(qi::labels::_a) >> qi::eol))]) >>
         qi::omit[qi::eol >> qi::string(qi::labels::_a)] |
         qi::omit[qi::string(qi::labels::_a)]);

      name = qi::omit[+qi::ascii::blank] >> variable_name;
      word = qi::omit[+qi::ascii::blank] >> argument;
      wordlist = qi::omit[+qi::ascii::blank] >> arguments;
      filename = qi::omit[*qi::ascii::blank] >> argument;

      separator_op = qi::skip(qi::ascii::blank)[AMP | !DSEMI >> SEMI];
      separator = (separator_op >> *nl | nl_list);
      sequential_sep = qi::skip(qi::ascii::blank)[/*!DSEMI >> */SEMI] >> *nl | nl_list;

      pipeline = -Bang >> command %
        (qi::skip(qi::ascii::blank)[VLINE] >> *nl);
      and_or = pipeline % (qi::skip(qi::ascii::blank)[AND_IF | OR_IF] >> *nl);
      list = and_or % separator_op;
      term = and_or % separator;
      compound_list = *nl >> term >> -separator;

      io_redirect = -IO_NUMBER >> (io_file | io_here);
      io_file = qi::skip(qi::ascii::blank)[LESS | LESSAND | GREAT | GREATAND | DGREAT | LESSGREAT | CLOBBER] >> filename;

      do_group = qi::skip(qi::ascii::space)[Do] >> compound_list >> qi::skip(qi::ascii::space)[Done];
      for_clause = qi::skip(qi::ascii::blank)[For] >> name >>
        (SEMI >> *nl | *nl >> -(qi::skip(qi::ascii::blank)[In] >> -wordlist >> -sequential_sep)) >> do_group;
      pattern = argument >> *(qi::omit[VLINE] >> argument);

      case_item = !qi::eps | -qi::skip(qi::ascii::blank)['('] >> pattern >> qi::lit(')') >>
        (compound_list | *nl) >> qi::skip(qi::ascii::space)[DSEMI] >> *nl;
      case_item_ns = !qi::eps | -qi::skip(qi::ascii::blank)['('] >> pattern >> qi::lit(')') >>
        -compound_list >> *nl;

      case_clause = qi::skip(qi::ascii::blank)[Case] >> word >> qi::omit[+qi::ascii::blank] >> *nl >>
        qi::skip(qi::ascii::space)[In] >> *nl >>
        *case_item >> -case_item_ns >>
        qi::skip(qi::ascii::space)[Esac];

      if_clause = qi::skip(qi::ascii::blank)[If] >> compound_list >>
        qi::skip(qi::ascii::blank)[Then] >> compound_list >>
        *(
          qi::skip(qi::ascii::blank)[Elif] >> compound_list >>
          qi::skip(qi::ascii::blank)[Then] >> compound_list
          ) >>
        -(
          qi::skip(qi::ascii::blank)[Else] >> compound_list
          ) >>
        qi::skip(qi::ascii::space)[Fi];

      compound_command = 
        qi::skip(qi::ascii::space)[Lbrace] >> compound_list >> qi::skip(qi::ascii::space)[Rbrace] |
        qi::skip(qi::ascii::space)[Lparen] >> compound_list >> qi::skip(qi::ascii::space)[Rparen] |
        for_clause |
        case_clause |
        if_clause |
        qi::skip(qi::ascii::blank)[While] >> compound_list >> do_group |
        qi::skip(qi::ascii::blank)[Until] >> compound_list >> do_group;
      function_definition =
        variable_name >> *qi::ascii::blank >> Parens >> *nl >> compound_command;
      command = qi::omit[*qi::ascii::blank] >>
        (
         function_definition |
         (assignments >> command_arguments | assignments | command_arguments) |
         compound_command
         ) >> *io_redirect;

      nl = qi::omit[*comment] >> qi::eol;
      nl_list = +(qi::omit[*comment] >> qi::eol);
      comment = qi::as_string[qi::skip(qi::ascii::blank)['#'] >> *(~qi::char_("\r\n"))];

      BOOST_SPIRIT_DEBUG_NODE(top);
    }
  };
}
int main()
{
  client::sh_parser<std::string::const_iterator> sh_parser;
  std::string str;
  std::cin.unsetf(std::ios::skipws);
  std::copy(std::istream_iterator<char>(std::cin), std::istream_iterator<char>(), std::back_inserter(str));
  {
    std::string::const_iterator it = str.begin(), end = str.end();
    spirit::utree ut;
    bool r = phrase_parse(it, end, sh_parser, qi::ascii::space/*, qi::skip_flag::dont_postskip*/, ut);
    std::cout << str << std::endl;
    if (r && it == end) {
      std::cout << "succeeded:\t" << ut << std::endl;
    }
    else {
      std::cout << "failed:\t" << std::string(it, end) << std::endl;
    }
  }
  return 0;
}

Boost Spirit V2+Universal Tree における bash スクリプトパーサの例

by taiji at 2013/05/13 20:00

先の Boost Spirit V2+Universal Tree における計算機の例 では一行毎の構文解析であったが、複数行に渡る構文解析の例として、シェルスクリプト構文解析を段階的に実装してみよう。ここでは bash の BNF を最終的に目指した例を示していく。

Spirit V2+Universal Tree の基本形

複数行に対応した Spirit V2+Universal Tree の基本形は以下のようになる。

#include <iostream>
#include <string>
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/support_utree.hpp>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;
namespace spirit = boost::spirit;

namespace client {
  template <typename Iterator>
  struct a_parser : qi::grammar<Iterator, qi::ascii::space_type, spirit::utree()> {
    qi::rule<Iterator, qi::ascii::space_type, spirit::utree()> top;
    a_parser() : a_parser::base_type(top)
    {
      top = *~qi::lit('\t'); // ここのパーサ規則を書く

      BOOST_SPIRIT_DEBUG_NODE(top);
    }
  };
}
int main()
{
  client::a_parser<std::string::const_iterator> a_parser;
  std::string str;
  std::cin.unsetf(std::ios::skipws);
  std::copy(std::istream_iterator<char>(std::cin), std::istream_iterator<char>(), std::back_inserter(str));
  {
    std::string::const_iterator it = str.begin(), end = str.end();
    spirit::utree ut;
    bool r = phrase_parse(it, end, a_parser, qi::ascii::space/*, qi::skip_flag::dont_postskip*/, ut);
    if (r && it == end) {
      std::cout << "succeeded:\t" << ut << std::endl;
    }
    else {
      std::cout << "failed:\t" << std::string(it, end) << std::endl;
      return 1;
    }
  }
  return 0;
}

これを雛形としてシェルスクリプトの構文解析を肉付けしていこう。以降、Unified diff 形式でコードの変遷を表現するものとする。

コメント処理、コマンド行、変数代入、シングルクォート

例えば、以下のようなスクリプトを受理できるようなパーサ規則を書く。

#!/bin/sh
i=0
message='hello world'!
echo $i $message
LC_ALL=C time
--- a_multiline_parser+utree01.cc       2013-05-13 20:16:28.000000000 +0900
+++ xsh_ba00.cc 2013-05-13 20:16:28.000000000 +0900
@@ -9,11 +9,53 @@
 
 namespace client {
   template <typename Iterator>
-  struct a_parser : qi::grammar<Iterator, qi::ascii::space_type, spirit::utree()> {
+  struct sh_parser : qi::grammar<Iterator, qi::ascii::space_type, spirit::utree()> {
     qi::rule<Iterator, qi::ascii::space_type, spirit::utree()> top;
-    a_parser() : a_parser::base_type(top)
+    qi::rule<Iterator, qi::ascii::space_type, spirit::utree()> NEWLINE,
+      INDENT,
+      DELIMITER,
+      variable_name,
+      single_quoted_string,
+      unquoted_string,
+      command_name,
+      command_arguments,
+      argument,
+      arguments,
+      ASSIGNMENT_WORD,
+      assignments,
+      simple_command,
+      command,
+      compound_list,
+      comment;
+    sh_parser() : sh_parser::base_type(top)
     {
-      top = *~qi::lit('\t');
+      top = *(compound_list || NEWLINE);
+
+      NEWLINE = qi::omit[-comment] >> qi::no_skip[qi::eol];
+      INDENT = qi::no_skip[*qi::ascii::blank];
+      DELIMITER = qi::no_skip[+qi::ascii::blank];
+
+      variable_name = qi::as_string[qi::no_skip[(qi::ascii::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_'))]];
+      single_quoted_string = qi::no_skip[qi::as_string[qi::char_("'") >> *~qi::char_("'") >> qi::char_("'")]];
+      unquoted_string = qi::no_skip[qi::as_string[+~qi::char_("\t\n \"#&'();<=>\\`|")]];
+
+      command_name = +(single_quoted_string | unquoted_string);
+      command_arguments = command_name >> *(qi::omit[DELIMITER] >> argument);
+
+      argument = +(single_quoted_string | unquoted_string);
+      arguments = argument >> *(qi::omit[DELIMITER] >> argument);
+
+      ASSIGNMENT_WORD = variable_name >> qi::no_skip['='] >> -argument;
+      assignments = ASSIGNMENT_WORD >> *(!qi::eps | qi::omit[DELIMITER] >> ASSIGNMENT_WORD);
+
+      simple_command = assignments >> command_arguments | assignments | command_arguments;
+      command = qi::omit[INDENT] >> simple_command;
+
+      compound_list
+        = *NEWLINE >> command % +NEWLINE >> -NEWLINE
+        ;
+
+      comment = qi::no_skip[*qi::ascii::blank >> qi::as_string['#' >> *(~qi::char_("\r\n"))]];
 
       BOOST_SPIRIT_DEBUG_NODE(top);
     }
@@ -21,20 +63,20 @@
 }
 int main()
 {
-  client::a_parser<std::string::const_iterator> a_parser;
+  client::sh_parser<std::string::const_iterator> sh_parser;
   std::string str;
   std::cin.unsetf(std::ios::skipws);
   std::copy(std::istream_iterator<char>(std::cin), std::istream_iterator<char>(), std::back_inserter(str));
   {
     std::string::const_iterator it = str.begin(), end = str.end();
     spirit::utree ut;
-    bool r = phrase_parse(it, end, a_parser, qi::ascii::space/*, qi::skip_flag::dont_postskip*/, ut);
+    bool r = phrase_parse(it, end, sh_parser, qi::ascii::space/*, qi::skip_flag::dont_postskip*/, ut);
+    std::cout << str << std::endl;
     if (r && it == end) {
       std::cout << "succeeded:\t" << ut << std::endl;
     }
     else {
       std::cout << "failed:\t" << std::string(it, end) << std::endl;
-      return 1;
     }
   }
   return 0;

これでコメント処理、コマンド行、変数代入、シングルクォートされた任意の文字列を含む文字列や、空白区切りによる文字列のリストのパースが可能となっている。

ダブルクォート

次に以下のような、ダブルクォートされた文字列を受理できるようなパーサ規則を書く。

message="hello \"world\"!"
echo "$message"
--- xsh_ba00.cc 2013-05-13 20:16:28.000000000 +0900
+++ xsh_ba01.cc 2013-05-13 20:16:28.000000000 +0900
@@ -16,6 +16,7 @@
       DELIMITER,
       variable_name,
       single_quoted_string,
+      double_quoted_string,
       unquoted_string,
       command_name,
       command_arguments,
@@ -37,12 +38,16 @@
 
       variable_name = qi::as_string[qi::no_skip[(qi::ascii::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_'))]];
       single_quoted_string = qi::no_skip[qi::as_string[qi::char_("'") >> *~qi::char_("'") >> qi::char_("'")]];
+      double_quoted_string = qi::no_skip[qi::as_string[qi::char_('"') >> *(
+                                                                           (qi::lit('\\') >> qi::char_) | (!qi::lit('\\') >> ~qi::lit('"'))
+                                                                           ) >>
+                                                       qi::char_('"')]];
       unquoted_string = qi::no_skip[qi::as_string[+~qi::char_("\t\n \"#&'();<=>\\`|")]];
 
-      command_name = +(single_quoted_string | unquoted_string);
+      command_name = +(single_quoted_string | double_quoted_string | unquoted_string);
       command_arguments = command_name >> *(qi::omit[DELIMITER] >> argument);
 
-      argument = +(single_quoted_string | unquoted_string);
+      argument = +(single_quoted_string | double_quoted_string | unquoted_string);
       arguments = argument >> *(qi::omit[DELIMITER] >> argument);
 
       ASSIGNMENT_WORD = variable_name >> qi::no_skip['='] >> -argument;

これでバックスラッシュでエスケープされていないダブルクォーテーションでのみ閉じられたダブルクォート文字列のパースが可能となっている。

ヒアドキュメント

次に、以下のようなヒアドキュメントの入力を受理するようなパーサ規則を書くが、まずは終端文字列を「EOH」に固定したものから試みる。

cat <<EOH
usage:
        $0 < filename
EOH

ちなみに、ヒアドキュメントでタブを除去する「<<-」のときには真、さもなくば偽を付加するものとする。

--- xsh_ba01.cc 2013-05-13 20:16:28.000000000 +0900
+++ xsh_ba02.cc 2013-05-13 20:16:28.000000000 +0900
@@ -24,12 +24,16 @@
       arguments,
       ASSIGNMENT_WORD,
       assignments,
+      io_here,
       simple_command,
       command,
       compound_list,
       comment;
     sh_parser() : sh_parser::base_type(top)
     {
+#define DLESS qi::as_string["<<"]
+#define MINUS qi::as_string['-']
+
       top = *(compound_list || NEWLINE);
 
       NEWLINE = qi::omit[-comment] >> qi::no_skip[qi::eol];
@@ -53,7 +57,13 @@
       ASSIGNMENT_WORD = variable_name >> qi::no_skip['='] >> -argument;
       assignments = ASSIGNMENT_WORD >> *(!qi::eps | qi::omit[DELIMITER] >> ASSIGNMENT_WORD);
 
-      simple_command = assignments >> command_arguments | assignments | command_arguments;
+      io_here %= DLESS >> qi::matches[qi::no_skip[MINUS]] >>
+        qi::no_skip[qi::lit("EOH")] >> qi::omit[-comment] >> qi::no_skip[qi::eol] >>
+        ((qi::as_string[*qi::no_skip[+(qi::char_ - (qi::eol >> qi::lit("EOH") >> qi::eol))]]) >>
+         qi::omit[qi::no_skip[qi::eol] >> qi::lit("EOH")] |
+         qi::omit[qi::lit("EOH")]);
+
+      simple_command = (assignments >> command_arguments | assignments | command_arguments) >> *io_here;
       command = qi::omit[INDENT] >> simple_command;
 
       compound_list
@@ -63,6 +73,9 @@
       comment = qi::no_skip[*qi::ascii::blank >> qi::as_string['#' >> *(~qi::char_("\r\n"))]];
 
       BOOST_SPIRIT_DEBUG_NODE(top);
+
+#undef DLESS
+#undef MINUS
     }
   };
 }

このままでは「EOH」以外の終端文字列には対応できていない。よって、io_here にローカル変数 qi::locals<std::string> を使用可能にして、それを参照するようにする。

--- xsh_ba02.cc 2013-05-13 20:16:28.000000000 +0900
+++ xsh_ba03.cc 2013-05-13 20:16:28.000000000 +0900
@@ -3,14 +3,16 @@
 //#define BOOST_SPIRIT_DEBUG
 #include <boost/spirit/include/support_utree.hpp>
 #include <boost/spirit/include/qi.hpp>
+#include <boost/spirit/include/phoenix.hpp>
 
 namespace qi = boost::spirit::qi;
 namespace spirit = boost::spirit;
 
 namespace client {
   template <typename Iterator>
-  struct sh_parser : qi::grammar<Iterator, qi::ascii::space_type, spirit::utree()> {
-    qi::rule<Iterator, qi::ascii::space_type, spirit::utree()> top;
+  struct sh_parser : qi::grammar<Iterator, qi::ascii::space_type, spirit::utree(), qi::locals<std::string> > {
+    qi::rule<Iterator, qi::ascii::space_type, spirit::utree(), qi::locals<std::string> > top,
+      io_here;
     qi::rule<Iterator, qi::ascii::space_type, spirit::utree()> NEWLINE,
       INDENT,
       DELIMITER,
@@ -24,7 +26,6 @@
       arguments,
       ASSIGNMENT_WORD,
       assignments,
-      io_here,
       simple_command,
       command,
       compound_list,
@@ -58,10 +59,10 @@
       assignments = ASSIGNMENT_WORD >> *(!qi::eps | qi::omit[DELIMITER] >> ASSIGNMENT_WORD);
 
       io_here %= DLESS >> qi::matches[qi::no_skip[MINUS]] >>
-        qi::no_skip[qi::lit("EOH")] >> qi::omit[-comment] >> qi::no_skip[qi::eol] >>
-        ((qi::as_string[*qi::no_skip[+(qi::char_ - (qi::eol >> qi::lit("EOH") >> qi::eol))]]) >>
-         qi::omit[qi::no_skip[qi::eol] >> qi::lit("EOH")] |
-         qi::omit[qi::lit("EOH")]);
+        qi::as_string[qi::no_skip[+(qi::char_ - qi::ascii::space)]][qi::labels::_a = qi::labels::_1] >> qi::omit[-comment] >> qi::no_skip[qi::eol] >>
+        ((qi::as_string[*qi::no_skip[+(qi::char_ - (qi::eol >> qi::string(qi::labels::_a) >> qi::eol))]]) >>
+         qi::omit[qi::no_skip[qi::eol] >> qi::string(qi::labels::_a)] |
+         qi::omit[qi::string(qi::labels::_a)]);
 
       simple_command = (assignments >> command_arguments | assignments | command_arguments) >> *io_here;
       command = qi::omit[INDENT] >> simple_command;

これで任意の終端文字列でヒアドキュメントが受理できるようになる。さらに、以下のような変数展開を抑止するヒアドキュメントにも対応しよう。

cat <<'EOF'
$1 $2 ...
EOF
--- xsh_ba03.cc 2013-05-13 20:16:28.000000000 +0900
+++ xsh_ba04.cc 2013-05-13 20:16:28.000000000 +0900
@@ -30,6 +30,7 @@
       command,
       compound_list,
       comment;
+    qi::rule<Iterator, qi::ascii::space_type, std::string()> quoted_here_end, unquoted_here_end, here_end;
     sh_parser() : sh_parser::base_type(top)
     {
 #define DLESS qi::as_string["<<"]
@@ -58,8 +59,12 @@
       ASSIGNMENT_WORD = variable_name >> qi::no_skip['='] >> -argument;
       assignments = ASSIGNMENT_WORD >> *(!qi::eps | qi::omit[DELIMITER] >> ASSIGNMENT_WORD);
 
+      quoted_here_end = '\'' >> qi::as_string[qi::no_skip[+(qi::char_ - '\'')]][qi::labels::_val = qi::labels::_1] >> '\'';
+      unquoted_here_end %= qi::as_string[qi::no_skip[+(qi::char_ - qi::ascii::space)]];
+      here_end %= qi::as_string[quoted_here_end | unquoted_here_end];
+
       io_here %= DLESS >> qi::matches[qi::no_skip[MINUS]] >>
-        qi::as_string[qi::no_skip[+(qi::char_ - qi::ascii::space)]][qi::labels::_a = qi::labels::_1] >> qi::omit[-comment] >> qi::no_skip[qi::eol] >>
+        here_end[qi::labels::_a = qi::labels::_1] >> qi::omit[-comment] >> qi::no_skip[qi::eol] >>
         ((qi::as_string[*qi::no_skip[+(qi::char_ - (qi::eol >> qi::string(qi::labels::_a) >> qi::eol))]]) >>
          qi::omit[qi::no_skip[qi::eol] >> qi::string(qi::labels::_a)] |
          qi::omit[qi::string(qi::labels::_a)]);

ちなみに規則で「=」の代わりに「%=」とすると「_val = _1」を省略できる。

リダイレクション

次に、以下のようなリダイレクションを受理するパーサ規則を書く。

sort < filename
printf '' > filename
--- xsh_ba04.cc 2013-05-13 20:16:28.000000000 +0900
+++ xsh_ba05.cc 2013-05-13 20:16:28.000000000 +0900
@@ -26,6 +26,9 @@
       arguments,
       ASSIGNMENT_WORD,
       assignments,
+      filename,
+      redirection,
+      redirection_list,
       simple_command,
       command,
       compound_list,
@@ -33,7 +36,25 @@
     qi::rule<Iterator, qi::ascii::space_type, std::string()> quoted_here_end, unquoted_here_end, here_end;
     sh_parser() : sh_parser::base_type(top)
     {
+#define IO_NUMBER qi::as_string[+qi::digit]
+
+#define AMP qi::as_string['&']
+#define VLINE qi::as_string['|']
+#define SEMI qi::as_string[';']
+
+#define AND_IF qi::as_string["&&"]
+#define OR_IF qi::as_string["||"]
+#define DSEMI qi::as_string[";;"]
+
+#define LESS qi::as_string['<']
+#define GREAT qi::as_string['>']
 #define DLESS qi::as_string["<<"]
+#define DGREAT qi::as_string[">>"]
+#define LESSAND qi::as_string["<&"]
+#define GREATAND qi::as_string[">&"]
+#define LESSGREAT qi::as_string["<>"]
+#define CLOBBER qi::as_string[">|"]
+#define ANDGREAT qi::as_string["&>"]
 #define MINUS qi::as_string['-']
 
       top = *(compound_list || NEWLINE);
@@ -69,7 +90,14 @@
          qi::omit[qi::no_skip[qi::eol] >> qi::string(qi::labels::_a)] |
          qi::omit[qi::string(qi::labels::_a)]);
 
-      simple_command = (assignments >> command_arguments | assignments | command_arguments) >> *io_here;
+      filename = qi::omit[*qi::ascii::blank] >> argument;
+
+      redirection =
+        (LESS | GREAT | DLESS | DGREAT | LESSAND | GREATAND | LESSGREAT | CLOBBER) >> filename |
+        io_here |
+        ANDGREAT >> filename;
+      redirection_list = +redirection;
+      simple_command = (assignments >> command_arguments | assignments | command_arguments) >> -redirection_list;
       command = qi::omit[INDENT] >> simple_command;
 
       compound_list
@@ -80,7 +108,25 @@
 
       BOOST_SPIRIT_DEBUG_NODE(top);
 
+#undef IO_NUMBER
+
+#undef AMP
+#undef VLINE
+#undef SEMI
+
+#undef AND_IF
+#undef OR_IF
+#undef DSEMI
+
+#undef LESS
+#undef GREAT
 #undef DLESS
+#undef DGREAT
+#undef LESSAND
+#undef GREATAND
+#undef LESSGREAT
+#undef CLOBBER
+#undef ANDGREAT
 #undef MINUS
     }
   };

これでリダイレクションの構文を受理できるようになる。

パイプライン

次に、以下のようなパイプラインを受理できるパーサ規則を書く。

echo | cat | sort | uniq
--- xsh_ba05.cc 2013-05-13 20:16:29.000000000 +0900
+++ xsh_ba06.cc 2013-05-13 20:16:29.000000000 +0900
@@ -32,6 +32,7 @@
       simple_command,
       command,
       compound_list,
+      pipeline,
       comment;
     qi::rule<Iterator, qi::ascii::space_type, std::string()> quoted_here_end, unquoted_here_end, here_end;
     sh_parser() : sh_parser::base_type(top)
@@ -101,8 +102,9 @@
       command = qi::omit[INDENT] >> simple_command;
 
       compound_list
-        = *NEWLINE >> command % +NEWLINE >> -NEWLINE
+        = *NEWLINE >> pipeline % +NEWLINE >> -NEWLINE
         ;
+      pipeline = command % (VLINE >> *NEWLINE);
 
       comment = qi::no_skip[*qi::ascii::blank >> qi::as_string['#' >> *(~qi::char_("\r\n"))]];
 

しかしこれでは、以下のようなヒアドキュメント直後のパイプラインには未対応である。

cat <<EOF | sort
        :
EOF

よって、以下のように pipe_here 規則を io_here 規則に挿入する。

--- xsh_ba06.cc 2013-05-13 20:16:29.000000000 +0900
+++ xsh_ba07.cc 2013-05-13 20:16:29.000000000 +0900
@@ -26,6 +26,7 @@
       arguments,
       ASSIGNMENT_WORD,
       assignments,
+      pipe_here,
       filename,
       redirection,
       redirection_list,
@@ -81,12 +82,14 @@
       ASSIGNMENT_WORD = variable_name >> qi::no_skip['='] >> -argument;
       assignments = ASSIGNMENT_WORD >> *(!qi::eps | qi::omit[DELIMITER] >> ASSIGNMENT_WORD);
 
+      pipe_here = VLINE >> *pipeline;
+
       quoted_here_end = '\'' >> qi::as_string[qi::no_skip[+(qi::char_ - '\'')]][qi::labels::_val = qi::labels::_1] >> '\'';
       unquoted_here_end %= qi::as_string[qi::no_skip[+(qi::char_ - qi::ascii::space)]];
       here_end %= qi::as_string[quoted_here_end | unquoted_here_end];
 
       io_here %= DLESS >> qi::matches[qi::no_skip[MINUS]] >>
-        here_end[qi::labels::_a = qi::labels::_1] >> qi::omit[-comment] >> qi::no_skip[qi::eol] >>
+        here_end[qi::labels::_a = qi::labels::_1] >> -pipe_here >> qi::omit[-comment] >> qi::no_skip[qi::eol] >>
         ((qi::as_string[*qi::no_skip[+(qi::char_ - (qi::eol >> qi::string(qi::labels::_a) >> qi::eol))]]) >>
          qi::omit[qi::no_skip[qi::eol] >> qi::string(qi::labels::_a)] |
          qi::omit[qi::string(qi::labels::_a)]);

これでパイプラインの構文を受理できるようになる。

リスト

次に、以下のようなリストを受理できるようなパーサ規則を書く。

true && echo yes || echo no
--- xsh_ba07.cc 2013-05-13 20:16:29.000000000 +0900
+++ xsh_ba08.cc 2013-05-13 20:16:29.000000000 +0900
@@ -32,7 +32,10 @@
       redirection_list,
       simple_command,
       command,
+      list,
       compound_list,
+      list1,
+      newline_list,
       pipeline,
       comment;
     qi::rule<Iterator, qi::ascii::space_type, std::string()> quoted_here_end, unquoted_here_end, here_end;
@@ -103,11 +106,14 @@
       redirection_list = +redirection;
       simple_command = (assignments >> command_arguments | assignments | command_arguments) >> -redirection_list;
       command = qi::omit[INDENT] >> simple_command;
-
+      list = newline_list >> list1 >> NEWLINE >> newline_list;
       compound_list
-        = *NEWLINE >> pipeline % +NEWLINE >> -NEWLINE
+        = list
+        | newline_list >> list1
         ;
-      pipeline = command % (VLINE >> *NEWLINE);
+      list1 = pipeline % ((AND_IF | OR_IF | NEWLINE) >> newline_list);
+      newline_list = *NEWLINE;
+      pipeline = command % (VLINE >> newline_list);
 
       comment = qi::no_skip[*qi::ascii::blank >> qi::as_string['#' >> *(~qi::char_("\r\n"))]];
 

さらに、次のようなリストを受理できるようなパーサ規則を書く。

echo & i=0; LC_ALL=C date
--- xsh_ba08.cc 2013-05-13 20:16:29.000000000 +0900
+++ xsh_ba09.cc 2013-05-13 20:16:29.000000000 +0900
@@ -106,12 +106,12 @@
       redirection_list = +redirection;
       simple_command = (assignments >> command_arguments | assignments | command_arguments) >> -redirection_list;
       command = qi::omit[INDENT] >> simple_command;
-      list = newline_list >> list1 >> NEWLINE >> newline_list;
+      list = newline_list >> list1 >> (AMP | SEMI | NEWLINE) >> newline_list;
       compound_list
         = list
         | newline_list >> list1
         ;
-      list1 = pipeline % ((AND_IF | OR_IF | NEWLINE) >> newline_list);
+      list1 = pipeline % ((AND_IF | OR_IF | AMP | SEMI | NEWLINE) >> newline_list);
       newline_list = *NEWLINE;
       pipeline = command % (VLINE >> newline_list);
 

これでリスト構文を受理できるようになる。

サブシェル、グループ

次に、以下のようなサブシェルやグループを受理できるようなパーサ規則を書く。

true && (echo yes) || (echo no)
true && {
  echo yes
} || {
  echo no
}

また、ついでにリダイレクションにてファイルデスクリプタ番号に対応させておく。

--- xsh_ba09.cc 2013-05-13 20:16:29.000000000 +0900
+++ xsh_ba10.cc 2013-05-13 20:16:29.000000000 +0900
@@ -27,15 +27,23 @@
       ASSIGNMENT_WORD,
       assignments,
       pipe_here,
+      number,
       filename,
       redirection,
       redirection_list,
       simple_command,
       command,
+      shell_command,
+      subshell,
+      group_command,
       list,
       compound_list,
+      list0,
       list1,
+      list_terminator,
       newline_list,
+      simple_list1,
+      simple_list,
       pipeline,
       comment;
     qi::rule<Iterator, qi::ascii::space_type, std::string()> quoted_here_end, unquoted_here_end, here_end;
@@ -62,7 +70,29 @@
 #define ANDGREAT qi::as_string["&>"]
 #define MINUS qi::as_string['-']
 
-      top = *(compound_list || NEWLINE);
+#define If qi::as_string["if"]
+#define Then qi::as_string["then"]
+#define Elif qi::as_string["elif"]
+#define Else qi::as_string["else"]
+#define Fi qi::as_string["fi"]
+#define For qi::as_string["for"]
+#define While qi::as_string["while"]
+#define Until qi::as_string["until"]
+#define Do qi::as_string["do"]
+#define Done qi::as_string["done"]
+#define Select qi::as_string["select"]
+#define Case qi::as_string["case"]
+#define Esac qi::as_string["esac"]
+#define In qi::as_string["in"]
+#define Function qi::as_string["function"]
+#define Lparen qi::as_string['(']
+#define Rparen qi::as_string[')']
+#define Lbrace qi::as_string['{']
+#define Rbrace qi::as_string['}']
+#define Parens qi::as_string["()"]
+#define Bang qi::as_string['!']
+
+      top = *compound_list;
 
       NEWLINE = qi::omit[-comment] >> qi::no_skip[qi::eol];
       INDENT = qi::no_skip[*qi::ascii::blank];
@@ -97,22 +127,35 @@
          qi::omit[qi::no_skip[qi::eol] >> qi::string(qi::labels::_a)] |
          qi::omit[qi::string(qi::labels::_a)]);
 
+      number = IO_NUMBER;
       filename = qi::omit[*qi::ascii::blank] >> argument;
 
       redirection =
-        (LESS | GREAT | DLESS | DGREAT | LESSAND | GREATAND | LESSGREAT | CLOBBER) >> filename |
-        io_here |
+        -number >> (LESS | GREAT | DLESS | DGREAT | LESSAND | GREATAND | LESSGREAT | CLOBBER) >> filename |
+        -number >> (LESSAND | GREATAND) >> (number | MINUS) |
+        -number >> io_here |
         ANDGREAT >> filename;
       redirection_list = +redirection;
       simple_command = (assignments >> command_arguments | assignments | command_arguments) >> -redirection_list;
-      command = qi::omit[INDENT] >> simple_command;
-      list = newline_list >> list1 >> (AMP | SEMI | NEWLINE) >> newline_list;
+      command = qi::omit[INDENT] >> (shell_command >> -redirection_list | simple_command);
+      shell_command
+        = subshell
+        | group_command
+        ;
+      subshell = Lparen >> compound_list >> Rparen;
+      group_command = Lbrace >> list >> Rbrace;
+      list = newline_list >> list0;
       compound_list
         = list
         | newline_list >> list1
         ;
+      list0 = list1 >> (AMP | SEMI | NEWLINE) >> newline_list;
       list1 = pipeline % ((AND_IF | OR_IF | AMP | SEMI | NEWLINE) >> newline_list);
+      list_terminator = NEWLINE | SEMI;
       newline_list = *NEWLINE;
+      simple_list = simple_list1 >> -(AMP | SEMI);
+      simple_list1
+        = pipeline % ((AND_IF | OR_IF) >> newline_list | (AMP | SEMI));
       pipeline = command % (VLINE >> newline_list);
 
       comment = qi::no_skip[*qi::ascii::blank >> qi::as_string['#' >> *(~qi::char_("\r\n"))]];
@@ -139,6 +182,28 @@
 #undef CLOBBER
 #undef ANDGREAT
 #undef MINUS
+
+#undef If
+#undef Then
+#undef Elif
+#undef Else
+#undef Fi
+#undef For
+#undef While
+#undef Until
+#undef Do
+#undef Done
+#undef Select
+#undef Case
+#undef Esac
+#undef In
+#undef Function
+#undef Lparen
+#undef Rparen
+#undef Lbrace
+#undef Rbrace
+#undef Parens
+#undef Bang
     }
   };
 }

ここまで受理できるようになったので、残りは if などの制御構文のみとなった。

制御構文

if then elif else fi,

for in do done,

while do done,

until do done,

select do done,

case in ) ;; esac,

[function] f()

の制御構文を受理できるパーサ規則を書く。

--- xsh_ba10.cc 2013-05-13 20:16:30.000000000 +0900
+++ xsh_ba11.cc 2013-05-13 20:16:30.000000000 +0900
@@ -16,6 +16,7 @@
     qi::rule<Iterator, qi::ascii::space_type, spirit::utree()> NEWLINE,
       INDENT,
       DELIMITER,
+      RESERVED_WORDS,
       variable_name,
       single_quoted_string,
       double_quoted_string,
@@ -28,14 +29,28 @@
       assignments,
       pipe_here,
       number,
+      name,
+      word,
+      word_list,
       filename,
       redirection,
       redirection_list,
+      simple_command_element,
       simple_command,
       command,
       shell_command,
+      for_command,
+      select_command,
+      case_command,
+      function_def,
       subshell,
+      if_command,
       group_command,
+      elif_clause,
+      case_clause_sequence,
+      case_clause,
+      pattern,
+      pattern_list,
       list,
       compound_list,
       list0,
@@ -44,6 +59,7 @@
       newline_list,
       simple_list1,
       simple_list,
+      pipeline_command,
       pipeline,
       comment;
     qi::rule<Iterator, qi::ascii::space_type, std::string()> quoted_here_end, unquoted_here_end, here_end;
@@ -97,6 +113,7 @@
       NEWLINE = qi::omit[-comment] >> qi::no_skip[qi::eol];
       INDENT = qi::no_skip[*qi::ascii::blank];
       DELIMITER = qi::no_skip[+qi::ascii::blank];
+      RESERVED_WORDS = /*Bang | */Lbrace | Rbrace/* | Lparen | Rparen*/ | Case | Do | Done | Elif | Else | Esac | Fi | For | If | In | Then | Until | While;
 
       variable_name = qi::as_string[qi::no_skip[(qi::ascii::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_'))]];
       single_quoted_string = qi::no_skip[qi::as_string[qi::char_("'") >> *~qi::char_("'") >> qi::char_("'")]];
@@ -106,7 +123,7 @@
                                                        qi::char_('"')]];
       unquoted_string = qi::no_skip[qi::as_string[+~qi::char_("\t\n \"#&'();<=>\\`|")]];
 
-      command_name = +(single_quoted_string | double_quoted_string | unquoted_string);
+      command_name = +(single_quoted_string | double_quoted_string | unquoted_string) - RESERVED_WORDS;
       command_arguments = command_name >> *(qi::omit[DELIMITER] >> argument);
 
       argument = +(single_quoted_string | double_quoted_string | unquoted_string);
@@ -128,6 +145,9 @@
          qi::omit[qi::string(qi::labels::_a)]);
 
       number = IO_NUMBER;
+      name = qi::omit[DELIMITER] >> variable_name;
+      word = qi::omit[DELIMITER] >> argument;
+      word_list = qi::omit[DELIMITER] >> arguments;
       filename = qi::omit[*qi::ascii::blank] >> argument;
 
       redirection =
@@ -135,27 +155,59 @@
         -number >> (LESSAND | GREATAND) >> (number | MINUS) |
         -number >> io_here |
         ANDGREAT >> filename;
+      simple_command_element
+        = qi::omit[DELIMITER] >> ASSIGNMENT_WORD
+        | redirection
+        | qi::omit[DELIMITER] >> argument
+        ;
       redirection_list = +redirection;
-      simple_command = (assignments >> command_arguments | assignments | command_arguments) >> -redirection_list;
+      simple_command = (ASSIGNMENT_WORD | redirection | command_name) >> *simple_command_element;
       command = qi::omit[INDENT] >> (shell_command >> -redirection_list | simple_command);
       shell_command
-        = subshell
+        = for_command
+        | case_command
+        | (While | Until) >> compound_list >> Do >> compound_list >> Done
+        | select_command
+        | if_command
+        | subshell
         | group_command
+        | function_def
         ;
+      for_command
+        = For >> name >> (SEMI          >> newline_list
+                          |                newline_list >> -(In >> -word_list >> list_terminator >> newline_list)
+                          ) >> (Do | Lbrace) >> compound_list >> (Done | Rbrace);
+      select_command
+        = Select >> name >> (SEMI       >> newline_list
+                             |             newline_list >> -(In >> -word_list >> list_terminator >> newline_list)
+                             ) >> (Do | Lbrace) >> list >> (Done | Rbrace);
+      case_command
+        = Case >> word >> newline_list >> In >> (case_clause | -case_clause_sequence >> newline_list)   >> Esac;
+      function_def = (-(Function >> qi::omit[DELIMITER]) >> variable_name >> Parens | Function >> name) >>
+        newline_list >> (group_command | subshell);
       subshell = Lparen >> compound_list >> Rparen;
+      if_command
+        = If >> compound_list >> Then >> compound_list >> -(Else >> compound_list)      >> Fi
+        | If >> compound_list >> Then >> compound_list >> elif_clause                   >> Fi;
       group_command = Lbrace >> list >> Rbrace;
+      elif_clause = *(Elif >> compound_list >> Then >> compound_list) >> -(Else >> compound_list);
+      case_clause = -case_clause_sequence >> pattern_list;
+      pattern_list = newline_list >> qi::omit[INDENT] >> qi::omit[-Lparen] >> pattern >> qi::omit[Rparen] >> (compound_list | newline_list);
+      case_clause_sequence = +(!qi::eps | pattern_list >> DSEMI);
+      pattern = argument % VLINE;
       list = newline_list >> list0;
       compound_list
         = list
         | newline_list >> list1
         ;
-      list0 = list1 >> (AMP | SEMI | NEWLINE) >> newline_list;
-      list1 = pipeline % ((AND_IF | OR_IF | AMP | SEMI | NEWLINE) >> newline_list);
+      list0 = list1 >> (AMP | (!DSEMI >> SEMI) | NEWLINE) >> newline_list;
+      list1 = pipeline_command % ((AND_IF | OR_IF | AMP | SEMI | NEWLINE) >> newline_list);
       list_terminator = NEWLINE | SEMI;
       newline_list = *NEWLINE;
       simple_list = simple_list1 >> -(AMP | SEMI);
       simple_list1
-        = pipeline % ((AND_IF | OR_IF) >> newline_list | (AMP | SEMI));
+        = pipeline_command % ((AND_IF | OR_IF) >> newline_list | (AMP | SEMI));
+      pipeline_command = -Bang >> pipeline;
       pipeline = command % (VLINE >> newline_list);
 
       comment = qi::no_skip[*qi::ascii::blank >> qi::as_string['#' >> *(~qi::char_("\r\n"))]];

これで bash シェルスクリプト構文を受理できるようになった。

そしてこれは Learning the bash Shell, Second Edition, BNF for bash 記載の BNF を Boost Spirit Qi で書き直したものになっている。最後に、全体像としてオリジナルの BNF も添えたコードを載せておく。ちなみに、!qi::eps というのが稀に必要となっている箇所があるが、これがないと属性が重複してしまう現象を防いでいる。

#include <iostream>
#include <string>
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/support_utree.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace qi = boost::spirit::qi;
namespace spirit = boost::spirit;

namespace client {
  template <typename Iterator>
  struct sh_parser : qi::grammar<Iterator, qi::ascii::space_type, spirit::utree(), qi::locals<std::string> > {
    qi::rule<Iterator, qi::ascii::space_type, spirit::utree(), qi::locals<std::string> > top,
      io_here;
    qi::rule<Iterator, qi::ascii::space_type, spirit::utree()> NEWLINE,
      INDENT,
      DELIMITER,
      RESERVED_WORDS,
      variable_name,
      single_quoted_string,
      double_quoted_string,
      unquoted_string,
      command_name,
      command_arguments,
      argument,
      arguments,
      ASSIGNMENT_WORD,
      assignments,
      pipe_here,
      number,
      name,
      word,
      word_list,
      filename,
      redirection,
      redirection_list,
      simple_command_element,
      simple_command,
      command,
      shell_command,
      for_command,
      select_command,
      case_command,
      function_def,
      subshell,
      if_command,
      group_command,
      elif_clause,
      case_clause_sequence,
      case_clause,
      pattern,
      pattern_list,
      list,
      compound_list,
      list0,
      list1,
      list_terminator,
      newline_list,
      simple_list1,
      simple_list,
      pipeline_command,
      pipeline,
      comment;
    qi::rule<Iterator, qi::ascii::space_type, std::string()> quoted_here_end, unquoted_here_end, here_end;
    sh_parser() : sh_parser::base_type(top)
    {
#define IO_NUMBER qi::as_string[+qi::digit]

#define AMP qi::as_string['&']
#define VLINE qi::as_string['|']
#define SEMI qi::as_string[';']

#define AND_IF qi::as_string["&&"]
#define OR_IF qi::as_string["||"]
#define DSEMI qi::as_string[";;"]

#define LESS qi::as_string['<']
#define GREAT qi::as_string['>']
#define DLESS qi::as_string["<<"]
#define DGREAT qi::as_string[">>"]
#define LESSAND qi::as_string["<&"]
#define GREATAND qi::as_string[">&"]
#define LESSGREAT qi::as_string["<>"]
#define CLOBBER qi::as_string[">|"]
#define ANDGREAT qi::as_string["&>"]
#define MINUS qi::as_string['-']

#define If qi::as_string["if"]
#define Then qi::as_string["then"]
#define Elif qi::as_string["elif"]
#define Else qi::as_string["else"]
#define Fi qi::as_string["fi"]
#define For qi::as_string["for"]
#define While qi::as_string["while"]
#define Until qi::as_string["until"]
#define Do qi::as_string["do"]
#define Done qi::as_string["done"]
#define Select qi::as_string["select"]
#define Case qi::as_string["case"]
#define Esac qi::as_string["esac"]
#define In qi::as_string["in"]
#define Function qi::as_string["function"]
#define Lparen qi::as_string['(']
#define Rparen qi::as_string[')']
#define Lbrace qi::as_string['{']
#define Rbrace qi::as_string['}']
#define Parens qi::as_string["()"]
#define Bang qi::as_string['!']

      top = *compound_list;

      NEWLINE = qi::omit[-comment] >> qi::no_skip[qi::eol];
      INDENT = qi::no_skip[*qi::ascii::blank];
      DELIMITER = qi::no_skip[+qi::ascii::blank];
      RESERVED_WORDS = /*Bang | */Lbrace | Rbrace/* | Lparen | Rparen*/ | Case | Do | Done | Elif | Else | Esac | Fi | For | If | In | Then | Until | While;

      variable_name = qi::as_string[qi::no_skip[(qi::ascii::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_'))]];
      single_quoted_string = qi::no_skip[qi::as_string[qi::char_("'") >> *~qi::char_("'") >> qi::char_("'")]];
      double_quoted_string = qi::no_skip[qi::as_string[qi::char_('"') >> *(
                                                                           (qi::lit('\\') >> qi::char_) | (!qi::lit('\\') >> ~qi::lit('"'))
                                                                           ) >>
                                                       qi::char_('"')]];
      unquoted_string = qi::no_skip[qi::as_string[+~qi::char_("\t\n \"#&'();<=>\\`|")]];

      command_name = +(single_quoted_string | double_quoted_string | unquoted_string) - RESERVED_WORDS;
      command_arguments = command_name >> *(qi::omit[DELIMITER] >> argument);

      argument = +(single_quoted_string | double_quoted_string | unquoted_string);
      arguments = argument >> *(qi::omit[DELIMITER] >> argument);

      ASSIGNMENT_WORD = variable_name >> qi::no_skip['='] >> -argument;
      assignments = ASSIGNMENT_WORD >> *(!qi::eps | qi::omit[DELIMITER] >> ASSIGNMENT_WORD);

      pipe_here = VLINE >> *pipeline;

      quoted_here_end = '\'' >> qi::as_string[qi::no_skip[+(qi::char_ - '\'')]][qi::labels::_val = qi::labels::_1] >> '\'';
      unquoted_here_end %= qi::as_string[qi::no_skip[+(qi::char_ - qi::ascii::space)]];
      here_end %= qi::as_string[quoted_here_end | unquoted_here_end];

      io_here %= DLESS >> qi::matches[qi::no_skip[MINUS]] >>
        here_end[qi::labels::_a = qi::labels::_1] >> -pipe_here >> qi::omit[-comment] >> qi::no_skip[qi::eol] >>
        ((qi::as_string[*qi::no_skip[+(qi::char_ - (qi::eol >> qi::string(qi::labels::_a) >> qi::eol))]]) >>
         qi::omit[qi::no_skip[qi::eol] >> qi::string(qi::labels::_a)] |
         qi::omit[qi::string(qi::labels::_a)]);

      number = IO_NUMBER;
      name = qi::omit[DELIMITER] >> variable_name;
      word = qi::omit[DELIMITER] >> argument;
      word_list = qi::omit[DELIMITER] >> arguments;
      filename = qi::omit[*qi::ascii::blank] >> argument;

#if 0 /* !use EBNF-like qi */
      redirection
        = LESS >> filename
        | number >> LESS >> filename
        | GREAT >> filename
        | number >> GREAT >> filename
        | DGREAT >> filename
        | number >> DGREAT >> filename
        | LESSAND >> number
        | number >> LESSAND >> number
        | GREATAND >> number
        | number >> GREATAND >> number
        | LESSAND >> filename
        | number >> LESSAND >> filename
        | GREATAND >> filename
        | number >> GREATAND >> filename
        | LESSAND >> MINUS
        | number >> LESSAND >> MINUS
        | GREATAND >> MINUS
        | number >> GREATAND >> MINUS
        | LESSGREAT >> filename
        | number >> LESSGREAT >> filename
        | CLOBBER >> filename
        | number >> CLOBBER >> filename
        | io_here
        | number >> io_here
        | ANDGREAT >> filename
        ;
#else
      redirection =
        -number >> (LESS | GREAT | DLESS | DGREAT | LESSAND | GREATAND | LESSGREAT | CLOBBER) >> filename |
        -number >> (LESSAND | GREATAND) >> (number | MINUS) |
        -number >> io_here |
        ANDGREAT >> filename;
#endif
#if 0 /* use recursive qi */
      simple_command_element
        = command_name
        | ASSIGNMENT_WORD
        | redirection
        ;
      redirection_list
        = redirection
        | redirection_list >> redirection
        ;
      simple_command
        = simple_command_element
        | simple_command >> simple_command_element
        ;
#else
      simple_command_element
        = qi::omit[DELIMITER] >> ASSIGNMENT_WORD
        | redirection
        | qi::omit[DELIMITER] >> argument
        ;
      redirection_list = +redirection;
      simple_command = (ASSIGNMENT_WORD | redirection | command_name) >> *simple_command_element;
#endif
#if 0 /* !use EBNF-like qi */
      command
        = qi::omit[INDENT] >> shell_command >> redirection_list
        | qi::omit[INDENT] >> shell_command
        | qi::omit[INDENT] >> simple_command
        ;
#else
      command = qi::omit[INDENT] >> (shell_command >> -redirection_list | simple_command);
#endif
#if 0 /* !use EBNF-like qi */
      shell_command
        = for_command
        | case_command
        | While >> compound_list >> Do >> compound_list >> Done
        | Until >> compound_list >> Do >> compound_list >> Done
        | select_command
        | if_command
        | subshell
        | group_command
        | function_def
        ;
#else
      shell_command
        = for_command
        | case_command
        | (While | Until) >> compound_list >> Do >> compound_list >> Done
        | select_command
        | if_command
        | subshell
        | group_command
        | function_def
        ;
#endif
#if 0 /* !use EBNF-like qi */
      for_command
        = For >> name                   >> newline_list                                                                 >> Do           >> compound_list >> Done
        | For >> name                   >> newline_list                                                                 >> Lbrace       >> compound_list >> Rbrace
        | For >> name >> SEMI           >> newline_list                                                                 >> Do           >> compound_list >> Done
        | For >> name >> SEMI           >> newline_list                                                                 >> Lbrace       >> compound_list >> Rbrace
        | For >> name                   >> newline_list >> In                   >> list_terminator >> newline_list      >> Do           >> compound_list >> Done
        | For >> name                   >> newline_list >> In                   >> list_terminator >> newline_list      >> Lbrace       >> compound_list >> Rbrace
        | For >> name                   >> newline_list >> In >> word_list      >> list_terminator >> newline_list      >> Do           >> compound_list >> Done
        | For >> name                   >> newline_list >> In >> word_list      >> list_terminator >> newline_list      >> Lbrace       >> compound_list >> Rbrace
        ;
#else
      for_command
        = For >> name >> (SEMI          >> newline_list
                          |                newline_list >> -(In >> -word_list >> list_terminator >> newline_list)
                          ) >> (Do | Lbrace) >> compound_list >> (Done | Rbrace);
#endif
#if 0 /* !use EBNF-like qi */
      select_command
        = Select >> name                >> newline_list                                                         >> Do           >> list >> Done
        | Select >> name                >> newline_list                                                         >> Lbrace       >> list >> Rbrace
        | Select >> name >> SEMI        >> newline_list                                                         >> Do           >> list >> Done
        | Select >> name >> SEMI        >> newline_list                                                         >> Lbrace       >> list >> Rbrace
        | Select >> name                >> newline_list >> In >> word_list >> list_terminator >> newline_list   >> Do           >> list >> Done
        | Select >> name                >> newline_list >> In >> word_list >> list_terminator >> newline_list   >> Lbrace       >> list >> Rbrace
        ;
#else
      select_command
        = Select >> name >> (SEMI       >> newline_list
                             |             newline_list >> -(In >> -word_list >> list_terminator >> newline_list)
                             ) >> (Do | Lbrace) >> list >> (Done | Rbrace);
#endif
#if 0 /* !use EBNF-like qi */
      case_command
        = Case >> word >> newline_list >> In >> case_clause                             >> Esac
        | Case >> word >> newline_list >> In >> case_clause_sequence    >> newline_list >> Esac
        | Case >> word >> newline_list >> In                            >> newline_list >> Esac
        ;
#else
      case_command
        = Case >> word >> newline_list >> In >> (case_clause | -case_clause_sequence >> newline_list)   >> Esac;
#endif
#if 0 /* !use EBNF-like qi */
      function_def
        =       variable_name   >> Parens       >> newline_list >> (group_command | subshell)
        | Function >>   name    >> Parens       >> newline_list >> (group_command | subshell)
        | Function >>   name                    >> newline_list >> (group_command | subshell)
        ;
#else
      function_def = (-(Function >> qi::omit[DELIMITER]) >> variable_name >> Parens | Function >> name) >>
        newline_list >> (group_command | subshell);
#endif
      subshell = Lparen >> compound_list >> Rparen;
#if 0 /* !use EBNF-like qi */
      if_command
        = If >> compound_list >> Then >> compound_list >> Else >> compound_list >> Fi
        | If >> compound_list >> Then >> compound_list >> elif_clause           >> Fi
        | If >> compound_list >> Then >> compound_list                          >> Fi
        ;
#else
      if_command
        = If >> compound_list >> Then >> compound_list >> -(Else >> compound_list)      >> Fi
        | If >> compound_list >> Then >> compound_list >> elif_clause                   >> Fi;
#endif
      group_command = Lbrace >> list >> Rbrace;
#if 0 /* use recursive qi */
      elif_clause
        = Elif >> compound_list >> Then >> compound_list >> Else >> compound_list
        | Elif >> compound_list >> Then >> compound_list >> elif_clause
        | Elif >> compound_list >> Then >> compound_list                        
        ;
#else
      elif_clause = *(Elif >> compound_list >> Then >> compound_list) >> -(Else >> compound_list);
#endif
#if 0 /* !use EBNF-like qi */
      case_clause
        = case_clause_sequence >>       pattern_list
        |                               pattern_list
        ;
      pattern_list
        = newline_list >> qi::omit[INDENT]                      >> pattern >> qi::omit[Rparen] >> compound_list
        | newline_list >> qi::omit[INDENT]                      >> pattern >> qi::omit[Rparen] >> newline_list
        | newline_list >> qi::omit[INDENT] >> qi::omit[Lparen]  >> pattern >> qi::omit[Rparen] >> compound_list
        | newline_list >> qi::omit[INDENT] >> qi::omit[Lparen]  >> pattern >> qi::omit[Rparen] >> newline_list
        ;
#else
      case_clause = -case_clause_sequence >> pattern_list;
      pattern_list = newline_list >> qi::omit[INDENT] >> qi::omit[-Lparen] >> pattern >> qi::omit[Rparen] >> (compound_list | newline_list);
#endif
#if 0 /* use recursive qi */
      case_clause_sequence
        = pattern_list >> DSEMI
        | case_clause_sequence >> pattern_list >> DSEMI
        ;
      pattern
        = word
        | pattern >> VLINE >> word
        ;
#else
      case_clause_sequence = +(!qi::eps | pattern_list >> DSEMI);
      pattern = argument % VLINE;
#endif
      list = newline_list >> list0;
      compound_list
        = list
        | newline_list >> list1
        ;
#if 0 /* !use EBNF-like qi */
      list0
        = list1 >> AMP                  >> newline_list
        | list1 >> (!DSEMI >> SEMI)     >> newline_list
        | list1 >> NEWLINE              >> newline_list
        ;
#if 0 /* use recursive qi */
      list1
        = list1 >> AND_IF       >> newline_list >> list1
        | list1 >> OR_IF        >> newline_list >> list1
        | list1 >> AMP          >> newline_list >> list1
        | list1 >> SEMI         >> newline_list >> list1
        | list1 >> NEWLINE      >> newline_list >> list1
        | pipeline_command
        ;
#else
      list1 = pipeline_command % ((AND_IF | OR_IF | AMP | SEMI | NEWLINE) >> newline_list);
#endif
      list_terminator
        = NEWLINE
        | SEMI
        ;
#else
      list0 = list1 >> (AMP | (!DSEMI >> SEMI) | NEWLINE) >> newline_list;
#if 0 /* use recursive qi */
      list1
        = list1 >> (AND_IF | OR_IF | AMP | SEMI | NEWLINE) >> newline_list >> list1
        | pipeline_command
        ;
#else
      list1 = pipeline_command % ((AND_IF | OR_IF | AMP | SEMI | NEWLINE) >> newline_list);
#endif
      list_terminator = NEWLINE | SEMI;
#endif
#if 0 /* use recursive qi */
      newline_list
        = qi::eps
        | newline_list >> NEWLINE
        ;
#else
      newline_list = *NEWLINE;
#endif
#if 0 /* !use EBNF-like qi */
      simple_list
        = simple_list1
        | simple_list1 >> AMP
        | simple_list1 >> SEMI
        ;
      simple_list1
        = simple_list1 >> AND_IF        >> newline_list >> simple_list1
        | simple_list1 >> OR_IF         >> newline_list >> simple_list1
        | simple_list1 >> AMP                           >> simple_list1
        | simple_list1 >> SEMI                          >> simple_list1
        | pipeline_command
        ;
      pipeline_command
        =               pipeline
        | Bang >>       pipeline
        ;
#else
      simple_list = simple_list1 >> -(AMP | SEMI);
#if 0 /* use recursive qi */
      simple_list1
        = simple_list1 >> ((AND_IF | OR_IF) >> newline_list | (AMP | SEMI)) >> simple_list1
        | pipeline_command
        ;
#else
      simple_list1
        = pipeline_command % ((AND_IF | OR_IF) >> newline_list | (AMP | SEMI));
#endif
      pipeline_command = -Bang >> pipeline;
#endif
#if 0 /* use recursive qi */
      pipeline
        = pipeline >> VLINE >> newline_list >> pipeline
        | command
        ;
#else
      pipeline = command % (VLINE >> newline_list);
#endif

      comment = qi::no_skip[*qi::ascii::blank >> qi::as_string['#' >> *(~qi::char_("\r\n"))]];

      BOOST_SPIRIT_DEBUG_NODE(top);

#undef IO_NUMBER

#undef AMP
#undef VLINE
#undef SEMI

#undef AND_IF
#undef OR_IF
#undef DSEMI

#undef LESS
#undef GREAT
#undef DLESS
#undef DGREAT
#undef LESSAND
#undef GREATAND
#undef LESSGREAT
#undef CLOBBER
#undef ANDGREAT
#undef MINUS

#undef If
#undef Then
#undef Elif
#undef Else
#undef Fi
#undef For
#undef While
#undef Until
#undef Do
#undef Done
#undef Select
#undef Case
#undef Esac
#undef In
#undef Function
#undef Lparen
#undef Rparen
#undef Lbrace
#undef Rbrace
#undef Parens
#undef Bang
    }
  };
}
int main()
{
  client::sh_parser<std::string::const_iterator> sh_parser;
  std::string str;
  std::cin.unsetf(std::ios::skipws);
  std::copy(std::istream_iterator<char>(std::cin), std::istream_iterator<char>(), std::back_inserter(str));
  {
    std::string::const_iterator it = str.begin(), end = str.end();
    spirit::utree ut;
    bool r = phrase_parse(it, end, sh_parser, qi::ascii::space/*, qi::skip_flag::dont_postskip*/, ut);
    std::cout << str << std::endl;
    if (r && it == end) {
      std::cout << "succeeded:\t" << ut << std::endl;
    }
    else {
      std::cout << "failed:\t" << std::string(it, end) << std::endl;
    }
  }
  return 0;
}

Boost Spirit V2+Universal Tree における計算機の例

by taiji at 2013/04/04 12:54

Boost Spirit のサンプルにはいくつもの計算機の例があるが、四則演算以外をサポートした簡潔な好例がなく、ここで実装例を紹介する価値はありそうである。

本稿では Spirit V2+Universal Tree での例を紹介していく。

Spirit V2+Universal Tree の基本形

Spirit V2+Universal Tree の基本形は以下のようになる。

#include <iostream>
#include <string>
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/support_utree.hpp>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;
namespace spirit = boost::spirit;

namespace client {
  template <typename Iterator>
  struct a_parser : qi::grammar<Iterator, spirit::utree()> {
    qi::rule<Iterator, spirit::utree()> top;
    a_parser() : a_parser::base_type(top)
    {
      top = *~qi::lit('\t'); // ここにパーサ規則を書く

      BOOST_SPIRIT_DEBUG_NODE(top);
    }
  };
}
int main()
{
  client::a_parser<std::string::const_iterator> a_parser;
  std::string str;
  while (std::getline(std::cin, str)) {
    std::string::const_iterator it = str.begin(), end = str.end();
    spirit::utree ut;
    bool r = phrase_parse(it, end, a_parser, qi::ascii::space, ut);
    if (r && it == end) {
      std::cout << "succeeded:\t" << ut << std::endl;
    }
    else {
      std::cout << "failed:\t" << std::string(it, end) << std::endl;
      return 1;
    }
  }
  return 0;
}

計算機でサポートする演算子

以下の演算子をサポートする実装例を段階的に示そう。

  1. 単項演算子の単項正負(+ -)、単項論理否定&ビット否定(! ~)、二項演算子の二項加減乗除法(+ - * / %)、関係等号不等号(< <= > >= == !=)、ビットシフト(<< >>)、ビット演算子(& ^ |)、論理積和(&& ||)
  2. 三項演算子の三項条件(?:)
  3. 変数のサポートおよび単項演算子の前後置増減分(++ --)、二項演算子の単純代入(=)、複合代入(+= -+ *= /= %= &= ^= +=)およびカンマ演算子(,)

また、演算は任意の数値型で行えるように実装する。

ここで、int型における1.と2.の演算についてはiexpr、3.についてはicalcというプログラム名をつけるものとする。

ここではセマンティックアクションは使わずに、Universal Treeを用いた AST(Abstract Syntax Tree) の実装を紹介する。

よって、変数のサポートがなされた次節の AST では以下のような代入と三項条件の組み合わせがサポートされる。

i<j ? i++ : i=j

iexpr+utree.cc

int型における1.と2.の演算を行う計算機を Spirit V2+Universal Tree で実装すると以下のようになるだろう。

#include <libgen.h>
#include <iostream>
#include <iomanip>
#include <string>
#include <boost/spirit/include/support_utree.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/qi.hpp>

namespace client {
  namespace qi = boost::spirit::qi;
  namespace ascii = boost::spirit::ascii;
  namespace spirit = boost::spirit;
  namespace phoenix = boost::phoenix;

  template <typename T>
  struct unary_op {
    template <typename T1, typename T2 = void, typename T3 = void>
    struct result { typedef void type; };
    unary_op(unsigned int op) : op(op) {}
    void operator()(T &val, T const &rhs) const
    {
      val.clear();
      val.push_back(op);
      val.push_back(rhs);
    }
    unsigned int op;
  };
  template <typename T>
  struct binary_op : public unary_op<T> {
    binary_op(unsigned int op) : unary_op<T>(op) {}
    void operator()(T &val, T const &rhs) const
    {
      T lhs;
      lhs.swap(val);
      val.push_back(unary_op<T>::op);
      val.push_back(lhs);
      val.push_back(rhs);
    }
  };
  template <typename T>
  struct ternary_op : public unary_op<T> {
    ternary_op(unsigned int op) : unary_op<T>(op) {}
    void operator()(T &val, T const &a, T const &b) const
    {
      T c;
      c.swap(val);
      val.push_back(unary_op<T>::op);
      val.push_back(c);
      val.push_back(a);
      val.push_back(b);
    }
  };
  enum code_type {
    code_negate = 0,
    code_logical_not,
    code_bitwise_not,
    code_plus,
    code_minus,
    code_multiplies,
    code_divides,
    code_modulus,
    code_bitshift_left,
    code_bitshift_right,
    code_less,
    code_less_equal,
    code_greater,
    code_greater_equal,
    code_equal_to,
    code_not_equal_to,
    code_logical_or,
    code_logical_and,
    code_bitwise_or,
    code_bitwise_xor,
    code_bitwise_and,
    code_conditional,
  };
  phoenix::function<unary_op<spirit::utree> > const
    unary_negate(code_negate),
    unary_logical_not(code_logical_not),
    unary_bitwise_not(code_bitwise_not);
  phoenix::function<binary_op<spirit::utree> > const
    binary_plus(code_plus),
    binary_minus(code_minus),
    binary_multiplies(code_multiplies),
    binary_divides(code_divides),
    binary_modulus(code_modulus),
    binary_bitshift_left(code_bitshift_left),
    binary_bitshift_right(code_bitshift_right),
    binary_less(code_less),
    binary_less_equal(code_less_equal),
    binary_greater(code_greater),
    binary_greater_equal(code_greater_equal),
    binary_equal_to(code_equal_to),
    binary_not_equal_to(code_not_equal_to),
    binary_logical_or(code_logical_or),
    binary_logical_and(code_logical_and),
    binary_bitwise_or(code_bitwise_or),
    binary_bitwise_xor(code_bitwise_xor),
    binary_bitwise_and(code_bitwise_and);
  phoenix::function<ternary_op<spirit::utree> > const
    ternary_conditional(code_conditional);

  template <typename Iterator, typename T>
  struct calculator : qi::grammar<Iterator, ascii::space_type, T()> {
    qi::rule<Iterator, ascii::space_type, T()> expr, t[9], term, fctr;
    calculator() : calculator::base_type(expr)
    {
      using qi::uint_;
      using qi::_val;
      using qi::_1;
      using qi::_2;
      using qi::hex;
      using qi::oct;

      expr = t[0][_val = _1] >> *('?' >> expr >>
                                  ':' >> expr)[ternary_conditional(_val, _1, _2)];
      t[0] = t[1][_val = _1] >> *("||"  >> t[1][binary_logical_or(_val, _1)]);
      t[1] = t[2][_val = _1] >> *("&&"  >> t[2][binary_logical_and(_val, _1)]);
      t[2] = t[3][_val = _1] >> *('|'   >> t[3][binary_bitwise_or(_val, _1)]);
      t[3] = t[4][_val = _1] >> *('^'   >> t[4][binary_bitwise_xor(_val, _1)]);
      t[4] = t[5][_val = _1] >> *('&'   >> t[5][binary_bitwise_and(_val, _1)]);
      t[5] = t[6][_val = _1] >> *("<<"  >> t[6][binary_bitshift_left(_val, _1)] |
                                  ">>"  >> t[6][binary_bitshift_right(_val, _1)]);
      t[6] = t[7][_val = _1] >> *('<'   >> t[7][binary_less(_val, _1)] |
                                  '>'   >> t[7][binary_greater(_val, _1)] |
                                  "<="  >> t[7][binary_less_equal(_val, _1)] |
                                  "<="  >> t[7][binary_greater_equal(_val, _1)]);
      t[7] = t[8][_val = _1] >> *("=="  >> t[8][binary_equal_to(_val, _1)] |
                                  "!="  >> t[8][binary_not_equal_to(_val, _1)]);
      t[8] = term[_val = _1] >> *('+'   >> term[binary_plus(_val, _1)] |
                                  '-'   >> term[binary_minus(_val, _1)]);
      term = fctr[_val = _1] >> *('*'   >> fctr[binary_multiplies(_val, _1)] |
                                  '/'   >> fctr[binary_divides(_val, _1)] |
                                  '%'   >> fctr[binary_modulus(_val, _1)]);
      fctr = "0x" >> hex[_val = _1] |
        '0' >> oct[_val = _1] |
        uint_[_val = _1] |
        '(' >> expr[_val = _1] >> ')' |
        '-' >> fctr[unary_negate(_val, _1)] |
        '+' >> fctr[_val = _1] |
        '!' >> fctr[unary_logical_not(_val, _1)] |
        '~' >> fctr[unary_bitwise_not(_val, _1)];
    }
  };
  template <typename T>
  struct calculator_utree_machine {
    calculator_utree_machine() {}
    T execute(const spirit::utree &ut) const
    {
      if (ut.which() != spirit::utree_type::list_type)
        return ut.get<T>();
      typename spirit::utree::const_iterator it=ut.begin();
      switch (it->get<T>()) {
      case code_negate:
        return -execute(*++it);
      case code_logical_not:
        return !execute(*++it);
      case code_bitwise_not:
        return ~execute(*++it);
      case code_plus:
        return execute(*++it) + execute(*++it);
      case code_minus:
        return execute(*++it) - execute(*++it);
      case code_multiplies:
        return execute(*++it) * execute(*++it);
      case code_divides:
        return execute(*++it) / execute(*++it);
      case code_modulus:
        return execute(*++it) % execute(*++it);
      case code_bitshift_left:
        return execute(*++it) << execute(*++it);
      case code_bitshift_right:
        return execute(*++it) >> execute(*++it);
      case code_less:
        return execute(*++it) < execute(*++it);
      case code_less_equal:
        return execute(*++it) <= execute(*++it);
      case code_greater:
        return execute(*++it) > execute(*++it);
      case code_greater_equal:
        return execute(*++it) >= execute(*++it);
      case code_equal_to:
        return execute(*++it) == execute(*++it);
      case code_not_equal_to:
        return execute(*++it) != execute(*++it);
      case code_logical_or:
        return execute(*++it) || execute(*++it);
      case code_logical_and:
        return execute(*++it) && execute(*++it);
      case code_bitwise_or:
        return execute(*++it) | execute(*++it);
      case code_bitwise_xor:
        return execute(*++it) ^ execute(*++it);
      case code_bitwise_and:
        return execute(*++it) & execute(*++it);
      case code_conditional: default:
        if (!execute(*++it)) ++it;
        return execute(*++it);
      }
    }
  };
}

int main(int argc, char *argv[])
{
  int rv = 0;

  typedef std::string::const_iterator iterator_type;
  typedef int value_type;
  client::calculator<iterator_type, boost::spirit::utree> calc;
  std::string str;
  while (std::getline(std::cin, str)) {
    iterator_type it = str.begin(), end = str.end();
    boost::spirit::utree ut;
    bool r = phrase_parse(it, end, calc, boost::spirit::ascii::space, ut);
    client::calculator_utree_machine<value_type> vmachine;
    if (r && it == end) {
      std::cout << "succeeded:\t" << ut << std::endl;
      std::cout << vmachine.execute(ut) << std::endl;
    }
    else {
      std::cout << "failed:\t" << std::string(it, end) << std::endl;
      rv = !0;
    }
  }
  return rv;
}

もし unsigned int 型の演算を行いたければ、main 関数の冒頭を typedef unsigned int value_type; に書き換えれば良い。

クラス utree が使用され、その内部のリスト構造によって AST が構成されていることがわかるだろう。そしてその AST を calculator_utree_machine::execute によって再帰的に演算している。

icalc+utree.cc

さて、iexpr+utree.cc に変数のサポートと代入演算子を導入すると以下のようになるだろう。

#include <libgen.h>
#include <iostream>
#include <iomanip>
#include <string>
#include <map>
#include <boost/spirit/include/support_utree.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/qi.hpp>

namespace client {
  namespace qi = boost::spirit::qi;
  namespace ascii = boost::spirit::ascii;
  namespace spirit = boost::spirit;
  namespace phoenix = boost::phoenix;

  template <typename T>
  struct unary_op {
    template <typename T1, typename T2 = void, typename T3 = void>
    struct result { typedef void type; };
    unary_op(unsigned int op) : op(op) {}
    void operator()(T &val, T const &rhs) const
    {
      val.clear();
      val.push_back(op);
      val.push_back(rhs);
    }
    unsigned int op;
  };
  template <typename T>
  struct binary_op : public unary_op<T> {
    binary_op(unsigned int op) : unary_op<T>(op) {}
    void operator()(T &val, T const &rhs) const
    {
      T lhs;
      lhs.swap(val);
      val.push_back(unary_op<T>::op);
      val.push_back(lhs);
      val.push_back(rhs);
    }
  };
  template <typename T>
  struct ternary_op : public unary_op<T> {
    ternary_op(unsigned int op) : unary_op<T>(op) {}
    void operator()(T &val, T const &a, T const &b) const
    {
      T c;
      c.swap(val);
      val.push_back(unary_op<T>::op);
      val.push_back(c);
      val.push_back(a);
      val.push_back(b);
    }
  };
  enum code_type {
    code_negate = 0,
    code_logical_not,
    code_bitwise_not,
    code_plus,
    code_minus,
    code_multiplies,
    code_divides,
    code_modulus,
    code_bitshift_left,
    code_bitshift_right,
    code_less,
    code_less_equal,
    code_greater,
    code_greater_equal,
    code_equal_to,
    code_not_equal_to,
    code_logical_or,
    code_logical_and,
    code_bitwise_or,
    code_bitwise_xor,
    code_bitwise_and,
    code_assignment_direct,
    code_assignment_summation,
    code_assignment_difference,
    code_assignment_product,
    code_assignment_quotient,
    code_assignment_remainder,
    code_assignment_bitshift_left,
    code_assignment_bitshift_right, 
    code_assignment_bitwise_and,
    code_assignment_bitwise_xor,
    code_assignment_bitwise_or,
    code_prefix_increment,
    code_prefix_decrement,
    code_postfix_increment,
    code_postfix_decrement,
    code_conditional,
    code_binary_comma,
  };
  phoenix::function<unary_op<spirit::utree> > const
    unary_negate(code_negate),
    unary_logical_not(code_logical_not),
    unary_bitwise_not(code_bitwise_not);
  phoenix::function<binary_op<spirit::utree> > const
    binary_plus(code_plus),
    binary_minus(code_minus),
    binary_multiplies(code_multiplies),
    binary_divides(code_divides),
    binary_modulus(code_modulus),
    binary_bitshift_left(code_bitshift_left),
    binary_bitshift_right(code_bitshift_right),
    binary_less(code_less),
    binary_less_equal(code_less_equal),
    binary_greater(code_greater),
    binary_greater_equal(code_greater_equal),
    binary_equal_to(code_equal_to),
    binary_not_equal_to(code_not_equal_to),
    binary_logical_or(code_logical_or),
    binary_logical_and(code_logical_and),
    binary_bitwise_or(code_bitwise_or),
    binary_bitwise_xor(code_bitwise_xor),
    binary_bitwise_and(code_bitwise_and);
  phoenix::function<ternary_op<spirit::utree> > const
    ternary_conditional(code_conditional);
  phoenix::function<binary_op<spirit::utree> > const
    assignment_direct(code_assignment_direct),
    assignment_summation(code_assignment_summation),
    assignment_difference(code_assignment_difference),
    assignment_product(code_assignment_product),
    assignment_quotient(code_assignment_quotient),
    assignment_remainder(code_assignment_remainder),
    assignment_bitshift_left(code_assignment_bitshift_left),
    assignment_bitshift_right(code_assignment_bitshift_right), 
    assignment_bitwise_and(code_assignment_bitwise_and),
    assignment_bitwise_xor(code_assignment_bitwise_xor),
    assignment_bitwise_or(code_assignment_bitwise_or),
    binary_comma(code_binary_comma);
  phoenix::function<unary_op<spirit::utree> > const
    prefix_increment(code_prefix_increment),
    prefix_decrement(code_prefix_decrement),
    postfix_increment(code_postfix_increment),
    postfix_decrement(code_postfix_decrement);

  template <typename Iterator, typename T>
  struct calculator : qi::grammar<Iterator, ascii::space_type, T()> {
    qi::rule<Iterator, ascii::space_type, T()> vars, e[7], expr, t[9], term, fctr;
    calculator() : calculator::base_type(expr)
    {
      using qi::uint_;
      using qi::_val;
      using qi::_1;
      using qi::_2;
      using qi::hex;
      using qi::oct;
      using qi::alpha;
      using qi::alnum;
      using qi::lexeme;
      using qi::as_string;

      vars = as_string[lexeme[(alpha >> *(alnum | '_'))]];
      expr = e[0][_val = _1] >> *(','   >> e[0][binary_comma(_val, _1)]);
      e[0] = e[1] | e[2] | e[3] | e[4] | e[5] | e[6];
      e[1] = vars[_val = _1] >> ('='    >> e[0][assignment_direct(_val, _1)]);
      e[2] = vars[_val = _1] >> ("+="   >> e[0][assignment_summation(_val, _1)] |
                                 "-="   >> e[0][assignment_difference(_val, _1)]);
      e[3] = vars[_val = _1] >> ("*="   >> e[0][assignment_product(_val, _1)] |
                                 "/="   >> e[0][assignment_quotient(_val, _1)] |
                                 "%="   >> e[0][assignment_remainder(_val, _1)]);
      e[4] = vars[_val = _1] >> ("<<="  >> e[0][assignment_bitshift_left(_val, _1)] |
                                 ">>="  >> e[0][assignment_bitshift_right(_val, _1)]);
      e[5] = vars[_val = _1] >> ("&="   >> e[0][assignment_bitwise_and(_val, _1)] |
                                 "^="   >> e[0][assignment_bitwise_xor(_val, _1)] |
                                 "|="   >> e[0][assignment_bitwise_or(_val, _1)]);
      e[6] = t[0][_val = _1] >> *('?' >> e[0] >>
                                  ':' >> e[0])[ternary_conditional(_val, _1, _2)];
      t[0] = t[1][_val = _1] >> *("||"  >> t[1][binary_logical_or(_val, _1)]);
      t[1] = t[2][_val = _1] >> *("&&"  >> t[2][binary_logical_and(_val, _1)]);
      t[2] = t[3][_val = _1] >> *('|'   >> t[3][binary_bitwise_or(_val, _1)]);
      t[3] = t[4][_val = _1] >> *('^'   >> t[4][binary_bitwise_xor(_val, _1)]);
      t[4] = t[5][_val = _1] >> *('&'   >> t[5][binary_bitwise_and(_val, _1)]);
      t[5] = t[6][_val = _1] >> *("<<"  >> t[6][binary_bitshift_left(_val, _1)] |
                                  ">>"  >> t[6][binary_bitshift_right(_val, _1)]);
      t[6] = t[7][_val = _1] >> *('<'   >> t[7][binary_less(_val, _1)] |
                                  '>'   >> t[7][binary_greater(_val, _1)] |
                                  "<="  >> t[7][binary_less_equal(_val, _1)] |
                                  "<="  >> t[7][binary_greater_equal(_val, _1)]);
      t[7] = t[8][_val = _1] >> *("=="  >> t[8][binary_equal_to(_val, _1)] |
                                  "!="  >> t[8][binary_not_equal_to(_val, _1)]);
      t[8] = term[_val = _1] >> *('+'   >> term[binary_plus(_val, _1)] |
                                  '-'   >> term[binary_minus(_val, _1)]);
      term = fctr[_val = _1] >> *('*'   >> fctr[binary_multiplies(_val, _1)] |
                                  '/'   >> fctr[binary_divides(_val, _1)] |
                                  '%'   >> fctr[binary_modulus(_val, _1)]);
      fctr = "0x" >> hex[_val = _1] |
        '0' >> oct[_val = _1] |
        uint_[_val = _1] |
        "++" >> vars[prefix_increment(_val, _1)] |
        "--" >> vars[prefix_decrement(_val, _1)] |
        vars[postfix_increment(_val, _1)] >> "++" |
        vars[postfix_decrement(_val, _1)] >> "--" |
        vars[_val = _1] |
        '(' >> expr[_val = _1] >> ')' |
        '-' >> fctr[unary_negate(_val, _1)] |
        '+' >> fctr[_val = _1] |
        '!' >> fctr[unary_logical_not(_val, _1)] |
        '~' >> fctr[unary_bitwise_not(_val, _1)];
    }
  };
  template <typename T>
  struct calculator_utree_machine {
    std::map<std::string, T> &vars;
    calculator_utree_machine(std::map<std::string, T> &vars) : vars(vars) {}
    T execute(const spirit::utree &ut)
    {
      spirit::utf8_string_range_type r;
      if (ut.which() != spirit::utree_type::list_type) {
        if (ut.which() != spirit::utree_type::int_type) {
          r = ut.get<spirit::utf8_string_range_type>();
          if (vars.find(std::string(r.begin(), r.end())) == vars.end())
            vars[std::string(r.begin(), r.end())] = 0;
          return vars[std::string(r.begin(), r.end())];
        }
        return ut.get<T>();
      }
      typename spirit::utree::const_iterator it=ut.begin();
      switch (it->get<T>()) {
      case code_negate:
        return -execute(*++it);
      case code_logical_not:
        return !execute(*++it);
      case code_bitwise_not:
        return ~execute(*++it);
      case code_plus:
        return execute(*++it) + execute(*++it);
      case code_minus:
        return execute(*++it) - execute(*++it);
      case code_multiplies:
        return execute(*++it) * execute(*++it);
      case code_divides:
        return execute(*++it) / execute(*++it);
      case code_modulus:
        return execute(*++it) % execute(*++it);
      case code_bitshift_left:
        return execute(*++it) << execute(*++it);
      case code_bitshift_right:
        return execute(*++it) >> execute(*++it);
      case code_less:
        return execute(*++it) < execute(*++it);
      case code_less_equal:
        return execute(*++it) <= execute(*++it);
      case code_greater:
        return execute(*++it) > execute(*++it);
      case code_greater_equal:
        return execute(*++it) >= execute(*++it);
      case code_equal_to:
        return execute(*++it) == execute(*++it);
      case code_not_equal_to:
        return execute(*++it) != execute(*++it);
      case code_logical_or:
        return execute(*++it) || execute(*++it);
      case code_logical_and:
        return execute(*++it) && execute(*++it);
      case code_bitwise_or:
        return execute(*++it) | execute(*++it);
      case code_bitwise_xor:
        return execute(*++it) ^ execute(*++it);
      case code_bitwise_and:
        return execute(*++it) & execute(*++it);
      case code_assignment_direct:
        r = (*++it).get<spirit::utf8_string_range_type>();
        return (vars[std::string(r.begin(), r.end())] = execute(*++it));
      case code_assignment_summation:
        r = (*++it).get<spirit::utf8_string_range_type>();
        return (vars[std::string(r.begin(), r.end())] += execute(*++it));
      case code_assignment_difference:
        r = (*++it).get<spirit::utf8_string_range_type>();
        return (vars[std::string(r.begin(), r.end())] -= execute(*++it));
      case code_assignment_product:
        r = (*++it).get<spirit::utf8_string_range_type>();
        return (vars[std::string(r.begin(), r.end())] *= execute(*++it));
      case code_assignment_quotient:
        r = (*++it).get<spirit::utf8_string_range_type>();
        return (vars[std::string(r.begin(), r.end())] /= execute(*++it));
      case code_assignment_remainder:
        r = (*++it).get<spirit::utf8_string_range_type>();
        return (vars[std::string(r.begin(), r.end())] %= execute(*++it));
      case code_assignment_bitshift_left:
        r = (*++it).get<spirit::utf8_string_range_type>();
        return (vars[std::string(r.begin(), r.end())] <<= execute(*++it));
      case code_assignment_bitshift_right:
        r = (*++it).get<spirit::utf8_string_range_type>();
        return (vars[std::string(r.begin(), r.end())] >>= execute(*++it));
      case code_assignment_bitwise_and:
        r = (*++it).get<spirit::utf8_string_range_type>();
        return (vars[std::string(r.begin(), r.end())] &= execute(*++it));
      case code_assignment_bitwise_xor:
        r = (*++it).get<spirit::utf8_string_range_type>();
        return (vars[std::string(r.begin(), r.end())] ^= execute(*++it));
      case code_assignment_bitwise_or:
        r = (*++it).get<spirit::utf8_string_range_type>();
        return (vars[std::string(r.begin(), r.end())] |= execute(*++it));
      case code_prefix_increment:
        r = (*++it).get<spirit::utf8_string_range_type>();
        return ++vars[std::string(r.begin(), r.end())];
      case code_prefix_decrement:
        r = (*++it).get<spirit::utf8_string_range_type>();
        return --vars[std::string(r.begin(), r.end())];
      case code_postfix_increment:
        r = (*++it).get<spirit::utf8_string_range_type>();
        return vars[std::string(r.begin(), r.end())]++;
      case code_postfix_decrement:
        r = (*++it).get<spirit::utf8_string_range_type>();
        return vars[std::string(r.begin(), r.end())]--;
      case code_binary_comma:
        return execute(*++it), execute(*++it);
      case code_conditional: default:
        if (!execute(*++it)) ++it;
        return execute(*++it);
      }
    }
  };
}

int main(int argc, char *argv[])
{
  int rv = 0;

  typedef std::string::const_iterator iterator_type;
  typedef int value_type;
  client::calculator<iterator_type, boost::spirit::utree> calc;
  std::map<std::string, value_type> vars;
  std::string str;
  while (std::getline(std::cin, str)) {
    iterator_type it = str.begin(), end = str.end();
    boost::spirit::utree ut;
    bool r = phrase_parse(it, end, calc, boost::spirit::ascii::space, ut);
    client::calculator_utree_machine<value_type> vmachine(vars);
    if (r && it == end) {
      std::cout << "succeeded:\t" << ut << std::endl;
      std::cout << vmachine.execute(ut) << std::endl;
    }
    else {
      std::cout << "failed:\t" << std::string(it, end) << std::endl;
      rv = !0;
    }
  }
  return rv;
}

同様に、もし unsigned int 型の演算を行いたければ、main 関数の冒頭を typedef unsigned int value_type; に書き換えれば良い。

ちなみに、この実装では代入時に未使用の変数を新たに使用することができ、変数の参照時に未使用の変数を使用すると零が代入されているものとする。

diff -u iexpr+utree.cc icalc+utree.cc

上述の二つのプログラムリストの掲示だけでは、どのように変数のサポートと代入演算子を導入したかが分かり辛いので、これらの Unified diff 形式をみてみよう。

--- iexpr+utree.cc
+++ icalc+utree.cc
@@ -2,6 +2,7 @@
 #include <iostream>
 #include <iomanip>
 #include <string>
+#include <map>
 #include <boost/spirit/include/support_utree.hpp>
 #include <boost/spirit/include/phoenix_operator.hpp>
 #include <boost/spirit/include/qi.hpp>
@@ -72,7 +73,23 @@
     code_bitwise_or,
     code_bitwise_xor,
     code_bitwise_and,
+    code_assignment_direct,
+    code_assignment_summation,
+    code_assignment_difference,
+    code_assignment_product,
+    code_assignment_quotient,
+    code_assignment_remainder,
+    code_assignment_bitshift_left,
+    code_assignment_bitshift_right, 
+    code_assignment_bitwise_and,
+    code_assignment_bitwise_xor,
+    code_assignment_bitwise_or,
+    code_prefix_increment,
+    code_prefix_decrement,
+    code_postfix_increment,
+    code_postfix_decrement,
     code_conditional,
+    code_binary_comma,
   };
   phoenix::function<unary_op<spirit::utree> > const
   unary_negate(code_negate),
@@ -99,10 +116,28 @@
     binary_bitwise_and(code_bitwise_and);
   phoenix::function<ternary_op<spirit::utree> > const
   ternary_conditional(code_conditional);
+  phoenix::function<binary_op<spirit::utree> > const
+    assignment_direct(code_assignment_direct),
+    assignment_summation(code_assignment_summation),
+    assignment_difference(code_assignment_difference),
+    assignment_product(code_assignment_product),
+    assignment_quotient(code_assignment_quotient),
+    assignment_remainder(code_assignment_remainder),
+    assignment_bitshift_left(code_assignment_bitshift_left),
+    assignment_bitshift_right(code_assignment_bitshift_right), 
+    assignment_bitwise_and(code_assignment_bitwise_and),
+    assignment_bitwise_xor(code_assignment_bitwise_xor),
+    assignment_bitwise_or(code_assignment_bitwise_or),
+    binary_comma(code_binary_comma);
+  phoenix::function<unary_op<spirit::utree> > const
+    prefix_increment(code_prefix_increment),
+    prefix_decrement(code_prefix_decrement),
+    postfix_increment(code_postfix_increment),
+    postfix_decrement(code_postfix_decrement);
 
   template <typename Iterator, typename T>
   struct calculator : qi::grammar<Iterator, ascii::space_type, T()> {
-    qi::rule<Iterator, ascii::space_type, T()> expr, t[9], term, fctr;
+    qi::rule<Iterator, ascii::space_type, T()> vars, e[7], expr, t[9], term, fctr;
     calculator() : calculator::base_type(expr)
     {
       using qi::uint_;
@@ -111,9 +146,27 @@
       using qi::_2;
       using qi::hex;
       using qi::oct;
+      using qi::alpha;
+      using qi::alnum;
+      using qi::lexeme;
+      using qi::as_string;
 
-      expr = t[0][_val = _1] >> *('?' >> expr >>
-                                  ':' >> expr)[ternary_conditional(_val, _1, _2)];
+      vars = as_string[lexeme[(alpha >> *(alnum | '_'))]];
+      expr = e[0][_val = _1] >> *(','   >> e[0][binary_comma(_val, _1)]);
+      e[0] = e[1] | e[2] | e[3] | e[4] | e[5] | e[6];
+      e[1] = vars[_val = _1] >> ('='    >> e[0][assignment_direct(_val, _1)]);
+      e[2] = vars[_val = _1] >> ("+="   >> e[0][assignment_summation(_val, _1)] |
+                                 "-="   >> e[0][assignment_difference(_val, _1)]);
+      e[3] = vars[_val = _1] >> ("*="   >> e[0][assignment_product(_val, _1)] |
+                                 "/="   >> e[0][assignment_quotient(_val, _1)] |
+                                 "%="   >> e[0][assignment_remainder(_val, _1)]);
+      e[4] = vars[_val = _1] >> ("<<="  >> e[0][assignment_bitshift_left(_val, _1)] |
+                                 ">>="  >> e[0][assignment_bitshift_right(_val, _1)]);
+      e[5] = vars[_val = _1] >> ("&="   >> e[0][assignment_bitwise_and(_val, _1)] |
+                                 "^="   >> e[0][assignment_bitwise_xor(_val, _1)] |
+                                 "|="   >> e[0][assignment_bitwise_or(_val, _1)]);
+      e[6] = t[0][_val = _1] >> *('?' >> e[0] >>
+                                  ':' >> e[0])[ternary_conditional(_val, _1, _2)];
       t[0] = t[1][_val = _1] >> *("||"  >> t[1][binary_logical_or(_val, _1)]);
       t[1] = t[2][_val = _1] >> *("&&"  >> t[2][binary_logical_and(_val, _1)]);
       t[2] = t[3][_val = _1] >> *('|'   >> t[3][binary_bitwise_or(_val, _1)]);
@@ -135,6 +188,11 @@
       fctr = "0x" >> hex[_val = _1] |
         '0' >> oct[_val = _1] |
         uint_[_val = _1] |
+        "++" >> vars[prefix_increment(_val, _1)] |
+        "--" >> vars[prefix_decrement(_val, _1)] |
+        vars[postfix_increment(_val, _1)] >> "++" |
+        vars[postfix_decrement(_val, _1)] >> "--" |
+        vars[_val = _1] |
         '(' >> expr[_val = _1] >> ')' |
         '-' >> fctr[unary_negate(_val, _1)] |
         '+' >> fctr[_val = _1] |
@@ -144,11 +202,20 @@
   };
   template <typename T>
   struct calculator_utree_machine {
-    calculator_utree_machine() {}
-    T execute(const spirit::utree &ut) const
+    std::map<std::string, T> &vars;
+    calculator_utree_machine(std::map<std::string, T> &vars) : vars(vars) {}
+    T execute(const spirit::utree &ut)
     {
-      if (ut.which() != spirit::utree_type::list_type)
+      spirit::utf8_string_range_type r;
+      if (ut.which() != spirit::utree_type::list_type) {
+        if (ut.which() != spirit::utree_type::int_type) {
+          r = ut.get<spirit::utf8_string_range_type>();
+          if (vars.find(std::string(r.begin(), r.end())) == vars.end())
+            vars[std::string(r.begin(), r.end())] = 0;
+          return vars[std::string(r.begin(), r.end())];
+        }
         return ut.get<T>();
+      }
       typename spirit::utree::const_iterator it=ut.begin();
       switch (it->get<T>()) {
       case code_negate:
@@ -193,6 +260,53 @@
         return execute(*++it) ^ execute(*++it);
       case code_bitwise_and:
         return execute(*++it) & execute(*++it);
+      case code_assignment_direct:
+        r = (*++it).get<spirit::utf8_string_range_type>();
+        return (vars[std::string(r.begin(), r.end())] = execute(*++it));
+      case code_assignment_summation:
+        r = (*++it).get<spirit::utf8_string_range_type>();
+        return (vars[std::string(r.begin(), r.end())] += execute(*++it));
+      case code_assignment_difference:
+        r = (*++it).get<spirit::utf8_string_range_type>();
+        return (vars[std::string(r.begin(), r.end())] -= execute(*++it));
+      case code_assignment_product:
+        r = (*++it).get<spirit::utf8_string_range_type>();
+        return (vars[std::string(r.begin(), r.end())] *= execute(*++it));
+      case code_assignment_quotient:
+        r = (*++it).get<spirit::utf8_string_range_type>();
+        return (vars[std::string(r.begin(), r.end())] /= execute(*++it));
+      case code_assignment_remainder:
+        r = (*++it).get<spirit::utf8_string_range_type>();
+        return (vars[std::string(r.begin(), r.end())] %= execute(*++it));
+      case code_assignment_bitshift_left:
+        r = (*++it).get<spirit::utf8_string_range_type>();
+        return (vars[std::string(r.begin(), r.end())] <<= execute(*++it));
+      case code_assignment_bitshift_right:
+        r = (*++it).get<spirit::utf8_string_range_type>();
+        return (vars[std::string(r.begin(), r.end())] >>= execute(*++it));
+      case code_assignment_bitwise_and:
+        r = (*++it).get<spirit::utf8_string_range_type>();
+        return (vars[std::string(r.begin(), r.end())] &= execute(*++it));
+      case code_assignment_bitwise_xor:
+        r = (*++it).get<spirit::utf8_string_range_type>();
+        return (vars[std::string(r.begin(), r.end())] ^= execute(*++it));
+      case code_assignment_bitwise_or:
+        r = (*++it).get<spirit::utf8_string_range_type>();
+        return (vars[std::string(r.begin(), r.end())] |= execute(*++it));
+      case code_prefix_increment:
+        r = (*++it).get<spirit::utf8_string_range_type>();
+        return ++vars[std::string(r.begin(), r.end())];
+      case code_prefix_decrement:
+        r = (*++it).get<spirit::utf8_string_range_type>();
+        return --vars[std::string(r.begin(), r.end())];
+      case code_postfix_increment:
+        r = (*++it).get<spirit::utf8_string_range_type>();
+        return vars[std::string(r.begin(), r.end())]++;
+      case code_postfix_decrement:
+        r = (*++it).get<spirit::utf8_string_range_type>();
+        return vars[std::string(r.begin(), r.end())]--;
+      case code_binary_comma:
+        return execute(*++it), execute(*++it);
       case code_conditional: default:
         if (!execute(*++it)) ++it;
         return execute(*++it);
@@ -208,12 +322,13 @@
   typedef std::string::const_iterator iterator_type;
   typedef int value_type;
   client::calculator<iterator_type, boost::spirit::utree> calc;
+  std::map<std::string, value_type> vars;
   std::string str;
   while (std::getline(std::cin, str)) {
     iterator_type it = str.begin(), end = str.end();
     boost::spirit::utree ut;
     bool r = phrase_parse(it, end, calc, boost::spirit::ascii::space, ut);
-    client::calculator_utree_machine<value_type> vmachine;
+    client::calculator_utree_machine<value_type> vmachine(vars);
     if (r && it == end) {
       std::cout << "succeeded:\t" << ut << std::endl;
       std::cout << vmachine.execute(ut) << std::endl;

変数のサポートのために、std::map<std::string, value_type> vars が導入され、クラス utreespirit::utf8_string_range_type 型として変数が格納されていることがわかるだろう。


Boost Spirit V2 における計算機の例

by taiji at 2013/04/04 12:52

Boost Spirit のサンプルにはいくつもの計算機の例があるが、四則演算以外をサポートした簡潔な好例がなく、ここで実装例を紹介する価値はありそうである。

本稿では Spirit V2 での例を紹介していく。

Spirit V2 の基本形

Spirit V2 の基本形は以下のようになる。

#include <iostream>
#include <string>
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

namespace client {
  template <typename Iterator>
  struct a_parser : qi::grammar<Iterator> {
    qi::rule<Iterator> top;
    a_parser() : a_parser::base_type(top)
    {
      top = *~qi::lit('\t'); // ここにパーサ規則を書く

      BOOST_SPIRIT_DEBUG_NODE(top);
    }
  };
}
int main()
{
  client::a_parser<std::string::const_iterator> a_parser;
  std::string str;
  while (std::getline(std::cin, str)) {
    std::string::const_iterator it = str.begin(), end = str.end();
    bool r = phrase_parse(it, end, a_parser, qi::ascii::space);
    if (r && it == end) {
      std::cout << "succeeded" << std::endl;
    }
    else {
      std::cout << "failed:\t" << std::string(it, end) << std::endl;
      return 1;
    }
  }
  return 0;
}

計算機でサポートする演算子

以下の演算子をサポートする実装例を段階的に示そう。

  1. 単項演算子の単項正負(+ -)、単項論理否定&ビット否定(! ~)、二項演算子の二項加減乗除法(+ - * / %)、関係等号不等号(< <= > >= == !=)、ビットシフト(<< >>)、ビット演算子(& ^ |)、論理積和(&& ||)
  2. 三項演算子の三項条件(?:)
  3. 変数のサポートおよび単項演算子の前後置増減分(++ --)、二項演算子の単純代入(=)、複合代入(+= -+ *= /= %= &= ^= +=)およびカンマ演算子(,)

また、演算は任意の数値型で行えるように実装する。

ここで、int型における1.と2.の演算についてはiexpr、3.についてはicalcというプログラム名をつけるものとする。

ここでは敢えてAST(Abstract Syntax Tree)による例は次節にゆずり、セマンティックアクションで計算機を実装する例を紹介する。セマンティックアクションでは構文解析しながら演算を行うので、例えば、以下のような代入と三項条件の組み合わせについては困難を伴うので、次節の AST でサポートするものとする。

i<j ? i++ : i=j

iexpr.cc

int型における1.と2.の演算を行う計算機を Spirit V2 で実装すると以下のようになるだろう。

#include <iostream>
#include <iomanip>
#include <string>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>

namespace client {
  namespace qi = boost::spirit::qi;
  namespace ascii = boost::spirit::ascii;

  template <typename Iterator, typename T>
  struct calculator : qi::grammar<Iterator, T(), ascii::space_type> {
    qi::rule<Iterator, T(), ascii::space_type> expr, t[9], term, fctr;
    calculator() : calculator::base_type(expr)
    {
      using qi::_val;
      using qi::_1;
      using qi::_2;
      using qi::uint_;
      using qi::hex;
      using qi::oct;

#if !0 /* ternary operator */
      expr = t[0][_val = _1] >> *('?' >> expr >>
                                  ':' >> expr)[_val = if_else(_val, _1, _2)];
#else
      expr = t[0][_val = _1];
#endif
      t[0] = t[1][_val = _1] >> *("||"  >> t[1][_val = _val || _1]);
      t[1] = t[2][_val = _1] >> *("&&"  >> t[2][_val = _val && _1]);
      t[2] = t[3][_val = _1] >> *('|'   >> t[3][_val |= _1]);
      t[3] = t[4][_val = _1] >> *('^'   >> t[4][_val ^= _1]);
      t[4] = t[5][_val = _1] >> *('&'   >> t[5][_val &= _1]);
      t[5] = t[6][_val = _1] >> *("<<"  >> t[6][_val <<= _1] |
                                  ">>"  >> t[6][_val >>= _1]);
      t[6] = t[7][_val = _1] >> *('<'   >> t[7][_val = _val < _1] |
                                  '>'   >> t[7][_val = _val > _1] |
                                  "<="  >> t[7][_val = _val <= _1] |
                                  ">="  >> t[7][_val = _val >= _1]);
      t[7] = t[8][_val = _1] >> *("=="  >> t[8][_val = _val == _1] |
                                  "!="  >> t[8][_val = _val != _1]);
      t[8] = term[_val = _1] >> *('+'   >> term[_val += _1] |
                                  '-'   >> term[_val -= _1]);
      term = fctr[_val = _1] >> *('*'   >> fctr[_val *= _1] |
                                  '/'   >> fctr[_val /= _1] |
                                  '%'   >> fctr[_val %= _1]);
      fctr = "0x" >> hex[_val = _1] |
        '0' >> oct[_val = _1] |
        uint_[_val = _1] |
        '(' >> expr[_val = _1] >> ')' |
        '-'     >> fctr[_val = -_1] |
        '+'     >> fctr[_val = _1] |
        '!'     >> fctr[_val = !_1] |
        '~'     >> fctr[_val = ~_1];
    }
  };
}
int main(int argc, char *argv[])
{
  int rv = 0;

  typedef std::string::const_iterator iterator_type;
  typedef int value_type;
  client::calculator<iterator_type, value_type> calc;
  std::string str;
  value_type result;
  while (std::getline(std::cin, str)) {
    std::string::const_iterator it = str.begin(), end = str.end();
    bool r = phrase_parse(it, end, calc, boost::spirit::ascii::space, result);
    if (r && it == end) {
      std::cout << "succeeded:\t" << std::endl;
      std::cout << result << std::endl;
    }
    else {
      std::cout << "failed:\t" << std::string(it, end) << std::endl;
      rv = !0;
    }
  }
  return rv;
}

もし unsigned int 型の演算を行いたければ、main 関数の冒頭を typedef unsigned int value_type; に書き換えれば良い。

さて、これは Boost Phoenix のおかげで極めて簡便に書かれているが、このまま変数のサポートと代入演算子を導入するのは困難である。よって、Boost Variant による AST の実装に切り換える。

iexpr+variant.cc

#include <iostream>
#include <iomanip>
#include <string>
#include <boost/variant.hpp>
#include <vector>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace client {
  struct nulary_op {};
  struct unary_op;
  struct binary_op;
  struct ternary_op;

  struct astree {
    typedef boost::variant<
      nulary_op,
      boost::recursive_wrapper<unary_op>,
      boost::recursive_wrapper<binary_op>,
      boost::recursive_wrapper<ternary_op>,
      int
    > type;
    typedef type::types types;
    enum {
      nulary_op_type,
      unary_op_type,
      binary_op_type,
      ternary_op_type,
      value_type,
    };

    type expr;

    int which() const { return expr.which(); }

    astree()                            : expr(nulary_op()) {}
    astree(unary_op const &expr)        : expr(expr) {}
    astree(binary_op const &expr)       : expr(expr) {}
    astree(ternary_op const &expr)      : expr(expr) {}
    astree(unsigned int expr)           : expr(expr) {}
    astree(type const &expr)            : expr(expr) {}

    astree &ternary_conditional(astree const &a, astree const &b);

    astree &binary_logical_or(astree const &rhs);
    astree &binary_logical_and(astree const &rhs);

    astree &binary_bitwise_or(astree const &rhs);
    astree &binary_bitwise_xor(astree const &rhs);
    astree &binary_bitwise_and(astree const &rhs);
    astree &binary_bitshift_left(astree const &rhs);
    astree &binary_bitshift_right(astree const &rhs);

    astree &binary_less(astree const &rhs);
    astree &binary_less_equal(astree const &rhs);
    astree &binary_greater(astree const &rhs);
    astree &binary_greater_equal(astree const &rhs);
    astree &binary_equal_to(astree const &rhs);
    astree &binary_not_equal_to(astree const &rhs);

    astree &binary_plus(astree const &rhs);
    astree &binary_minus(astree const &rhs);
    astree &binary_multiplies(astree const &rhs);
    astree &binary_divides(astree const &rhs);
    astree &binary_modulus(astree const &rhs);

    astree &unary_negate();
    astree &unary_logical_not();
    astree &unary_bitwise_not();
  };
} // namespace client

namespace boost {
  template <typename T>
  inline
  T get(client::astree const &expr)
  {
    return boost::get<T>(expr.expr);
  }
}

namespace client {
  enum byte_code {
    code_negate = 1,
    code_logical_not,
    code_bitwise_not,
    code_plus,
    code_minus,
    code_multiplies,
    code_divides,
    code_modulus,
    code_bitshift_left,
    code_bitshift_right,
    code_less,
    code_less_equal,
    code_greater,
    code_greater_equal,
    code_equal_to,
    code_not_equal_to,
    code_logical_or,
    code_logical_and,
    code_bitwise_or,
    code_bitwise_xor,
    code_bitwise_and,
    code_ternary_conditional,
    code_value,
  };

  struct ternary_op {
    ternary_op() {}
    ternary_op(int op,
               astree const &c,
               astree const &a,
               astree const &b) : op(op), c(c), a(a), b(b) {}
    int op;
    astree c, a, b;
  };
  struct binary_op {
    binary_op() {}
    binary_op(int op,
              astree const &lhs,
              astree const &rhs) : op(op), lhs(lhs), rhs(rhs) {}
    int op;
    astree lhs, rhs;
  };
  struct unary_op {
    unary_op() {}
    unary_op(int op,
             astree const &rhs) : op(op), rhs(rhs) {}
    int op;
    astree rhs;
  };

  inline
  astree &astree::ternary_conditional(astree const &a, astree const &b)
  { expr = ternary_op(code_ternary_conditional, expr, a, b); return *this; }

  inline
  astree &astree::binary_logical_or(astree const &rhs)
  { expr = binary_op(code_logical_or,   expr, rhs); return *this; }
  inline
  astree &astree::binary_logical_and(astree const &rhs)
  { expr = binary_op(code_logical_and,  expr, rhs); return *this; }

  inline
  astree &astree::binary_bitwise_or(astree const &rhs)
  { expr = binary_op(code_bitwise_or,   expr, rhs); return *this; }
  inline
  astree &astree::binary_bitwise_xor(astree const &rhs)
  { expr = binary_op(code_bitwise_xor,  expr, rhs); return *this; }
  inline
  astree &astree::binary_bitwise_and(astree const &rhs)
  { expr = binary_op(code_bitwise_and,  expr, rhs); return *this; }
  inline
  astree &astree::binary_bitshift_left(astree const &rhs)
  { expr = binary_op(code_bitshift_left,        expr, rhs); return *this; }
  inline
  astree &astree::binary_bitshift_right(astree const &rhs)
  { expr = binary_op(code_bitshift_right,       expr, rhs); return *this; }

  inline
  astree &astree::binary_less(astree const &rhs)
  { expr = binary_op(code_less,         expr, rhs); return *this; }
  inline
  astree &astree::binary_less_equal(astree const &rhs)
  { expr = binary_op(code_less_equal,   expr, rhs); return *this; }
  inline
  astree &astree::binary_greater(astree const &rhs)
  { expr = binary_op(code_greater,      expr, rhs); return *this; }
  inline
  astree &astree::binary_greater_equal(astree const &rhs)
  { expr = binary_op(code_greater_equal,        expr, rhs); return *this; }
  inline
  astree &astree::binary_equal_to(astree const &rhs)
  { expr = binary_op(code_equal_to,     expr, rhs); return *this; }
  inline
  astree &astree::binary_not_equal_to(astree const &rhs)
  { expr = binary_op(code_not_equal_to, expr, rhs); return *this; }

  inline
  astree &astree::binary_plus(astree const &rhs)
  { expr = binary_op(code_plus,         expr, rhs); return *this; }
  inline
  astree &astree::binary_minus(astree const &rhs)
  { expr = binary_op(code_minus,        expr, rhs); return *this; }
  inline
  astree &astree::binary_multiplies(astree const &rhs)
  { expr = binary_op(code_multiplies,   expr, rhs); return *this; }
  inline
  astree &astree::binary_divides(astree const &rhs)
  { expr = binary_op(code_divides,      expr, rhs); return *this; }
  inline
  astree &astree::binary_modulus(astree const &rhs)
  { expr = binary_op(code_modulus,      expr, rhs); return *this; }

  inline
  astree &astree::unary_negate()
  { expr = unary_op(code_negate,        expr); return *this; }
  inline
  astree &astree::unary_logical_not()
  { expr = unary_op(code_logical_not,   expr); return *this; }
  inline
  astree &astree::unary_bitwise_not()
  { expr = unary_op(code_bitwise_not,   expr); return *this; }
}

namespace client {

  namespace qi = boost::spirit::qi;
  namespace ascii = boost::spirit::ascii;

  template <typename Iterator>
  struct calculator : qi::grammar<Iterator, astree(), ascii::space_type> {
    qi::rule<Iterator, astree(), ascii::space_type> expr, t[9], term, fctr;
    calculator() : calculator::base_type(expr)
    {
      using qi::_val;
      using qi::_1;
      using qi::_2;
      using qi::hex;
      using qi::oct;
      using qi::uint_;

      expr = t[0][_val = _1] >> *('?' >> expr >>
                                  ':' >> expr)[_val = bind(&astree::ternary_conditional, _val, _1, _2)];
      t[0] = t[1][_val = _1] >> *("||"  >> t[1][_val = bind(&astree::binary_logical_or, _val, _1)]);
      t[1] = t[2][_val = _1] >> *("&&"  >> t[2][_val = bind(&astree::binary_logical_and, _val, _1)]);
      t[2] = t[3][_val = _1] >> *('|'   >> t[3][_val = bind(&astree::binary_bitwise_or, _val, _1)]);
      t[3] = t[4][_val = _1] >> *('^'   >> t[4][_val = bind(&astree::binary_bitwise_xor, _val, _1)]);
      t[4] = t[5][_val = _1] >> *('&'   >> t[5][_val = bind(&astree::binary_bitwise_and, _val, _1)]);
      t[5] = t[6][_val = _1] >> *("<<"  >> t[6][_val = bind(&astree::binary_bitshift_left, _val, _1)] |
                                  ">>"  >> t[6][_val = bind(&astree::binary_bitshift_right, _val, _1)]);
      t[6] = t[7][_val = _1] >> *('<'   >> t[7][_val = bind(&astree::binary_less, _val, _1)] |
                                  '>'   >> t[7][_val = bind(&astree::binary_greater, _val, _1)] |
                                  "<="  >> t[7][_val = bind(&astree::binary_less_equal, _val, _1)] |
                                  ">="  >> t[7][_val = bind(&astree::binary_greater_equal, _val, _1)]);
      t[7] = t[8][_val = _1] >> *("=="  >> t[8][_val = bind(&astree::binary_equal_to, _val, _1)] |
                                  "!="  >> t[8][_val = bind(&astree::binary_not_equal_to, _val, _1)]);
      t[8] = term[_val = _1] >> *('+'   >> term[_val = bind(&astree::binary_plus, _val, _1)] |
                                  '-'   >> term[_val = bind(&astree::binary_minus, _val, _1)]);
      term = fctr[_val = _1] >> *('*'   >> fctr[_val = bind(&astree::binary_multiplies, _val, _1)] |
                                  '/'   >> fctr[_val = bind(&astree::binary_divides, _val, _1)] |
                                  '%'   >> fctr[_val = bind(&astree::binary_modulus, _val, _1)]);
      fctr = "0x" >> hex[_val = _1] |
        '0' >> oct[_val = _1] |
        uint_[_val = _1] |
        '(' >> expr[_val = _1] >> ')' |
        '-' >> fctr[_val = bind(&astree::unary_negate, _1)] |
        '+' >> fctr[_val = _1] |
        '!' >> fctr[_val = bind(&astree::unary_logical_not, _1)] |
        '~' >> fctr[_val = bind(&astree::unary_bitwise_not, _1)];
    }
  };

  template <typename T>
  struct calculator_astree_machine {
    calculator_astree_machine() {}
    T execute(const astree &ast)
    {
      if (ast.which() == astree::value_type ||
          false) {
        return (T)boost::get<int>(ast); // !
      }
      T lhs, rhs, c, a, b;
      switch (ast.which()) {
      case astree::unary_op_type:
        rhs = execute(boost::get<unary_op>(ast).rhs);
        switch (boost::get<unary_op>(ast).op) {
        case code_negate:       return -rhs;
        case code_logical_not:  return !rhs;
        case code_bitwise_not:  return ~rhs;
        }
        break;
      case astree::binary_op_type:
        lhs = execute(boost::get<binary_op>(ast).lhs);
        rhs = execute(boost::get<binary_op>(ast).rhs);
        switch (boost::get<binary_op>(ast).op) {
        case code_plus:         return lhs + rhs;
        case code_minus:        return lhs - rhs;
        case code_multiplies:   return lhs * rhs;
        case code_divides:      return lhs / rhs;
        case code_modulus:      return lhs % rhs;
        case code_bitshift_left:        return lhs << rhs;
        case code_bitshift_right:       return lhs >> rhs;
        case code_less:         return lhs < rhs;
        case code_less_equal:   return lhs <= rhs;
        case code_greater:      return lhs > rhs;
        case code_greater_equal:        return lhs >= rhs;
        case code_equal_to:     return lhs == rhs;
        case code_not_equal_to: return lhs != rhs;
        case code_logical_or:   return lhs || rhs;
        case code_logical_and:  return lhs && rhs;
        case code_bitwise_or:   return lhs | rhs;
        case code_bitwise_xor:  return lhs ^ rhs;
        case code_bitwise_and:  return lhs & rhs;
        }
        break;
      case astree::ternary_op_type:
        c = execute(boost::get<ternary_op>(ast).c);
        a = execute(boost::get<ternary_op>(ast).a);
        b = execute(boost::get<ternary_op>(ast).b);
        switch (boost::get<ternary_op>(ast).op) {
        case code_ternary_conditional:  return c ? a : b;
        }
        break;
      }
      return T(0);
    }
  };
} // namespace client

int main(int argc, char *argv[])
{
  int rv = 0;

  typedef std::string::const_iterator iterator_type;
  typedef int value_type;
  client::calculator<iterator_type> calc;
  std::string str;
  client::astree result;
  value_type a;
  while (std::getline(std::cin, str)) {
    iterator_type it = str.begin(), end = str.end();
    bool r = phrase_parse(it, end, calc, boost::spirit::ascii::space, result);
    client::calculator_astree_machine<value_type> vmachine;
    a = vmachine.execute(result);
    if (r && it == end) {
      std::cout << "succeeded:\t" << std::endl;
      std::cout << a << std::endl;
    }
    else {
      std::cout << "failed:\t" << std::string(it, end) << std::endl;
      rv = !0;
    }
  }
  return rv;
}

同様に、もし unsigned int 型の演算を行いたければ、main 関数の冒頭を typedef unsigned int value_type; に書き換えれば良い。

クラス astree が導入され、その boost::variant 型のメンバ変数 expr が AST を構成していることがわかるだろう。そしてその AST を calculator_astree_machine::execute によって再帰的に演算している。

icalc+variant.cc

さて、iexpr+variant.cc に変数のサポートと代入演算子を導入すると以下のようになるだろう。

#include <iostream>
#include <iomanip>
#include <string>
#include <boost/variant.hpp>
#include <vector>
#include <map>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace client {
  struct nulary_op {};
  struct unary_op;
  struct binary_op;
  struct ternary_op;

  struct astree {
    typedef boost::variant<
      nulary_op,
      boost::recursive_wrapper<unary_op>,
      boost::recursive_wrapper<binary_op>,
      boost::recursive_wrapper<ternary_op>,
      std::string,
      int
    > type;
    typedef type::types types;
    enum {
      nulary_op_type,
      unary_op_type,
      binary_op_type,
      ternary_op_type,
      string_type,
      value_type,
    };

    type expr;

    int which() const { return expr.which(); }

    astree()                            : expr(nulary_op()) {}
    astree(unary_op const &expr)        : expr(expr) {}
    astree(binary_op const &expr)       : expr(expr) {}
    astree(ternary_op const &expr)      : expr(expr) {}
    astree(unsigned int expr)           : expr(expr) {}
    astree(type const &expr)            : expr(expr) {}

    astree &ternary_conditional(astree const &a, astree const &b);

    astree &assignment_direct(astree const &rhs);
    astree &assignment_summation(astree const &rhs);
    astree &assignment_difference(astree const &rhs);
    astree &assignment_product(astree const &rhs);
    astree &assignment_quotient(astree const &rhs);
    astree &assignment_remainder(astree const &rhs);
    astree &assignment_bitshift_left(astree const &rhs);
    astree &assignment_bitshift_right(astree const &rhs);
    astree &assignment_bitwise_and(astree const &rhs);
    astree &assignment_bitwise_xor(astree const &rhs);
    astree &assignment_bitwise_or(astree const &rhs);

    astree &prefix_increment();
    astree &prefix_decrement();
    astree &postfix_increment();
    astree &postfix_decrement();

    astree &binary_comma(astree const &rhs);

    astree &binary_logical_or(astree const &rhs);
    astree &binary_logical_and(astree const &rhs);

    astree &binary_bitwise_or(astree const &rhs);
    astree &binary_bitwise_xor(astree const &rhs);
    astree &binary_bitwise_and(astree const &rhs);
    astree &binary_bitshift_left(astree const &rhs);
    astree &binary_bitshift_right(astree const &rhs);

    astree &binary_less(astree const &rhs);
    astree &binary_less_equal(astree const &rhs);
    astree &binary_greater(astree const &rhs);
    astree &binary_greater_equal(astree const &rhs);
    astree &binary_equal_to(astree const &rhs);
    astree &binary_not_equal_to(astree const &rhs);

    astree &binary_plus(astree const &rhs);
    astree &binary_minus(astree const &rhs);
    astree &binary_multiplies(astree const &rhs);
    astree &binary_divides(astree const &rhs);
    astree &binary_modulus(astree const &rhs);

    astree &unary_negate();
    astree &unary_logical_not();
    astree &unary_bitwise_not();
  };
} // namespace client

namespace boost {
  template <typename T>
  inline
  T get(client::astree const &expr)
  {
    return boost::get<T>(expr.expr);
  }
}

namespace client {
  enum byte_code {
    code_negate = 1,
    code_logical_not,
    code_bitwise_not,
    code_plus,
    code_minus,
    code_multiplies,
    code_divides,
    code_modulus,
    code_bitshift_left,
    code_bitshift_right,
    code_less,
    code_less_equal,
    code_greater,
    code_greater_equal,
    code_equal_to,
    code_not_equal_to,
    code_logical_or,
    code_logical_and,
    code_bitwise_or,
    code_bitwise_xor,
    code_bitwise_and,
    code_assignment_direct,
    code_assignment_summation,
    code_assignment_difference,
    code_assignment_product,
    code_assignment_quotient,
    code_assignment_remainder,
    code_assignment_bitshift_left,
    code_assignment_bitshift_right,
    code_assignment_bitwise_and,
    code_assignment_bitwise_xor,
    code_assignment_bitwise_or,
    code_prefix_increment,
    code_prefix_decrement,
    code_postfix_increment,
    code_postfix_decrement,
    code_comma,
    code_ternary_conditional,
    code_var,
    code_value,
  };

  struct ternary_op {
    ternary_op() {}
    ternary_op(int op,
               astree const &c,
               astree const &a,
               astree const &b) : op(op), c(c), a(a), b(b) {}
    int op;
    astree c, a, b;
  };
  struct binary_op {
    binary_op() {}
    binary_op(int op,
              astree const &lhs,
              astree const &rhs) : op(op), lhs(lhs), rhs(rhs) {}
    int op;
    astree lhs, rhs;
  };
  struct unary_op {
    unary_op() {}
    unary_op(int op,
             astree const &rhs) : op(op), rhs(rhs) {}
    int op;
    astree rhs;
  };

  inline
  astree &astree::ternary_conditional(astree const &a, astree const &b)
  { expr = ternary_op(code_ternary_conditional, expr, a, b); return *this; }

  inline
  astree &astree::assignment_direct(astree const &rhs)
  { expr = binary_op(code_assignment_direct,    expr, rhs); return *this; }
  inline
  astree &astree::assignment_summation(astree const &rhs)
  { expr = binary_op(code_assignment_summation, expr, rhs); return *this; }
  inline
  astree &astree::assignment_difference(astree const &rhs)
  { expr = binary_op(code_assignment_difference,        expr, rhs); return *this; }
  inline
  astree &astree::assignment_product(astree const &rhs)
  { expr = binary_op(code_assignment_product,   expr, rhs); return *this; }
  inline
  astree &astree::assignment_quotient(astree const &rhs)
  { expr = binary_op(code_assignment_quotient,  expr, rhs); return *this; }
  inline
  astree &astree::assignment_remainder(astree const &rhs)
  { expr = binary_op(code_assignment_remainder, expr, rhs); return *this; }
  inline
  astree &astree::assignment_bitshift_left(astree const &rhs)
  { expr = binary_op(code_assignment_bitshift_left,     expr, rhs); return *this; }
  inline
  astree &astree::assignment_bitshift_right(astree const &rhs)
  { expr = binary_op(code_assignment_bitshift_right,    expr, rhs); return *this; }
  inline
  astree &astree::assignment_bitwise_and(astree const &rhs)
  { expr = binary_op(code_assignment_bitwise_and,       expr, rhs); return *this; }
  inline
  astree &astree::assignment_bitwise_xor(astree const &rhs)
  { expr = binary_op(code_assignment_bitwise_xor,       expr, rhs); return *this; }
  inline
  astree &astree::assignment_bitwise_or(astree const &rhs)
  { expr = binary_op(code_assignment_bitwise_or,        expr, rhs); return *this; }

  inline
  astree &astree::prefix_increment()
  { expr = unary_op(code_prefix_increment,      expr); return *this; }
  inline
  astree &astree::prefix_decrement()
  { expr = unary_op(code_prefix_decrement,      expr); return *this; }
  inline
  astree &astree::postfix_increment()
  { expr = unary_op(code_postfix_increment,     expr); return *this; }
  inline
  astree &astree::postfix_decrement()
  { expr = unary_op(code_postfix_decrement,     expr); return *this; }

  inline
  astree &astree::binary_comma(astree const &rhs)
  { expr = binary_op(code_comma,                expr, rhs); return *this; }

  inline
  astree &astree::binary_logical_or(astree const &rhs)
  { expr = binary_op(code_logical_or,   expr, rhs); return *this; }
  inline
  astree &astree::binary_logical_and(astree const &rhs)
  { expr = binary_op(code_logical_and,  expr, rhs); return *this; }

  inline
  astree &astree::binary_bitwise_or(astree const &rhs)
  { expr = binary_op(code_bitwise_or,   expr, rhs); return *this; }
  inline
  astree &astree::binary_bitwise_xor(astree const &rhs)
  { expr = binary_op(code_bitwise_xor,  expr, rhs); return *this; }
  inline
  astree &astree::binary_bitwise_and(astree const &rhs)
  { expr = binary_op(code_bitwise_and,  expr, rhs); return *this; }
  inline
  astree &astree::binary_bitshift_left(astree const &rhs)
  { expr = binary_op(code_bitshift_left,        expr, rhs); return *this; }
  inline
  astree &astree::binary_bitshift_right(astree const &rhs)
  { expr = binary_op(code_bitshift_right,       expr, rhs); return *this; }

  inline
  astree &astree::binary_less(astree const &rhs)
  { expr = binary_op(code_less,         expr, rhs); return *this; }
  inline
  astree &astree::binary_less_equal(astree const &rhs)
  { expr = binary_op(code_less_equal,   expr, rhs); return *this; }
  inline
  astree &astree::binary_greater(astree const &rhs)
  { expr = binary_op(code_greater,      expr, rhs); return *this; }
  inline
  astree &astree::binary_greater_equal(astree const &rhs)
  { expr = binary_op(code_greater_equal,        expr, rhs); return *this; }
  inline
  astree &astree::binary_equal_to(astree const &rhs)
  { expr = binary_op(code_equal_to,     expr, rhs); return *this; }
  inline
  astree &astree::binary_not_equal_to(astree const &rhs)
  { expr = binary_op(code_not_equal_to, expr, rhs); return *this; }

  inline
  astree &astree::binary_plus(astree const &rhs)
  { expr = binary_op(code_plus,         expr, rhs); return *this; }
  inline
  astree &astree::binary_minus(astree const &rhs)
  { expr = binary_op(code_minus,        expr, rhs); return *this; }
  inline
  astree &astree::binary_multiplies(astree const &rhs)
  { expr = binary_op(code_multiplies,   expr, rhs); return *this; }
  inline
  astree &astree::binary_divides(astree const &rhs)
  { expr = binary_op(code_divides,      expr, rhs); return *this; }
  inline
  astree &astree::binary_modulus(astree const &rhs)
  { expr = binary_op(code_modulus,      expr, rhs); return *this; }

  inline
  astree &astree::unary_negate()
  { expr = unary_op(code_negate,        expr); return *this; }
  inline
  astree &astree::unary_logical_not()
  { expr = unary_op(code_logical_not,   expr); return *this; }
  inline
  astree &astree::unary_bitwise_not()
  { expr = unary_op(code_bitwise_not,   expr); return *this; }
}

namespace client {

  namespace qi = boost::spirit::qi;
  namespace ascii = boost::spirit::ascii;

  template <typename Iterator>
  struct calculator : qi::grammar<Iterator, astree(), ascii::space_type> {
    qi::rule<Iterator, astree(), ascii::space_type> vars, e[7], expr, t[9], term, fctr;
    calculator() : calculator::base_type(expr)
    {
      using qi::_val;
      using qi::_1;
      using qi::_2;
      using qi::hex;
      using qi::oct;
      using qi::uint_;
      using qi::alpha;
      using qi::alnum;
      using qi::lexeme;
      using qi::as_string;

      vars = as_string[lexeme[(alpha >> *(alnum | '_'))]];
      expr = e[0][_val = _1] >> *(','   >> e[0][_val = bind(&astree::binary_comma, _val, _1)]);
      e[0] = e[1] | e[2] | e[3] | e[4] | e[5] | e[6];
      e[1] = vars[_val = _1] >> ('='    >> e[0][_val = bind(&astree::assignment_direct, _val, _1)]);
      e[2] = vars[_val = _1] >> ("+="   >> e[0][_val = bind(&astree::assignment_summation, _val, _1)] |
                                 "-="   >> e[0][_val = bind(&astree::assignment_difference, _val, _1)]);
      e[3] = vars[_val = _1] >> ("*="   >> e[0][_val = bind(&astree::assignment_product, _val, _1)] |
                                 "/="   >> e[0][_val = bind(&astree::assignment_quotient, _val, _1)] |
                                 "%="   >> e[0][_val = bind(&astree::assignment_remainder, _val, _1)]);
      e[4] = vars[_val = _1] >> ("<<="  >> e[0][_val = bind(&astree::assignment_bitshift_left, _val, _1)] |
                                 ">>="  >> e[0][_val = bind(&astree::assignment_bitshift_right, _val, _1)]);
      e[5] = vars[_val = _1] >> ("&="   >> e[0][_val = bind(&astree::assignment_bitwise_and, _val, _1)] |
                                 "^="   >> e[0][_val = bind(&astree::assignment_bitwise_xor, _val, _1)] |
                                 "|="   >> e[0][_val = bind(&astree::assignment_bitwise_or, _val, _1)]);
      e[6] = t[0][_val = _1] >> *('?'   >> e[0] >>
                                  ':'   >> e[0])[_val = bind(&astree::ternary_conditional, _val, _1, _2)];
      t[0] = t[1][_val = _1] >> *("||"  >> t[1][_val = bind(&astree::binary_logical_or, _val, _1)]);
      t[1] = t[2][_val = _1] >> *("&&"  >> t[2][_val = bind(&astree::binary_logical_and, _val, _1)]);
      t[2] = t[3][_val = _1] >> *('|'   >> t[3][_val = bind(&astree::binary_bitwise_or, _val, _1)]);
      t[3] = t[4][_val = _1] >> *('^'   >> t[4][_val = bind(&astree::binary_bitwise_xor, _val, _1)]);
      t[4] = t[5][_val = _1] >> *('&'   >> t[5][_val = bind(&astree::binary_bitwise_and, _val, _1)]);
      t[5] = t[6][_val = _1] >> *("<<"  >> t[6][_val = bind(&astree::binary_bitshift_left, _val, _1)] |
                                  ">>"  >> t[6][_val = bind(&astree::binary_bitshift_right, _val, _1)]);
      t[6] = t[7][_val = _1] >> *('<'   >> t[7][_val = bind(&astree::binary_less, _val, _1)] |
                                  '>'   >> t[7][_val = bind(&astree::binary_greater, _val, _1)] |
                                  "<="  >> t[7][_val = bind(&astree::binary_less_equal, _val, _1)] |
                                  ">="  >> t[7][_val = bind(&astree::binary_greater_equal, _val, _1)]);
      t[7] = t[8][_val = _1] >> *("=="  >> t[8][_val = bind(&astree::binary_equal_to, _val, _1)] |
                                  "!="  >> t[8][_val = bind(&astree::binary_not_equal_to, _val, _1)]);
      t[8] = term[_val = _1] >> *('+'   >> term[_val = bind(&astree::binary_plus, _val, _1)] |
                                  '-'   >> term[_val = bind(&astree::binary_minus, _val, _1)]);
      term = fctr[_val = _1] >> *('*'   >> fctr[_val = bind(&astree::binary_multiplies, _val, _1)] |
                                  '/'   >> fctr[_val = bind(&astree::binary_divides, _val, _1)] |
                                  '%'   >> fctr[_val = bind(&astree::binary_modulus, _val, _1)]);
      fctr = "0x" >> hex[_val = _1] |
        '0' >> oct[_val = _1] |
        uint_[_val = _1] |
        "++" >> vars[_val = bind(&astree::prefix_increment, _1)] |
        "--" >> vars[_val = bind(&astree::prefix_decrement, _1)] |
        vars[_val = bind(&astree::postfix_increment, _1)] >> "++" |
        vars[_val = bind(&astree::postfix_decrement, _1)] >> "--" |
        vars[_val = _1] |
        '(' >> expr[_val = _1] >> ')' |
        '-' >> fctr[_val = bind(&astree::unary_negate, _1)] |
        '+' >> fctr[_val = _1] |
        '!' >> fctr[_val = bind(&astree::unary_logical_not, _1)] |
        '~' >> fctr[_val = bind(&astree::unary_bitwise_not, _1)];
    }
  };

  template <typename T>
  struct calculator_astree_machine {
    std::map<std::string, T> &vars;
    calculator_astree_machine(std::map<std::string, T> &vars) : vars(vars) {}
    T execute(const astree &ast)
    {
      if (ast.which() == astree::value_type ||
          ast.which() == astree::string_type ||
          false) {
        if (ast.which() == astree::string_type) {
          if (vars.find(boost::get<std::string>(ast)) == vars.end())
            vars[boost::get<std::string>(ast)] = 0;
          return vars[boost::get<std::string>(ast)];
        }
        return (T)boost::get<int>(ast); // !
      }
      T lhs, rhs, c, a, b;
      switch (ast.which()) {
      case astree::unary_op_type:
        rhs = execute(boost::get<unary_op>(ast).rhs);
        switch (boost::get<unary_op>(ast).op) {
        case code_negate:       return -rhs;
        case code_logical_not:  return !rhs;
        case code_bitwise_not:  return ~rhs;
        case code_prefix_increment:     return ++vars[boost::get<std::string>(boost::get<unary_op>(ast).rhs)];
        case code_prefix_decrement:     return --vars[boost::get<std::string>(boost::get<unary_op>(ast).rhs)];
        case code_postfix_increment:    return vars[boost::get<std::string>(boost::get<unary_op>(ast).rhs)]++;
        case code_postfix_decrement:    return vars[boost::get<std::string>(boost::get<unary_op>(ast).rhs)]--;
        }
        break;
      case astree::binary_op_type:
        lhs = execute(boost::get<binary_op>(ast).lhs);
        rhs = execute(boost::get<binary_op>(ast).rhs);
        switch (boost::get<binary_op>(ast).op) {
        case code_plus:         return lhs + rhs;
        case code_minus:        return lhs - rhs;
        case code_multiplies:   return lhs * rhs;
        case code_divides:      return lhs / rhs;
        case code_modulus:      return lhs % rhs;
        case code_bitshift_left:        return lhs << rhs;
        case code_bitshift_right:       return lhs >> rhs;
        case code_less:         return lhs < rhs;
        case code_less_equal:   return lhs <= rhs;
        case code_greater:      return lhs > rhs;
        case code_greater_equal:        return lhs >= rhs;
        case code_equal_to:     return lhs == rhs;
        case code_not_equal_to: return lhs != rhs;
        case code_logical_or:   return lhs || rhs;
        case code_logical_and:  return lhs && rhs;
        case code_bitwise_or:   return lhs | rhs;
        case code_bitwise_xor:  return lhs ^ rhs;
        case code_bitwise_and:  return lhs & rhs;
        case code_assignment_direct:    return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] = rhs);
        case code_assignment_summation: return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] += rhs);
        case code_assignment_difference:        return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] -= rhs);
        case code_assignment_product:   return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] *= rhs);
        case code_assignment_quotient:  return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] /= rhs);
        case code_assignment_remainder: return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] %= rhs);
        case code_assignment_bitshift_left:     return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] <<= rhs);
        case code_assignment_bitshift_right:    return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] >>= rhs);
        case code_assignment_bitwise_and:       return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] &= rhs);
        case code_assignment_bitwise_xor:       return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] ^= rhs);
        case code_assignment_bitwise_or:        return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] |= rhs);
        case code_comma:                return lhs, rhs;
        }
        break;
      case astree::ternary_op_type:
        c = execute(boost::get<ternary_op>(ast).c);
        a = execute(boost::get<ternary_op>(ast).a);
        b = execute(boost::get<ternary_op>(ast).b);
        switch (boost::get<ternary_op>(ast).op) {
        case code_ternary_conditional:  return c ? a : b;
        }
        break;
      }
      return T(0);
    }
  };
} // namespace client

int main(int argc, char *argv[])
{
  int rv = 0;

  typedef std::string::const_iterator iterator_type;
  typedef int value_type;
  client::calculator<iterator_type> calc;
  std::map<std::string, value_type> vars;
  std::string str;
  client::astree result;
  value_type a;
  while (std::getline(std::cin, str)) {
    iterator_type it = str.begin(), end = str.end();
    bool r = phrase_parse(it, end, calc, boost::spirit::ascii::space, result);
    client::calculator_astree_machine<value_type> vmachine(vars);
    a = vmachine.execute(result);
    if (r && it == end) {
      std::cout << "succeeded:\t" << std::endl;
      std::cout << a << std::endl;
    }
    else {
      std::cout << "failed:\t" << std::string(it, end) << std::endl;
      rv = !0;
    }
  }
  return rv;
}

同様に、もし unsigned int 型の演算を行いたければ、main 関数の冒頭を typedef unsigned int value_type; に書き換えれば良い。

ちなみに、この実装では代入時に未使用の変数を新たに使用することができ、変数の参照時に未使用の変数を使用すると零が代入されているものとする。

diff -u iexpr+variant.cc icalc+variant.cc

上述の二つのプログラムリストの掲示だけでは、どのように変数のサポートと代入演算子を導入したかが分かり辛いので、これらの Unified diff 形式をみてみよう。

--- iexpr+variant.cc
+++ icalc+variant.cc
@@ -3,6 +3,7 @@
 #include <string>
 #include <boost/variant.hpp>
 #include <vector>
+#include <map>
 #include <boost/spirit/include/qi.hpp>
 #include <boost/spirit/include/phoenix.hpp>
 
@@ -18,6 +19,7 @@
       boost::recursive_wrapper<unary_op>,
       boost::recursive_wrapper<binary_op>,
       boost::recursive_wrapper<ternary_op>,
+      std::string,
       int
     > type;
     typedef type::types types;
@@ -26,6 +28,7 @@
       unary_op_type,
       binary_op_type,
       ternary_op_type,
+      string_type,
       value_type,
     };
 
@@ -42,6 +45,25 @@
 
     astree &ternary_conditional(astree const &a, astree const &b);
 
+    astree &assignment_direct(astree const &rhs);
+    astree &assignment_summation(astree const &rhs);
+    astree &assignment_difference(astree const &rhs);
+    astree &assignment_product(astree const &rhs);
+    astree &assignment_quotient(astree const &rhs);
+    astree &assignment_remainder(astree const &rhs);
+    astree &assignment_bitshift_left(astree const &rhs);
+    astree &assignment_bitshift_right(astree const &rhs);
+    astree &assignment_bitwise_and(astree const &rhs);
+    astree &assignment_bitwise_xor(astree const &rhs);
+    astree &assignment_bitwise_or(astree const &rhs);
+
+    astree &prefix_increment();
+    astree &prefix_decrement();
+    astree &postfix_increment();
+    astree &postfix_decrement();
+
+    astree &binary_comma(astree const &rhs);
+
     astree &binary_logical_or(astree const &rhs);
     astree &binary_logical_and(astree const &rhs);
 
@@ -102,7 +124,24 @@
     code_bitwise_or,
     code_bitwise_xor,
     code_bitwise_and,
+    code_assignment_direct,
+    code_assignment_summation,
+    code_assignment_difference,
+    code_assignment_product,
+    code_assignment_quotient,
+    code_assignment_remainder,
+    code_assignment_bitshift_left,
+    code_assignment_bitshift_right,
+    code_assignment_bitwise_and,
+    code_assignment_bitwise_xor,
+    code_assignment_bitwise_or,
+    code_prefix_increment,
+    code_prefix_decrement,
+    code_postfix_increment,
+    code_postfix_decrement,
+    code_comma,
     code_ternary_conditional,
+    code_var,
     code_value,
   };
 
@@ -136,6 +175,57 @@
   { expr = ternary_op(code_ternary_conditional, expr, a, b); return *this; }
 
+  inline
+  astree &astree::assignment_direct(astree const &rhs)
+  { expr = binary_op(code_assignment_direct,    expr, rhs); return *this; }
+  inline
+  astree &astree::assignment_summation(astree const &rhs)
+  { expr = binary_op(code_assignment_summation, expr, rhs); return *this; }
+  inline
+  astree &astree::assignment_difference(astree const &rhs)
+  { expr = binary_op(code_assignment_difference,        expr, rhs); return *this; }
+  inline
+  astree &astree::assignment_product(astree const &rhs)
+  { expr = binary_op(code_assignment_product,   expr, rhs); return *this; }
+  inline
+  astree &astree::assignment_quotient(astree const &rhs)
+  { expr = binary_op(code_assignment_quotient,  expr, rhs); return *this; }
+  inline
+  astree &astree::assignment_remainder(astree const &rhs)
+  { expr = binary_op(code_assignment_remainder, expr, rhs); return *this; }
+  inline
+  astree &astree::assignment_bitshift_left(astree const &rhs)
+  { expr = binary_op(code_assignment_bitshift_left,     expr, rhs); return *this; }
+  inline
+  astree &astree::assignment_bitshift_right(astree const &rhs)
+  { expr = binary_op(code_assignment_bitshift_right,    expr, rhs); return *this; }
+  inline
+  astree &astree::assignment_bitwise_and(astree const &rhs)
+  { expr = binary_op(code_assignment_bitwise_and,       expr, rhs); return *this; }
+  inline
+  astree &astree::assignment_bitwise_xor(astree const &rhs)
+  { expr = binary_op(code_assignment_bitwise_xor,       expr, rhs); return *this; }
+  inline
+  astree &astree::assignment_bitwise_or(astree const &rhs)
+  { expr = binary_op(code_assignment_bitwise_or,        expr, rhs); return *this; }
+
+  inline
+  astree &astree::prefix_increment()
+  { expr = unary_op(code_prefix_increment,      expr); return *this; }
+  inline
+  astree &astree::prefix_decrement()
+  { expr = unary_op(code_prefix_decrement,      expr); return *this; }
+  inline
+  astree &astree::postfix_increment()
+  { expr = unary_op(code_postfix_increment,     expr); return *this; }
+  inline
+  astree &astree::postfix_decrement()
+  { expr = unary_op(code_postfix_decrement,     expr); return *this; }
+
+  inline
+  astree &astree::binary_comma(astree const &rhs)
+  { expr = binary_op(code_comma,                expr, rhs); return *this; }
+
   inline
   astree &astree::binary_logical_or(astree const &rhs)
   { expr = binary_op(code_logical_or,   expr, rhs); return *this; }
   inline
@@ -211,7 +301,7 @@
 
   template <typename Iterator>
   struct calculator : qi::grammar<Iterator, astree(), ascii::space_type> {
-    qi::rule<Iterator, astree(), ascii::space_type> expr, t[9], term, fctr;
+    qi::rule<Iterator, astree(), ascii::space_type> vars, e[7], expr, t[9], term, fctr;
     calculator() : calculator::base_type(expr)
     {
       using qi::_val;
@@ -220,9 +310,27 @@
       using qi::hex;
       using qi::oct;
       using qi::uint_;
-
-      expr = t[0][_val = _1] >> *('?' >> expr >>
-                                  ':' >> expr)[_val = bind(&astree::ternary_conditional, _val, _1, _2)];
+      using qi::alpha;
+      using qi::alnum;
+      using qi::lexeme;
+      using qi::as_string;
+
+      vars = as_string[lexeme[(alpha >> *(alnum | '_'))]];
+      expr = e[0][_val = _1] >> *(','   >> e[0][_val = bind(&astree::binary_comma, _val, _1)]);
+      e[0] = e[1] | e[2] | e[3] | e[4] | e[5] | e[6];
+      e[1] = vars[_val = _1] >> ('='    >> e[0][_val = bind(&astree::assignment_direct, _val, _1)]);
+      e[2] = vars[_val = _1] >> ("+="   >> e[0][_val = bind(&astree::assignment_summation, _val, _1)] |
+                                 "-="   >> e[0][_val = bind(&astree::assignment_difference, _val, _1)]);
+      e[3] = vars[_val = _1] >> ("*="   >> e[0][_val = bind(&astree::assignment_product, _val, _1)] |
+                                 "/="   >> e[0][_val = bind(&astree::assignment_quotient, _val, _1)] |
+                                 "%="   >> e[0][_val = bind(&astree::assignment_remainder, _val, _1)]);
+      e[4] = vars[_val = _1] >> ("<<="  >> e[0][_val = bind(&astree::assignment_bitshift_left, _val, _1)] |
+                                 ">>="  >> e[0][_val = bind(&astree::assignment_bitshift_right, _val, _1)]);
+      e[5] = vars[_val = _1] >> ("&="   >> e[0][_val = bind(&astree::assignment_bitwise_and, _val, _1)] |
+                                 "^="   >> e[0][_val = bind(&astree::assignment_bitwise_xor, _val, _1)] |
+                                 "|="   >> e[0][_val = bind(&astree::assignment_bitwise_or, _val, _1)]);
+      e[6] = t[0][_val = _1] >> *('?'   >> e[0] >>
+                                  ':'   >> e[0])[_val = bind(&astree::ternary_conditional, _val, _1, _2)];
       t[0] = t[1][_val = _1] >> *("||"  >> t[1][_val = bind(&astree::binary_logical_or, _val, _1)]);
       t[1] = t[2][_val = _1] >> *("&&"  >> t[2][_val = bind(&astree::binary_logical_and, _val, _1)]);
       t[2] = t[3][_val = _1] >> *('|'   >> t[3][_val = bind(&astree::binary_bitwise_or, _val, _1)]);
@@ -244,6 +352,11 @@
       fctr = "0x" >> hex[_val = _1] |
         '0' >> oct[_val = _1] |
         uint_[_val = _1] |
+        "++" >> vars[_val = bind(&astree::prefix_increment, _1)] |
+        "--" >> vars[_val = bind(&astree::prefix_decrement, _1)] |
+        vars[_val = bind(&astree::postfix_increment, _1)] >> "++" |
+        vars[_val = bind(&astree::postfix_decrement, _1)] >> "--" |
+        vars[_val = _1] |
         '(' >> expr[_val = _1] >> ')' |
         '-' >> fctr[_val = bind(&astree::unary_negate, _1)] |
         '+' >> fctr[_val = _1] |
@@ -254,11 +367,18 @@
 
   template <typename T>
   struct calculator_astree_machine {
-    calculator_astree_machine() {}
+    std::map<std::string, T> &vars;
+    calculator_astree_machine(std::map<std::string, T> &vars) : vars(vars) {}
     T execute(const astree &ast)
     {
       if (ast.which() == astree::value_type ||
+          ast.which() == astree::string_type ||
           false) {
+        if (ast.which() == astree::string_type) {
+          if (vars.find(boost::get<std::string>(ast)) == vars.end())
+            vars[boost::get<std::string>(ast)] = 0;
+          return vars[boost::get<std::string>(ast)];
+        }
         return (T)boost::get<int>(ast); // !
       }
       T lhs, rhs, c, a, b;
@@ -269,6 +389,10 @@
         case code_negate:       return -rhs;
         case code_logical_not:  return !rhs;
         case code_bitwise_not:  return ~rhs;
+        case code_prefix_increment:     return ++vars[boost::get<std::string>(boost::get<unary_op>(ast).rhs)];
+        case code_prefix_decrement:     return --vars[boost::get<std::string>(boost::get<unary_op>(ast).rhs)];
+        case code_postfix_increment:    return vars[boost::get<std::string>(boost::get<unary_op>(ast).rhs)]++;
+        case code_postfix_decrement:    return vars[boost::get<std::string>(boost::get<unary_op>(ast).rhs)]--;
         }
         break;
       case astree::binary_op_type:
@@ -293,6 +417,18 @@
         case code_bitwise_or:   return lhs | rhs;
         case code_bitwise_xor:  return lhs ^ rhs;
         case code_bitwise_and:  return lhs & rhs;
+        case code_assignment_direct:    return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] = rhs);
+        case code_assignment_summation: return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] += rhs);
+        case code_assignment_difference:        return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] -= rhs);
+        case code_assignment_product:   return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] *= rhs);
+        case code_assignment_quotient:  return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] /= rhs);
+        case code_assignment_remainder: return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] %= rhs);
+        case code_assignment_bitshift_left:     return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] <<= rhs);
+        case code_assignment_bitshift_right:    return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] >>= rhs);
+        case code_assignment_bitwise_and:       return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] &= rhs);
+        case code_assignment_bitwise_xor:       return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] ^= rhs);
+        case code_assignment_bitwise_or:        return (vars[boost::get<std::string>(boost::get<binary_op>(ast).lhs)] |= rhs);
+        case code_comma:                return lhs, rhs;
         }
         break;
       case astree::ternary_op_type:
@@ -316,13 +452,14 @@
   typedef std::string::const_iterator iterator_type;
   typedef int value_type;
   client::calculator<iterator_type> calc;
+  std::map<std::string, value_type> vars;
   std::string str;
   client::astree result;
   value_type a;
   while (std::getline(std::cin, str)) {
     iterator_type it = str.begin(), end = str.end();
     bool r = phrase_parse(it, end, calc, boost::spirit::ascii::space, result);
-    client::calculator_astree_machine<value_type> vmachine;
+    client::calculator_astree_machine<value_type> vmachine(vars);
     a = vmachine.execute(result);
     if (r && it == end) {
       std::cout << "succeeded:\t" << std::endl;

変数のサポートのために、std::map<std::string, value_type> vars が導入され、クラス astreeboost::variant 型のメンバ変数 exprstd::string が追加されていることがわかるだろう。