Abi-Gag im PG
Heute, am ersten Schultag (*seufz*) nach den Pfingstferien, war also
Abi-Gag im benachbarten Pestalozzi-Gymnasium—kein wohlgehütetes
Geheimnis, ich wusste das schon vor den Ferien.
Sie hatten ein generisches Thema, “Märchenwald”, aber recht nette
Kulissen, einen Dennete-Stand, eine Hüpfburg und eine
Wasserlandschaft, in der die 5. und 6. Klässler ihre
Wet-Shirt-Contests durchführen konnten, zumindest sah es danach aus.
Livemusik gabs auch, der wir durch schlechte Dämmung des
WG-Computerraums teilhaben durften. Sollten meine Schreikrämpfe bei
Verunglimpfung und starker akustischer Umweltverschmutzung als sie
anfingen “Basket Case” (war schlimm), “Teenage Dirtbag” (noch
schlimmer) und last but not least “Highway to Hell” (ich hätte nicht
gedacht, dass es schlimmer kommt als von Corinna May) zu spielen,
jemanden
im Matheunterricht gestört haben, bitte ich um Entschuldigung.
Erstaunlicherweise klang die Band dann danach besser, als wir direkt
vor den Boxen saßen… komisch.
Ich bin dann ja mal gespannt, wann unser Abi-Gag stattfinden wird.
Teile der Kulisse hängen schon zusammengerollt an der Fassade, es wird
wohl diese Woche sein.
Lieber saurer Joster als süße Siebzehn.
Katholisator
[über die Schützenabzeichen:] Merkt man wenigstens nicht, wenn man
versehentlich draufpisst…
NP: Adam Green—Down On The Street
Pulling from Streams
It is interesting that in the “real world”, most things go easier into
one direction than in the other. For example, it is easier to break a
coffee cup than to glue it together again. I think this has something
to do with entropy.
Computer programs are not unaffected by this fact, take for example
the two styles of on-the-fly XML
parsing, Push vs. Pull.
Translating a Pull API into a Pull API is done trivially, such code
usually looks like that:
while parser.pull
case parser.state
when :open
receiver.open parser.text
when :text
receiver.text parser.text
when :close
receiver.close
end
end
Doing the reverse—converting a Push API into a Pull one—looks
impossible at first, but that is not true. It can be done (albeit
only in limited ways in Ruby, see below) using our beloved
continuations.
I have written a small class, Stream2Pull that does this crazy task:
class Stream2Pull
attr_reader :data
def initialize(&block)
@data = nil
@continuation = lambda { instance_eval &block }
end
def method_missing(*arguments)
@data = arguments
@again = true
switch
end
def pull
@again = false
switch
@again
end
def switch
save = @continuation
callcc { |@continuation| save.call }
save = nil
GC.start
end
end
Stream2Pull takes a block it will call on itself on the first
execution of pull. This code is expected to send undefined messages
to the instance. Then the “magic” happens: With the call of switch,
the context of the program changes from method_missing to the call of
pull, which then returns. Now the caller can examine the undefined
message—the event—by looking at data. Let’s wrap REXML’s
StreamParser that way:
s_to_p = Stream2Pull.new {
REXML::Parsers::StreamParser.new(<<XML, self).parse
<?xml version="1.0"?>
<info>abc<em>hehe</em></info>
}
Now, we can use our new object just like a pull parser:
while s_to_p.pull
p s_to_p.data
end
We just repeatedly pull from the parser, thereby always switching
contexts between the inner loop of the StreamParser and ours.
These are the events that are successfully emitted:
[:xmldecl, "1.0", nil, nil]
[:tag_start, "info", {}]
[:text, "abc"]
[:tag_start, "em", {}]
[:text, "hehe"]
[:tag_end, "em"]
[:tag_end, "info"]
[:text, "\n"]
We just have converted a Push API into a Pull API!
Now, we can talk about the dark side of the medal. Ruby’s
continuations are very slow and they eat loads of memory. When I
tried parsing a 1.5 megabyte file—the largest one I could find on my
disk, it is actually the XSL specification—I didn’t yet have the
GC.start inside switch. In about 2 minutes, my Ruby gobbled 400
megabytes of memory! After enforcing garbage-collection, it took less
memory but memory usage was still raising constantly. After 15
minutes and 100 megabytes used I gave up.
These are the things I really want to see in (I hope not too far away)
future Ruby 2 implementations: Efficient tail calls, continuation
passing style, and possibly support for partial
continuations. If you
implement CPS, you can get the others almost for free. Those features
would make above code run at about the speed of a “native” pull
parser. Alternatively, please port the Ruby standard library to
Scheme. }}:-) Just kidding.
If you just want to parse XML, it’s all not too bad, actually. This
is because REXML’s StreamParser *is* actually a wrapper around a
PullParser (REXML::BaseParser)! (Of course, I knew that in advance.)
Therefore, we can easily write:
bp = REXML::Parsers::BaseParser.new(ABOVE_XML)
while bp.has_next?
p bp.pull
end
And it will generate exactly the same output as above. :-) The main
difference is, well, it uses constant memory and only needs 15 seconds
to parse the big XML document. And that’s not too bad for a pure Ruby
implementation.
NP: Bob Marley—Three Little Birds
Parsing Vooly in C
Yesterday I started writing a Vooly
parser in C, mainly for reasons of speed and to provide bindings for
lots of other languages without needing to reimplement the exact (yet
unfinished) specification.
Before I did that, I extended Vooly with a new mechanism for escaping.
Until now, it was not possible to write << and >> as text in
Vooly, because those sequences were delimiters. With the new
variable length terminators, you can easily do that by just using
more angle brackets, for example:
<<note
To bitshift in C, use code like:
<<<code foo = bar >> 2 >>>
or, to shift left:
<<<code foo = bar << 2 >>>.
Pretty easy, no?
>>
Since the code blocks start with <<<, you need >>> to close them
again (and at least <<< to open a new tag inside).
The “fun” thing about this addition was that it totally crushed by old
Ruby parser that was based on StringScanner, and about 30 lines long.
It turns out that the syntax is not regular anymore, and therefore is
hard to parse using ordinary regular expressions.
All my Vooly parsers are pull parsers because these are comparatively
easy to implement (push can be easier, sometimes), but also pleasant to
use and quick.
Now, this is the way I wrote the C implementation: First, I rewrote
the Ruby parser until it could run the old test-suite I had for the
previous parser. Then, I added tests for variable length terminators
and… I created a monster. I actually wrote a parser that would
dynamically generate and compile regular expressions, keep them on a
stack, and match them against the input string.
Seriously, I never wrote such a weird parser. Look at some snippets:
TEXT_REGEXP = Hash.new { |hash, key|
hash[key] = /.+?(?=<{#{key.size}}|(>{#{key.size}}))/m
}
...
elsif @scanner.scan(@openrx.last)
@textrx << TEXT_REGEXP[@scanner[1]]
...
elsif @scanner.scan(@closerx.last)
...
@textrx.pop
elsif @scanner.scan(@textrx.last)
...
@text = @scanner.matched
...
It was among the most craziest code ever, but it told me my idea
worked as I wanted. It was about 60 lines of Ruby. Still, if I were
to implement it like that in C, I’d probably have gone crazy. You
maybe should know that I didn’t do (serious) C in a long time, and
I’m totally spoiled by Ruby’s string handling, garbage-collection etc.
Another, even more serious problem was that my nice parser only read
from Strings (due to StringScanner), but I also wanted to read from
streams (IO). Therefore, rewrite! Just too good I have a fairly
complete test suite…
I reimplemented the parser in Ruby rather quickly; according to
SLOCCount, it is whole 108 lines of Ruby as now, fully featured. I
tried hard to use not very Ruby-specific features; the code doesn’t
use regular expressions for parsing at all, it’s just a simple state
machine (oh I love those :-)) with five states. Also, I don’t use a
lot of OO features, only instance variables. The rest is written in a
plain old procedural style. While it surely isn’t the most elegant
Ruby I ever wrote, this is extremely helpful for the next step.
Let’s see what features and methods of Ruby we use:
- String, Array, Integer, Symbols, true, false, nil
- One user-defined class with instance variables
- String#strip!, String#[], String#empty?, String#<<, one trivial
substitution (
@text.sub! /\s$/, '')
- IO#getc, IO#eof?, IO#ungetc
- Array#last, Array#pop, Array#size, Array#<<
That was it. And, believe me, that part of Ruby was very easy to
write in C. I basically could go over the source and translate it
line by line. Symbols would get enums, my class would get a
struct, the instance variables its fields. String#strip! can be
written in 5 lines of code, String#<< needs a bit help because of the
memory management, but it’s not too bad. The IO methods map directly
to the C standard library, and the Array stuff can be written using a
statically allocated array and a “head pointer”. Just as you’d usually
implement stacks in C.
I think I wrote the guts of the C parser in about one and a half hour.
I ported some test cases over, and they ran fine. I estimate the
total time to port the second Ruby version to C to about five hours,
including documenting the API. The C version is, as of now, 275 lines
of code according to David A. Wheeler’s SLOCCount. Not very much, in the
end.
If I were to write the C parser directly, I fear I would have lost
track early, have no chance to test it easily, and if my idea wouldn’t
have worked, it would have wasted a lot of time. Clearly, I wrote
lots of “superfluous” code during the port, but it helped me getting a
good understanding of the code and its working.
Using a more flexible language just allows for much shorter turnaround
times and makes development a lot more fun. I do rather see 30
“undefined method foo” exceptions than three “segmentation faults” per
hour. Debugging Ruby is just a lot easier. I debugged the Ruby
versions using p and testcases only, but I would have wasted a lot
of time if I didn’t use gdb for checking some things in the C
version. Lucky me, I didn’t need to open gdb at lot.
All in all, I wrote a piece of non-trivial C code for a new idea by
- Implementing/extending a test suite in Ruby.
- Implementing a new parser using complex Ruby without limitations.
- Testing the implementation and extending the testcases for the new
feature.
- Implementing the new feature.
- Rewriting the parser in “low-level” Ruby until it passes the test suite.
- Porting the low-level parser to C.
- Testing and documenting the C parser.
Every step is clearly defined, and the list “just” can be followed
sequentially. Should any unexpected things happen, feel free to
backtrack until you get everything working.
I am very confident with this way of developing efficient and short C
code, and will likely do it that way in the future again, should
(there will…) be need for it.
NP: Dan Bern—Mars
Off to Büsum
Alas, tonight we’re driving to Büsum
for holiday. I expect to be back next Saturday evening. As usual, I
don’t have net access in that time, at least I think so. This means
no blogging and no mail.
If you want to see the house we’ll be residing
in, go ahead.
And please excuse light blogging this week, nothing relevant happened
and I got sidetracked by some things. I expect that to get better
when I’m back again. Who knows what I’ll code on this trip. :-)
Oh my, how will I survive that 10 hour drive?
NP: Pitty Sing—Radio
Lazy Streams for Ruby
On #ruby-lang, we talked about
SICP
yesterday, and that reminded me I didn’t look into this wonderful book
for a long time. I had enough time today to open it…
Quickly skipping over the first two chapters that I remembered fairly
well (and that are “easy”), I started reading chapter 3 Modularity,
Objects, and
State.
The section on
Streams
finally made me switch to my always-open Emacs window and port that
beautiful Scheme into even better Ruby. :-)
In short, streams can be explained like this:
From an abstract point of view, a stream is simply a
sequence. However, we will find that the straightforward
implementation of streams as lists doesn’t fully reveal the power of
stream processing. As an alternative, we introduce the technique of
delayed evaluation, which enables us to represent very large (even
infinite) sequences as streams.
Stream processing lets us model systems that have state without ever
using assignment or mutable data.
What a great target for an functional installment in Ruby! Should you
not be any familiar with streams, I recommend you to read above
chapter now, and come back after. Or just read on, if you want to
bite into the code.
To make use of Ruby’s object-orientation, I decided to make each
string-cons instances of class Stream. A Stream contains two
objects, car and cdr, whereas cdr is evaluated lazily. In Ruby, we
do this using a block:
class Stream
def initialize(car, &cdr)
@car = car
@cdr = cdr
end
def car; @car; end
def cdr; @cdr.call; end
end
Now, we can create a simple cons:
cons = Stream.new(1) { 2 }
[cons.car, cons.cdr] # => [1, 2]
That way of creating new conses is not very beautiful, but everything
we get in Ruby (no macros…).
We will also need a few helper methods:
class Stream
def self.interval(low, high)
if low > high
nil
else
new(low) { interval low+1, high }
end
end
def filter(&pred)
if pred.call car
Stream.new(car) { cdr.filter &pred }
else
cdr.filter &pred
end
end
def take(n)
stream = self
results = []
n.times {
results << stream.car
stream = stream.cdr
}
results
end
end
Stream.interval(1, 100) will return a new stream that will evaluate
in the numbers of 1 to 100.
Stream#filter returns a new stream that only contains elements that
satisfy a given predicate. Lazily of course.
Stream#take finally will evaluate the string n times and return an
Array of the evaluated values.
Now, let’s do an example from SICP,
calculate the third prime between 10000 and 1000000.
If this was evaluated eagerly, you still
wouldn’t see the result. Instead, we simply run:
def prime?(n)
2.upto(Math.sqrt(n).round) { |i| return false if n % i == 0 }
true
end
Stream.interval(10000, 1000000).filter { |s| prime? s }.take(3).last
# => 10037
The answer comes instantly. Why is this? In the end, the stream
behaves just a loop: We loop over all the integers in the range, and,
basically, trackback if the numbers aren’t primes. Now, we do that
only three times, because that often the stream will be evaluated.
Therefore, we only test 37 numbers for being prime, not 999000.
Another nice example are fibonacci numbers, where each element—I
hope you knew that already—is the sum of its successors. But first,
we need to add two streams:
class Stream
def self.fibonacci
new(0) { new(1) { fibonacci.cdr + fibonacci } }
end
def +(other)
Stream.new(self.car + other.car) { self.cdr + other.cdr }
end
end
A quick test, it works:
Stream.fibonacci.take(10)
# => [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
Now, by the way, we also can calculate the
golden mean, which is the can be
approximated as the quotient of two successive fibonacci numbers.
n = Stream.fibonacci.take(20).last(2)
n[1] / n[0].to_f # => 1.61803405572755
(Math.sqrt(5.0) + 1) / 2.0 # => 1.61803398874989
Unfortunately, lazyness with lambdas is not exactly a wonder of speed
in Ruby, therefore Streams are, beyond their very interesting
properties, not of much use for the usual Ruby programmer. In purely
functional languages (e.g. Haskell), where lazy evaluation is a common
thing, they are however a very common way of implementing state
without actually changing things.
NP: Jetscreamer—Front Porch
Pfingstparty
So, Fabses
Pfingstparty
war ja ein voller Erfolg. Ich lasse einfach mal Bilder sprechen:






Der Rest (und trotzdem nicht alle, hähä) sind bei Fabse
online.
Andi gibt auch noch seinen Senf
dazu.
Update: Meine Güte, jetzt hab ich doch tatsächlich die beiden
Knüllersprüche des Abends vergessen:
Der Tag hat keine Zukunft
[Um 21:51…]
Das Leben ist kein Ponyhof!
NP: Apollo Sunshine—I Was On The Moon
Introducing Rye
After trying almost every web framework for Ruby out there, I still
could not find one I really liked (Wee was close, but has non-restful
URLs), so I simply decided to grow my own (hey, you know me ;-)),
called Rye.
Rye is a framework of the different kind. I don’t have a lot of code
yet, and lacks a lot of functionality, but I could extract the basic
idea already and will try to explain it here.
Rye provides an RESTful interface to stored objects. So far, my
objects are simply stored as YAML files, but that will not scale for
bigger amounts of data. Rye maps these objects onto URLs like this:
http://<server>/<object-path...>/<method>/<arguments...>?<query>
So far, Rye can handle the two most commonly used (and supported,
sigh) HTTP methods, GET and POST. In the case of GET, Rye will
retrieve/load the object, call the method on it (possibly with the
arguments and the parsed query) and, depending on the return value
of the method, render the generated page or call show, the default
renderer for the object. POST is treated similary, except that the
object is saved to the store again before rendering the site.
Therefore, GET can’t easily change any values and is idempotent, as
required by REST. (Of course, you still can—and are free to—shoot
yourself in the foot by manipulating the storage on your own, but it
will really be your fault.)
The nice benefit of this is that there is not much to do to use Rye:
Usually (so far, at least) you simply define a few YAML specific
methods to store only the required instance variables of the object
and make your class inherit from Rye::Base, which defines a few
helper methods and attributes (maybe this should get a mix-in).
As an example, I developed a very simple to-do list manager:
class ToDoList < Rye::Base
attr_accessor :title
attr_accessor :todo
attr_accessor :done
def initialize(title)
super()
@title = title
@todo, @done = [], []
end
def to_yaml_properties; ["@title", "@todo", "@done"]; end
def show; template; end
def add(params); @todo << params["item"]; self; end
def finish(item)
@done << item if @todo.delete item
self
end
def undo(item)
@todo << item if @done.delete item
self
end
end
As you can see, this is straight-forward Ruby code without any Web
specific stuff—except maybe the show method that calls template,
a helper defined in Rye::Base that renders the object using a
class-specific Kashmir template (of course, you can easily interface
with your favourite templating engine).
If you wonder why finish and undo return self, well, that means
that Rye will redirect to show them, a useful and logic convention.
finish and undo do change data, and therefore need to be called
using POST. As of now, you would call finish("blubb") by writing HTML
like this:
<form action="/myapp.rye/finish/blubb" method="post">
<input type="submit" value="Finish blubb!" />
</form>
If you decide to pass additional values (for example, to call add
using a form) you can do that too, as usual:
<form action="/myapp.rye/add" method="post">
<input type="text" name="item" />
<input type="submit" value="Add!" />
</form>
Since the code essentially only contains the real application logic,
it is very easy to unit test. In fact, you test it just like any other
class. (Nevertheless, I may add a helper library for unit testing
which I dub “ergot” :-))
So far, Rye is only about 150 lines of code, but already covers a lot
of the basic functionality. I’ll need to add a generic store, so one
can use databases (for example with Og) too and isn’t forced to use
YAML. Maybe I get a first release done this week (or possibly not,
who knows).
I’d be interested in hearing your comments on Rye, just comment here
or mail me.
NP: 33hz—Digital Lover
Wie man eine Shisha präpariert...
…in einfachen bebilderten Schritten:
Unsere Wasserpfeife samt Utensilien.
Trichter aufsetzen und mit Tabak befüllen.
Trichter mit Alufolie bespannen.
Alufolie perforieren.
Metallsieb einsetzen.
Holzkohle reinlegen.
Metallverkleidung aufsetzen.
Fast fertig…
… Schläuche nicht vergessen!
NP: Bob Marley—No Woman No Cry
Kühltieftruhe, Teebauern und sinnlose Lebern
Soo, mal wieder ‘ne Ladung Quotes in die Runde geworfen:
Warum eins kaufen, wenn man für’s doppelte Geld zwei bekommt?
Was hat Jesus vor seinem Tod getrunken?
— Kreuzkümmel.
[Der ist richtig schlecht. :-P]
You drive me nuting futs!
Kühltieftruhe
Meine Leber hatte keinen Sinn mehr…
Das hab ich jetzt nicht so gemeint, aber ernsthaft.
We are teh pawnz.
— Juhuu, wir sind Teebauern!
[zum ev. Religionslehrer:] Gibt es auch noch andere heilige
Gegenstände?
— Heilige Gegenstände?!?
— Unsere
Englischlehrerin hat gesagt, unser Grammatikheft ist ein heiliger
Gegenstand…
SHIT — Selbsthilfe in Tübingen.
NP: Bob Dylan—Slow Train
Scripting for Runaways
During our trip over the extended weekend, we quickly decided it would
be easier if different persons buy various stuff everyone needed (say,
group tickets, lunch and booze). So, everyone started to pay lots of
different things with his/her own money, and it seemed like it would
get a big mess.
Ruby to the rescue! I quickly (and I mean it, I coded while the
others made breakfast) came up with a script to calculate who needed
to give what amounts of money to other persons so everyone spent the
same amount of money—all in all.
In the beginning, it was a very easy task. We created a simple file
formatted like this:
# Name Amount Comment
Mr. Foo 72.00 Tickets
Lil' Bar 33.00 Booze
The people that didn’t pay anything just were listed, with empty
amount and comment:
Squarehead
J. Random Hacker
Now, we would run the program and get an output like that:
J. Random Hacker -> Lil' Bar: 8.25
J. Random Hacker -> Mr. Foo: 18.00
Lil' Bar -> Mr. Foo: 9.75
Squarehead -> Lil' Bar: 8.25
Squarehead -> Mr. Foo: 18.00
4 Leute, 26.25 pro Person (Cashflow: 105.00)
Lil' Bar: Booze
Mr. Foo: Tickets
On a glance, you see who needs to pay whom what amount and what for.
Some throwaway results are presented too.
Of course, that script was far too easy (the initial version didn’t
sort the lines, so it was even easier, but I didn’t use VC, so it’s
lost in the mists of time). Have a look at the
source.
Soon, trouble started. One person came on the next day on her own, so
she wouldn’t need to pay the group tickets (that’s only fair), but of
course the other stuff we bought. I simply rewrote the program to
divide the money among the known people at that time, and list
everyone at the beginning and the one that came late just later. It
worked fine and was a quick adjustment.
On the second day, we decided to make the first cashing-up.
Obviously, we wouldn’t want to remove the entries so far (they were
needed for the statistic, too), but they shouldn’t be calculated
anymore. I hacked some code to make --- output the list and delete
the current debts.
Then, there was someone that would leave sooner, and of course didn’t
want to pay things she couldn’t make use of. Another hack, and -Name
lines remove the person from the known people. This messes up the
statistic a bit, but we didn’t really care.
The source code of our final version is
available too.
Now, when I revisit the code, it’s very much like a awk script to
me. You can see the typical elements, no additional classes were
defined, only hashes and arrays store the data. The main program is a
loop to parse, and run stuff at the same time. It’s rather icky with
it’s linear, but messed flow.
But I can do better. Let’s build a domain specific language to manage
group money! Now, the input file (a valid Ruby program, actually)
looks like this:
person :MrFoo
person :LilBar
person :Squarehead
person :JRHacker
MrFoo.pay 72.00, "Tickets"
LilBar.pay 33.00, "Booze"
cashing_up!
The last version (for now, at least) is
only two lines longer than the previous one, but lets Ruby do the
parsing. This is actually very useful. For example, we could now
also write:
EGG = 0.33
JRHacker.pay 12*EGG, "a dozen eggs"
There are further ways to clean up the code. For now, shared state is saved
in constants and globals, we could create a wrapper class to take care
of that (even instance_eval the file inside it, and use $SAFE),
but for now, and especially for such a simple program, this is not
needed.
I have demonstrated the usefulness of a general-purpose scripting
language to ease tasks that do happen on the road and wouldn’t be a
lot of fun to calculate manually (YMMV, but if you have lots of
entries, you need to calculate a lot). Additionally, I showed how a
flexible object-oriented language like Ruby can be used to create
mini-languages to save parsing overhead and make use of the language
features.
I wish you well on your travels
My friends I wish you well along the way
— Dan Bern, Jail
NP: Manu Chao—La Despedida
Vacation
From May 4th to May 8th I’m in Munich with friends for no particular
reason except having free time :-) and therefore unlikely to blog here
or at Anarchaia and to read my mail. While it may be possible I get
WLAN somewhere (Anyone know a nifty, free hotspot? Please mail me.),
don’t count on that.
I’ll write quotes, scribble stuff that happened and make a lot of
pictures though, so wait till I get back and there will be something
to see here.
NP: The Distillers—The Young Crazed Peeling
Jehoovacraft und Bordbordell
Wie üblich, Quotes:
[über den globalen Temperaturanstieg während des zweiten Weltkrieges:]
Wieso sind die dann gestorben, weils in Stalingrad so kalt war?
Methan ist ökumenisch nicht wertvoll!
Bordbordell.
Also 100N ist schon eine ordentliche Kraft auf den Finger…
— So viel wie 100 Tafeln Schokolade!
Hoffentlich spielen sie in der Sportschau besser…
Mädchen kommt von Made, oder?
Willkommen in der Gosse der Erwachsenen!
Jetzt hat’s ihm aber fast den Sack weggerissen!
Ich bau’ jetzt ein Jehoovacraft.
Die ist rumgelaufen und hat die Fisiker gesucht…
Du bläst auf deiner Blase.
NP: Interpol—NYC