25 Jan 2008
Ever since I’ve upgraded to Apple Macintosh OS X 10.5 Leopard, I’ve run into
problems using the Darwinports “port” command to install new software.
The problem is that for some reason the version of GNU “patch” that I have
installed in /usr/bin/patch is version 2.5.8, and it doesn’t operate the way
that Darwin ports expects. A typical error message is:
---> Applying patches to erlang
Error: Target org.macports.patch returned: shell command " cd "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_lang_erlang/work/erlang-R12B-0" && patch -p0 < '/opt/local/var/macports/sources/rsync.macports.org/release/ports/lang/erlang/files/patch-toolbar.erl'" returned error 2
Command output: Get file lib/toolbar/src/toolbar.erl from Perforce with lock? [y]
Perforce client error:
Connect to server failed; check $P4PORT.
TCP connect to perforce failed.
perforce: host unknown.
patch: **** Can't get file lib/toolbar/src/toolbar.erl from Perforce
Error: Status 1 encountered during processing.
The work-around is to define the environment variable
POSIXLY_CORRECT=1 , as in:
POSIXLY_CORRECT=1 sudo port install erlang
Now, I’ve done some web searching, and I haven’t seen anyone else complaining
about this problem, so perhaps there’s something odd about my setup.
07 Jan 2008
Yesterday I wrote a web scraper. A web scraper is a program that crawls over a
set of web pages, following links and collecting data. Another name for this
kind of program is a “spider”, because it “crawls” the web.
In the past I’ve
written scrapers in Java and
F#, with good results. But
yesterday, when I wanted to write a new scraper, I though I’d try using a
dynamically-typed language instead.
What’s a dynamically-typed language you
ask? Well, computer languages can generally be divided into two camps,
depending on whether they make you declare the type of data that can be stored
in a variable or not. Declaring the type up front can make the program run
faster, but it’s more work for the developer. Java and F#, the languages I
previously used to write a web scraper, are statically typed languages,
although F# uses type inference so you don’t actually have to declare types
very often – the computer figures it out for you.
In order to scrape HTML you need three things:
- a language
- a library that fetches HTTP pages
- a library that parses the HTML into a tree of HTML tags
Unless you’re using Mono or Microsoft’s Common Language Runtime, the language
you choose will restrict the libraries that you can use.
So, the first thing I
needed to do was choose a dynamic language. Since I just finished reading
“Practical Common Lisp”, an excellent
advanced tutorial on the Lisp language, I though I’d try using Lisp. But that
didn’t work out very well at all. Lisp has neither a standard implementation
nor a set of standard libraries for downloading web pages and parsing HTML. I
did some Googling to try and find some combination of parts that would work
for me. Unfortunately, it seemed that every web page I visited recommended a
different combination of libraries, and none of the combinations I tried
worked for me. In the end I just gave up in frustration.
Then, I turned to
Python. I had not used Python much, but I knew it
had a reputation as an easy-to-use language with a lot of easy-to-use
libraries. And you know what? It really was easy! I did some web searches,
copied some example code, and voila, I had a working web spider in about an
hour. And the program was easy to write every step of the way. I used the
standard CPython implementation for the language, Python’s built-in urllib2
library to fetch the web data, and the Beautiful
Soup library for parsing the
HTML.
How does the Python compare to Java and F# for web scraping?
Python Benefits:
- Very brief, easy to write code
- Libraries built in or easy to find
- Lots of web examples
- I didn’t have to think: I just used for loops and subroutine calls.
- Very fast turn-around.
- Easy to create and iterate over lists of strings.
Python non-issues for this application:
- Didn’t matter that the language was slow, because this task is totally I/O bound.
- Didn’t matter that the IDE is poor, using print and developing interactively was fine
F# Benefits:
- Good IDE (Visual Studio)
- Both URL fetching and HTML parsing libraries built in to CLR
F# Non-issues:
F# Drawbacks:
- The CLR libraries for URL fetching and HTML parsing are more difficult to use than Python. It takes more steps to complete similar operations.
- Strong typing gets in the way of writing simple code.
- odd language syntax compared to Algol-derived languages.
- Hard-to-understand error messages from the compiler.
- Mixed functional/imperative programming is more complicated than just imperative programing.
- The language and library encourages you to use advanced concepts to do simple things. In my web scraper I wrote a lot of classes and had methods that took complicated curried functions as arguments. This made the code hard to debug. In retrospect perhaps I should have just used lists of strings, the same as I did in Python. Since F# supports lists of strings pretty well, maybe this is my problem rather than F#’s. ;-)
Java benefits:
- Good debugger
- Good libraries
- Multithreading
Java drawbacks:
- Very wordy language
- Very wordy libraries
Lisp drawbacks:
- No standard implementation
- No standard libraries
Looking to the future, I’d be interested in writing a web scraper in
IronPython, which has good IDE support, and in C# 3.0, which has some support
for type inference.
In any event, I’m left with a very favorable impression of
Python, and plan to look into it some more. In the past I was put off from it
because it was slow, but now I see how useful it is when speed doesn’t matter.
[Note: When I first wrote this article I was under the impression that CPython
didn’t support threads. I since discovered (by reading the Python in a
Nutshell book) that it does support threads. Once I knew this, I was able to
easily add multi-threading to the web scraper. CPython’s threads are somewhat
limited: only one thread is allowed to run Python code at a time. But that’s
fine for this application, where the multiple threads spend most of their time
blocked waiting for C-based network I/O ]
29 Dec 2007
Curious about the internal designs of GPUs, CPUs, and game consoles? Tired of
lame articles full of uninformed speculation and fanboy rants? Then check out
the archives of the “Hot Chips”
conference, an annual conference where computer chip designers get together to
brag about their latest chips. The conference presentation slides are all
online, and they’re full of good technical information on GPUs, CPUs. and even
game consoles. Of course, presenters often gloss over any technical problems
with their chips, so you won’t get the full picture. But these presentations
offer a detailed technical look inside otherwise secret system architectures.
(For what it’s worth the web site is poorly organized, and many links are
broken – you sometimes have to edit the URLs slightly to find the correct
links.)
Some highlights:
Reality Co-Processor, Ken Hayes (Silicon Graphics, Inc.) - All
about the Nintendo 64.
Gekko: A PowerPC compatible processor supporting high- performance 3D Graphics - Gamecube CPU
Multiple Cell Papers (PS3)
Xbox 360 System Architecture
29 Dec 2007
According to a presentation at the 24th Chaos Communication
Congress,
hackers have apparently been able to defeat the Nintendo Wii game console’s
security system using tweezers to bypass the hardware memory protection.
The
way it works is that the Wii runs in two modes: a GameCube emulation mode,
which has access to just 1/8th of the total memory, and Wii mode, that has
access to all the memory.
Hackers had already figured out how to run their own
code in GameCube mode. So the trick was to run their code in GameCube mode,
then use the tweezers to short out the address lines to allow the hacker’s
code to access parts of rest of the memory. By shorting different address
lines different portions of memory were made available. By collecting enough
shards they eventually mapped all of memory.
Apparently the Wii operating
system keeps its digital signature keys in this protected memory, and once the
digital signatures were found it was possible to sign and run homebrew code on
the Wii.
It is not clear to me whether the attack is a per-machine attack or a
break-once-run-everywhere attack.
29 Dec 2007
I was searching for information on LINJ, a Lisp language that compiles into
human-readable Java code. Unfortunately, the LINJ web site
http://www.evaluator.pt/linj.html, is currently offline.
Luckily, it turned
out that the Internet Archive Wayback
machine had cached both that page, and
the download files that that page had pointed to. Very cool!
Similarly, I was
looking for the source to the Windows CE port of Quake 3, and found that the
project’s web site http://www.noctemware.com/q3ce.html had been abandoned and
taken over by spammers. Luckily the Wayback machine had cached both the
original web page and the downloads.
Let this be a lesson to you aspiring open
source developers out there: It’s better to store small open-source projects
in a large “won’t-ever-go-away” source repository like
SourceForge or Google
Code than to use your own vanity domain hosting. Of
course, even using a large popular repository is not failure-proof. Some large
code repositories from the early days of the Internet, like DEC’s ftp site,
have gone away after their owning company was bought by another company.
Perhaps some sort of distributed system of discoverable git repositories is
the answer.
In the case of Quake 3 for Windows CE, I’ve contacted the author,
Christien Rioux, and with his kind permission I’ve set up a code.google.com
project so that other people can more easily find the sources (and binaries):
http://code.google.com/p/q3ce .
28 Dec 2007
I’ve been thinking about Lisp lately. A powerful language, with many excellent
features, but not hugely successful. And I thought of one reason why:
The excellent book Guns, Germs and Steel hypothesizes that agriculture displaced
hunting and gathering because people who practiced agriculture stayed in one
place, and were able to have one child per year, as opposed to the hunter-
gatherers, who had to wait until their children were old enough to walk before
having another child. This reproductive rate difference was amplified by the
ability for a given unit of land to support more farmers than hunters. As a
result, agriculture displaced hunting even though individual hunters were far
healthier (as seen by their skeleton height) than the farmers that displaced
them.
So it is possible for a poorer technology to displace a better one, if
it has compensating advantages. And I think that’s what’s hit Lisp. C and
Java, which are each less powerful and more wordy than Lisp, are more
successful for reasons other than power and brevity. Perhaps because both
languages allow lots of reusable code to be written by ordinary programmers.
Maybe Lispers are like healthy hunters, being displaced by hordes of sickly
farmers.
Well, no doubt the Lispers will take some comfort in the Java and C
programmers being displaced in turn by whatever language next becomes even
more successful, just as hunter-gatherers may take pleasure in the trend that
farmers are gradually being displaced by urban dwellers. (Come to think of it,
Lispers have already seen their original competitor Fortran displaced by
C/C++, and in turn much of C++ has been displaced by Java and C#.)
I think
something similar is happening in productivity applications, as a generation
of not-very-good-but-web-based productivity applications is displacing the
Microsoft Office suite. (For example, I am typing this blog entry into a
simple and ugly HTML-based web form rather than a beautiful Word document.)
23 Dec 2007
This week I converted my family’s main computer from Vista to OS X. (It’s a
Mac Mini).
We pretty much use it for web surfing, web email, really old DOS
children’s games, and photo editing. Macs do that pretty well.
I still have a
Vista machine that I use for the excellent Windows Media Center – love the
record-by-keyword feature and the free programming guide.
But for day-to-day
use we’re back to the Mac.
23 Dec 2007
I’ve spent roughly 4 years of midnight-engineering time looking into the cool
languages to see if they would make game programming easier or more fun.
Haskell, Ocaml, F#, Erlang, Scheme, Lisp, D, Factor, Scala, Python, I’ve
looked at them all.
F# held my attention for quite a while, but now my
platform-of-choice has moved away from F#’s design center. (I’m into Linux-
based mobile platforms now.) And to be honest, I’m still happier in a C-like
language.
I’m depressed. Sure, I learned a lot about fancy language features,
but I could have written quite a few games in plain-old-C++ (or C#, or Java or
Flash or Basic) in the same time.
P.S. Someone else has done this more
impressively than I have. Do a Google Groups search for Brandon van Every, who
has had a five year odyssey to find the perfect non-C++ game programming
language. I corresponded with him back when we were both interested in O’Caml.
Since then he’s managed to annoy pretty much everyone by harping on their
favorite language’s shortcomings. In the end (at least as of six months ago)
he’d given up and gone back to C++. I look forward to seeing what he does
next.
23 Dec 2007
I’ve been poking around with Lisp and Scheme again, and am reminded of some of
my favorite science fiction books (warning, plot spoilers follow):
- Verner Vinge’s “A Fire Upon the Deep” begins with a group of scientists mining an ancient civilization’s web archives. They need to build interpreters for the ancient civilization’s programs. All goes well until they reconstitute a malevolent AI that they spend the rest of the book fighting.
- Piers Anthony’s Macroscope involves a group of people trying to decode an Extra-Terestrial message, that other ETs are trying to jam. Over the course of the book your opinion as to which group of ETs has humanity’s best interests at heart changes back and forth several times.
- Any number of SF stories are set in the far future where people poke around in the ruins of a once-great civilization. (See Gene Wolfe, Cordwainer Smith.)
Working with Lisp reminds me of these books. Lisp’s a seductive, ancient,
powerful language that has been worked on for years by very smart, very
motivated hackers. Pretty much everything one can think of to do with Lisp has
been done, multiple times, by really smart people.
For example, I want to have
a system where I can interactively write a game, changing code on the fly
while the game is running. I want to be able to use macros and garbage
collection and free serialization and inspection and all that cool stuff that
Lisp provides.
And Naughty Dog had all that, in GOAL, for the PS2. Their
implementation was apparently much better (more efficient, less buggy, more
features) than one I could cobble together out of open-source-Lisp parts. And
they had used it to successfully write two or three games. But they they
walked away from it. Gave it up. Went back to C++.
They said it was because
they had a lot of pressure from their parent company to make their engine more
reusable. But I think it’s also because the results they were getting just
weren’t that much better than the results that all the other developers, who
use C++-and-some-cheap-scripting-language-like-Lua, were getting.
I think Lisp
is like a powerful alien technology that may not be in your best interests to
use.
04 Dec 2007
As seen on Lambda the Ultimate.org
Quantifying the Performance of Garbage Collection vs. Explicit Memory
Management We compare explicit memory management to both copying and non-
copying garbage collectors across a range of benchmarks, and include real
(non-simulated) runs that validate our results. These results quantify the
time-space tradeoff of garbage collection: with five times as much memory, an
Appel-style generational garbage collector with a non-copying mature space
matches the performance of explicit memory management. With only three times
as much memory, it runs on average 17% slower than explicit memory management.
However, with only twice as much memory, garbage collection degrades
performance by nearly 70%. When physical memory is scarce, paging causes
garbage collection to run an order of magnitude slower than explicit memory
management.