Saturday, 30 August 2008

Single-child families and population decline

There's worry in certain circles right now about population decline in much of Europe. Families are becoming smaller as investment in individual children (supposedly) continues to rise. And if average family size is below, what is it, 2.1, then voila, it's below replacement level.

I wonder if there's another factor that's contributing, making the effect larger and more self-reinforcing. With very few exceptions, all the only-children I know are single in their adult lives, and all those who had at least one brother or sister are now partnered, again with very few exceptions.

If this is true, then as average family size decreases, logically, the number of single-child families increases as a proportion of the whole. If most of those children are not going to reproduce, then you have a runaway decline.

Those enterprising cloud warriors

It seems that no sooner does a technology appear than it is subverted by those who want to abuse it.

A while back I started renting a virtual server to which I was (am) going to move a website that I own. The site is currently on a small hosting account, and it's starting to need a bit more than that. Anyway, I didn't do anything for a while, and just let the virtual server gather dust. Then, the other night, I got a burst of energy and started to look at what I could port over first.

As I was doing that, I was also idly browsing through the log files in /var/log and I see an enormous messages file and an enormous secure file. And their backups were big too... This is a server that, while it's on the internet, didn't even have a domain name until the last few days, it was just a raw ip address. Why were those files so big?

It seems that someone is trying an automated dictionary attack on the server. As far as I can tell, each login attempt, via ssh, is supplying a username but no password. Each name is being tried once, and then on to the next. So it's looking for unprotected accounts rather than trying to guess passwords (I think a "dictionary attack" really refers to when they are using a dictionary to try to guess passwords, but I'll stick with it since they must be storing their list of usernames in a dictionary too.) It seems fairly primitive but it's still immensely worrying, especially since I really don't want to have to become a linux security expert.

Each attack starts about the same time every evening, and lasts about eight minutes. That's it, but in that time it's doing a couple of login attempts every second. Of course, the source ip address of each attempt is logged, so I've been busily adding them to /etc/hosts.deny whenever I see a new one. Last night was quiet, first time in a couple of weeks, apparently. Time will tell whether just adding addresses to hosts.deny will work; in the short run maybe, I rather doubt it in the long run.

So what has any of this got to do with "the cloud"? Just that the attacking ip address sometimes resolves to an Amazon AWS instance, ec2-75-101-154-0.compute-1.amazonaws.com to be exact. Either some nerveless criminal is renting out EC2 instances with the intent of using them specifically to crack whatever insecure hosts they can find or, perhaps more likely, some of the startup images that EC2 instances use — and these images are huge, containing a whole operating system, as well as whatever applications they are going to run — have been compromised.

Looking at the way Amazon run EC2, they provide a number of basic instance images, but there are a lot of others mentioned on their forums, created by "helpful" users and containing just the right applications for people to find them attractive. An obvious security hole you might think, but people must be using them, or they wouldn't exist. Maybe Amazon need to virus-scan their EC2 images before starting them up, what a horrible thought.


Update: I see that at least one other person has noticed the same thing happening too: David in Sweden. I wonder if he ever got an answer out of Amazon?

Saturday, 23 August 2008

Amazon Elastic Block Store

This has been presented as a small, incremental step forward for Amazon's cloud-based computing. It's not. It's a big deal.

First, some grossly-simplified terminology. In the beginning there was Amazon Elastic Compute Cloud (EC2), which is your server(s). There was also Amazon Simple Storage Service (S3) which you can think of as being your network-attached file system. A little later they also brought out (though it's still in beta a.t.m.) Amazon SimpleDB which is your database and Amazon Simple Queue Service which is your messaging infrastructure. So what's the problem? Why do we need EBS?

Well I said "servers" above, but it's more accurate to think of EC2 instances as being like a CPU + RAM, where the RAM has enough space for a virtual disk. Your server does have some disk space attached ("instance storage" in the lingo) to which it can write files, but remember that the whole point of EC2 is that servers can get started automatically in response to demand, and that when demand falls, they can get automatically switched off. At that point, anything they've written to their local disk just disappears. I suppose it's the same as an object going out of scope and getting garbage-collected.

So if your web application, running on your EC2 instance, has done some work and updated some records, then it needs to salt them away somewhere outside the server-instance itself, and that's where S3 and SimpleDB come in. And of course, once you realise that your EC2 instance is probably going to need to save some data, it's but a step to see that it's also probably going to have to initialise itself, once it's up and running, from a data store too. So data passes between EC2 and S3 and SimpleDB in both directions.

So again, where's the problem? Well, I said that S3 was like a file system. It is, it's like some old, clunky PC-DOS 1.x floppy-disk-only file system, before you had directories and certainly before you had subdirectories. Each user account gets I think it's 100 bags (they're the floppies) and each of those bags can hold vast amounts of objects, but all at the top level, and all in the same namespace. If you want to impose any kind of structure on top of that, then you must do it by appropriately decorating the names (keys) that you use. It's as though your whole hard disk could only have a single folder and you had to keep everything organised by using tremendously long filenames.

Again, I said SimpleDB was the database, but it's not an SQL database. In fact it doesn't even have tables. It's more like a set of objects and a set of attributes, and any object can have (or not have) any of those attributes, and you can search on the attributes. So imaging that you could add some kind of metadata tags to S3 files and search on the tags, and that's SimpleDB.

I'm not criticising them, the constraints of replication, timeliness, availability etc. etc. very likely imposed all these limitations. And in an odd kind of way, the objects you get in S3 and SimpleDB remind me more of the sort of constructs you get in programming languages like Java or Javascript, so you can see that as building blocks they might be very powerful and useful.

But it's probably true to say that the software that would run well in this environment would have to be written or adapted specially for it. A lot has been written too, in the couple of years that EC2 and S3 have been available. But there's a whole class of more traditional applications that haven't, and which won't work in this environment, and they represent 99.9999% of all the software that's out there.

So now Amazon introduce EBS, and you can think of that as an infinite array of network-attached hard disks (but this time they are proper ones, with subdirectories and everything :) ). Previously, if I wanted to store my web application's data in say a PostgreSQL database, to get the benefit of SQL based access, I had to put that database on the EC2 instance's hard disk (instance storage) and had to make sure that the data in that database got pickled to S3 before the instance was shut down. Now I can just put the database on the EBS disk and forget about it.

Of course, for the very largest, most fault-tolerant and distributed of applications, EC2 + S3 + SimpleDB + SQS was probably architecturally the right way to go anyway; the sheer size of such applications might make exactly the same constraints necessary. The difference is in the area where I live: the web site that can fit comfortably on a single server, or even a single virtual server, with an SQL database on the same machine or on a different machine on the same network. That's to say, the space where the smallest to the medium sized database driven websites live.

These are well served at the moment by the numerous small hosting providers who'll give you a virtual server or two, or your own box. But if your web site does well, then the upgrade path can be difficult and expensive. And if your web site very suddenly does well then you likely won't be able to resource-up in time and it may fall over. And the pricing models are very coarse-grained in comparison to Amazon's.

For those web sites, Amazon EC2 wasn't an option because they would have had to rewrite their data layer and still end up with something that wasn't as flexible as SQL. But with EBS, those kinds of sites can be ported directly.

Monday, 11 August 2008

South Ossetia

It's anyone's guess what the Russians' ultimate goals are, but I suppose there are three possibilities, in ascending order of seriousness.

First, simply to defend the (independence of the) South Ossetians, which is their ostensible motive. If that's the case, one might reasonably ask if invading Georgia is the right way to go about it. However the ultimate consequences might not be all bad.

Second, to annexe South Ossetia. No doubt pliant S.O. nationals could be found to publically thank them for doing so. This would definitely qualify as adventurism, and its successful accomplishment would likely increase the Kremlin's appetite for more such moves - Abkhazia probably being the first.

Third, a full-scale and prolonged occupation of Georgia, leading eventually perhaps to: the incorporation of large tracts of its territory (South Ossetia, Abkhazia, and the corridor between them) into Russia; the emergence of rump Georgia as a vassal state; or even the complete annexation of Georgia. This would be extremely bad, and would likely lead to instability in the surrounding areas.

Something that would worry me very much if it happened after successful Russian action in S.E. (defined as the second or third possibilities above) would be a coup in Serbia, followed by an invitation to Russia to safeguard Serbian borders. Conflict between Russia and the EC would seem likely at that point, followed almost immediately by a Russian energy embargo and a collapse of European will.

Wednesday, 6 August 2008

Absinthe

Good to see that Absinthe is making a comeback. Interesting that the American market is where all the new action is, and that the products available there are higher quality than in Europe:

To date, there are four brands on US shelves: Lucid (Breaux's formula), Kuebler, Green Moon, and St. George Absinthe Verte. "The US is lucky in that its first absinthes are high-quality products, distilled from whole herbs," Breaux says. "In the European market, 80 to 90 percent is industrial junk."

I want my grave-photo to show me sitting at a table, having just finished an excellent repast, smoking a Cuban cigar and drinking a fine Absinthe.*

* Does one drink Absinthe after dinner or before? Perhaps I could make do with a fine champagne cognac.

Sunday, 27 July 2008

GWT and GAE

One always looks, of course, for the ideal toolkit with which to develop AJAX-style web applications. It's a pastime into which one can put as little or as much effort as one desires, and it rarely seems to result in satisfaction. In that, it resembles the search for happiness... but I digress.

Some toolkits promise much and deliver little. Many seem focused on flashy effects rather than functionality. Most seem to have large and inexplicable gaps ("surely they must have realised they would need to..."). You end up combining, or trying to combine, different kits that are built on different principles and have no unifying structure, and whose subtle interactions lead to difficult bugs.

I'd more or less settled on YUI (The Yahoo! User Interface Library) as having the best functionality for my purposes. It's a bit heavy and a bit clunky, and new developments spend entirely too long in beta, but it's comprehensive and versatile, even powerful. I use it in the editing application for gedsguides.com.

Along the way, I'd had a look, along with everyone else, when Google brought out the Google Web Toolkit. An interesting approach: compile a Java source to a Javascript runtime (the delicious irony didn't go unnoticed — the Java you'd write using the GWT would once upon a time have run natively in a JApplet). The potential was obvious, as were the potential gotchas: differences between the development environment (a Java runtime) and the production environment (in-browser Javascript) might lead to severe debugging problems.

Happily, the quality of engineering that went into GWT seems to have been high enough for it to have garnered a good-sized community, so maybe the fears were overstated. There's even a version of Ext JS available for it, Ext GWT to give it a bit of extra pizzaz in the GUI. The combination seems excellent and I'm nerving myself up to rewriting a page of the GGW Admin Console with it, as a test.

Another area where GWT shows evidence of independent creativity is its approach to cross-browser compatibility. Most other toolkits, insofar as they attempt to solve this problem, approach it by abstracting it away the differences. The result is that they invariably have a base module that has to be loaded by the browser, and that translates their general code into browser specific calls, for whichever browsers are supported. This module needn't be very large, 10 to 20 kilobytes perhaps, but it does have to contain code paths for all supported browsers. And that's a reasonable price to pay for the independence it provides.

The reason all the Javascript toolkits have to adopt this approach is because they are entirely runtime, browser-based affairs. GWT offers the same browser independence to your code, but this is a pre compile-time affair! GWT then produces lots of different run-time support files for your app, one set of files for each browser (even, for each version of each browser). The end result is that the Javascript footprint of your application, when it actually runs in a particular browser, may be much smaller than with other toolkit: each browser has to load only the code that it needs. Extremely smart. The cost of course is that your server has to have room for tons of files; your application takes much more space on your server's disk than with the other approaches; there may also be an impact on your server's cacheing. These are all things that we know how to solve on the server though, and they play to its strengths; all in all, I rather like the balance that GWT has adopted.

Here's the Achilles heel. If your web application needs to contact the server for data (which, given that it's a web app, it's bound to do a lot of!), then the support that GWT provides for RPC on the server is implemented using Java servlets. This means that your server has to be Java based: Tomcat, or Resin, or JBoss or whatever. This is a matter of a particular implementation rather than anything essential of course, though there's a fair bit of work for anyone that wants to re-implement the server-side stuff using something like, say ... Python.

Why would anyone want to re-implement GWT's server side using something like, say ... Python?

Well say hello to Google App Engine! I remember getting very excited when this came out: "run your web applications on Google's infrastructure"! Who wouldn't want to do that? This would be instant death to the thousands of tiny ISPs who host millions of modest LAMP sites. The sort of stuff that uses a MySQL database for small amounts of data. Things like game guild sites with membership lists of a couple of hundred people, raid calendars, little bits of news and so on. (I still think it will impact these things severely, but over a much longer timescale than I first thought.)

The thing is that App Engine is Python-only. Lots of people have been wondering if it's possible to combine GWT, which is mostly a browser-client (AJAX) technology, with GAE, which is mostly a server techology. Trivially it is, since the pages you serve from your GAE web-app can contain anything of course, including GWT applications. But that only works with the most basic of applications. If you need to pass data between the AJAXy GWT application running in your browser and your server, then GWT wants to talk to a Java servlet. Or you could jettison GWT's preferred approach (RPC) and go for something RESTful (there's a JSON example app comes with the GWT SDK). In which case a Python back-end is as good as anything else; it's very doable, but there'd be somewhat more coding for you if you did that.

So Google need to adopt a more joined-up approach to web applications. They could commit to writing a GWT-compatible back end in Python and running on GAE. That's not such a bad idea; it migh even be the least work for them. Make it a compiler option, with a flag saying if its server code should be emitted in Java for Tomcat et al. or in in Python for (Django and) GAE.

Don't look for Protocol Buffers to help here though, even though it's touted as "Google's Data Interchange Format". Although Protocol Buffers already supports both Java and Python (that is to say, the supporting tools can emit code in both those languages) it's a binary wire protocol: suitable for slinging data around on an internal network, but not so good for (typically text-based) conversations between a browser and a web server, especially if there's a firewall in the way.

Update. Note that Google App Engine uses Protocol Buffers internally, to communicate between the application (i.e., your web app) and the App Engine server, and between the App Engine server and the BigTable servers (all internal to Google's private network see? — exactly the use case envisaged in the previous paragraph).

Thursday, 17 July 2008

Different attitudes to what immigrants bring with them

Marginally more interesting than usual (which is to say, ever so slightly interesting) post at language log: When two become four on the subject of attmepts to make English the official language of the United States.

On this question I find my feelings ambiguous. When I was young, and interested in everything, especially languages, I considered it a tragedy if someone who grew up in a second-generation immigrant household didn't speak their parents' mother tongue. What a loss! They would never have that as it were stereoscopic vision of things that even a rough and ready biligualism can bring. As someone who won a small proficiency with French the hard way, I thought people who had rejected, or otherwise been denied, the chance to be bilingual were the most pitiable kind of unfortunates.

Now that I'm getting old, and somewhat tired, I like to be able to understand what's being said to me without having to go to too much effort. In sum, I want it to be presented in my native tongue. The people I feel sorry for now are those who've lived all their lives in one place, and find that that one place has become a foreign country, no longer their home.

But enough about the deficiences of my twilight years. What I think is interesting is not just attitudes to the languages that immigrants bring with them, but the wide differences in attitudes to the different things that they bring with them. so for example, I'd characterise anglo-american attitudes broadly thus:

  • Food: "Yum! More please!"
  • Politics and/or religions: "Please keep it to yourselves."
  • Languages: "Hmm. Not for me thank you."
  • Money: "You can't use that stuff here."

It's the contrast between the first and the last one that's strongest. Most westerners like to try foreign food, and some cuisines get a really rapturous welcome. Nobody in the West, neither the general population nor the state itself, wants foreigners to try to pay for things in anything other than our own currency. Languages — somewhere in between.