Wensheng: June 2008

6/30/08

Can't believe my brain is only 20

So I took this japanese flash test for brain age, it show my brain age (脳年齢) was 20. I don't know if it helped that I was drinking a beer at the time of testing.

Add: Hmm, Chimps do much better.

Grid/Cloud/Utility computing is expensive!

Utility computing is hot now. Two major players are Amazon EC2 and Google AppEngine. There are many minor players too. It's also evident that some ISP's are jumping onto the bandwagon.
I did some quick research on several vendors and sumarizd by findings here:

click to enlarge

For me, Google AppEngine makes most sense. You pay nothing for up-to 5 million page-views per month. The only problem, albeit a major one, is that it's only python. It's not a problem for me though since I know python.

Other providers are way too expensive. Look at engineyard.com, $400 per month get you a meager 760Mb ram, only 250GB transfer. Someone must be out of his mind to purchase it.

For those who don't go with AppEngine, I would say it's best to just get a VPS. You can do whatever you want with VPS just like a dedicated server. And it's cheap, you can easily find a 512M ram VPS with plenty of diskspace and bandwidth for less than $50, that's all you pay. (Just make sure your VPS is Xen based, not Virtuozzo/OpenVZ based, as they allow ram bursting, which means you will NOT get your ram when you really need it, as other users on the same host machine as you stole it.)

6/26/08

So Google thinks I am a robot

Two minutes into my testing of my AppEngine website pytan.com, I got this page. I have no idea what triggered it, probably from ajax calls to appengine when I move my todo items.

When I tried to access this blog, I got it again.

However rest of google: search, reader etc. worked fine. So apparently blogger and AppEngine share some of the infrastructure.

So a robot think I, a human, is a robot. Tech Singularity is here already?

6/5/08

The Flip vs Casio Exilim Z80 vs Michael Arrington's Canon

Michael Arrington compared the Flip Mino with his Canon SD750 and concluded that the Flip just doesn't make sense. I reached the same conclusion sometime ago when I shopped for a digital camera that shoot good video, and found Casio Exilim.

Here's the comparison of the Flip vs Exilim Z80. Here Z80 didn't exactly win "across the board" (e.g. battery life), but it's pretty close.

I highlighted the winning points of Z80: higher video resolution, optical zoom, larger storage /longer recording time, and lower price.
Z80 is shorter but a little fatter. Overall it's about 8% bigger. But if size really doesn't matter for you, consider Exilim S10, it's only 0.59" thin and smaller than Mino overall.

I actually don't care much about image quality of Z80. I have a Rebel Xti. I bought Z80 specifically for it's video capability.

There's a red button for one-button video recording. One criticism of using digital camera for video capture is that you have to set to to movie mode. But with this red video button, the argument goes away. You just turn it on, push the red button, and you are recording, it's that easy. Takes 5 seconds? No, more like 1.5 second!

One nagging point of Z80 is that it output video in "mov" format (h.264 encoded). Yes, Quicktime format. I actually don't like it but it would have been great plus if it's judged by Michael Arrington. He said "everyone is moving to Quicktime at this point." Not me, I'd rather install "quicktime alternative" then to install Quicktime/itune.

6/2/08

pymmseg, python mmseg

pymmseg is a python implementation of mmseg. It's my quick-n-dirty Chinese word segmentation program.

I needed a Chinese utf8 word segmentation program for some simple stuff. After googling found mmseg. But it's big5 based and written in C. I created a python version based just on 'simple algorithm' (just do maximum matching without other 3 steps) and converted lexicons to UTF8.

Download

It sorta works, for my purpose anyway, as you can see from screen shot.

You can see a lot of words don't exists in dictionary/lexicons like '赈灾','板房','两米'. That's because the lexicons are directly converted to simplified Chinese from a traditional one, and it's missing a lot of words. I am sure using a dictionary trained from simplified source will greatly improve pymmseg's accuracy. But using a better dictionary will not solve ambiguity, such as '兴奋得很晚都睡不着' (should be '兴奋得很晚都睡不着'). For that we will have to use 'complex algorithm'.

I will create a better dictionary using simplified chinese corpus, and also create the a version employing 'complex algorithm' when I have time.

Wensheng