on GFW of China

GFW=great fire wall.
On my vacation to China this last summer I tried to access this blog from different locations in China: Xi'an, Beijing, Dalian, Shanghai. Never once did I succeed.

I can read Wikipedia, but I heard from other people it's hit & miss. I had no problem with porn sites though.
I always thought this censoring by Chinese government is really stupid. Today I read a excellent column on NY Times by World-is-flat author Tom Friedman, towards the end of the article, it struck a chord with me:
America still has the right stuff to thrive. We still have the most creative, diverse, innovative culture and open society — in a world where the ability to imagine and generate new ideas with speed and to implement them through global collaboration is the most important competitive advantage. China may have great airports, but last week it went back to censoring The New York Times and other Western news sites. Censorship restricts your people’s imaginations. That’s really, really dumb. And that’s why for all our missteps, the 21st century is still up for grabs.

For the benefit of the China and Chinese people's future, Mr. Hu, Tear down this great fire-wall. (tear down this wall). 胡哥, 这墙拆了得了.


Emacs utf8 and Chinese on Windows

Download EmacsW32 latest binary (emacs 23 + emacsw32). Then add following to your .emacs
(setq locale-coding-system 'utf-8)
(set-terminal-coding-system 'utf-8)
(set-keyboard-coding-system 'utf-8)
(set-selection-coding-system 'utf-8)
(prefer-coding-system 'utf-8)

All your problems of viewing/editing Chinese on Windows Emacs will go away.


Adding Emacs to right click context menu

To add Emacs (or any program) to right click context menu for any type of files, open regedit, add a key under HKCR/*/shell, name it whatever you want, I name it "run emacs".

Then under it, create a new key called "command", change it data to your path to emacs, plus a "%1" at the end.

If you want to add Emacs to just a specific type of file, for example, ".org" file for Orgmode, it's a little more work.
Create a new key ".org" under HKCR, give it data value "Orgmode.File". Then create a new key "Orgmode.File" under HKCR. Under it, create key "Shell", under which create key "Emacs", under which create key "command", change its data to emacs path plus "%1".

If you want to add Emacs to directory context menu, create the same keys under HKCR/Folder.

Of course, all these apply to any programs, not just emacs.


how to make fedora 10 image from old version

Fedora 10 just came out. If you have an old redhat OS (fedora or CentOS) and you want a new fedora 10 image, to be used as a Xen domainU or a customized Amazon EC2 AMI or whatever, here's how to do it:

Step 1: Create a 2Gig sparse file or lvm and mount it to /mnt.
for sparse file:
# dd if=/dev/zero of=f10.img bs=1024k count=1 seek=2000
# mkfs.ext3 f10.img
# mount -o loop f10.img /mnt
for lvm:
# lvcreate -L2G -n f10 vg0 (assume you have a volume group called vg0)
# mkfs.xfs /dev/vg0/f10
# mount /dev/vg0/f10 /mnt

Step 2: edit /etc/yum.repo.d/fedora.repo
go to /etc/yum.repo.d, create a new directory, then move everything to the new directory. (you will move them back up to /etc/yum.repo.d after creating fedora 10). Create a new file call fedora.repo with content:
name=Fedora 10 - i386


Step 3: do a "yum groupinstall"
#yum clean all
#yum makecache
# yum --installroot=/mnt -y groupinstall Base
Go have a cup of tea, or read some blogs, or downmod some reddits.
After the installation is complete, to make it actually useable, you need to create these devices:
# MAKEDEV -d /mnt/dev -x console
# MAKEDEV -d /mnt/dev -x null
# MAKEDEV -d /mnt/dev -x zero
And create a root passwd:
# chroot /mnt
# pvconv
# passwd
# exit
Create some files:
# vi /mnt/etc/fstab
# vi /mnt/etc/sysconfig/network
# vi /mnt/etc/sysconfig/network-scripts/ifcfg-eth0
# vi /mnt/etc/resolv.conf
The contents of these files depends on your settings, use the files on your current OS as references.

That's it, you're done. Unmount the image. Now you should be able to boot it. After you boot it(e.g. as a Xen domU), do some configurations. These are what I did:
# chkconfig NetworkManager off
# chkconfig network on
# service NetworkManager stop
# service network start #use network instead of NetworkManager
# chkconfig atd off
# chkconfig bluetooth off
# chkconfig cups off
# chkconfig haldaemon off
# chkconfig ip6tables off
# chkconfig mdmonitor off
# chkconfig netfs off
# chkconfig netfs off
# chkconfig nfslock off
# chkconfig pcscd off
# chkconfig portreserve off
# chkconfig rpcbind off
# chkconfig rpcidmapd off
# chkconfig rpcgssd off #all these services are useless for me
# yum install xfsprogs #my filesystem is xfs
# vi /etc/rc.d/rc.sysinit #comment out a bunch of lines to get rid of false err/warning device-mapper messages during boot-up

Don't forget to delete (in your current OS) fedora.repo in /etc/yum.repo.d and move everthing back in there.


Super Easy Python powered Christmas light controller for $40

There're lots of ways to control Christmas lights. The easiest (but expensive) is just getting a lightorama. I'm not spending $400 on a controller. So I decided to build my own.

Computer Christmas has many articles showing you how to do this. There's a how-to for a 320 channel controller. All I need is 8 channel on/off (no dimming), so a simple parallel port relay box is perfect for me. This how-to tells you how to building one. But actually you don't need to build a relay box yourself, there are parallel port relay kits sold online for cheap. You just buy the kit and connect to electric box.

So Here's how to build a Python powered Xmas light controllers for $40(more or less).

1, get a parallel port relay kit for about $32, for example: here or here. You can get it assembled or in module form for a few more bucks. I didn't want to spend much time assembling so I got a module.

2, get electric box and outlets from Homedepot or Loews for total $8. Again I'm lazy so I just got a power strip from Walmart for $4, but it only has 7 outlets, I couldn't find one with 8.

3, Wire the kit to your electric box.

4, Connect to computer and fire up Python (see my previous post).
Video of testing session.
Here's source code for the GUI program (in wxpython), most of code were generated by wxglade, I just created the event handlers for the mouse clicks. The speed control (slider) is not implemented yet.


python parallel port on Windows

Tried different ways of controlling parallel port with Python on Windows. The easiest way is with Inpout32.dll.

Just download the dll, put it in system32 folder. Then from python:
from ctypes import windll
p = windll.inpout32
p.Inp32(0x378) #default 255(all high) on my pc
p.Out32(0x378, 0) #put all low on port 2-9

The address 0x378 might be different on your machine, open System->Hardware->Device Manager->Ports->ECP Printer Port->Properties->Resources, use the first number as your address.


Create QRcode with google chart API

Google recently add QRCode to its Chart API. This make QRCode generation a breeze. All you need to do is sending a GET request to google.
QR Code is the most popular 2d barcode. I put a page up that make it easy to generate QR Code with Google Chart API:


To create a URL, you need to put 'http://' at the beginning. Otherwise code reader will see it as just text. Likewise to create a telephone number, you need to put 'tel:' at the beginning, otherwise it's just code and you can't dial it from your phone.

There're a lot of code readers. I have Kaywa Reader on my cellphone, it works really well.


Can't believe my brain is only 20

So I took this japanese flash test for brain age, it show my brain age (脳年齢) was 20. I don't know if it helped that I was drinking a beer at the time of testing.

Add: Hmm, Chimps do much better.

Grid/Cloud/Utility computing is expensive!

Utility computing is hot now. Two major players are Amazon EC2 and Google AppEngine. There are many minor players too. It's also evident that some ISP's are jumping onto the bandwagon.
I did some quick research on several vendors and sumarizd by findings here:

click to enlarge

For me, Google AppEngine makes most sense. You pay nothing for up-to 5 million page-views per month. The only problem, albeit a major one, is that it's only python. It's not a problem for me though since I know python.

Other providers are way too expensive. Look at engineyard.com, $400 per month get you a meager 760Mb ram, only 250GB transfer. Someone must be out of his mind to purchase it.

For those who don't go with AppEngine, I would say it's best to just get a VPS. You can do whatever you want with VPS just like a dedicated server. And it's cheap, you can easily find a 512M ram VPS with plenty of diskspace and bandwidth for less than $50, that's all you pay. (Just make sure your VPS is Xen based, not Virtuozzo/OpenVZ based, as they allow ram bursting, which means you will NOT get your ram when you really need it, as other users on the same host machine as you stole it.)


So Google thinks I am a robot

Two minutes into my testing of my AppEngine website pytan.com, I got this page. I have no idea what triggered it, probably from ajax calls to appengine when I move my todo items.

When I tried to access this blog, I got it again.

However rest of google: search, reader etc. worked fine. So apparently blogger and AppEngine share some of the infrastructure.

So a robot think I, a human, is a robot. Tech Singularity is here already?


The Flip vs Casio Exilim Z80 vs Michael Arrington's Canon

Michael Arrington compared the Flip Mino with his Canon SD750 and concluded that the Flip just doesn't make sense. I reached the same conclusion sometime ago when I shopped for a digital camera that shoot good video, and found Casio Exilim.

Here's the comparison of the Flip vs Exilim Z80. Here Z80 didn't exactly win "across the board" (e.g. battery life), but it's pretty close.

I highlighted the winning points of Z80: higher video resolution, optical zoom, larger storage /longer recording time, and lower price.
Z80 is shorter but a little fatter. Overall it's about 8% bigger. But if size really doesn't matter for you, consider Exilim S10, it's only 0.59" thin and smaller than Mino overall.

I actually don't care much about image quality of Z80. I have a Rebel Xti. I bought Z80 specifically for it's video capability.

There's a red button for one-button video recording. One criticism of using digital camera for video capture is that you have to set to to movie mode. But with this red video button, the argument goes away. You just turn it on, push the red button, and you are recording, it's that easy. Takes 5 seconds? No, more like 1.5 second!

One nagging point of Z80 is that it output video in "mov" format (h.264 encoded). Yes, Quicktime format. I actually don't like it but it would have been great plus if it's judged by Michael Arrington. He said "everyone is moving to Quicktime at this point." Not me, I'd rather install "quicktime alternative" then to install Quicktime/itune.


pymmseg, python mmseg

pymmseg is a python implementation of mmseg. It's my quick-n-dirty Chinese word segmentation program.

I needed a Chinese utf8 word segmentation program for some simple stuff. After googling found mmseg. But it's big5 based and written in C. I created a python version based just on 'simple algorithm' (just do maximum matching without other 3 steps) and converted lexicons to UTF8.


It sorta works, for my purpose anyway, as you can see from screen shot.

You can see a lot of words don't exists in dictionary/lexicons like '赈灾','板房','两米'. That's because the lexicons are directly converted to simplified Chinese from a traditional one, and it's missing a lot of words. I am sure using a dictionary trained from simplified source will greatly improve pymmseg's accuracy. But using a better dictionary will not solve ambiguity, such as '兴奋 得很 晚 都 睡不着' (should be '兴奋 得 很晚 都 睡不着'). For that we will have to use 'complex algorithm'.

I will create a better dictionary using simplified chinese corpus, and also create the a version employing 'complex algorithm' when I have time.


This blog is almost xhtml valid

After some html tweaking, this blog is now almost valid xhtml. HTML validator shows 2nd page have only 1 warning. It had more than 100 warnings before my tweaking.

Homepage have 4 warnings, 2 of them are from video 'embed'. The others are from iframe and pagination. Iframe 'navbar' itself have 4 warning. These are google's faults which I can do nothing about.
So I give up.
Not that I think it's important, but it'd be nice if google generates valid xhtml code.


Welcome to Beijing

Here's a MV called 'Welcome to Beijing'. Lots of famous Chinese singers in this. I can only tell the names for 2/3 of them.

Apparently it's for Olympic game. Speaking of which, I am so glad my folks were able to get ticket for me for 5 different sessions, now all the tickets are gone.

Today the first news on official Beijing Olympic site is: "China to launch second Olympic weather forecasting satellite on May 27". Wow, they launch 2 satellites just for Olympic weather forecasting?!

Anyway, Welcome to Beijing (in August)


Test video from caipu.org

Testing to see if a embedded video from caipu.org works here.

from here.


decoding dianping.com google map lat/lng

diandianping.com now use Google map for its biz locations. I'd like to get latitude and longitude for some of the restaurants I found there.
But "view source" showed its lat/lng are encrypted.

I just don't understand these Chinese sites like mapbar, mapabc, now dianping, why encrypt them? What's the point?
Because dianping use gmap, it has to send clear lat/lng to google, that means the browser has to decode lat/lng, that mean the decoding code is right there in javascript. In this case, the decoding happens in jquery.jmap.min.js.
This js is also obstructed. But I was able to crack it in a short time.
It appears the javascript evaluates and generates code on the fly and then evaluate the generated code that decode lat/lng and send them to google.

Here's my decoding script. It works now but might not in the future as dianping might change the parameters like digi, add, plus and cha.

Chinese: 点评用谷歌地图,但是给地图坐标加了密。我不知道这样做是为了什么。因为你必须在浏览器里解密,所以研究下代码,很容易就解了密。这里给出解密代码,是用python写的。