8/20/10

Enable chinese IDN .中国 (.xn--fiqs8s) in browsers

(Note this is for English browsers. You don't need to do this in localized browsers.)

Chinese IDN .中国 (puny: .xn--fiqs8s) is live since 8/15. Internet Explorer and Safari (tried on my iphone) support it natively. But in non-localized Firefox and Chrome, it display Punycode instead of unicode characters.

For example, the domain http://你我他。中国 will be displayed as http://xn--8mqxmp29b.xn--fiqs8s/ in Firefox and Chrome.

To fix it, do this in Chrome: In Tools(the wrench)->Options->Under the Hood tab, under "Web Content", click "Change Font and Language settings" button, then select "Languages" tab, click the Add button to add the language Chinese.


For Firefox, someone already filed a bug report, so hopefully it will be fixed in future release soon. But for now, we need to add the Chinese IDN to the whitelist. Type "about:config" in the address bar, click "I'll be careful, I promise!" button, right click, select New->Boolean, enter "network.IDN.whitelist.xn--fiqs8s", click OK, then select "true".

Now you can type Chinese IDN domain name in the address bar, it will be displayed as is and will not be converted to Punycode. In fact, if you type punycode, it will be convert back to IDN.

5/16/10

Creating ooxml word docx file with python and XSLT

The idea is taking an existing Word document, making it a template, and using it with external data to create a new Word document.

If you're dealing with 'doc' file (word2003 and before), you can use Pywin32. There are good information in the Chapter 10 of "Python Programming on Win32" book, in a section called "Automating Word".
But 'doc' file is on the way out. Staring from version 2007, Word is using the new 'docx' format. The good news is that 'docx' use OOXML (Office Open XML), an open industry standard. This means we can, in theory anyway, create Word DOCX file without using Windows. In practice, there's currently no good library that facilitate creating a docx file from xml data from scratch. It's tedious to do so without a good library. Hopefully someone will create one soon.

A short-cut exists though, i.e. create a template docx, create XSLT from it, then use XSLT to transform xml data into a new docx file.

Here's a post that describes this process using CSharp. My code is based on ideas and codes in this post.

Here's how to do it in Python (using the excellent lxml module):

1, Create a template docx file
Just create a regular docx file with Word (example.docx)

2, Create xslt file from docx file.
A docx file is a zip file. You can get "word/document.xml" file by unzip the docx file. But the resulting file has no line breaks. It's very hard to edit it using regular text editor like VIM. (you can use xml editor, but there ain't good one that's free). I have a python program that does this (getxslt.py). All it does is getting 'word/document.xml', prettifying it(adding linebreaks), and adding XSLT header and footer.
We need to use the docx file later, but without 'word/document.xml' file (python's zipfile module can't do file replacement in zip). You can achieve this by "open archive" the docx file with '7-zip' on windows, then delete 'word/document.xml'.

3, Modify xslt file.
Determine where you need to change the data, add things like:
<xsl:value-of select="..."/>
and
<xsl:for-each select="...">
...
</xsl:for-each>
You need to study a little bit xslt to do this, it's not too hard.

4, Do XSLT transform to convert xml data to 'word/document.xml' and add it to new docx file.
import shutil
import zipfile
from lxml import etree

xsl = etree.XSLT(etree.parse("xslt1.xml"))
xml = etree.parse("cddata.xml")
newxml=xsl(xml)

#nodocxml.docx is original docx file without word\document.xml
shutil.copyfile("nodocxml.docx","mycds.docx")

mycds = zipfile.ZipFile("mycds.docx",'a',zipfile.ZIP_DEFLATED)
mycds.writestr('word/document.xml',str(newxml))
mycds.close()

explanation:
'xslt1.xml' is the modified xslt file from the step 3;
'cddata.xml' is xml data file (source: w3schools xslt tutorial );
newxml is the result xml tree of xslt transformation;
'nodocxml.docx' is the same as example.docx, the original docx file, except it doesn't have 'word/document.xml';
'mycds.docx' is our final product.

The example files can be downloaded from here: http://bitbucket.org/wensheng/pydocx/downloads/pydocxml.zip

The example does very few transformation. If you want to more advanced stuff like adding images, you have to be more familiar with OOXML. (Add image to media directory, change relationships xml file, add relationship anchor to document.xml etc.)

4/13/10

Getting back data from lvm on raid1 on a single disk

One of my servers died. The only thing I have left is a hard-drive from that server. But I have important data I need to recover from this disk.

The disk was part of RAID1 (software-raid) disks. It's not boot-able, and only contains LVM volumes.

Here's how I get my data back:

1, Connect the hard-drive to my PC. I used a USB docking station.

2, My OS (Fedora) assigned it to /dev/sdd. It can not mount it since it's raid.

3, make raid node, and attach the partition
mknod /dev/md0 b 9 3
mdadm --create /dev/md0 --level=raid1 --name=0 --auto=md --raid-disks=1 -f /dev/sdd1
"-f" switch is needed to force creating raid1 with only 1 disk.

4, now if I do pvscan, vgscan, and lvscan, it shows my vg (vg1) and lvm's:
pvscan #it shows PV /dev/md0 VG vg1
but I still can not mount them, because no volume group actually existed. Do this:
vgchange -a y vg1
Now /dev/vg1 is activated.

5, Do the mount as usual
mount /dev/vg1/wensheng /mnt

After I got back all my data, I disassemble the hard-drive to get the magnets, and throw the rest to trash. Why? because of this:

from my experience, this drive will die soon anyway.