Sunday, May 01, 2016

rename Chinese filenames to pinyin 中文文件名 重命名 拼音

PHP rename may not be able to read Chinese filename properly, then you can do the following to convert the filenames.


before we get started, here's a bash to clean the filenames:

find . -depth -name '* *' \
| while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done

This script can not work recursively. 

How to rename Unicode Chinese files to Pinyin?

I googled and found no existing tools. That’s very rare and even weird. The only useful piece of information I got was a mapping file that contains a Unicode Pinyin table. So I have to do it myself… to write a script to convert the Unicode Chinese file names to Pinyin using the mapping file.
Since I was doing Python Challenge at the time, natually I just scripted something in Python to get the job done.
The reason I did that was this. I have a HDTV that has a feature to play music from an USB drive. When I wanted to play the songs I downloaded from the Voice of China. I had a problem. The file name of the songs had many Unicode Chinese characters. The TV obviously doesn’t support Unicode. It just doesn’t display those Chinese characters at all. For example:
01 04张玮 – High歌.mp3
05 09吉克隽逸 – I Fell Good.mp3
I can only see:
01 04 – High.mp3
05 09 – I Fell Good.mp3
If those above are okay, then the following ones are ridiculous:
11 11 – .mp3
11 12 – .mp3
11 13 – .mp3
11 14 – .mp3
I have no idea what was what when I tried to choose the songs. Actually their filenames are as the following:
11 11大山 – 王妃.mp3
11 12王韵壹 – 你快乐所以我快乐.mp3
11 13金池 – 后知后觉.mp3
11 14吴莫愁 – 痒.mp3
Putting the mapping file and the script in one folder, all renaming Unicode files under a sub folder “VoC”, then just run the script. Finally I got all the file names like this, not perfect but I am able to tell what songs they are:
11 11 DaShan – WangFei.mp3
11 12 WangYunYi – NiKuaiLeSuoYiWoKuaiLe.mp3
11 13 JinChi – HouZhiHouJue.mp3
11 14 WuMoChou – Yang.mp3
I hope you find my solution helpful. Here is my Python script.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# renameCH2Pinyin.py
# Rename filename from Chinese characters to capitalized pinyin using the
# mapping file and taking out the tone numbers
 
import os
import re
 
# File uni2pinyin is a mapping from hex to Pinyin with a tone number
f = open('uni2pinyin')
wf = f.read() # read the whole mapping file
 
os.chdir('voc') # to rename all files in sub folder 'voc'
myulist = os.listdir(u'.') # read all file names in unicode mode
for x in myulist: # each file name
    filenamePY = ''
    for y in x: # each character
        if 0x4e00 <= ord(y) <= 0x9fff: # Chinese Character Unicode range
            hexCH = (hex(ord(y))[2:]).upper() # strip leading '0x' and change
                                              # to uppercase
            p = re.compile(hexCH+'\t([a-z]+)[\d]*') # define the match pattern
            mp = p.search(wf)
            filenamePY+=mp.group(1).title() # get the pinyin without the tone
                                            # number and capitalize it
        else:
            filenamePY+=y
    print x
    filename = filenamePY
    print filename
    os.rename(x, filename)
os.chdir('..') # go back to the parent folder
This is the link where I got the mapping file:

No comments: