PHP rename may not be able to read Chinese filename properly, then you can do the following to convert the filenames.
before we get started, here's a bash to clean the filenames:
find . -depth -name '* *' \
| while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done
This script can not work recursively.
How to rename Unicode Chinese files to Pinyin?
I googled and found no existing tools. That’s very rare and even weird. The only useful piece of information I got was a mapping file that contains a Unicode Pinyin table. So I have to do it myself… to write a script to convert the Unicode Chinese file names to Pinyin using the mapping file.
Since I was doing Python Challenge at the time, natually I just scripted something in Python to get the job done.
Since I was doing Python Challenge at the time, natually I just scripted something in Python to get the job done.
The reason I did that was this. I have a HDTV that has a feature to play music from an USB drive. When I wanted to play the songs I downloaded from the Voice of China. I had a problem. The file name of the songs had many Unicode Chinese characters. The TV obviously doesn’t support Unicode. It just doesn’t display those Chinese characters at all. For example:
01 04张玮 – High歌.mp3
05 09吉克隽逸 – I Fell Good.mp3
05 09吉克隽逸 – I Fell Good.mp3
I can only see:
01 04 – High.mp3
05 09 – I Fell Good.mp3
01 04 – High.mp3
05 09 – I Fell Good.mp3
If those above are okay, then the following ones are ridiculous:
11 11 – .mp3
11 12 – .mp3
11 13 – .mp3
11 14 – .mp3
11 11 – .mp3
11 12 – .mp3
11 13 – .mp3
11 14 – .mp3
I have no idea what was what when I tried to choose the songs. Actually their filenames are as the following:
11 11大山 – 王妃.mp3
11 12王韵壹 – 你快乐所以我快乐.mp3
11 13金池 – 后知后觉.mp3
11 14吴莫愁 – 痒.mp3
11 11大山 – 王妃.mp3
11 12王韵壹 – 你快乐所以我快乐.mp3
11 13金池 – 后知后觉.mp3
11 14吴莫愁 – 痒.mp3
Putting the mapping file and the script in one folder, all renaming Unicode files under a sub folder “VoC”, then just run the script. Finally I got all the file names like this, not perfect but I am able to tell what songs they are:
11 11 DaShan – WangFei.mp3
11 12 WangYunYi – NiKuaiLeSuoYiWoKuaiLe.mp3
11 13 JinChi – HouZhiHouJue.mp3
11 14 WuMoChou – Yang.mp3
11 11 DaShan – WangFei.mp3
11 12 WangYunYi – NiKuaiLeSuoYiWoKuaiLe.mp3
11 13 JinChi – HouZhiHouJue.mp3
11 14 WuMoChou – Yang.mp3
I hope you find my solution helpful. Here is my Python script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | # renameCH2Pinyin.py# Rename filename from Chinese characters to capitalized pinyin using the# mapping file and taking out the tone numbersimport osimport re# File uni2pinyin is a mapping from hex to Pinyin with a tone numberf = open('uni2pinyin')wf = f.read() # read the whole mapping fileos.chdir('voc') # to rename all files in sub folder 'voc'myulist = os.listdir(u'.') # read all file names in unicode modefor x in myulist: # each file name filenamePY = '' for y in x: # each character if 0x4e00 <= ord(y) <= 0x9fff: # Chinese Character Unicode range hexCH = (hex(ord(y))[2:]).upper() # strip leading '0x' and change # to uppercase p = re.compile(hexCH+'\t([a-z]+)[\d]*') # define the match pattern mp = p.search(wf) filenamePY+=mp.group(1).title() # get the pinyin without the tone # number and capitalize it else: filenamePY+=y print x filename = filenamePY print filename os.rename(x, filename)os.chdir('..') # go back to the parent folder |
This is the link where I got the mapping file:
No comments:
Post a Comment