PHP rename may not be able to read Chinese filename properly, then you can do the following to convert the filenames.
before we get started, here's a bash to clean the filenames:
find . -depth -name '* *' \
| while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done
This script can not work recursively.
How to rename Unicode Chinese files to Pinyin?
I googled and found no existing tools. That’s very rare and even weird. The only useful piece of information I got was a mapping file that contains a Unicode Pinyin table. So I have to do it myself… to write a script to convert the Unicode Chinese file names to Pinyin using the mapping file.
Since I was doing Python Challenge at the time, natually I just scripted something in Python to get the job done.
Since I was doing Python Challenge at the time, natually I just scripted something in Python to get the job done.
The reason I did that was this. I have a HDTV that has a feature to play music from an USB drive. When I wanted to play the songs I downloaded from the Voice of China. I had a problem. The file name of the songs had many Unicode Chinese characters. The TV obviously doesn’t support Unicode. It just doesn’t display those Chinese characters at all. For example:
01 04张玮 – High歌.mp3
05 09吉克隽逸 – I Fell Good.mp3
05 09吉克隽逸 – I Fell Good.mp3
I can only see:
01 04 – High.mp3
05 09 – I Fell Good.mp3
01 04 – High.mp3
05 09 – I Fell Good.mp3
If those above are okay, then the following ones are ridiculous:
11 11 – .mp3
11 12 – .mp3
11 13 – .mp3
11 14 – .mp3
11 11 – .mp3
11 12 – .mp3
11 13 – .mp3
11 14 – .mp3
I have no idea what was what when I tried to choose the songs. Actually their filenames are as the following:
11 11大山 – 王妃.mp3
11 12王韵壹 – 你快乐所以我快乐.mp3
11 13金池 – 后知后觉.mp3
11 14吴莫愁 – 痒.mp3
11 11大山 – 王妃.mp3
11 12王韵壹 – 你快乐所以我快乐.mp3
11 13金池 – 后知后觉.mp3
11 14吴莫愁 – 痒.mp3
Putting the mapping file and the script in one folder, all renaming Unicode files under a sub folder “VoC”, then just run the script. Finally I got all the file names like this, not perfect but I am able to tell what songs they are:
11 11 DaShan – WangFei.mp3
11 12 WangYunYi – NiKuaiLeSuoYiWoKuaiLe.mp3
11 13 JinChi – HouZhiHouJue.mp3
11 14 WuMoChou – Yang.mp3
11 11 DaShan – WangFei.mp3
11 12 WangYunYi – NiKuaiLeSuoYiWoKuaiLe.mp3
11 13 JinChi – HouZhiHouJue.mp3
11 14 WuMoChou – Yang.mp3
I hope you find my solution helpful. Here is my Python script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | # renameCH2Pinyin.py # Rename filename from Chinese characters to capitalized pinyin using the # mapping file and taking out the tone numbers import os import re # File uni2pinyin is a mapping from hex to Pinyin with a tone number f = open ( 'uni2pinyin' ) wf = f.read() # read the whole mapping file os.chdir( 'voc' ) # to rename all files in sub folder 'voc' myulist = os.listdir(u '.' ) # read all file names in unicode mode for x in myulist: # each file name filenamePY = '' for y in x: # each character if 0x4e00 < = ord (y) < = 0x9fff : # Chinese Character Unicode range hexCH = ( hex ( ord (y))[ 2 :]).upper() # strip leading '0x' and change # to uppercase p = re. compile (hexCH + '\t([a-z]+)[\d]*' ) # define the match pattern mp = p.search(wf) filenamePY + = mp.group( 1 ).title() # get the pinyin without the tone # number and capitalize it else : filenamePY + = y print x filename = filenamePY print filename os.rename(x, filename) os.chdir( '..' ) # go back to the parent folder |
This is the link where I got the mapping file:
No comments:
Post a Comment