Unicode file names in Python 2.7

Today I wrote a small script to find and delete duplicate files. To do this task I needed to iterate over files in a specific folder, and calculate md5 checksum for each file:

for folder, subs, files in os.walk(path):
    for filename in files:
        file_path = os.path.join(folder, filename)
	        with open(file_path, 'rb') as fh:
	        	...

If the source folder contains a file or a folder with unicode characters in it, execution of the code results in this bummer:

IOError: [Errno 22] invalid mode ('rb') or filename: 'files\\????????? ????? ????????.txt'

[Read More]