How to display Chinese correctly in Python

Wrote on April 12, 2020, 2:54 a.m.

There are three options to correctly display Chinese characters in Python by adding one single line of code:

#!/usr/bin/python3 #coding:utf-83 
print("你好吗")

#!/usr/bin/python3 #-*-coding:utf-8 -*-3
print("你好吗")

#!/usr/bin/python3 #vim: set fileencoding:utf-83
print("你好吗")

The GB2312-80 character set is a standard published by the State Bureau of Standardization of the People's Republic of China (PRC) in 1980 and put in force in May, 1981. GB2312-80 defines 7,445 characters, including 6,763 Chinese characters

Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters, including 13,053 Chinese characters

In 2000, GB18030 was the official national standard to replace GBK1.0. The standard contains 27484 Chinese characters, as well as Tibetan, Mongolian, Uygur and other major minority languages. In terms of Chinese character vocabulary, GB18030 adds 6582 Chinese characters (Unicode code code 0x3400-0x4db5) of CJK extension A to 20902 Chinese characters of GB13000.1, totaling 27484 Chinese characters.

GB 18030 is a Chinese government standard, described as Information Technology — Chinese coded character set and defines the required language and character support necessary for software in China. GB18030 is the registered Internet name for the official character set of the People's Republic of China (PRC) superseding GB2312. As a Unicode Transformation Format (i.e. an encoding of all Unicode code points), GB18030 supports both simplified and traditional Chinese characters. It is also compatible with legacy encodings including GB2312, CP936, and GBK 1.0.

Unicode is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard is maintained by the Unicode Consortium, and as of March 2020 the most recent version, Unicode 13.0, contains a repertoire of 143,924 characters (consisting of 143,696 graphic characters, 163 format characters and 65 control characters) covering 154 modern and historic scripts, as well as multiple symbol sets and emoji. The character repertoire of the Unicode Standard is synchronized with ISO/IEC 10646, and both are code-for-code identical.

<< Previous Next >>

Share

Notification

I am open to the job market at this moment, please click HERE for my resume and please contact me if you could help find me an opportunity, many thanks:)

2023-01-19