Effect of GBK or utf8 charset in the Shanghai Dragon

Posted by

Figure 2:

theory need to practice, I carried out the experiment in a page of their (page address 贵族宝贝sl.zoum5贵族宝贝), before UTF-8 encoding is used, then changed to gbk. In May 5th when the page search keywords included in the first batch query ", today in Shanghai love has disappeared, as to change the weights before encoding will not affect the accumulation, also need further observation.

see someone asked GBK and UTF-8 to Shanghai dragon, I said some personal opinions.

3. in the process of a spider crawling the page in identifying if the value for GBK words basically can do for Chinese type website (no content on the back of the judgment), if you need to UTF-8 the further judgment (e.g. the full-text characters in the scope of what belongs to the UTF-8 Chinese character).

2. GBK encoding program currently open source is relatively mature.

and a little note is due to GBK and UTF-8 encoding, if the website charset was collected in after it, the spider in the crawling process if not timely find the change of charset will determine the page content page is K lead to abnormal.

 

 

similarly, if before UTF-8 encoding is used, but the page has been included in the search engine, if you replace the GBK encoding of words, in the spider crawling process if the spider can not discover the value of the charset property changes will also in accordance with the previous encoding analysis result is normal before and have compared the page great changes, which led to the page is K possible.

if the site is for the people, to recommend the use of GBK, for the following reasons:

in my own forum as an example (the following example with the actual situation and points out, only to illustrate it), as shown in Figure 1, the encoding for the GBK forum, the browser display properly.

The charset property of Figure 1:

browser with UTF-8 encoding explanation if forced it would be like figure 2.

if the site is in a foreign language words decisive UTF-8.

by the end of May 13th, K page has been restored to normal. "

1.gbk adopts double byte Chinese characters three bytes, UTF-8 using Chinese characters, said the number of bytes from a Chinese characters of GBK, compared with UTF-8 can save 50% space.

Leave a Reply

Your email address will not be published. Required fields are marked *