65.9K
CodeProject 正在变化。 阅读更多。
Home

Base62 编码

starIconstarIconstarIconstarIconstarIcon

5.00/5 (1投票)

2016年2月3日

CPOL

3分钟阅读

viewsIcon

54655

downloadIcon

303

一种 Base62 编码算法,带有特殊的自定义代码,用于在 Java 中利用 Base64 模式

引言

本文实现了一个 Base62 编码算法,以利用 Base64 方案。

背景

Base62 编码通常用于 URL 缩短,这是一种 Base 10 整数与其 Base 62 编码之间的转换。Base62 有 62 个字符,26 个大写字母 A 到 Z,26 个小写字母 a 到 z,10 个数字 0 到 9,它与 Base64 类似,除了它排除了 +, / 和 =,它们在 Base62 中用作值 62、63 和填充。

大多数在线的 Base62 编码资源都是用于 URL 缩短转换,这只是一种数字转换,本文不涉及该算法,它实际上转换一个二进制字节数组及其 Base62 编码字符数组,就像 Base64 所做的那样。

这是 Base64 编码方案

Base62 使用相同的方案。Base64 使用 6 位作为分组,因为 6 位的最大值是 64,但 Base62 似乎不能使用 6 位,因为值 62 介于最大 5 位值 32 和最大 6 位值 64 之间。为了克服这个问题,从 62 个代码字符中选择一个特殊字符作为前缀标志,这里我们使用最后一个 '9' 作为特殊字符,'9' 用作特殊前缀标志,以指示其后面的字符是 6 位值 61、62 和 63 之一。总而言之,对于 [0, 63] 中的任何 6 位值

0~60 : One Character from "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz012345678"
61   : Two Character '9A'
62   : Two Character '9B'
63   : Two Character '9C'

将二进制数据编码为纯字符通常会产生一个副作用,即编码后的上下文具有更大的字节大小,原因是将一个字节映射到 Base16Base32Base64 等实际上是将 256 个值映射到 16、32 或 64 个值,例如,要表示一个字节,至少需要两个 HEX 代码 (Base16)。

为了阐明这一点,下载包(文件夹 EncodeTest4Bits)中附加了一个 4 位 Base62 编码测试,它将产生双倍大小的编码纯文本文件。 这是 4 位 Base62 编码表

String[] Base62EncodeTable = {
"00","01","02","03","04","05","06","07","08","09","0a","0b","0c","0d","0e","0f","0g","0h",
"0i","0j","0k","0l","0m","0n","0o","0p","0q","0r","0s","0t","0u","0v","0w","0x","0y","0z",
"0A","0B","0C","0D","0E","0F","0G","0H","0I","0J","0K","0L","0M","0N","0O","0P","0Q","0R",
"0S","0T","0U","0V","0W","0X","0Y","0Z","10","11","12","13","14","15","16","17","18","19",
"1a","1b","1c","1d","1e","1f","1g","1h","1i","1j","1k","1l","1m","1n","1o","1p","1q","1r",
"1s","1t","1u","1v","1w","1x","1y","1z","1A","1B","1C","1D","1E","1F","1G","1H","1I","1J",
"1K","1L","1M","1N","1O","1P","1Q","1R","1S","1T","1U","1V","1W","1X","1Y","1Z","20","21",
"22","23","24","25","26","27","28","29","2a","2b","2c","2d","2e","2f","2g","2h","2i","2j",
"2k","2l","2m","2n","2o","2p","2q","2r","2s","2t","2u","2v","2w","2x","2y","2z","2A","2B",
"2C","2D","2E","2F","2G","2H","2I","2J","2K","2L","2M","2N","2O","2P","2Q","2R","2S","2T",
"2U","2V","2W","2X","2Y","2Z","30","31","32","33","34","35","36","37","38","39","3a","3b",
"3c","3d","3e","3f","3g","3h","3i","3j","3k","3l","3m","3n","3o","3p","3q","3r","3s","3t",
"3u","3v","3w","3x","3y","3z","3A","3B","3C","3D","3E","3F","3G","3H","3I","3J","3K","3L",
"3M","3N","3O","3P","3Q","3R","3S","3T","3U","3V","3W","3X","3Y","3Z","40","41","42","43",
"44","45","46","47"
};

Using the Code

编码一个 byte 数组

byte[] buf = new byte[rFileLength];   
String encodedStr = Base62.base62Encode(buf);

解码一个 char 数组

char[] chars = new char[len];
byte[] decodedArr = Base62.base62Decode(chars);

这是 Base62 类来实现该算法。 编码函数主要来自 Base64 参考代码。 区别在于 Base62 中没有填充,并且为值 61、62 和 63 添加了特殊标志字符 '9' 前缀。

编码部分按顺序处理输入字节的 3 字节组,对于二进制输入的主要部分,它通常将 3 个字节转换为 4 个字符,每个字符对应于 3 字节组中的 6 位单元。 对于二进制输入的结尾部分,第三个字符是最后一个字符,或者第二个字符是最后一个字符。

解码部分按顺序处理输入字符的 4 个字符组,这 4 个字符组不计入 CODEFLAG '9'。 例如,前 4 个字符组是 'CAjC',它来自组 '9C9Aj9C'。 对于字符输入的主要部分,每 4 个字符组转换为 3 个字节二进制。 主循环之后,处理少于 4 个字符的尾部字符。

最后,著名的 Lena 用于测试以回忆起美好的旧时光。 :)

Base62.Java

import java.util.ArrayList;
import java.util.Map;
import java.util.HashMap;

public class Base62
{    
    private static final String CODES =
            "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";

    private static final char CODEFLAG = '9';
    
    private static StringBuilder out = new StringBuilder();
    
    private static Map<Character, Integer> CODEMAP = new HashMap<Character, Integer>();        
    
    private static void Append(int b)
    {
        if(b < 61)
        {
            out.append(CODES.charAt(b));
        }
        else
        {            
            out.append(CODEFLAG);
                
            out.append(CODES.charAt(b-61));
        }
    }    
    
    public static String base62Encode(byte[] in)       
    {              
        // Reset output StringBuilder
        out.setLength(0);
        
        //
        int b;        
        
        // Loop with 3 bytes as a group
        for (int i = 0; i < in.length; i += 3)  {
            
            // #1 char
            b = (in[i] & 0xFC) >> 2;                
            Append(b);
            
            b = (in[i] & 0x03) << 4;
            if (i + 1 < in.length)      {
                
                // #2 char
                b |= (in[i + 1] & 0xF0) >> 4;
                Append(b);
                
                b = (in[i + 1] & 0x0F) << 2;
                if (i + 2 < in.length)  
                {
                    
                    // #3 char
                    b |= (in[i + 2] & 0xC0) >> 6;
                    Append(b);
                    
                    // #4 char
                    b = in[i + 2] & 0x3F;
                    Append(b);                    
                }
                else  
                {         
                    // #3 char, last char
                    Append(b);                                        
                }
            }
            else
            {      
                // #2 char, last char
                Append(b);                
            }
        }

        return out.toString();
    }
    
    public static byte[] base62Decode(char[] inChars)    {
                
        // Map for special code followed by CODEFLAG '9' and its code index
        CODEMAP.put('A', 61);
        CODEMAP.put('B', 62);
        CODEMAP.put('C', 63);        
        
        ArrayList<Byte> decodedList = new ArrayList<Byte>();
        
        // 6 bits bytes
        int[] unit = new int[4];
        
        int inputLen = inChars.length;
        
        // char counter
        int n = 0;
        
        // unit counter
        int m = 0;
        
        // regular char
        char ch1 = 0;
        
        // special char
        char ch2 = 0;  
        
        Byte b = 0;
        
        while (n < inputLen)
        {            
            ch1 = inChars[n];
            if (ch1 != CODEFLAG)
            {
                // regular code                
                unit[m] = CODES.indexOf(ch1);
                m++;
                n++;
            }
            else
            {
                n++;
                if(n < inputLen)
                {
                    ch2 = inChars[n];
                    if(ch2 != CODEFLAG)
                    {
                        // special code index 61, 62, 63                                  
                        unit[m] = CODEMAP.get(ch2);
                        m++;
                        n++;
                    }
                }
            }        
            
            // Add regular bytes with 3 bytes group composed from 4 units with 6 bits.
            if(m == 4)
            {                
                b = new Byte((byte) ((unit[0] << 2) | (unit[1] >> 4)));
                decodedList.add(b);
                b = new Byte((byte) ((unit[1] << 4) | (unit[2] >> 2)));
                decodedList.add(b);                    
                b = new Byte((byte) ((unit[2] << 6) | unit[3]));
                decodedList.add(b);
                
                // Reset unit counter
                m = 0;
            }
        }
        
        // Add tail bytes group less than 4 units
        if(m != 0)
        {
            if(m == 1)
            {
                b = new Byte((byte) ((unit[0] << 2) ));
                decodedList.add(b);
            }
            else if(m == 2)
            {
                b = new Byte((byte) ((unit[0] << 2) | (unit[1] >> 4)));
                decodedList.add(b);
            }
            else if (m == 3)
            {
                b = new Byte((byte) ((unit[0] << 2) | (unit[1] >> 4)));
                decodedList.add(b);
                b = new Byte((byte) ((unit[1] << 4) | (unit[2] >> 2)));
                decodedList.add(b);
            }
        }
        
        Byte[] decodedObj = decodedList.toArray(new Byte[decodedList.size()]);
        
        byte[] decoded = new byte[decodedObj.length];

        // Convert object Byte array to primitive byte array
        for(int i = 0; i < decodedObj.length; i++) {
            decoded[i] = (byte)decodedObj[i];
            
        }
        
        return decoded;
    }    
}

关注点

当前方法只是使用固定索引值组 61、62、63 作为特殊字符索引。 可以通过对输入应用统计分析来改进这一点,对于哪些 3 字节组具有最小的 6 位分组分布,然后使用该 3 字节组代替当前组,通过这种方式,编码大小膨胀率将最接近 Base64 编码。

要验证解码后的图像是否与原始图像相同,我建议使用 Notepad++ HEX 编辑器插件。

历史

  • 2016 年 2 月 3 日:初始日期

参考

© . All rights reserved.