Base62 编码





5.00/5 (1投票)
一种 Base62 编码算法,带有特殊的自定义代码,用于在 Java 中利用 Base64 模式
引言
本文实现了一个 Base62
编码算法,以利用 Base64
方案。
背景
Base62
编码通常用于 URL 缩短,这是一种 Base 10 整数与其 Base 62 编码之间的转换。Base62
有 62 个字符,26 个大写字母 A 到 Z,26 个小写字母 a 到 z,10 个数字 0 到 9,它与 Base64 类似,除了它排除了 +, / 和 =,它们在 Base62
中用作值 62、63 和填充。
大多数在线的 Base62
编码资源都是用于 URL 缩短转换,这只是一种数字转换,本文不涉及该算法,它实际上转换一个二进制字节数组及其 Base62
编码字符数组,就像 Base64 所做的那样。
这是 Base64
编码方案
Base62
使用相同的方案。Base64
使用 6 位作为分组,因为 6 位的最大值是 64,但 Base62
似乎不能使用 6 位,因为值 62 介于最大 5 位值 32 和最大 6 位值 64 之间。为了克服这个问题,从 62 个代码字符中选择一个特殊字符作为前缀标志,这里我们使用最后一个 '9' 作为特殊字符,'9' 用作特殊前缀标志,以指示其后面的字符是 6 位值 61、62 和 63 之一。总而言之,对于 [0, 63] 中的任何 6 位值
0~60 : One Character from "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz012345678"
61 : Two Character '9A'
62 : Two Character '9B'
63 : Two Character '9C'
将二进制数据编码为纯字符通常会产生一个副作用,即编码后的上下文具有更大的字节大小,原因是将一个字节映射到 Base16
、Base32
、Base64
等实际上是将 256 个值映射到 16、32 或 64 个值,例如,要表示一个字节,至少需要两个 HEX 代码 (Base16)。
为了阐明这一点,下载包(文件夹 EncodeTest4Bits
)中附加了一个 4 位 Base62
编码测试,它将产生双倍大小的编码纯文本文件。 这是 4 位 Base62
编码表
String[] Base62EncodeTable = {
"00","01","02","03","04","05","06","07","08","09","0a","0b","0c","0d","0e","0f","0g","0h",
"0i","0j","0k","0l","0m","0n","0o","0p","0q","0r","0s","0t","0u","0v","0w","0x","0y","0z",
"0A","0B","0C","0D","0E","0F","0G","0H","0I","0J","0K","0L","0M","0N","0O","0P","0Q","0R",
"0S","0T","0U","0V","0W","0X","0Y","0Z","10","11","12","13","14","15","16","17","18","19",
"1a","1b","1c","1d","1e","1f","1g","1h","1i","1j","1k","1l","1m","1n","1o","1p","1q","1r",
"1s","1t","1u","1v","1w","1x","1y","1z","1A","1B","1C","1D","1E","1F","1G","1H","1I","1J",
"1K","1L","1M","1N","1O","1P","1Q","1R","1S","1T","1U","1V","1W","1X","1Y","1Z","20","21",
"22","23","24","25","26","27","28","29","2a","2b","2c","2d","2e","2f","2g","2h","2i","2j",
"2k","2l","2m","2n","2o","2p","2q","2r","2s","2t","2u","2v","2w","2x","2y","2z","2A","2B",
"2C","2D","2E","2F","2G","2H","2I","2J","2K","2L","2M","2N","2O","2P","2Q","2R","2S","2T",
"2U","2V","2W","2X","2Y","2Z","30","31","32","33","34","35","36","37","38","39","3a","3b",
"3c","3d","3e","3f","3g","3h","3i","3j","3k","3l","3m","3n","3o","3p","3q","3r","3s","3t",
"3u","3v","3w","3x","3y","3z","3A","3B","3C","3D","3E","3F","3G","3H","3I","3J","3K","3L",
"3M","3N","3O","3P","3Q","3R","3S","3T","3U","3V","3W","3X","3Y","3Z","40","41","42","43",
"44","45","46","47"
};
Using the Code
编码一个 byte
数组
byte[] buf = new byte[rFileLength];
String encodedStr = Base62.base62Encode(buf);
解码一个 char
数组
char[] chars = new char[len];
byte[] decodedArr = Base62.base62Decode(chars);
这是 Base62
类来实现该算法。 编码函数主要来自 Base64
参考代码。 区别在于 Base62
中没有填充,并且为值 61、62 和 63 添加了特殊标志字符 '9
' 前缀。
编码部分按顺序处理输入字节的 3 字节组,对于二进制输入的主要部分,它通常将 3 个字节转换为 4 个字符,每个字符对应于 3 字节组中的 6 位单元。 对于二进制输入的结尾部分,第三个字符是最后一个字符,或者第二个字符是最后一个字符。
解码部分按顺序处理输入字符的 4 个字符组,这 4 个字符组不计入 CODEFLAG '9'
。 例如,前 4 个字符组是 'CAjC
',它来自组 '9C9Aj9C
'。 对于字符输入的主要部分,每 4 个字符组转换为 3 个字节二进制。 主循环之后,处理少于 4 个字符的尾部字符。
最后,著名的 Lena 用于测试以回忆起美好的旧时光。 :)
Base62.Java
import java.util.ArrayList;
import java.util.Map;
import java.util.HashMap;
public class Base62
{
private static final String CODES =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
private static final char CODEFLAG = '9';
private static StringBuilder out = new StringBuilder();
private static Map<Character, Integer> CODEMAP = new HashMap<Character, Integer>();
private static void Append(int b)
{
if(b < 61)
{
out.append(CODES.charAt(b));
}
else
{
out.append(CODEFLAG);
out.append(CODES.charAt(b-61));
}
}
public static String base62Encode(byte[] in)
{
// Reset output StringBuilder
out.setLength(0);
//
int b;
// Loop with 3 bytes as a group
for (int i = 0; i < in.length; i += 3) {
// #1 char
b = (in[i] & 0xFC) >> 2;
Append(b);
b = (in[i] & 0x03) << 4;
if (i + 1 < in.length) {
// #2 char
b |= (in[i + 1] & 0xF0) >> 4;
Append(b);
b = (in[i + 1] & 0x0F) << 2;
if (i + 2 < in.length)
{
// #3 char
b |= (in[i + 2] & 0xC0) >> 6;
Append(b);
// #4 char
b = in[i + 2] & 0x3F;
Append(b);
}
else
{
// #3 char, last char
Append(b);
}
}
else
{
// #2 char, last char
Append(b);
}
}
return out.toString();
}
public static byte[] base62Decode(char[] inChars) {
// Map for special code followed by CODEFLAG '9' and its code index
CODEMAP.put('A', 61);
CODEMAP.put('B', 62);
CODEMAP.put('C', 63);
ArrayList<Byte> decodedList = new ArrayList<Byte>();
// 6 bits bytes
int[] unit = new int[4];
int inputLen = inChars.length;
// char counter
int n = 0;
// unit counter
int m = 0;
// regular char
char ch1 = 0;
// special char
char ch2 = 0;
Byte b = 0;
while (n < inputLen)
{
ch1 = inChars[n];
if (ch1 != CODEFLAG)
{
// regular code
unit[m] = CODES.indexOf(ch1);
m++;
n++;
}
else
{
n++;
if(n < inputLen)
{
ch2 = inChars[n];
if(ch2 != CODEFLAG)
{
// special code index 61, 62, 63
unit[m] = CODEMAP.get(ch2);
m++;
n++;
}
}
}
// Add regular bytes with 3 bytes group composed from 4 units with 6 bits.
if(m == 4)
{
b = new Byte((byte) ((unit[0] << 2) | (unit[1] >> 4)));
decodedList.add(b);
b = new Byte((byte) ((unit[1] << 4) | (unit[2] >> 2)));
decodedList.add(b);
b = new Byte((byte) ((unit[2] << 6) | unit[3]));
decodedList.add(b);
// Reset unit counter
m = 0;
}
}
// Add tail bytes group less than 4 units
if(m != 0)
{
if(m == 1)
{
b = new Byte((byte) ((unit[0] << 2) ));
decodedList.add(b);
}
else if(m == 2)
{
b = new Byte((byte) ((unit[0] << 2) | (unit[1] >> 4)));
decodedList.add(b);
}
else if (m == 3)
{
b = new Byte((byte) ((unit[0] << 2) | (unit[1] >> 4)));
decodedList.add(b);
b = new Byte((byte) ((unit[1] << 4) | (unit[2] >> 2)));
decodedList.add(b);
}
}
Byte[] decodedObj = decodedList.toArray(new Byte[decodedList.size()]);
byte[] decoded = new byte[decodedObj.length];
// Convert object Byte array to primitive byte array
for(int i = 0; i < decodedObj.length; i++) {
decoded[i] = (byte)decodedObj[i];
}
return decoded;
}
}
关注点
当前方法只是使用固定索引值组 61、62、63 作为特殊字符索引。 可以通过对输入应用统计分析来改进这一点,对于哪些 3 字节组具有最小的 6 位分组分布,然后使用该 3 字节组代替当前组,通过这种方式,编码大小膨胀率将最接近 Base64 编码。
要验证解码后的图像是否与原始图像相同,我建议使用 Notepad++ HEX 编辑器插件。
历史
- 2016 年 2 月 3 日:初始日期