解析 JSON C

C++ - 零动态分配 JSON 解析器

CaldasGSM

5.00/5 (2投票s)

2024 年 5 月 9 日

CPOL

16分钟阅读

10043

197

本文讨论了在 C 语言中实现零分配 JSON 解析器，适用于内存资源有限的系统。

引言

本项目的目的是实现一个零分配的 JSON 解析器。使其非常适合嵌入式系统或内存要求严格的项目。

与其他声称零分配但实际上只实现“推入式解析器”或分词器（将分配责任留给调用者）的库不同，该库实际上实现了一个解析器，它将 JSON 文本重新解释为可迭代的结构，具有原生类型和解码数据。

该库通过将 JSON 数据解析到持有原始 JSON 文本的同一缓冲区中来避免堆分配，因此它实际上重用了相同的内存空间。这确实意味着原始数据会被销毁，所以如果您想保留原始文本数据，必须在将其传递给解析器之前复制该缓冲区。

很多时候，生成的结构比 JSON 文本更小，因此如果需要长时间保留解析后的信息，可以复制到一个较小的缓冲区（释放原始的“较大”缓冲区）。

用法

解析 JSON 对象

JsonResult oResult = Json_Parse("{\"foo\":\"bar\"}");
if (!oResult.Success)
{
    printf("Parsing failed : %s at %i\n", oResult.Error, oResult.Index);
}
else
{
    JsonProperty oProperty = Json_GetPropertyByName(oResult.RootObject, "foo");
    print("The object has property \"%s\" with value \"%s\"\n", oProperty.Name, oProperty.Value.StringValue)
}

解析 JSON 数组

JsonResult oResult = Json_Parse("[1,2,3]");
if (!oResult.Success)
{
    printf("Parsing failed : %s at %i\n", oResult.Error, oResult.Index);
}
else
{
    JsonElement oElement = Json_GetElementAtIndex(oResult.RootObject ,1);
    print("The array has value \"%s\" at index %i\n", oElement.Index, oElement.Value.DoubleValue)
}

解析 JSON 标量

尽管 JSON 规范不允许标量值作为根对象，但解析器仍然能够为它们生成值。

JsonResult oResult = Json_Parse("\"this is just a string\"");
if (!oResult.Success)
{
    printf("Parsing failed : %s at %i\n", oResult.Error, oResult.Index);
}
else
{
    print("the root is not an object but a string : \"%s\"\n", oResult.RootObject.StringValue)
}

文档

解析 API

元素	描述
`enum JsonType`	一个枚举器，用于限定 `JsonObject` 包含的数据类型，或者在发生解析或枚举错误时为 `JsonTypeInvalid`
`struct JsonObject`	一个结构，用于保存已解析的值，指示其类型，或者在发生解析或枚举错误时为 `JsonTypeInvalid`
`struct JsonResult`	`Json_Parse` 函数返回的结构，在成功时包含解析状态、失败时的错误或根 `JsonObject`
`struct JsonProperty`	对象枚举函数返回的结构，包含属性名称及其值（作为 `JsonObject`）
`struct JsonElement`	数组枚举函数返回的结构，其值作为 `JsonObject`，并具有元素的索引。
`JsonResult Json_Parse(char* pJson)`	将 JSON 文本解析为序列化的原生结构，重用同一缓冲区，这是一个破坏性操作，如果需要原始数据，则必须在调用此函数之前复制文本数据。
`JsonObject Json_Load(char* pJson)`	加载先前解析过的缓冲区，用于在解析后已将其持久化的场景。
`JsonProperty Json_IterateProperties(JsonObject oJsonObject)`	返回给定 `JsonObject` 的第一个属性，给定对象必须是 `JsonTypeObject` 类型
`JsonProperty Json_NextProperty(JsonProperty oJsonProperty)`	返回给定 `JsonProperty` 后面的属性，如果给定属性是最后一个，则返回的 `JsonProperty` 的属性将被清零，值类型将为 `JsonTypeInvalid`
`JsonProperty Json_GetPropertyByName(JsonObject oJsonObject, char* pName)`	迭代对象并检索给定 `JsonObject` 中具有给定名称的属性，如果没有找到此类属性，则返回的 `JsonProperty` 的属性将被清零，值类型将为 `JsonTypeInvalid`
`int Json_GetPropertyCount(JsonObject oJsonObject)`	返回给定 `JsonObject` 的属性数量，如果您希望将值复制到另一个结构并需要分配大小，则很有用
`JsonElement Json_IterateElements(JsonObject oJsonArray)`	返回给定 JsonObject 的第一个值，给定对象必须是 `JsonTypeArray` 类型
`JsonElement Json_NextElement(JsonElement oJsonElement)`	返回给定 `JsonElement` 后面的值，如果给定元素是最后一个，则返回的 `JsonElement` 的属性将被清零，值类型将为 `JsonTypeInvalid`
`JsonElement Json_GetElementAtIndex(JsonObject oJsonArray, int nIndex)`	迭代数组并在给定属性处检索给定 `JsonObject` 的值，如果索引超出范围，则 `JsonElement` 的属性将被清零，值类型将为 `JsonTypeInvalid`
`int Json_GetElementCount(JsonObject oJsonArray)`	返回给定 `JsonObject` 的值的数量

enum JsonType

枚举器用于反映 JSON 文本中找到的数据类型，其中包含一个特殊的 JsonTypeInvalid，以便解析或枚举函数能够返回失败。

enumeration JsonType
{
    JsonTypeInvalid //returned if the parsing or enumeration failed
    JsonTypeNull    //a literal "null" in the JSON
    JsonTypeBool    //a literal "true" or "false" in the JSON, the value will be parsed into JsonObject.BoolValue
    JsonTypeNumber  //an integer or decimal value in the JSON, the value will be parsed into JsonObject.DoubleValue
    JsonTypeString  //a double quote delimited string JSON, the value will be parsed into JsonObject.StringValue
    JsonTypeObject  //a {} delimited object JSON
    JsonTypeArray   //a [] delimited array JSON
}

structure JsonResult

一个结构，用于保存已解析的值，指示其类型，或者在发生解析或枚举错误时为 JsonTypeInvalid。

如果 JsonResult.Success 等于 0，则 Error 元素将指向一个包含描述的字符串，Index 将指向解析失败的输入索引。如果 JsonResult.Success 是其他任何值，则 InitialSize 和 EndSize 将包含初始字符串大小和最终使用的总大小（InitialSize 之前的剩余字节将被清零），这将让您大致了解实现的“压缩”程度，并且 EndSize 可用于为长期持久化分配新缓冲区。RootObject 将包含已解析的值，在使用前应根据其类型检查对象，确保它不是 JsonTypeInvalid。

解析 JSON 文本后，解析器会将剩余的缓冲区用 \0 填充。

struct JsonResult
{
    int Success             //0 in case of failure
    //if Success == 0
    char* Error             //a description in case of error, undefined otherwise
    int Index               //the index at which the parser failed, undefined otherwise
    //if Success != 0
    int InitialSize         //the size of the parsed json text
    int EndSize             //the size of buffer used to hold the parsed data ( will be <= than InitialSize )
    JsonObject RootObject   //the object that was parsed out of the JSON text
}

structure JsonObject

此结构表示一个 json 值或一个 json 结构，JSON 定义支持两种类型的结构。对象是无序的键值对集合，由 {} 分隔，或者数组是值的有序列表，由 [] 分隔。这些类型分别设置为 JsonTypeObject 和 JsonTypeArray。在这种情况下，JsonObject 没有值，应使用迭代方法。

对于其他类型，其他属性包含已解析的值，除了 JsonTypeNull（表示 JSON 中的文字 null）或 JsonTypeInvalid（用于表示失败）。

struct JsonObject
{
    char* Position      //used internally, by the iteration functions to locate a referenced object
    JsonType Type       //the type of the JSON value/structure that this object represents 
    char* StringValue   //a null terminated string in case of "JsonTypeString", undefined otherwise
    double DoubleValue  //a decimal numbers in case of "JsonTypeNumber", undefined otherwise
    char BoolValue      //0 or 1 case of "JsonTypeBool", undefined otherwise
}

structure JsonProperty

此结构由 Json_IterateProperties 和 Json_GetPropertyByName 返回，还必须将其传递给 Json_NextProperty 以继续迭代。

struct JsonProperty
{
    char* Position      //used internally, by the iteration functions to locate a referenced object
    char* Name          //the null terminated string that holds the name of the property 
    JsonObject Value    //an object that holds that property value
}

structure JsonElement

此结构由 Json_IterateElements 和 Json_GetElementAtIndex 返回，还必须将其传递给 Json_NextElement 以继续迭代。

struct JsonElement
{
    char* Position      //used internally, by the iteration functions to locate a referenced object
    int Index           //index of the value in the parent array
    JsonObject Value    //the value found at the index
}

function Json_Parse

此函数接受一个可写且以 null 结尾的字符串，其中包含有效的 JSON 文本，并将文本解析到同一缓冲区内存空间。您必须注意，这是一个破坏性操作。

JsonResult 由此函数返回并仅供此函数使用，以便轻松捕获解析失败。

解析 JSON 文本后，解析器会将剩余的缓冲区用 \0 填充。

这是一个破坏性操作，如果需要原始数据，则必须在调用此函数之前进行复制。

用法

JsonResult oResult = Json_Parse("{\"foo\":\"bar\"}");
if(oResult.Success)
    //do something with oResult.RootObject

functions Json_IterateProperties & Json_NextProperty

这些函数可用于“探索”结构未知的 JSON 对象，方法是枚举其所有属性。它们的行为和结果相似，唯一的主要区别是 Json_IterateProperties 接收一个 JsonObject 并检索第一个 JsonProperty。而 Json_NextProperty 检索给定属性后面的下一个 JsonProperty。

用法

JsonProperty oFirstProperty = Json_IterateProperties(oObject);
if(oFirstProperty.Value.Type != JsonTypeInvalid)
    //the object has at least 1 property 

JsonProperty oSecondProperty = Json_NextProperty(oFirstProperty);
if(oSecondProperty.Value.Type != JsonTypeInvalid)
    //the object has at least 2 properties

//lets output them all
for (JsonProperty oProperty = Json_IterateProperties(oObject); oProperty.Value.Type != JsonTypeInvalid; oProperty = Json_NextProperty(oProperty))
   print("The object has property named \"%s\" \n", oProperty.Name)

function Json_GetPropertyByName

如果 JSON 数据具有已知结构，您可以使用 Json_GetPropertyByName 直接访问属性值。请注意，如果属性不存在，将返回一个 JsonTypeInvalid 类型的对象，因此应验证返回的类型以防止错误。

请注意，此操作是 O(n)

用法

JsonProperty oUnsureProperty = Json_GetPropertyByName(oObject,"user_name");
if(oUnsureProperty.Value.Type != JsonTypeInvalid)
    //the property is present

//get values from a known structure disregarding validations
double nPrice = Json_GetPropertyByName(oObject,"product_price").Value.DoubleValue;
int bActive = Json_GetPropertyByName(oObject,"is_active").Value.BoolValue == 1;

function Json_GetPropertyCount

这只是一个辅助函数，返回 JsonObject 的属性计数。由于不能按索引获取属性，因此此函数没有太大用处。但如果您需要进行与对象“大小”成比例的分配，它可能会有帮助。

如果传入的 JsonObject 不是 JsonTypeObject 类型，则返回 -1。

请注意，此操作是 O(n)

用法

JsonObject oObject; //gotten from a previous parsing

int nTotalProperties = Json_GetPropertyCount(oObject);

functions Json_IterateElements & Json_NextElement

这些函数用于迭代 JSON 数组。它们的行为和结果相似，区别在于 Json_IterateElements 接收一个 JsonObject 并检索第一个 JsonElement。而 Json_NextElement 检索给定元素后面的下一个 JsonElement。

用法

JsonElement oFirstElement = Json_IterateElements(oObject);
if(oFirstElement.Value.Type == JsonTypeInvalid)
    print("array is empty");

//lets output them all
for (JsonElement oElement = Json_IterateElements(oObject); oElement.Value.Type != JsonTypeInvalid; oElement = Json_NextElement(oElement))
    print("The array contains \"%s\" \n", oElement.Value.StringValue)

function Json_GetElementCount

这只是一个辅助函数，返回 JsonObject 数组中的元素数量。

请注意，此操作是 O(n)

用法

int nTotalElements = Json_GetElementCount(oObject);

function `Json_GetElementAtIndex`

数组中的元素可以通过其索引以随机顺序访问。给定的索引必须在有效范围内。要获取项目总数，请使用 Json_GetElementCount 函数。

请注意，此操作是 O(n)

用法

//if iteration of the full array is needed you should use "IterateElements" and "NextElement"
//as this takes O(n) + O(n log n)
int nTotalElements = Json_GetElementCount(oObject);
for(int i = 0 ; i < nTotalElements ; i++)
    Json_GetElementAtIndex(oObject,i).Value.DoubleValue;

示例

一个示例文件作为以下代码的参考。

{
    "name": "my house",
     "location":{
        "lat":123.123,
        "lon":0.9999,
     },
     "mixed_array":[
        true,
        false,
        123465,
        "hello",
        {
            "child_object":"inside"
        }
     ]
}

解析文件，尝试每一步验证以避免任何运行时错误。

#include "json.h"

//load from a file to a "big enough" buffer (this is lazy , bad code, DON'T COPY) 
char pBuffer[1024];
hFile = fopen("sample.json", "r");
fgets(pBuffer, 1024, hFile);

//this first example will show some defensive code and how to process the data with the correct validations to avoid runtime errors
JsonResult oResult = Json_Parse(pBuffer);
if (!oResult.Success)
{
    printf("Parsing failed : %s at %i\n", oResult.Error, oResult.Index);
}
else if(oResult.RootObject.Type == JsonTypeObject)
{
    //get the name property
    JsonProperty oName = Json_GetPropertyByName(oResult.RootObject, "name");
    if(oName.Value.Type == JsonTypeString)
        printf("the name of my object : %s \n", oName.Value.StringValue);

    //get the sub object of the "location" property 
    JsonProperty oLocation = Json_GetPropertyByName(oResult.RootObject, "location");
    if(oLocation.Value.Type == JsonTypeObject)
    {
        //iterate its properties
       JsonObject oCoordinates = oLocation.Value;
       for (JsonProperty oDimension = Json_IterateProperties(oCoordinates); oDimension.Value.Type != JsonTypeInvalid; oDimension = Json_NextProperty(oDimension))
       {
           if(oDimension.Value.Type == JsonTypeNumber)
                print("coordinate \"%s\" is %f \n", oDimension.Name,oDimension.Value.DoubleValue)
            else
                print("coordinate \"%s\" is not of type number\n", oDimension.Name)
       }
    }

    JsonProperty oMixed = Json_GetPropertyByName(oResult.RootObject, "mixed_array");
    if(oMixed.Value.Type == JsonTypeArray)
    {
        //iterate the items of the array
       JsonObject oTheArray = oMixed.Value;
       for (JsonProperty oEntry = Json_IterateElements(oTheArray); oEntry.Value.Type != JsonTypeInvalid; oEntry = Json_NextElement(oEntry))
       {
         switch (oEntry.Type)
            {
                case JsonTypeArray:
                    printf("an array with %i items\n", Json_GetElementCount(oEntry));
                    break;
                case JsonTypeBool:
                    printf("a boolean literal: %s \n", oEntry.BoolValue ? "true" : "false");
                    break;
                case JsonTypeNull:
                    printf("literally just a null");
                    break;
                case JsonTypeNumber:
                    printf("a number: %f\n", oEntry.DoubleValue);
                    break;
                case JsonTypeObject:
                    printf("an inner object with %i properties\n", Json_GetPropertyCount(oEntry));
                    break;
                case JsonTypeString:
                    printf("a string containing: %s\n", oEntry.StringValue);
                    break;
                default:
                case JsonTypeInvalid:
                    printf("but it worked on by machine!!!");
                    break;
            }
       }
    }
}

简短简单的代码示例，我希望一切都能如此简单，但老实说，这一个可能会因为缺少验证而让你头疼。

#include "json.h"

//If the previous example seems to have to much code... well...
//If you know the structure of the object and is guarantied that it is properly constructed 
//You can go straight for the prize (at you own risk)

JsonResult oObject = Json_Parse(pBuffer).RootObject;

printf("the name of my object : %s \n", Json_GetPropertyByName(oObject, "name").Value.StringValue);
JsonObject oCoordinates = Json_GetPropertyByName(oObject, "location").Value
print("coordinate \"lat\" is %f \n", Json_GetPropertyByName(oCoordinates, "lat").Value.DoubleValue)
print("coordinate \"lon\" is %f \n", Json_GetPropertyByName(oCoordinates, "lon").Value.DoubleValue)
JsonObject oArray = Json_GetPropertyByName(oObject, "mixed_array").Value
 
printf("a string containing: %s\n", Json_GetElementAtIndex(oObject, 3).Value.StringValue);
printf("a boolean literal: %s \n", Json_GetElementAtIndex(oObject, 1).Value.BoolValue ? "true" : "false");

最后是一些用于“探索”和打印未知 JSON 文本的代码。

void dump_object(int nLevel, JsonObject oJson)
{
    switch (oJson.Type)
    {
        case JsonTypeArray:
            printf("%*s array(%i)\n", nLevel * 4, "", Json_GetElementCount(oJson));
            for (JsonElement oElement = Json_IterateElements(oJson); oElement.Value.Type != JsonTypeInvalid; oElement = Json_NextElement(oElement))
            {
                printf("%*s element(%i)\n", (nLevel + 1) * 4, "", oElement.Index);
                dump_object(nLevel + 1, oElement.Value);

            }
            break;
        case JsonTypeBool:
            printf("%*s bool(%s)\n", nLevel * 4, "", oJson.BoolValue ? "true" : "false");
            break;
        case JsonTypeNull:
            printf("%*s null\n", nLevel * 4, "");
            break;
        case JsonTypeNumber:
            printf("%*s number(%f)\n", nLevel * 4, "", oJson.DoubleValue);
            break;
        case JsonTypeObject:
            printf("%*s object(%i)\n", nLevel * 4, "", Json_GetPropertyCount(oJson));
            for (JsonProperty oProperty = Json_IterateProperties(oJson); oProperty.Value.Type != JsonTypeInvalid; oProperty = Json_NextProperty(oProperty))
            {
                printf("%*s property(%s)\n", (nLevel + 1) * 4, "", oProperty.Name);
                dump_object(nLevel + 1, oProperty.Value);

            }
            break;
        case JsonTypeString:
            printf("%*s string(%s)\n", nLevel * 4, "", oJson.StringValue);
            break;
        default:
        case JsonTypeInvalid:
            printf("%*s invalid\n", nLevel * 4, "");
            break;
    }
}

//call the "dump" function that will print everything it finds 
JsonResult oResult = Json_Parse(pBuffer);
if (oResult.Success && oResult.RootObject.Type != JsonTypeInvalid)
    dumpObject(0, oResult.RootObject);

构建 API

由于解析 JSON 文本通常也包括创建它，因此该库还提供了写入 JSON 文本的函数。在这种情况下，动态内存分配是不可避免的，但缓冲区的分配和重新分配由库本身管理。最值得注意的是，在每一步中，缓冲区都包含有效的 JSON 文本。这意味着在每次交互时都会进行一些额外的内存复制。

构建函数接收指向指针的指针，因为它们重新分配缓冲区时，缓冲区地址可能会发生变化。

元素	描述
`char* Json_CreateBuffer()`	为构建 JSON 文本分配一个缓冲区，它可以在任何可以使用字符串的地方使用。但不能通过普通函数重新分配/释放。
`void Json_ReleaseBuffer(char*)`	释放先前分配的缓冲区的内存。
`int Json_AddNull(char** pBuffer)`	在当前 `array` 作用域的插入点写入一个文字 `null` 值。
`int Json_AddBool(char** pBuffer, int bValue)`	在当前 `array` 作用域的插入点添加一个文字 `true` 或 `false` 值。
`int Json_AddString(char** pBuffer, const char* sValue)`	在当前 `array` 作用域的插入点添加一个字符串值。
`int Json_AddNumber(char** pBuffer, double nValue)`	在当前 `array` 作用域的插入点添加一个数字值。
`int Json_AddArray(char** pBuffer)`	在当前 `array` 的插入点打开一个数组作用域，并将插入点移动到新创建的数组的作用域内部。
`int Json_AddObject(char** pBuffer)`	在当前 `array` 的插入点打开一个对象作用域，并将插入点移动到新创建的对象的作用域内部。
`int Json_AddPropertyNull(char** pBuffer, const char* sName)`	在当前 `object` 作用域的插入点写入一个具有 `null` 值的属性。
`int Json_AddPropertyBool(char** pBuffer, const char* sName, int bValue)`	在当前 `object` 作用域的插入点写入一个具有 `true` 或 `false` 值的属性。
`int Json_AddPropertyString(char** pBuffer, const char* sName, const char* sValue)`	在当前 `object` 作用域的插入点写入一个具有字符串值的属性。
`int Json_AddPropertyNumber(char** pBuffer, const char* sName, double nValue)`	在当前 `object` 作用域的插入点写入一个具有数字值的属性。
`int Json_AddPropertyArray(char** pBuffer, const char* sName)`	在当前 `object` 的插入点写入一个具有空数组值的属性，并将插入点移动到新创建的数组的作用域内部。
`int Json_AddPropertyObject(char** pBuffer, const char* sName)`	在当前 `object` 的插入点写入一个具有空对象值的属性，并将插入点移动到新创建的对象的作用域内部。
`int Json_ExitScope(char** pBuffer)`	将插入指针移回父作用域。
`const char* Json_GetError(char* pBuffer)`	在任何先前函数返回 `0` 的情况下返回错误消息。
`char* Json_Indent(char*)`	返回一个已分配的缓冲区，其中包含格式化后的 JSON 文本。（缓冲区必须使用 free() 释放）
`char* Json_Compress(char* pChar)`	从 JSON 文本中删除不显著的空白字符（修改是就地进行的）。

示例

一个示例文件作为以下代码的参考。

char* pBuffer = Json_CreateBuffer();
//the root object will be an array (brackets are used only for indentation)
Json_AddArray(&pBuffer);
{
    Json_AddObject(&pBuffer);//this opens a new scope and moves the insertion point to that scope
    {
        //so now we are adding properties to the object
        Json_AddPropertyNull(&pBuffer, "null_property");
        Json_AddPropertyBool(&pBuffer, "bool_property", 1);
        Json_AddPropertyString(&pBuffer, "string_property", "foo\nbar");
        Json_AddPropertyNumber(&pBuffer, "number_property", -111.222);
        Json_AddPropertyObject(&pBuffer, "child_object");//opens a new scope for child object
        {
            Json_AddPropertyArray(&pBuffer, "empty_array");//opens a new scope for an array
            Json_ExitScope(&pBuffer);//exit the newly created array scope (leaving it empty)
        }
        Json_ExitScope(&pBuffer);//exit the child object scope
        Json_AddPropertyString(&pBuffer, "parent_scope", "after the child object");
    }
    Json_ExitScope(&pBuffer);//exits the object scope, meaning we return to the array scope
    
    //these are added to the array
    Json_AddNull(&pBuffer);
    Json_AddBool(&pBuffer, 0);
    Json_AddString(&pBuffer, "hello world");
    Json_AddNumber(&pBuffer, 321.0123);
    Json_AddNumber(&pBuffer, .1);
    Json_AddNumber(&pBuffer, -0.5);
}
//the text was printable at any time, but here is the end result
printf("%s", pBuffer);

char* pPretty = Json_Indent(pBuffer);
//and here it is formatted and indented
printf("%s", pBuffer);

//the formatted version must be released with "free" 
free(pPretty)
//construction buffer CANT be used with "free",  `Json_ReleaseBuffer` MUST be used
Json_ReleaseBuffer(pBuffer);

工作原理

作用域数据

要了解此技术如何工作，我们必须认识到 JSON 包含的额外信息占用了原生数据操作不需要的空间。以字符串 "foo" 为例。它占用 5 个字节，包含以下信息：

|0x01|0x02|0x03|0x04|0x05|
|   "|   f|   o|   o|   "|

要将其转换为原生的以 null 结尾的 c 字符串，我们只需在字符串最后一个 o 之后放置一个 null。但是，前面已经有一个字节带有 "，我们不需要它。因此，我们只需更改内存内容为：

|0x01|0x02|0x03|0x04|0x05|
|   "|   f|   o|   o|  \0|

现在，返回指向地址 0x02 的指针将指向一个有效的以 null 结尾的 c 字符串。由于我们重用了字节，因此字符串开头有一个额外的字节，其中还有一个无用的 "。我们可以用它来放置一个特定的字节值来“标记”字符串的开头。因此，内存中的实际数据将变为：

|0x01|0x02|0x03|0x04|0x05|
| str|   f|   o|   o|  \0|

当然，字符串可能需要反转义，但这有利于我们，因为它意味着它将占用更少的空间。这是字符串 "foo\nbar" 的另一个示例：

        |0x01|0x02|0x03|0x04|0x05|0x06|0x07|0x08|0x09|0x0A|0x0B|0x0C|0x0D|0x0E|0x0F|
original|   "|   f|   o|   o|   \|   n|   b|   a|   r|   "|    |    |    |    |    |
  parsed| str|   f|   o|   o|  \n|   b|   a|   r|  \0|    |    |    |    |    |    |

对于 JSON 对象也是如此，它以 {} 的形式提供了 2 个额外的字节，而数组以 [] 的形式提供了额外的字节。

对象始终由键值对组成，因此可以将其内容解析并迭代为对，从而无需任何分隔符并丢弃 : 和 , 字符。

数组也是如此，直到我们到达数组末尾，每个连续的值都是数组的一个项。

因此，对象 {"foo":"bar"} 的示例可以这样解析：

        |0x01|0x02|0x03|0x04|0x05|0x06|0x07|0x08|0x09|0x0A|0x0B|0x0C|0x0D|0x0E|0x0F|
original|   {|   "|   f|   o|   o|   "|   :|   "|   b|   a|   r|   "|   }|    |    |
  parsed| obj| str|   f|   o|   o|  \0| str|   b|   a|   r|  \0| end|    |    |    |

而像 ["foo","bar"] 这样的数组将非常相似，唯一的区别在于类型“标记”以及我们如何迭代数据。

        |0x01|0x02|0x03|0x04|0x05|0x06|0x07|0x08|0x09|0x0A|0x0B|0x0C|0x0D|0x0E|0x0F|
original|   [|   "|   f|   o|   o|   "|   ,|   "|   b|   a|   r|   "|   ]|    |    |
  parsed| arr| str|   f|   o|   o|  \0| str|   b|   a|   r|  \0| end|    |    |    |

字面量

在理解了内存中的替换后，对于字面量 null、true 和 false，事情变得更加容易，因为它们提供了 4 个和 5 个字节，而我们只需要一个字节来放置一个“标记”，其字节值表示相应类型。所以这里是一个 JSON [true,false,null] 的简单示例：

        |0x01|0x02|0x03|0x04|0x05|0x06|0x07|0x08|0x09|0x0A|0x0B|0x0C|0x0D|0x0E|0x0F|0x10|0x11|
original|   [|   t|   r|   u|   e|   ,|   f|   a|   l|   s|   e|   ,|   n|   u|   l|   l|   ]|
  parsed| arr| tru| fal| nul| end|    |    |    |    |    |    |    |    |    |    |    |    |

数字

在这里我们遇到了一个障碍，与作用域数据不同，没有额外的字节供我们放置“标记”，而且与字面量不同，也没有固定的大小。

存在简单的情况，例如数字 1234，它占用 4 个字节，允许我们存储一个 32 位整数，但这仍然留下像 9 这样的情况，只有一个字节可用。如果我们用它来放置数字“标记”，我们将没有存储数据的空间。如果我们存储数据，我们就不知道如何在没有“标记”指示类型的情况下解释该数据。

但是，如果我们考虑“标记”字节及其值，我们可以看到到目前为止我们只定义了几个唯一值（实际上是 7 个），并且要存储 8 个不同的值，我们只需要 3 位。这给我们留下了“标记”字节的 5 位，我们永远不会使用它们，也许我们可以用这些位来存储实际数据。

这让我们走上了正确的道路。不仅如此，而且通过调整“标记”位，我们可以处理另一个问题……

以 JSON ["One","Two","Three"] 为例。如果我们有一个指向数组开头的指针，并且我们希望获取位置 1 的项（即第二个字符串）。我们需要“跳过”第一个项，但转到以 null 结尾的字符串末尾的唯一方法是读取所有字节直到遇到 \0。如果我们知道字符串的大小，我们实际上可以跳转到其末尾，然后获取下一个值。

因此，如果我们能够存储对象的大小，那么随机访问会更快。当然，JSON 允许任意大小的字符串、数组或对象。一个 32 位整数可以处理所有这些，但我们没有地方存放它，所以必须做出折衷。我们将把短字符串、数组或对象的大小存储在我们“标记”的可用位中，并为那些需要迭代才能找到实际大小的项设置一个不同的“标记”值。

这引导我们实现了我们“标记”值的最终版本，它们实际上是我们已解析值的标志和数据的编码。

8   7   6   5   4   3   2   1
-----------------------------   
6bit size [0-63]        0   1   JsonMarkerSmallString
6bit size [0-63]        1   0   JsonMarkerSmallObject
6bit size [0-63]        1   1   JsonMarkerSmallArray
5bit int [0-31]     1   0   0   JsonMarkerDecimal
4bit int [0-15] 1   0   0   0   JsonMarkerDigit
3bit [0-7]  1   0   0   0   0   JsonMarkerInt
1   0   0   0   0   0   0   0   JsonMarkerLargeString
1   0   1   0   0   0   0   0   JsonMarkerLargeObject
1   1   0   0   0   0   0   0   JsonMarkerLargeArray
1   1   1   0   0   0   0   0   JsonMarkerSequenceEnd
0   0   1   0   0   0   0   0   JsonMarkerNull
0   1   0   0   0   0   0   0   JsonMarkerTrue
0   1   1   0   0   0   0   0   JsonMarkerFalse
0   0   0   0   0   0   0   0   unused null

如果我们的“标记”的前 2 位中的任何一位被信号，我们就知道我们正在处理一个作用域，并且我们可以使用其他 6 位来保存从 0 到 63 的值，表示作用域的字节大小。这使我们能够轻松跳过短字符串，我相信这将是对象属性名称的 100% 的情况。以及其他可能批量出现的小对象，如 {"x":123,"y":321}，也将很容易跳过。

如果前 2 位是 0，我们则检查下一位。如果被信号，则表示后面将跟着一个整数，但小数必须移动 n 位。n 存储在标记的下一 5 位中，允许我们最多有 15 位小数。如果您想知道是否有“空间”来存储这个“标记”，请记住，要在 JSON 中定义任何带有小数位的内容都需要放置一个 .，因此总会有一个额外的字节可用于小数标记。

如果第 3 位也是 0，我们则检查第 4 位。如果为 1，则这是单个数字的标记，接下来的 4 位允许我们拥有从 0 到 16 的值，这足以存储从 '0' 到 '9' 的值，这些是数字的最坏情况。

如果第 4 位也是 0 但第 5 位是 1，我们在剩余的 3 位中有一个枚举器，它告诉我们后面整数的大小。

1 = 8 位有符号整数
2 = 16 位有符号整数
3 = 32 位有符号整数
4 = 64 位有符号整数

并且该整数的字节由 JSON 文本中的数字保证，我们可以检查最坏情况来确保。

9 1 字节 - 特殊情况，我们使用 JsonMarkerDigit
99 2 字节 - 1 字节“标记”表示 8 位整数，1 字节存储 -128 到 127 的值
999 3 字节 - 1 字节“标记”表示 16 位整数，2 字节存储 -32768 到 32767 的值
9999 4 字节 - 1 字节“标记”表示 16 位整数，2 字节存储 -32768 到 32767 的值（节省 1 字节）
99999 5 字节 - 1 字节“标记”表示 32 位整数，4 字节存储 -2147483648 到 2147483647 的值
等等...

如果我们在此任何一个数字前加上一个 - 负号，我们将获得更多空间。

最后，我们标记的最后 3 位组合提供了 8 种组合，足以定义我们其他类型值所需的标记。

全零组合被特意留空，以便在字符串终止或传递的缓冲区方面不会出错。遇到的 0 始终表示错误。

最后的想法

这项技术和编码是我自己创建的，没有任何外部灵感。这并不意味着我认为它是独一无二或新颖的。外面有很多聪明人，我不会惊讶如果某些其他算法使用了类似的编码技术。无论如何，我认为这段代码对于内存要求非常严格的项目可能有用，它的内存占用空间非常小，但又不会过多地牺牲性能。

代码按原样提供，您可以随意使用。

祝您编码愉快。

C++ - 零动态分配 JSON 解析器

引言

用法

解析 JSON 对象

解析 JSON 数组

解析 JSON 标量

文档

解析 API

enum JsonType

structure JsonResult

structure JsonObject

structure JsonProperty

structure JsonElement

function Json_Parse

用法

functions Json_IterateProperties & Json_NextProperty

用法

function Json_GetPropertyByName

用法

function Json_GetPropertyCount

用法

functions Json_IterateElements & Json_NextElement

用法

function Json_GetElementCount

用法

function Json_GetElementAtIndex

用法

示例

构建 API

示例

工作原理

作用域数据

字面量

数字

最后的想法

function `Json_GetElementAtIndex`