4db: C++ 的动态基于文件的 NoSQL 数据库

Michael Sydney Balloni

5.00/5 (6投票s)

2021年11月22日

Apache

2分钟阅读

17022

317

想为您的 C++ 应用程序添加基本的数据库功能吗？太懒而无法直接使用 SQLite？继续阅读！

下载项目 - 53.8 KB

背景

在本文中，我们将介绍 4db 的背景、接口和实现，4db 是一个用于 C++ 项目的动态文件型 NoSQL 数据库。

4db 的根源可以追溯到多年来被称为元字符串的各种事物。

元字符串在 .NET 中由 GitHub 上的 4db.net 延续，但 4db 是一个新的 C++ 移植版本。它现代、简洁、可移植，并直接构建在 SQLite C API 之上。它应该能很好地服务于需要简单文件型数据库的 C++ 开发人员。

接口

使用 4db 最好通过带注释的示例程序来演示

/// <summary>
/// This program demonstrates creating a 4db database and using all four supported commands
/// to populate, access, and manipulate the data.
/// 1. UPSERT
/// 2. SELECT
/// 3. DELETE
/// 4. DROP
/// </summary>
#include "ctxt.h"
#pragma comment(lib, "4db")

#include <stdio.h>

// used by main()
void addCar(fourdb::ctxt& context, int year, 
            const std::wstring& make, const std::wstring& model);

int main()
{
    // 4db is built on SQLite, so to create a 4db database,
    // we simply need to specify the location of the database file.
    // If the file does not exist, an empty database is automatically created.
    // The ctxt class manages the database connection,
    // provides many useful functions for executing SELECT queries,
    // and implements the UPSERT, DELETE and DROP functions.
    // There are many classes in 4db, but ctxt is the one you deal directly with;
    // you can auto the rest as seen here.
    printf("Opening database...\n");
    fourdb::ctxt context("cars.db");

    // Drop our database table to start things clean.
    printf("Starting up...\n");
    context.drop(L"cars");

    // Pass our context into addCar to add database records...so many cars...
    printf("Adding cars to database...\n");
    addCar(context, 1987, L"Nissan", L"Pathfinder");
    addCar(context, 1998, L"Toyota", L"Tacoma");
    addCar(context, 2001, L"Nissan", L"Xterra");
    //...

    // Select data out of the database using a basic dialect of SQL.
    // Here, we gather the "value" pseudo-column which 
    // is the primary key added by the addCar function.
    // We create a query object with our SELECT query,
    // pass in the value for the @year parameter,
    // and use ctxt::execQuery function to execute the query,
    // handing back a reader object to process the results.
    printf("Getting cars...\n");
    std::vector<fourdb::strnum> oldCarKeys;
    auto select =
        fourdb::sql::parse
        (
            L"SELECT value, year, make, model "
            L"FROM cars "
            L"WHERE year < @year "
            L"ORDER BY year ASC"
        );
    select.addParam(L"@year", 2000);
    auto reader = context.execQuery(select);
    while (reader->read())
    {
        // Collect the primary key ("value") that addCar added
        oldCarKeys.push_back(reader->getString(0));

        // 4db values are either numbers (doubles) or strings
        printf("%d: %S - %S\n", 
               static_cast<int>(reader->getDouble(1)), 
               reader->getString(2).c_str(),
               reader->getString(3).c_str());
    }

    // We use the list of primary keys to delete some rows.
    printf("Deleting old cars... (%u)\n", static_cast<unsigned>(oldCarKeys.size()));
    context.deleteRows(L"cars", oldCarKeys);

    printf("All done.\n");
    return 0;
}

/// <summary>
/// UPSERT a car into our database using the define function.
/// You pass the table name, primary key value, and column data to this function.
/// No need to explicitly create the table, just refer to it by name 
/// and the define function takes care of it.
/// NOTE: The primary key value and column data values
///       can only be strings or numbers.
///       For numbers, they have to be convertible to doubles,
///       and are selected out of the database as doubles.
/// </summary>
/// <param name="context">ctxt for doing database work</param>
/// <param name="year">year of the car</param>
/// <param name="make">make of the car</param>
/// <param name="model">model of the car</param>
void addCar(fourdb::ctxt& context, int year, 
            const std::wstring& make, const std::wstring& model)
{
    std::wstring tableName = L"cars";
    fourdb::strnum primaryKey = fourdb::num2str(year) + L"_" + make + L"_" + model;
    fourdb::paramap columnData
    {
        { L"year", year },
        { L"make", make },
        { L"model", model },
    };
    context.define(tableName, primaryKey, columnData);
}

实现

元字符串的概念一直是通过一个看起来像行列 SQL 接口的东西来呈现的，其实现是在一个“真正的” SQL 数据库中，最初只有 MySQL，现在只有 SQLite。

“虚拟”模式的表被提取到 SQLite 中的单独的真实表中

虚拟模式中所有表的注册表存储在真实的表，嗯，表中
所有表中所有列都在名称表中
数据库中的每个唯一值 - wstring 或 double - 都存储在值表中
虚拟表的每一行都由真实项目表中的一行表示
所有内容都通过 itemnamevalues 表连接在一起，每个数据单元格一行，itemid -> nameid -> valueid

大量开销，可能永远无法实现高性能。但它很简单，并且允许动态模式。并且有大量的用例不需要高性能。我看到了这项技术的光明前景。

musicdb - 一个更大的例子

至少在 Windows 上，iTunes 有一个文件，描述了媒体库，该文件是一个易于解析的 XML 文件。该文件中有每个库中每个曲目的条目。 XML 文件中的每个条目都像这样

<dict>
	<key>Track ID</key><integer>1002</integer>
	<key>Size</key><integer>7973544</integer>
	<key>Total Time</key><integer>242755</integer>
	<key>Disc Number</key><integer>1</integer>
	<key>Disc Count</key><integer>1</integer>
	<key>Track Number</key><integer>1</integer>
	<key>Track Count</key><integer>13</integer>
	<key>Year</key><integer>2012</integer>
	<key>Date Modified</key><date>2016-08-22T01:35:10Z</date>
	<key>Date Added</key><date>2021-03-17T00:41:46Z</date>
	<key>Bit Rate</key><integer>256</integer>
	<key>Sample Rate</key><integer>44100</integer>
	<key>Artwork Count</key><integer>1</integer>
	<key>Persistent ID</key><string>A8CF63F61390C4BC</string>
	<key>Track Type</key><string>File</string>
	<key>File Folder Count</key><integer>5</integer>
	<key>Library Folder Count</key><integer>1</integer>
	<key>Name</key><string>Kryptonite</string>
	<key>Artist</key><string>3 Doors Down</string>
	<key>Album Artist</key><string>3 Doors Down</string>
	<key>Composer</key><string>Matt Roberts</string>
	<key>Album</key><string>The Greatest Hits [+digital booklet]</string>
	<key>Genre</key><string>Alternative Rock</string>
	<key>Kind</key><string>MPEG audio file</string>
	<key>Comments</key><string>Amazon.com Song ID: 233359329</string>
	<key>Sort Album</key><string>Greatest Hits [+digital booklet]</string>
	<key>Location</key><string>file:///C:/Users/ballo/Music/iTunes/iTunes%20Media/
    Music/3%20Doors%20Down/The%20Greatest%20Hits%20%5B+digital%20booklet%5D/
    01-01-%20Kryptonite.mp3</string>
</dict>

musicdb 程序解析整个 XML 文件，并构建库的内存表示。它进行单个 4db 调用，将该表示加载到 4db 数据库中

bool inDict = false;
std::unordered_map<fourdb::strnum, fourdb::paramap> dicts; // accumalate info of all tracks
fourdb::paramap dict; // info about the current track
...
while (xmlFileStream)
{
	std::wstring line;
	std::getline(xmlFileStream, line);
	const wchar_t* tag = wcsstr(line.c_str(), L"<");
	if (tag == nullptr)
		continue;

	if (_wcsicmp(tag, L"<dict>") == 0)
	{
		inDict = true;
		dict.clear();
	}
	else if (_wcsicmp(tag, L"</dict>") == 0)
	{
		inDict = false;
		if (dict.size() == 1)
			continue;

		std::wstring key; // just needs to be unique
		for (const auto& kvp : dict)
		{
			const auto& snum = kvp.second;
			if (snum.isStr())
				key += snum.str();
			else
				key += fourdb::num2str(snum.num());
			key += '|';
		}

		dicts.insert({ key, dict });
		++addedCount;
	}
	else if (inDict && wcsncmp(tag, L"<key>", 5) == 0) // metadata key line
	{
		const wchar_t* closingKey = wcsstr(tag, L"</key>");
		if (closingKey == nullptr)
		{
			printf("Unclosed <key>: %S\n", line.c_str());
			continue;
		}

		std::wstring key(tag + 5, closingKey);
		cleanXmlValue(key); // deal with <, etc.

		const auto& fieldNameIt = fieldNames.find(key); // look column name from iTunes 
                                                        // field name
		if (fieldNameIt == fieldNames.end())
			continue; // not a field we care about
		const auto& fieldName = fieldNameIt->second;

		// It's either integer, string, or date
		const wchar_t* valueTag = nullptr;

		valueTag = wcsstr(closingKey, L"<integer>");
		if (valueTag != nullptr)
		{
			const wchar_t* closingValue = wcsstr(tag, L"</integer>");
			if (closingValue == nullptr)
			{
	#ifdef _DEBUG
				printf("Unclosed <integer>: %S\n", line.c_str());
	#endif
				continue;
			}

			std::wstring valueStr(valueTag + 9, closingValue);
			double valueNum = _wtof(valueStr.c_str());
			dict.insert({ fieldName, valueNum });
			continue;
		}

		valueTag = wcsstr(closingKey, L"<string>");
		if (valueTag != nullptr)
		{
			const wchar_t* closingValue = wcsstr(tag, L"</string>");
			if (closingValue == nullptr)
			{
	#ifdef _DEBUG
				printf("Unclosed <string>: %S\n", line.c_str());
	#endif
				continue;
			}

			std::wstring valueStr(valueTag + 8, closingValue);
			cleanXmlValue(valueStr);
			dict.insert({ fieldName, valueStr });
		}

		valueTag = wcsstr(closingKey, L"<date>");
		... // dates are treated like strings
	}
	...
}

// This one line loads the entire iTunes library into the 4db database "tracks" table
// The lambda / printf business is for pacifying the user during the import process
context.define(L"tracks", dicts, [](const wchar_t* msg) { printf("%S...\n", msg); });

一旦 4db 数据库被填充，您就可以使用基本方言的 SQL SELECT 语句来查询它，并以一种令人愉悦的格式获取查询结果

printf("> ");
std::wstring line;
std::getline(std::wcin, line);
...

auto select = fourdb::sql::parse(line);
auto paramNames = fourdb::extractParamNames(line);
if (!paramNames.empty())
{
	printf("\n");
	printf("Enter values for the parameters in your query;"
		   " put # in front of numeric values:\n");
	printf("\n");
	for (const auto& paramName : paramNames)
	{
		printf("%S: ", paramName.c_str());
		std::getline(std::wcin, line);
		if (!line.empty() && line[0] == '#')
			select.addParam(paramName, _wtof(line.substr(1).c_str()));
		else
			select.addParam(paramName, line);
	}
}

auto reader = context.execQuery(select);
auto colCount = reader->getColCount();

std::vector<std::vector<std::wstring>> matrix;
std::unordered_set<std::wstring> seenRowSummaries;

while (reader->read())
{
	std::vector<std::wstring> newRow;
	for (unsigned col = 0; col < colCount; ++col)
		newRow.push_back(reader->getString(col)); // getString works with all column types

	std::wstring newRowSummary = fourdb::join(newRow, L"\v");
	if (seenRowSummaries.find(newRowSummary) != seenRowSummaries.end())
		continue;

	seenRowSummaries.insert(newRowSummary);
	matrix.push_back(newRow);
}

printf("\n");

printf("Results: %u\n", static_cast<unsigned>(matrix.size()));
if (matrix.empty())
	continue;

printf("\n");

std::vector<std::wstring> headerRow;
for (unsigned col = 0; col < colCount; ++col)
	headerRow.push_back(reader->getColName(col));
matrix.insert(matrix.begin(), headerRow);

std::vector<unsigned> columnWidths;
for (const auto& header : headerRow)
	columnWidths.push_back(static_cast<unsigned>(header.size()));
for (const auto& row : matrix)
{
	for (size_t cellidx = 0; cellidx < columnWidths.size(); ++cellidx)
		columnWidths[cellidx] = std::max(columnWidths[cellidx], row[cellidx].size());
}

for (size_t cellidx = 0; cellidx < columnWidths.size(); ++cellidx)
{
	const auto& header = headerRow[cellidx];
	auto headerWidth = columnWidths[cellidx];
	printf("%S", header.c_str());
	for (size_t s = header.size(); s <= headerWidth; ++s)
		printf(" ");
}
printf("\n");

for (size_t cellIdx = 0; cellIdx < columnWidths.size(); ++cellIdx)
{
	auto headerWidth = columnWidths[cellIdx];
	for (size_t s = 0; s < headerWidth; ++s)
		printf("-");
	printf(" ");
}
printf("\n");

for (size_t rowIdx = 1; rowIdx < matrix.size(); ++rowIdx)
{
	const auto& row = matrix[rowIdx];
	for (size_t cellIdx = 0; cellIdx < columnWidths.size(); ++cellIdx)
	{
		const auto& value = row[cellIdx];
		auto headerWidth = columnWidths[cellIdx];
		printf("%S", value.c_str());
		for (size_t s = value.size(); s <= headerWidth; ++s)
			printf(" ");
	}
	printf("\n");
}

SQLite 包装器

4db 的核心是围绕 SQLite C API 的包装器类。 db 类管理数据库连接并提供用于执行查询的例程。 dbreader 类准备和执行查询并提供对查询结果的访问。

参数使用 paramap 传递，它是 std::unordered_map<std::wstring, strnum> 的一个 typedef。 strnum 是一个类，可以是 wstring 或 double。

学习 SQLite 的 C API 很有趣。包装器类主要独立存在；您可以稍加修改以满足您的目的。

结论

我希望您喜欢了解 4db，并了解使用 4db 处理您的 iTunes 媒体库可以有多么有趣。

历史

2021 年 11 月 22 日：初始版本