4db: C++ 的动态基于文件的 NoSQL 数据库





5.00/5 (6投票s)
想为您的 C++ 应用程序添加基本的数据库功能吗?
背景
在本文中,我们将介绍 4db 的背景、接口和实现,4db 是一个用于 C++ 项目的动态文件型 NoSQL 数据库。
4db 的根源可以追溯到多年来被称为元字符串的各种事物。
元字符串在 .NET 中由 GitHub 上的 4db.net 延续,但 4db 是一个新的 C++ 移植版本。它现代、简洁、可移植,并直接构建在 SQLite C API 之上。它应该能很好地服务于需要简单文件型数据库的 C++ 开发人员。
接口
使用 4db 最好通过带注释的示例程序来演示
/// <summary>
/// This program demonstrates creating a 4db database and using all four supported commands
/// to populate, access, and manipulate the data.
/// 1. UPSERT
/// 2. SELECT
/// 3. DELETE
/// 4. DROP
/// </summary>
#include "ctxt.h"
#pragma comment(lib, "4db")
#include <stdio.h>
// used by main()
void addCar(fourdb::ctxt& context, int year,
const std::wstring& make, const std::wstring& model);
int main()
{
// 4db is built on SQLite, so to create a 4db database,
// we simply need to specify the location of the database file.
// If the file does not exist, an empty database is automatically created.
// The ctxt class manages the database connection,
// provides many useful functions for executing SELECT queries,
// and implements the UPSERT, DELETE and DROP functions.
// There are many classes in 4db, but ctxt is the one you deal directly with;
// you can auto the rest as seen here.
printf("Opening database...\n");
fourdb::ctxt context("cars.db");
// Drop our database table to start things clean.
printf("Starting up...\n");
context.drop(L"cars");
// Pass our context into addCar to add database records...so many cars...
printf("Adding cars to database...\n");
addCar(context, 1987, L"Nissan", L"Pathfinder");
addCar(context, 1998, L"Toyota", L"Tacoma");
addCar(context, 2001, L"Nissan", L"Xterra");
//...
// Select data out of the database using a basic dialect of SQL.
// Here, we gather the "value" pseudo-column which
// is the primary key added by the addCar function.
// We create a query object with our SELECT query,
// pass in the value for the @year parameter,
// and use ctxt::execQuery function to execute the query,
// handing back a reader object to process the results.
printf("Getting cars...\n");
std::vector<fourdb::strnum> oldCarKeys;
auto select =
fourdb::sql::parse
(
L"SELECT value, year, make, model "
L"FROM cars "
L"WHERE year < @year "
L"ORDER BY year ASC"
);
select.addParam(L"@year", 2000);
auto reader = context.execQuery(select);
while (reader->read())
{
// Collect the primary key ("value") that addCar added
oldCarKeys.push_back(reader->getString(0));
// 4db values are either numbers (doubles) or strings
printf("%d: %S - %S\n",
static_cast<int>(reader->getDouble(1)),
reader->getString(2).c_str(),
reader->getString(3).c_str());
}
// We use the list of primary keys to delete some rows.
printf("Deleting old cars... (%u)\n", static_cast<unsigned>(oldCarKeys.size()));
context.deleteRows(L"cars", oldCarKeys);
printf("All done.\n");
return 0;
}
/// <summary>
/// UPSERT a car into our database using the define function.
/// You pass the table name, primary key value, and column data to this function.
/// No need to explicitly create the table, just refer to it by name
/// and the define function takes care of it.
/// NOTE: The primary key value and column data values
/// can only be strings or numbers.
/// For numbers, they have to be convertible to doubles,
/// and are selected out of the database as doubles.
/// </summary>
/// <param name="context">ctxt for doing database work</param>
/// <param name="year">year of the car</param>
/// <param name="make">make of the car</param>
/// <param name="model">model of the car</param>
void addCar(fourdb::ctxt& context, int year,
const std::wstring& make, const std::wstring& model)
{
std::wstring tableName = L"cars";
fourdb::strnum primaryKey = fourdb::num2str(year) + L"_" + make + L"_" + model;
fourdb::paramap columnData
{
{ L"year", year },
{ L"make", make },
{ L"model", model },
};
context.define(tableName, primaryKey, columnData);
}
实现
元字符串的概念一直是通过一个看起来像行列 SQL 接口的东西来呈现的,其实现是在一个“真正的” SQL 数据库中,最初只有 MySQL,现在只有 SQLite。
“虚拟”模式的表被提取到 SQLite 中的单独的真实表中
- 虚拟模式中所有表的注册表存储在真实的 表,嗯,表中
- 所有表中所有列都在 名称 表中
- 数据库中的每个唯一值 -
wstring
或double
- 都存储在 值 表中 - 虚拟表的每一行都由真实 项目 表中的一行表示
- 所有内容都通过 itemnamevalues 表连接在一起,每个数据单元格一行,
itemid
->nameid
->valueid
大量开销,可能永远无法实现高性能。但它很简单,并且允许动态模式。并且有大量的用例不需要高性能。我看到了这项技术的光明前景。
musicdb - 一个更大的例子
至少在 Windows 上,iTunes 有一个文件,描述了媒体库,该文件是一个易于解析的 XML 文件。该文件中有每个库中每个曲目的条目。 XML 文件中的每个条目都像这样
<dict>
<key>Track ID</key><integer>1002</integer>
<key>Size</key><integer>7973544</integer>
<key>Total Time</key><integer>242755</integer>
<key>Disc Number</key><integer>1</integer>
<key>Disc Count</key><integer>1</integer>
<key>Track Number</key><integer>1</integer>
<key>Track Count</key><integer>13</integer>
<key>Year</key><integer>2012</integer>
<key>Date Modified</key><date>2016-08-22T01:35:10Z</date>
<key>Date Added</key><date>2021-03-17T00:41:46Z</date>
<key>Bit Rate</key><integer>256</integer>
<key>Sample Rate</key><integer>44100</integer>
<key>Artwork Count</key><integer>1</integer>
<key>Persistent ID</key><string>A8CF63F61390C4BC</string>
<key>Track Type</key><string>File</string>
<key>File Folder Count</key><integer>5</integer>
<key>Library Folder Count</key><integer>1</integer>
<key>Name</key><string>Kryptonite</string>
<key>Artist</key><string>3 Doors Down</string>
<key>Album Artist</key><string>3 Doors Down</string>
<key>Composer</key><string>Matt Roberts</string>
<key>Album</key><string>The Greatest Hits [+digital booklet]</string>
<key>Genre</key><string>Alternative Rock</string>
<key>Kind</key><string>MPEG audio file</string>
<key>Comments</key><string>Amazon.com Song ID: 233359329</string>
<key>Sort Album</key><string>Greatest Hits [+digital booklet]</string>
<key>Location</key><string>file:///C:/Users/ballo/Music/iTunes/iTunes%20Media/
Music/3%20Doors%20Down/The%20Greatest%20Hits%20%5B+digital%20booklet%5D/
01-01-%20Kryptonite.mp3</string>
</dict>
musicdb
程序解析整个 XML 文件,并构建库的内存表示。它进行单个 4db 调用,将该表示加载到 4db 数据库中
bool inDict = false;
std::unordered_map<fourdb::strnum, fourdb::paramap> dicts; // accumalate info of all tracks
fourdb::paramap dict; // info about the current track
...
while (xmlFileStream)
{
std::wstring line;
std::getline(xmlFileStream, line);
const wchar_t* tag = wcsstr(line.c_str(), L"<");
if (tag == nullptr)
continue;
if (_wcsicmp(tag, L"<dict>") == 0)
{
inDict = true;
dict.clear();
}
else if (_wcsicmp(tag, L"</dict>") == 0)
{
inDict = false;
if (dict.size() == 1)
continue;
std::wstring key; // just needs to be unique
for (const auto& kvp : dict)
{
const auto& snum = kvp.second;
if (snum.isStr())
key += snum.str();
else
key += fourdb::num2str(snum.num());
key += '|';
}
dicts.insert({ key, dict });
++addedCount;
}
else if (inDict && wcsncmp(tag, L"<key>", 5) == 0) // metadata key line
{
const wchar_t* closingKey = wcsstr(tag, L"</key>");
if (closingKey == nullptr)
{
printf("Unclosed <key>: %S\n", line.c_str());
continue;
}
std::wstring key(tag + 5, closingKey);
cleanXmlValue(key); // deal with <, etc.
const auto& fieldNameIt = fieldNames.find(key); // look column name from iTunes
// field name
if (fieldNameIt == fieldNames.end())
continue; // not a field we care about
const auto& fieldName = fieldNameIt->second;
// It's either integer, string, or date
const wchar_t* valueTag = nullptr;
valueTag = wcsstr(closingKey, L"<integer>");
if (valueTag != nullptr)
{
const wchar_t* closingValue = wcsstr(tag, L"</integer>");
if (closingValue == nullptr)
{
#ifdef _DEBUG
printf("Unclosed <integer>: %S\n", line.c_str());
#endif
continue;
}
std::wstring valueStr(valueTag + 9, closingValue);
double valueNum = _wtof(valueStr.c_str());
dict.insert({ fieldName, valueNum });
continue;
}
valueTag = wcsstr(closingKey, L"<string>");
if (valueTag != nullptr)
{
const wchar_t* closingValue = wcsstr(tag, L"</string>");
if (closingValue == nullptr)
{
#ifdef _DEBUG
printf("Unclosed <string>: %S\n", line.c_str());
#endif
continue;
}
std::wstring valueStr(valueTag + 8, closingValue);
cleanXmlValue(valueStr);
dict.insert({ fieldName, valueStr });
}
valueTag = wcsstr(closingKey, L"<date>");
... // dates are treated like strings
}
...
}
// This one line loads the entire iTunes library into the 4db database "tracks" table
// The lambda / printf business is for pacifying the user during the import process
context.define(L"tracks", dicts, [](const wchar_t* msg) { printf("%S...\n", msg); });
一旦 4db 数据库被填充,您就可以使用基本方言的 SQL SELECT
语句来查询它,并以一种令人愉悦的格式获取查询结果
printf("> ");
std::wstring line;
std::getline(std::wcin, line);
...
auto select = fourdb::sql::parse(line);
auto paramNames = fourdb::extractParamNames(line);
if (!paramNames.empty())
{
printf("\n");
printf("Enter values for the parameters in your query;"
" put # in front of numeric values:\n");
printf("\n");
for (const auto& paramName : paramNames)
{
printf("%S: ", paramName.c_str());
std::getline(std::wcin, line);
if (!line.empty() && line[0] == '#')
select.addParam(paramName, _wtof(line.substr(1).c_str()));
else
select.addParam(paramName, line);
}
}
auto reader = context.execQuery(select);
auto colCount = reader->getColCount();
std::vector<std::vector<std::wstring>> matrix;
std::unordered_set<std::wstring> seenRowSummaries;
while (reader->read())
{
std::vector<std::wstring> newRow;
for (unsigned col = 0; col < colCount; ++col)
newRow.push_back(reader->getString(col)); // getString works with all column types
std::wstring newRowSummary = fourdb::join(newRow, L"\v");
if (seenRowSummaries.find(newRowSummary) != seenRowSummaries.end())
continue;
seenRowSummaries.insert(newRowSummary);
matrix.push_back(newRow);
}
printf("\n");
printf("Results: %u\n", static_cast<unsigned>(matrix.size()));
if (matrix.empty())
continue;
printf("\n");
std::vector<std::wstring> headerRow;
for (unsigned col = 0; col < colCount; ++col)
headerRow.push_back(reader->getColName(col));
matrix.insert(matrix.begin(), headerRow);
std::vector<unsigned> columnWidths;
for (const auto& header : headerRow)
columnWidths.push_back(static_cast<unsigned>(header.size()));
for (const auto& row : matrix)
{
for (size_t cellidx = 0; cellidx < columnWidths.size(); ++cellidx)
columnWidths[cellidx] = std::max(columnWidths[cellidx], row[cellidx].size());
}
for (size_t cellidx = 0; cellidx < columnWidths.size(); ++cellidx)
{
const auto& header = headerRow[cellidx];
auto headerWidth = columnWidths[cellidx];
printf("%S", header.c_str());
for (size_t s = header.size(); s <= headerWidth; ++s)
printf(" ");
}
printf("\n");
for (size_t cellIdx = 0; cellIdx < columnWidths.size(); ++cellIdx)
{
auto headerWidth = columnWidths[cellIdx];
for (size_t s = 0; s < headerWidth; ++s)
printf("-");
printf(" ");
}
printf("\n");
for (size_t rowIdx = 1; rowIdx < matrix.size(); ++rowIdx)
{
const auto& row = matrix[rowIdx];
for (size_t cellIdx = 0; cellIdx < columnWidths.size(); ++cellIdx)
{
const auto& value = row[cellIdx];
auto headerWidth = columnWidths[cellIdx];
printf("%S", value.c_str());
for (size_t s = value.size(); s <= headerWidth; ++s)
printf(" ");
}
printf("\n");
}
SQLite 包装器
4db 的核心是围绕 SQLite C API 的包装器类。 db 类管理数据库连接并提供用于执行查询的例程。 dbreader
类准备和执行查询并提供对查询结果的访问。
参数使用 paramap
传递,它是 std::unordered_map<std::wstring, strnum>
的一个 typedef。 strnum
是一个类,可以是 wstring
或 double
。
学习 SQLite 的 C API 很有趣。包装器类主要独立存在;您可以稍加修改以满足您的目的。
结论
我希望您喜欢了解 4db,并了解使用 4db 处理您的 iTunes 媒体库可以有多么有趣。
历史
- 2021 年 11 月 22 日:初始版本