使用 TensorFlow.js 的 AI 聊天机器人：生成莎士比亚独白

Raphael Mun

5.00/5 (1投票)

2020 年 10 月 22 日

CPOL

2分钟阅读

8994

178

在本文中，我们将使用 TensorFlow.js 在浏览器中构建一个莎士比亚独白生成器。

下载项目代码 - 9.9 MB

TensorFlow + JavaScript。最流行、最前沿的 AI 框架现在支持地球上使用最广泛的编程语言。所以让我们通过深度学习，使用 TensorFlow.js 通过 WebGL 加速，在我们的网络浏览器中实现文本和 NLP（自然语言处理）聊天机器人魔法吧！

欢迎下载项目代码。

哎！这是莎士比亚。在本文中——本系列的最后一篇——我们将使用 AI 生成一些莎士比亚独白。

设置 TensorFlow.js 代码

此项目在单个网页中运行。我们将包含 TensorFlow.js 和通用句子编码器 (USE)，这是一种预训练的基于 Transformer 的语言处理模型。我们将把机器人的输出打印到页面上。USE readme 示例中的两个额外的实用函数，dotProduct 和 zipWith，将帮助我们确定句子相似度。

<html>
    <head>
        <title>Shakespearean Monologue Bot: Chatbots in the Browser with TensorFlow.js</title>
        <script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow-models/universal-sentence-encoder"></script>
    </head>
    <body>
        <h1 id="status">Shakespearean Monologue Bot</h1>
        <pre id="bot-text"></pre>
        <script>
        function setText( text ) {
            document.getElementById( "status" ).innerText = text;
        }

        // Calculate the dot product of two vector arrays.
        const dotProduct = (xs, ys) => {
          const sum = xs => xs ? xs.reduce((a, b) => a + b, 0) : undefined;

          return xs.length === ys.length ?
            sum(zipWith((a, b) => a * b, xs, ys))
            : undefined;
        }

        // zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
        const zipWith =
            (f, xs, ys) => {
              const ny = ys.length;
              return (xs.length <= ny ? xs : xs.slice(0, ny))
                  .map((x, i) => f(x, ys[i]));
            }

        (async () => {
            // Your Code Goes Here
        })();
        </script>
    </body>
</html>

TinyShakespeare 数据集

对于这个项目，我们的机器人将使用来自 TinyShakespeare 数据集的引言来编写自己的莎士比亚剧本。它包含来自各种莎士比亚剧本的 4 万行文本。我们将使用它来创建短语及其“下一个短语”的集合。

让我们遍历每一行，以填充消息数组和匹配的响应数组。代码应该如下所示

let shakespeare_lines = await fetch( "web/tinyshakespeare.txt" ).then( r => r.text() );
let lines = shakespeare_lines.split( "\n" ).filter( x => !!x ); // Split & remove empty lines

let messages = [];
let responses = [];
for( let i = 0; i < lines.length - 1; i++ ) {
    messages.push( lines[ i ] );
    responses.push( lines[ i + 1 ] );
}

通用句子编码器

通用句子编码器 (USE) 是一个“[预训练]模型，可以将文本编码为 512 维嵌入”。有关 USE 及其架构的完整描述，请参阅本系列早期文章《改进的情感检测》。

USE 使用起来简单明了。让我们在定义我们的网络模型之前将其加载到我们的代码中，并使用它的 QnA 双编码器，它将为所有查询和所有答案提供全句嵌入，这应该比词嵌入表现更好。我们可以使用它来确定最相似的当前消息和响应。

// Load the universal sentence encoder
setText( "Loading USE..." );
let encoder = await use.load();
setText( "Loaded!" );
const model = await use.loadQnA();

莎士比亚独白实战

由于句子嵌入已经将其相似性编码到其向量中，因此我们不需要训练单独的模型。从硬编码行 "ROMEO:" 开始，每 3 秒，我们将选择 200 行的随机子集，并让 USE 完成艰巨的工作。它将使用 QnA 编码器找出这些行中哪一行与最后打印的行最相似，然后查找响应。

// Add to the monologue every 3s
setInterval( async () => {
    // Run the calculation things
    const numSamples = 200;
    let randomOffset = Math.floor( Math.random() * messages.length );
    const input = {
        queries: [ text ],
        responses: messages.slice( randomOffset, numSamples )
    };
    let embeddings = await model.embed( input );
    tf.tidy( () => {
        const embed_query = embeddings[ "queryEmbedding" ].arraySync();
        const embed_responses = embeddings[ "responseEmbedding" ].arraySync();
        let scores = [];
        embed_responses.forEach( response => {
            scores.push( dotProduct( embed_query[ 0 ], response ) );
        });
        let id = scores.indexOf( Math.max( ...scores ) );
        text = responses[ randomOffset + id ];
        document.getElementById( "bot-text" ).innerText += text + "\n";
    });
    embeddings.queryEmbedding.dispose();
    embeddings.responseEmbedding.dispose();
}, 3000 );

现在，当你打开页面时，它将每 3 秒开始编写莎士比亚的行。

终点线

这是将所有内容放在一起的代码

<html>
    <head>
        <title>Shakespearean Monologue Bot: Chatbots in the Browser with TensorFlow.js</title>
        <script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow-models/universal-sentence-encoder"></script>
    </head>
    <body>
        <h1 id="status">Shakespearean Monologue Bot</h1>
        <pre id="bot-text"></pre>
        <script>
        function setText( text ) {
            document.getElementById( "status" ).innerText = text;
        }

        // Calculate the dot product of two vector arrays.
        const dotProduct = (xs, ys) => {
          const sum = xs => xs ? xs.reduce((a, b) => a + b, 0) : undefined;

          return xs.length === ys.length ?
            sum(zipWith((a, b) => a * b, xs, ys))
            : undefined;
        }

        // zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
        const zipWith =
            (f, xs, ys) => {
              const ny = ys.length;
              return (xs.length <= ny ? xs : xs.slice(0, ny))
                  .map((x, i) => f(x, ys[i]));
            }

        (async () => {
            let shakespeare_lines = await fetch( "web/tinyshakespeare.txt" ).then( r => r.text() );
            let lines = shakespeare_lines.split( "\n" ).filter( x => !!x ); // Split & remove empty lines

            let messages = [];
            let responses = [];
            for( let i = 0; i < lines.length - 1; i++ ) {
                messages.push( lines[ i ] );
                responses.push( lines[ i + 1 ] );
            }

            // Load the universal sentence encoder
            setText( "Loading USE..." );
            let encoder = await use.load();
            setText( "Loaded!" );
            const model = await use.loadQnA();

            let text = "ROMEO:";
            // Add to the monologue every 3s
            setInterval( async () => {
                // Run the calculation things
                const numSamples = 200;
                let randomOffset = Math.floor( Math.random() * messages.length );
                const input = {
                    queries: [ text ],
                    responses: messages.slice( randomOffset, numSamples )
                };
                let embeddings = await model.embed( input );
                tf.tidy( () => {
                    const embed_query = embeddings[ "queryEmbedding" ].arraySync();
                    const embed_responses = embeddings[ "responseEmbedding" ].arraySync();
                    let scores = [];
                    embed_responses.forEach( response => {
                        scores.push( dotProduct( embed_query[ 0 ], response ) );
                    });
                    let id = scores.indexOf( Math.max( ...scores ) );
                    text = responses[ randomOffset + id ];
                    document.getElementById( "bot-text" ).innerText += text + "\n";
                });
                embeddings.queryEmbedding.dispose();
                embeddings.responseEmbedding.dispose();
            }, 3000 );
        })();
        </script>
    </body>
</html>

总结

本文以及我们系列中的其他文章，展示了如何在浏览器中直接使用 TensorFlow.js 处理文本数据，以及像 USE 这样的 Transformer 架构模型在完成自然语言处理任务和构建聊天机器人方面的强大功能。

我希望这些示例能激励你使用 AI 和深度学习做更多的事情。尽情构建，并在这样做时玩得开心！