Gibberish
从真实文本生成大量乱码。
介绍
当我查看微软的一个网站 Starter Kits 时,我注意到他们有一堆乱码的示例文本,我想知道可怜的灵魂是谁写了所有这些。然后我想,“他们肯定有一个工具来生成这些乱码文本”,于是我决定自己做一个有趣的 DIY 项目。
Gibberish 可以读取大量文本,并用不同的随机字符替换文本中的每个字母数字字符。或者,它可以将文本分解成单词数组,然后逐个单词地替换文本,仍然可以生成乱码,但可读性更强,并且使用的是真实单词。
我从 http://wordlist.sourceforge.net/ 网站上的 12Dicts 集合中获取了单词列表。但是,如果您愿意,也可以从表单上传自己的单词列表。
它只需要 .NET 2.0,无需安装。
背景
让 Gibberish 用随机字符替换现有字符很容易。但我还需要排除任何特殊字符以保留文本的格式,这并不难。
但是,让 Gibberish 用预定义列表中的单词替换单词要困难一些,因为每个单词可能包含几个“特殊”字符;这些字符中的每一个都应该用来分割单词,而单词的每个部分都会被另一个单词替换。最后,这些单词的部分会用它们原始的“特殊”字符、格式和字母大小写重新组合起来。
示例:Hello,Worldly.World's -> Wheel,Mallets.Coons'l
这需要使用 3 个不同的分隔符将单词分成 4 个单词:“Hello”、“Wordly”、“World”、“s”,分隔符为:,。
真正的诀窍是使用递归方法,每次找到要处理的单词或子单词中的特殊字符时,该方法都会循环回自身。
Using the Code
按钮
“Make Gibberish”按钮将整个 `string` 分解成 `CharArray`,并循环遍历数组中的每个字符。对于每次循环迭代,它都会调用 `SwapCharacters` 函数,并将新字符附加到 `gibberishString`。最后,它将 `gibberishString` 输出到文本框。
“Change Words”按钮将文本分解成单词数组,而不是字符数组,然后循环遍历数组中的每个单词。对于每次循环迭代,它会检查该单词是否包含任何特殊字符;如果没有,它会调用 `SwapWord` 函数,该函数会将该单词与在 `form_load` 事件期间读取的文本文件中的预定义单词列表进行交换。
如果单词包含特殊字符,它会调用 `ProcessSpecialWord` 函数,该函数会将单词分解成要替换的子单词数组。
最后,它会将新单词附加到 `gibberishString`,并最终将 `gibberishString` 输出到文本框。
方法
`SwapCharacters` 函数接收一个 `Char` 并确定它是大写、小写还是数字。如果是大写或小写,它会从字母表中抓取一个随机字符,并根据需要将其转换为大写或小写,然后返回结果 `Char`。如果是数字,它会返回 0-9 之间的随机数字。
' take a character, swap it with a random character, and return the result
Private Function SwapCharacters(ByVal character As Char) As Char
If Char.IsUpper(character) Then
' get a random letter from the letters array, convert to upper,
' and return new character
Return Char.ToUpper(letterCharArray(rand.Next(0, 25)))
ElseIf Char.IsLower(character) Then
' get a random letter from the letters array, which is already lowercase,
' and return it
Return letterCharArray(rand.Next(0, 25))
ElseIf CStr(numberCharArray).Contains(character) Then
' get a random number from 0-9 and store as new character
Return numberCharArray(rand.Next(0, 9))
Else 'not an alpha-numeric digit to return whatever it is to whence it came
Return character
End If
End Function
`SwapWord` 函数接收一个单词 `origWord`,并检查它是否包含任何特殊字符;如果包含,它只会返回传入的 `origWord`。
如果传入的单词长度只有 1 个字符,它会调用 `SwapCharacters` 函数并返回结果。
如果传入的单词没有特殊字符且长度大于一个字符,它会创建一个新的 `ArrayList`,循环遍历加载的单词列表中的每个单词,并将所有与 `origWord` 长度相同的单词添加到 `ArrayList` 中。如果 `ArrayList` 包含与 `origWord` 长度相同的单词,它会从 `ArrayList` 中随机挑选一个单词,并在将其通过 `CopyCase` 函数后返回新单词。
' take the word passed and return a new word of the same length
Private Function SwapWord(ByVal origWord As String) As String
' just in case the word dictionary was empty
If wordListArray Is Nothing Then
Return origWord
End If
' if the word passed has any special characters in it,
' or nothing at all, toss it back
If origWord.IndexOfAny(specialCharArray) >= 0 OrElse origWord = String.Empty Then
Return origWord
ElseIf origWord.Length = 1 Then
' if word is only one character, just replace the character
Return SwapCharacters(origWord.Chars(0)).ToString()
Else ' no special characters and length <> 1,
' so swap out the word and return it back
' build and ArrayList with all words from the array with the same
' length as the word being processed
Dim wordsWithLength As ArrayList = New ArrayList
Dim tempWord As String = String.Empty
' loop through every word in the wordListArray and store words of
' the same length as the original word
For i As Integer = 0 To wordListArray.Length - 1I
tempWord = wordListArray(i)
If tempWord.Length = origWord.Length Then
wordsWithLength.Add(tempWord)
End If
Next i
' we have all the words with the same length as the one we're processing,
' now pick one out at random
If wordsWithLength.Count = 0 Then
Return origWord
Else
Dim newWord As String = wordsWithLength.Item_
(rand.Next(0, wordsWithLength.Count - 1I)).ToString()
' return the new word after copying the letter casing from the original word
Return CopyCase(origWord, newWord)
End If
End If
End Function
`CopyCase` 函数接收 `oldWord` 和 `newWord`,将两个单词分解成各自的 `CharArray`,并循环遍历 `oldWordCharArray` 中的每个位置。对于每次迭代,循环会检查 `Char` 是大写还是小写,并将 `newWord` 数组中相应的字符转换为该大小写。
然后它使用 `CStr` 将 `CharArray` 转换回 `string`,并返回现在大小写正确的 `newWord`。
' loops through every letter in the old and new words and matches character cases
Private Function CopyCase(ByVal oldWord As String, ByVal newWord As String) As String
' first ensure the two words are the same length
If oldWord.Length = newWord.Length Then
' convert each word into a CharArray so we can loop through each character
Dim oldWordCharArray() As Char = oldWord.ToCharArray()
Dim newWordCharArray() As Char = newWord.ToCharArray()
' loop through each character of the old word
For i As Integer = 0 To UBound(oldWordCharArray)
If Char.IsUpper(oldWordCharArray(i)) Then
' original word character was upper case,
' so convert new word character to upper case
newWordCharArray(i) = Char.ToUpper(newWordCharArray(i))
ElseIf Char.IsLower(oldWordCharArray(i)) Then
' original word character was lower case,
' so convert new word character to lower case
newWordCharArray(i) = Char.ToLower(newWordCharArray(i))
End If
Next
' convert the new CharArray back into a word and return to sender
Return CStr(newWordCharArray)
Else
' the old and new words should be the same length, if not, just return
' the unprocessed newWord
Return newWord
End If
End Function
`ProcessSpecialWord` 函数是我绞尽脑汁的地方。我该如何递归地将一个包含多个不同特殊字符的单词分解成一个单独的单词数组,替换它们,然后将它们全部重新组合起来,同时记住哪些特殊字符应该放在哪里?
在编写并删除了大约 1,000 行代码后,我提出了 `ProcessSpecialWord` 方法,该方法会不断将单词传递回自身,直到没有更多特殊字符为止。诀窍是一次只用一个特殊字符来分割单词,如果有更多特殊字符,则只需将其传递回 `ProcessSpecialWord` 方法再处理一次。
一旦单词的所有部分都被分割和替换,就将它们全部连接起来,并返回 `string`。
' if the word has a non alpha-numeric letter, it gets passed here
' to be split up and each section of the word is changed to a new word
Private Function ProcessSpecialWord(ByVal word As String) As String
' loop through each special character and see of the word contains it
For Each specialChar As Char In specialCharArray
If word.Contains(specialChar.ToString) Then
' split the word into an array with the found special character
Dim subword() As String = SplitWord(word, specialChar)
' process each new split word
For i As Integer = 0 To subword.Length - 1I
If subword(i).IndexOfAny(specialCharArray) >= 0 Then
' the new word still contains special characters.
' pass the word back in to this same method again to get it
' split up again and keep passing it in until it's entirely split up
' and all sections of the word have been changed
subword(i) = ProcessSpecialWord(subword(i))
Else
' the subword has no special characters, so just swap it
subword(i) = SwapWord(subword(i))
End If
Next
' all done swapping words, so join all subwords back together,
' exit the loop and return the new joined word back
word = String.Join(specialChar, subword)
' we exit the loop because we only want to process one special
' character at a time. when there's more than one special character
' in the word, it gets thrown back into the function and split up again
' using that special character
Exit For
End If
Next specialChar
Return word
End Function
兴趣点
看到你能想出什么样的单词组合很有趣,并且对某人可能有用。我知道还有其他基于脚本的乱码生成器,但我查看过的几个无法解析多个特殊字符并保留单词标点符号。
尽管如此,即使在实践中没有用处,它在学术上也有用,而且有些人可能会利用代码逻辑中的一些初级到中级概念。
想出递归的 `ProcessSpecialWord` 函数是一个小小的成就,并且构思起来很有趣。
历史
- 2009 年 5 月 7 日:首次发布
请随时留下您对评论、反馈、代码质量或性能调整的任何想法。