
by Rambhutan » Sun Dec 05, 2010 10:54 am

by The Rich Port » Wed Dec 08, 2010 8:25 am
Rambhutan wrote:I would like to create a custom dictionary for Word from novels of a particular time - in this case the Sherlock Holmes stories and Kim by Rudyard Kipling. The idea is to create something that could then be spell checked using the custom dictionary which would highlight any anachronistic words. How would I go about doing this using a text file of the novels from project gutenberg?

by Rambhutan » Wed Dec 08, 2010 10:42 am
The Rich Port wrote:Rambhutan wrote:I would like to create a custom dictionary for Word from novels of a particular time - in this case the Sherlock Holmes stories and Kim by Rudyard Kipling. The idea is to create something that could then be spell checked using the custom dictionary which would highlight any anachronistic words. How would I go about doing this using a text file of the novels from project gutenberg?
... Why would you want to do that?

by The Rich Port » Wed Dec 08, 2010 11:04 am

by The Tofu Islands » Thu Dec 09, 2010 5:42 am
ruby -ne 'print if /^[A-Z -]+\r?$/' dict.txt | uniq > words.txt
by Rambhutan » Thu Dec 09, 2010 10:29 am
The Tofu Islands wrote:I'm not sure of how Word handles dictionary files, but as far as collecting words goes, I'd suggest hunting around online for a now-public-domain dictionary of the relevant period. Searching Project Gutenberg for dictionaries yields quite a lot of hits, including this one from 1913. It seems to include particles, pronouns, conjugated forms of verbs, and such, so it should work on its own (it is liable to miss plural forms, however).
As for processing it and generating the dictionary file,
- Code: Select all
ruby -ne 'print if /^[A-Z -]+\r?$/' dict.txt | uniq > words.txt
or something equivalent will generate a file containing every word in the dictionary (plus a few lines at the end that are copyright information). I'm not sure how to turn such a file into a Word dictionary, but it's a start. (If you can find a nice spot to upload it, I could throw you the 300KiB zip that it generates.)
Constructing one based on novel texts would be trickier — it would require writing something that scanned for words, and probably wouldn't be as complete.
Users browsing this forum: A Place Somewhere, New Texas Republic, Qwuazaria
Advertisement