NATION

PASSWORD

Creating a custom dictionary

A coffee shop for those who like to discuss art, music, books, movies, TV, each other's own works, and existential angst.
User avatar
Rambhutan
Negotiator
 
Posts: 5227
Founded: Jul 28, 2004
Ex-Nation

Creating a custom dictionary

Postby Rambhutan » Sun Dec 05, 2010 10:54 am

I would like to create a custom dictionary for Word from novels of a particular time - in this case the Sherlock Holmes stories and Kim by Rudyard Kipling. The idea is to create something that could then be spell checked using the custom dictionary which would highlight any anachronistic words. How would I go about doing this using a text file of the novels from project gutenberg?
Are we there yet?

Overherelandistan wrote: I chalange you to find a better one that isnt even worse

User avatar
The Rich Port
Post Czar
 
Posts: 38094
Founded: Jul 29, 2008
Ex-Nation

Postby The Rich Port » Wed Dec 08, 2010 8:25 am

Rambhutan wrote:I would like to create a custom dictionary for Word from novels of a particular time - in this case the Sherlock Holmes stories and Kim by Rudyard Kipling. The idea is to create something that could then be spell checked using the custom dictionary which would highlight any anachronistic words. How would I go about doing this using a text file of the novels from project gutenberg?


... Why would you want to do that?

User avatar
Rambhutan
Negotiator
 
Posts: 5227
Founded: Jul 28, 2004
Ex-Nation

Postby Rambhutan » Wed Dec 08, 2010 10:42 am

The Rich Port wrote:
Rambhutan wrote:I would like to create a custom dictionary for Word from novels of a particular time - in this case the Sherlock Holmes stories and Kim by Rudyard Kipling. The idea is to create something that could then be spell checked using the custom dictionary which would highlight any anachronistic words. How would I go about doing this using a text file of the novels from project gutenberg?


... Why would you want to do that?


Because I am writing a novel set in that era and I want to make sure the language I use does not contain any words that would not have been in use at the time.
Are we there yet?

Overherelandistan wrote: I chalange you to find a better one that isnt even worse

User avatar
The Rich Port
Post Czar
 
Posts: 38094
Founded: Jul 29, 2008
Ex-Nation

Postby The Rich Port » Wed Dec 08, 2010 11:04 am

Rambhutan wrote:
The Rich Port wrote:
... Why would you want to do that?


Because I am writing a novel set in that era and I want to make sure the language I use does not contain any words that would not have been in use at the time.


Oh. I thought it was for some weird obsession. :lol:

Honestly, it's not my area of expertise, so I don't think it's of any merit when I ask if it really matters. Maybe write it so that you INTENTIONALLY avoided using period vocabulary.

User avatar
The Tofu Islands
Minister
 
Posts: 2872
Founded: Mar 24, 2009
Ex-Nation

Postby The Tofu Islands » Thu Dec 09, 2010 5:42 am

I'm not sure of how Word handles dictionary files, but as far as collecting words goes, I'd suggest hunting around online for a now-public-domain dictionary of the relevant period. Searching Project Gutenberg for dictionaries yields quite a lot of hits, including this one from 1913. It seems to include particles, pronouns, conjugated forms of verbs, and such, so it should work on its own (it is liable to miss plural forms, however).

As for processing it and generating the dictionary file,
Code: Select all
ruby -ne 'print if /^[A-Z -]+\r?$/' dict.txt | uniq > words.txt

or something equivalent will generate a file containing every word in the dictionary (plus a few lines at the end that are copyright information). I'm not sure how to turn such a file into a Word dictionary, but it's a start. (If you can find a nice spot to upload it, I could throw you the 300KiB zip that it generates.)

Constructing one based on novel texts would be trickier — it would require writing something that scanned for words, and probably wouldn't be as complete.
In its majestic equality, the law forbids rich and poor alike to sleep under bridges, beg in the streets and steal loaves of bread.

User avatar
Rambhutan
Negotiator
 
Posts: 5227
Founded: Jul 28, 2004
Ex-Nation

Postby Rambhutan » Thu Dec 09, 2010 10:29 am

The Tofu Islands wrote:
I'm not sure of how Word handles dictionary files, but as far as collecting words goes, I'd suggest hunting around online for a now-public-domain dictionary of the relevant period. Searching Project Gutenberg for dictionaries yields quite a lot of hits, including this one from 1913. It seems to include particles, pronouns, conjugated forms of verbs, and such, so it should work on its own (it is liable to miss plural forms, however).

As for processing it and generating the dictionary file,
Code: Select all
ruby -ne 'print if /^[A-Z -]+\r?$/' dict.txt | uniq > words.txt

or something equivalent will generate a file containing every word in the dictionary (plus a few lines at the end that are copyright information). I'm not sure how to turn such a file into a Word dictionary, but it's a start. (If you can find a nice spot to upload it, I could throw you the 300KiB zip that it generates.)

Constructing one based on novel texts would be trickier — it would require writing something that scanned for words, and probably wouldn't be as complete.


Thanks, it was easier when Word dictionaries were just a text file.
Are we there yet?

Overherelandistan wrote: I chalange you to find a better one that isnt even worse


Return to Arts & Fiction

Who is online

Users browsing this forum: A Place Somewhere, New Texas Republic, Qwuazaria

Advertisement

Remove ads