Saturday, October 16, 2010

dictionary


The book I am reading frequently requires me to reference a dictionary. Today dict.org went down and that gave me all the excuse I needed to write some easy scripts. The result works on the basic web browser built in to my MP3 player.

The script downloads a pre-formatted dictionary, parses the definitions, and enters them into an SQL database. Another script gives a web-based interface to search the database.

Here is the shell script that downloads and imports the definitions.


#!/bin/sh
base=ftp://ftp.dict.org/dict/pre/
fn=dict-wn-2.0-pre.tar.gz
url=$base$fn
wget -c $url
tar xf $fn
gzip -dc <wn.dict.dz |\
awk -f load-wn20.awk |\
sqlite3 wn20.sqlite


Here is the AWK script that parses the definitions.


# Input: preformatted Wordnet 2.0 dictionary from dict.org
# Output: text file formatted for the sqlite3 command

function printDef() {
if (length(word) > 0) {
gsub(/'/, "''", word)
gsub(/'/, "''", data)
gsub(/\n\n*$/, "", data)
printf("insert into wn20 values ('%s', '%s');\n", word, data);
data = ""
}
}

BEGIN {
word = ""
data = ""
print "drop table if exists wn20;"
print "create table wn20 (word text, data text);"
print "begin;"
}

/^[0-9A-Za-z]/ {
printDef()
word = toupper($1)
}

{
data = sprintf("%s%s\n", data, $0)
}

END {
printDef()
print "end;"
print "create index words on wn20 (word);"
}


Here is the shell script for the web interface.


#!/bin/sh

lookup() {
db="$DOCUMENT_ROOT/$1.sqlite"
if [ "$3" = "wild" ]
then
query="select data from $1 where word like '$2%' limit 5"
else
query="select data from $1 where word = '$2'"
fi
retval=$(sqlite3 "$db" "$query" 2>&1)
echo "$retval"
}

word=$(echo "$QUERY_STRING" |\
sed -e 's/^q=//' -e 's/[^A-Za-z0-9].*//' |\
tr a-z A-Z)
isWild=$(echo "$QUERY_STRING" | grep '\*' >/dev/null 2>&1 && echo wild)
web=$(lookup web1913 $word $isWild)
wn=$(lookup wn20 $word $isWild)
if [ -z "$web" -a -z "$wn" ]
then
data="No definition found for word '$word'."
else
data="<h2>From Webster's 1914 Dictionary:</h2>
<pre id="web">
$web
</pre>
<h2>From WordNet 2.0 Dictionary:</h2>
<pre id="wn">
$wn
</pre>
"
fi

cat <<__TOP__
Content-type: text/html

<html>
<head>
<title>dict $word</title>
</head>
<body>
<form action="$SCRIPT_URL" name="dict">
Word: <input type="text" name="q" />
<input type="submit" />
</form>
__TOP__

if [ ! -z "$word" ]
then
cat <<__MID__
<h1>$word :</h1>
$data
__MID__
fi

cat <<__END__
</body>
</html>
__END__
exit 0


The scripts are in the linked dict.zip file.

0 comments: