diff options
author | Einhard Leichtfuß | 2018-12-27 18:55:44 +0100 |
---|---|---|
committer | Einhard Leichtfuß | 2018-12-27 19:06:40 +0100 |
commit | dd137b37a572f28510be3cd7a74ec538ef692689 (patch) | |
tree | 1b30d22a8f47fe36c257aa456a72bffa23c84789 /fixes.sed | |
parent | 8fe66d46bef882a390ba637a2be9a2172cd3f423 (diff) | |
download | aur-dd137b37a572f28510be3cd7a74ec538ef692689.tar.gz |
Add a few fixes to the dictionary source
A large part is derived from a diff between 0.48 and Debian's 0.48.5.
Also,
- split the sed script into one to be executed initially and another one
after webfilter.
- Use Debian's 0.48.5 instead of 0.48.4 (does not change much).
- Correctly set the version of the dictionary (as written by `dict -D').
- Do not patch the Makefile but execute the commands directly.
- Simplify prepare() and build().
- Add a check function.
Diffstat (limited to 'fixes.sed')
-rwxr-xr-x[-rw-r--r--] | fixes.sed | 184 |
1 files changed, 171 insertions, 13 deletions
diff --git a/fixes.sed b/fixes.sed index b135ecaaffb3..ff8cf356dc40 100644..100755 --- a/fixes.sed +++ b/fixes.sed @@ -1,20 +1,178 @@ -s`(<altname>|<contr>)<cref>([^<]*)</cref>`\1\2`g -s`(<stype>|<prod>)<ecol>([^<]*)</ecol>`\1\2`g +#!/usr/bin/env -S sed -Ef -s`<col>([^<]*),? <cd>([^<]*)</col>`<col>\1</col>, <cd>\2`g +# A large part of the changes is derived from a diff between 0.48 and +# Debian's 0.48.5, excluding changes included in the new gcide release and +# those that do not change the final output. +# +# Sources: +# http://archive.debian.org/debian-archive/debian/pool/main/d/dict-gcide/dict-gcide_0.48.orig.tar.gz +# http://deb.debian.org/debian/pool/main/d/dict-gcide/dict-gcide_0.48.5.tar.xz +# TODO: +# * '[<source></source>]' +# * '</item><item>' (dict -d gcide legislation) + + +## GENERAL + +# Remove lines pretending to be in a particular font. +\`^<p>\s*(<note>\s*)?<hand/\s*(<[^>]*type>\s*)?This\s*line\s*is\s*printed\s*in`, \`^\s*$` d + +# Remove book and publ tags in a qau element. +# <publ> seems to be removed by webfmt, so apparently not necessary to +# remove here. s`(<qau>[^<]*)(<book>|<publ>)([^<]*)(</book>|</publ>)`\1\3`g -s`<qau>([^<]*)(<break>)`<qau>\1</qau>\2` -s`^([^<]*)</qau>`\1` -s`<qau>([^<]*) (\([^)]{20}[^<]*)</qau>`<qau>\1</qau> \2` -s`<au>([^<]*)<break>`<au>\1</au><break>` -s`^([^<]*)</au>`\1` +## CIDE.A + +# Typo. +s`^(<p><q><qex>A priori</qex>, that is,) form (these necessities)`\1 from \2` + +# Add some semicolon. +\`^<mhw>\{ <hw>Ar"que\*bus</hw>, <hw>Ar"que\*buse</hw> \}</mhw>` { + s`(<def>A sort of hand gun or firearm) (a contrivance)`\1\; \2` +} + + +## CIDE.B + +# Add a closing paranthis. +s`\(Thirteenth Greatest of Centuries, 1913\.`&)` + + +## CIDE.C + +# Remove empty element. +s`<stype></stype>`` + + +## CIDE.D + +# Descartes did not live one and a half millennia. +\`^<hw>Descartes</hw>` { + s`(born) 159, (died)`\1 1596, \2` +} + +# Typo; doubled quote. +\`^<p><syn><b>Syn\.</b> -- To vary\; disagree\; dissent\; dispute\;` { + s`(<xex>)aiffer (with</xex>)`\1differ \2` + s`(<rdquo/){2}`\1` +} + +# Remove empty element. +\`^<p><ent>diploid</ent>`, \`<ent>` { + s`^(B: Oh, how I wish.*</q>)\s*<rj><qau></qau></rj>(</p>)`\1\2` +} + +# In 0.48, the 'between' was missing; I prefer Debian's way of solving it. +\`^<p><ent>Doublet</ent>`, \`<ent>` { + \`^<p><sn>4\.</sn>` { + s`(with a) (color between them)`\1 layer of \2` + } +} + + +## CIDE.E + +# Restrict qau element to the author themself. +s`^(<qau>Mark Feeney)(<br/)`\1</qau>\2` +\`^Copyright 1999 Globe Newspaper Company\.` { + s`</qau>`` +} + + +## CIDE.F + +s`measurments`measurements` + +# Fix misattribution. +s`(Dostoevsky's) (War and Peace)`\1 Crime and Punishment\; or Tolstoy's \2` + +s`compIy`comply` + + +## CIDE.I + +# Remove qau tags. +\`^<rj><qau>Dr\. Rod Beavon<br/`, +2 { + s`^(<rj>)<qau>(Dr\. Rod Beavon<br/)`\1\2</rj>` + s`^\((17 Dean's Yard London SW1P 3PB)\;(<br/)`<rj>\1</rj>\2` + s`(e-mail: rod\.beavon@westminster\.org\.uk)</qau>(</rj><br/)`<rj>\1\2` +} + + +## M + +# Remove extraneous ', in'. +s`^(<qau>Andrew Hood), in`\1` + + +## P + +# Restrict col element's content. +\`^<p><cs><col><b>Park of artillery</b></col>` { + s`(<col><b>industrial park</b>) `\1</col>` + s`</col>(</cs><br/)$`\1` +} + + +## R + +# Typo. +\`^<hw>Re\*cu"sant</hw>` { + s`\bchurc\b`church` +} + +# Remove text centering around a referenced image. +# Debian removed the preceding paragraph as well. I do not agree. +\`<a href="\\cide\\more\\lilac-breasted-roller\.jpg">`, \`zambezi\.co\.uk` d + + +## S + +# Fix badly formatted closing tag (<i>(.*)</> -> <i>\1</i>). +s`<([^><]*)>([^<]*)</>`<\1>\2</\1>` + +# If one wanted to fix more than necessary (Debian does): +#\`^<hw>Ses\*quip"li\*cate</hw>` { +# s`^`<p>` +# s`<(/?)i>`<\1xex>`g +# +# s`(<xex>)(a|b)(</xex>)<prime/`\1\2\\'\''b7\3`g +# s`<prime/`\\'\''b7` +#} + +s`\<(something)l\>`\1` + +s`rappng`rapping` + + +## T + +# Restrict qau to the author themself. +\`^<rj><qau>Andrew Forbes/CPA`, +1 { + s`^<rj><qau>Andrew Forbes/CPA`&</qau>` + s`^(\(from.*)</qau>(</rj><br/)$`\1\2` +} + + +## U + +# Restrict au element to the author themself. +\`^<au>Kari Jensen \(University of Wisconsin`, +1 { + s`^<au>Kari Jensen`&</au>` + s`^(\[available at.*)</au>(<br/)$`\1\2` +} + + +## V + +# Avoid double empty line in dict's output. +\`^<p><cs><col><b>Principle of virtual velocities</b>` { + s`-(- <col><b>Virtual image</b></col>)`\1` +} -s`,? <[^>]*></[^>]*> ?``g -s`</>``g -s`(<qau>Andrew Hood), in`\1` -s`\(Thirteenth Greatest of Centuries, 1913.`&)` -/<a href="\\cide\\more\\lilac-breasted-roller.jpg">/,/zambezi.co.uk/d +## W +s`\<(Where\*?)form\>`\1from` |