Document Date Descriptor
Document Date Descriptor (DDD)
There is a need for a concise (but also precise) way of expressing approximate dates and date ranges, especially one suitable for working with medieval documents. My main application area is entering place-name spellings in a database. I am not aware of any standard defined by historians, except the s.xii2 notation, and this is not expressive enough for the demands of place-name research. No information is lost with these DDD encodings, and the outputs will be consistently formatted (and sorted by date, if so desired) whatever the editor wrote. A crucial requirement is the ability for the notation to be able to be parsed by software.
Quick start: some real-world examples
Here the left-hand column contains real examples taken from editions of medieval documents, and the right-hand column contains the same information in DDD notation. Many more examples are at the bottom of this webpage.
early thirteenth century to circa 1240 | e13C-c1240 |
n.d. [?1st ½ 13c.] | nd?1h13C |
possibly 1240 to circa 1255 | p1240-c1255 |
mid 13th century | m13C |
second decade of the 15th century | 1410s |
late 13th century to about 1320 | l13C-?1320 |
first quarter of the 16th century | 1q16C |
1 Henry IV | 1H4 |
The basic elements and their meaning
- ?, p, c, nd: perhaps, probably, circa, no date. No precise meaning is given to the first two, but “probably” is intended to denote greater certainty than “perhaps”.
- <, >: before, after
- e, m, l: early, middle, late (applies to decades and centuries)
- em, ml: early to middle, middle to late
- s following digits: decade
- C following digits: century
- h, t, q: half, third, quarter (applies to centuries; must be preceded by a digit)
- -: to
- 1 Ed 2: typical regnal year specification
General design criteria for the notation
- Quick to type, with no redundant keystrokes required.
- No loss of information, and no extra assumptions imposed.
- Unambiguous, but still with some looseness allowed in the inputs (so one does not have to remember too many rules; see 1W1 examples below).
- Compact, to save space in e.g. long lists of field-names.
- Human-readable (or at least human-understandable with minimal need to refer to reference tables).
- Extractable from free-format text by simple pattern-matching rules.
- Suitable simultaneously for e.g. making quick notes in the record office, for permanent records in databases, and for final use in printed and web publications.
- Machine-readable, in a strict sense, so that invalid specifications will be detected and rejected, and also so that code can be written which “understands” the date specifications and can process them meaningfully.
- Able to express all date information as commonly used in editions of documents (thus not requiring any re-interpretation of what an editor has already decided).
- Automatically expandable to a more readable form (see verbose output below).
- Regnal years handled automatically.
- Covers exact dates as well as approximate ones, and ranges with possibly both ends uncertain.
- Automatically sortable into date order.
- Uncertain dates are sorted by the latest likely date, so that nothing is listed too early.
- Case-insensitive (except for monarchal names in regnal years); though lower case is preferred for the prefixes e for ‘early’, m for ‘mid’, and upper case for C for ‘century’, and this is provided by the normalized output
Typeset example
The screenshot shows the system in action specifying some Dodnash charters for Cattawade in Suffolk, the output of the python code below having been fed through the LaTeX typesetting system.
Software
The following two files are a complete implementation in python, a free language available for all operating systems; the code works in the current python 3.5 versions, and also in the legacy python 2.7 series. The codes are offered without any guarantees: date_descriptor_11.py and regnal_year_03.py. I would appreciate bug reports or other feedback. An extension to regnal years allowing the syntax t.Ed3 for “tempore” will be added soon. It is identical to the standard regnal year syntax, but has t or t. in place of a numeral. The verbose output will expand to the limits of the reign in question, and the sort value will be at the end of the reign.
The DDD grammar: informal description
An instance of DDD (Document Date Descriptor) is parsed by first checking whether it is a regnal year by a collection of special rules. If it is not, it is parsed by the following grammar. A year value for sorting is computed (a value of -1 indicates a syntax error in the input), and both normalized and verbose outputs are also created. The syntax is best understood by looking the examples below. Note that the traditional a. for ‘ante’ is completely avoided (partly because of the danger of confusion with ‘after’) in favour of <, and the symmetrical > is provided for ‘after’. The regular expression syntax is standard (see e.g. here); for example \d means a single digit, ? means ‘optional’, [] means a single one of the enclosed characters, and $ means the end of the expression. The current version does not check for nonsensical constructions like 1234-1150, but checks could easily be added.
nodate ='(nd)|(n\.?d\.)|(no?date)'
prenote ='(nodate|\[(.*?)\])?' # arbitrary text in [...]
postnote ='(\[(.*?)\])?' # arbitrary text in [...]
circa ='((c\.?)|(circa))' # "c" or "c." or "circa" for 'circa'
uncertain ='\?|p' # perhaps, probably
ba ='([<>])' # before or after
half ='[12]h'
third ='[123]t'
quarter ='[1234]q'
eml ='(em)|(ml)|[eml]' # e=early, em=early to middle, l=late etc.
prefix ='third|half|quarter|eml'
century ='(\d\d?)[Cc]' # e.g. 12C
decade ='(\d{2,3}0)s' # e.g. 1260s
year ='(\d{1,4})'
simplerange='((1\d\d\d[-]\d)|(1\d\d\d[-]\d\d)$)' # e.g. "1243-7"
oldstyle ='((\d{3}[012345678]/\d)|(\d{3}9/\d{2}))' # e.g. "1355/6"
first ='{uncertain}?{ba}?{circa}?{prefix}?({simplerange}|{oldstyle}|{decade}|{year}|{century})'
second ='{uncertain}?{ba}?{circa}?{prefix}?({oldstyle}|{decade}|{year}|{century})'
dd_grammar =prenote+first+'((([x-–])|([-]{2}))'+second+')?'+postnote+'$' # the full grammar!
The full DDD grammar as a python regular expresssion
(?P<prenote>((nd)|(n\.\s?d\.)|(no\s?date))|(\[(.*?)\]))?(?P<uncertain0>\?|p)?(?P<ba0>[<>])?(?P<circa0>(c\.?)|(circa))?(?P<prefix0>([12]h)|([123]t)|([1234]q)|((em)|(ml)|[eml]))?((?P<simplerange>(1\d\d\d[-]\d)|(1\d\d\d[-]\d\d)$)|(?P<oldstyle0>(\d{3}[012345678]/\d)|(\d{3}9/\d{2}))|(?P<decade0>\d{2,3}0)s|(?P<year0>\d{1,4})|(?P<century0>\d\d?)[Cc])((?P<rangesep>[x-]|[-]{2}|–)(?P<uncertain1>\?|p)?(?P<ba1>[<>])?(?P<circa1>(c\.?)|(circa))?(?P<prefix1>([12]h)|([123]t)|([1234]q)|((em)|(ml)|[eml]))?((?P<oldstyle1>(\d{3}[012345678]/\d)|(\d{3}9/\d{2}))|(?P<decade1>\d{2,3}0)s|(?P<year1>\d{1,4})|(?P<century1>\d\d?)[Cc]))?(\[(?P<postnote>.*?)\])?$
The test cases
Below is the ouput from running the python code on my standard set of test cases DDD_test_cases.txt. Here the first column is input, and the next three columns are output from the python code. The normalized output is intended to appear in publications derived from the input; this then ensures consistency of layout and formatting. Note that an ordinary hyphen (-) for a range is converted to an en-dash (–) in the normalized output.
input sort normalized output verbose output
# simple cases
1234 1234 1234 1234
1234-5 1235 1234–5 1234-5
1101/2 1102 1101/2 1101/2
1109/10 1110 1109/10 1109/10
# uncertain
c1230 1230 c.1230 circa 1230
c.1230 1230 c.1230 circa 1230
c950 950 c.950 circa 950
?950 950 ?950 perhaps 950
?1289 1290 ?1289 perhaps 1289
p1230s 1240 p1230s probably 1230s
ndcm13C 1266 n.d.c.m13C no date, circa middle of the 13th century
pl12C 1200 pl12C probably late 12th century
# before or after
<1255 1255 <1255 before 1255
>1255 1255 >1255 after 1255
<c1255 1255 <c.1255 before circa 1255
p>1300 1300 p>1300 probably after 1300
l12C 1200 l12C late 12th century
# decades
1260s 1270 1260s 1260s
e1220s 1213 e1220s early 1220s
m1220s 1217 m1220s middle of the 1220s
l1220s 1220 l1220s late 1220s
# centuries and parts of centuries
12C 1200 12C 12th century
13C 1300 13C 13th century
m12C 1166 m12C middle of the 12th century
l12C 1200 l12C late 12th century
e13C 1233 e13C early 13th century
em13C 1275 em13C early to middle 13th century
?em13C 1276 ?em13C perhaps early to middle 13th century
em13C 1275 em13C early to middle 13th century
ml15C 1500 ml15C middle to late 15th century
cM13C 1266 c.m13C circa middle of the 13th century
cm13C 1266 c.m13C circa middle of the 13th century
?M12C 1166 ?m12C perhaps middle of the 12th century
2h14C 1400 2h14C second half of the 14th century
1h8C 750 1h8C first half of the 8th century
1t15C 1433 1t15C first third of the 15th century
2q15C 1450 2q15C second quarter of the 15th century
3q15C 1475 3q15C third quarter of the 15th century
4q15C 1500 4q15C fourth quarter of the 15th century
cm9C 866 c.m9C circa middle of the 9th century
ce16C 1533 c.e16C circa early 16th century
# ranges
975-1016 1016 975–1016 975 to 1016
975x1016 1016 975x1016 975 to 1016
1297-1298 1298 1297–1298 1297 to 1298
1297-98 1298 1297–98 1297-98
1297-8 1298 1297–8 1297-8
1234-5 1235 1234–5 1234-5
1234-50 1250 1234–50 1234-50
1250-c1255 1255 1250–c.1255 1250 to circa 1255
1250-<1255 1255 1250–<1255 1250 to before 1255
1250-<c.1255 1255 1250–<c.1255 1250 to before circa 1255
c1200-c1300 1300 c.1200–c.1300 circa 1200 to circa 1300
c1200xc1300 1300 c.1200xc.1300 circa 1200 to circa 1300
c1200-<c1300 1300 c.1200–<c.1300 circa 1200 to before circa 1300
l12c-e13c 1233 l12C–e13C late 12th century to the early 13th century
1q15C-l16C 1525 1q15C–l16C first quarter of the 15th century to the late 16th century
l15C-1q16C 1525 l15C–1q16C late 15th century to the first quarter of the 16th century
9C-e10C 933 9C–e10C 9th century to the early 10th century
1234-1267 1267 1234–1267 1234 to 1267
1234x1267 1267 1234x1267 1234 to 1267
>1243-<1255 1255 >1243–<1255 after 1243 to before 1255
<1255-c1260 1260 <1255–c.1260 before 1255 to circa 1260
p>1255-c1260 1260 p>1255–c.1260 probably after 1255 to circa 1260
# prenotes and postnotes
[perhaps]1205-l13C 1300 [perhaps]1205–l13C perhaps 1205 to the late 13th century
[n.d., perhaps]c1345 1345 [n.d., perhaps]c.1345 n.d., perhaps circa 1345
[no date, possibly]1345-1355 1355 [no date, possibly]1345–1355 no date, possibly 1345 to 1355
1205[, or later] 1205 1205[, or later] 1205, or later
# insensitivity to case
cl12C 1200 c.l12C circa late 12th century
c.l12C 1200 c.l12C circa late 12th century
CL12C 1200 c.l12C circa late 12th century
E12c 1133 e12C early 12th century
E1260S 1253 e1260s early 1260s
# no date
nd?1250 1250 n.d.?1250 no date, perhaps 1250
n.d.?1250 1250 n.d.?1250 no date, perhaps 1250
no date?1250 1250 n.d.?1250 no date, perhaps 1250
# regnal years
3E6 1549 3 Edward VI 3 Edward VI (1548/9)
3Edw6 1549 3 Edward VI 3 Edward VI (1548/9)
1 Will 1 1067 1 William I 1 William I (1067)
1 Will I 1067 1 William I 1 William I (1067)
1 William I 1067 1 William I 1 William I (1067)
1W I 1067 1 William I 1 William I (1067)
1 W1 1067 1 William I 1 William I (1067)
1W 1 1067 1 William I 1 William I (1067)
1W1 1067 1 William I 1 William I (1067)
1W2 1088 1 William II 1 William II (1087/8)
1 Henry 2 1155 1 Henry II 1 Henry II (1154/5)
3 Henry IV 1402 3 Henry IV 3 Henry IV (1401/2)
3H4 1402 3 Henry IV 3 Henry IV (1401/2)
3Hen4 1402 3 Henry IV 3 Henry IV (1401/2)
3 William II 1090 3 William II 3 William II (1089/90)
3W2 1090 3 William II 3 William II (1089/90)
26E3 1353 26 Edward III 26 Edward III (1352/3)
26Ed3 1353 26 Edward III 26 Edward III (1352/3)
26Edw3 1353 26 Edward III 26 Edward III (1352/3)
26 Edward 3 1353 26 Edward III 26 Edward III (1352/3)
26 Edward III 1353 26 Edward III 26 Edward III (1352/3)
1R1 1190 1 Richard I 1 Richard I (1189/90)
3R2 1380 3 Richard II 3 Richard II (1379/80)
2R3 1485 2 Richard III 2 Richard III (1484/5)
tHen3 1272 t. Henry III t. Henry III
1Ma 1554 1 Mary I 1 Mary I (1553/4)
2My 1555 2 Mary I 2 Mary I (1554/5)
3Mary 1556 3 Mary I 3 Mary I (1555/6)
4Mary 1557 4 Mary I 4 Mary I (1556/7)
1234-50 1250 1234–50 1234-50
1234x50 1250 1234x50 1234 to 50
This website uses no cookies. This page was last modified 2024-01-21 10:57
by .