SUTime with regular date reading

963 views Asked by At

i'm using SUutime / stanford nlp, and it's doing a great job, but i can't figure out how to read regular dates formats.

for instance:

'we went at 27/10/1988 to the event'

it returns null

for expression like: 'we went at october 27th 1988 to the event', it works just fine

any ideas?

cheers

3

There are 3 answers

0
pelumi On

I'll put this here incase someone finds it useful.

The problem is that some time formats are not supported.

Taking a look at the sutime/english.sutime.txt file, you'll see a line like those seen below. The TODO there shows other formats can still be added. I added 2 others to mine as seen below:

  # TODO: Support other timezone formats
  { ruleType: "time", pattern: /yyyy-?MM-?dd-?'T'HH(:?mm(:?ss([.,]S{1,3})?)?)?(Z)?/ }
  { ruleType: "time", pattern: /yyyy-MM-dd/ }
  { ruleType: "time", pattern: /'T'HH(:?mm(:?ss([.,](S{1,3}))?)?)?(Z)?/ }
  #The entries below are newly added to support other time formats.
  { ruleType: "time", pattern: /dd\/MM\/yyyy/ }
  { ruleType: "time", pattern: /dd-MM-yyyy/ }

The newly added entries enable SUTime to correctly identify time formats of the form:

20-12-2014 or 28/12/2014

which is identical to the OPs required form.

0
Daniel On

I am not experiences with Stanford temporal package, but it is probably not tuned for that temporal format.

Something that I suggest you take a look is this: http://cogcomp.cs.illinois.edu/page/software_view/IllinoisTemporalExtractor

Which essentially works based on HeidelTime: https://code.google.com/p/heideltime/

0
gCoh On

ok everyone, i think i got it.

in the sutime/english.sutime.txt line 319, there are few patterns for US tagging:

{ ruleType: "time", pattern: /yyyy-?MM-?dd-?'T'HH(:?mm(:?ss([.,]S{1,3})?)?)?(Z)?/ }
{ ruleType: "time", pattern: /yyyy-MM-dd/ }
{ ruleType: "time", pattern: /'T'HH(:?mm(:?ss(.,)?)?)?(Z)?/ }
# Tokenizer "sometimes adds extra slash
{ ruleType: "time", pattern: /yyyy\?/MM\?/dd/ }
{ ruleType: "time", pattern: /MM?\?/dd?\?/(yyyy|yy)/ }
{ ruleType: "time", pattern: /MM?-dd?-(yyyy|yy)/ }
{ ruleType: "time", pattern: /HH?:mm(:ss)?/ }
{ ruleType: "time", pattern: /yyyy-MM/ }

just need to add few ruleTypes, to get it the needed order