I want None values for the fields "parseutil.parser" fails to parse, instead of default values

35 views Asked by At

Im trying to use parseutil.parser to parse a date in a string.

When the date is complete there are no problems:

I was born October 11th 2002 -> date(2002, 10, 11)

But when it is an incomplete date, parseutil.parser will try to "autofill" it with todays date, like this:

I need documents from 2003 -> date(2003, 2, 23) (today's month and year)

What I want is something like this

I need documents from 2003 -> date(2003, None, None)

I tried to pass "default=date(None, None, None)" to the parser, but datetime fields cannot be None, so that solution is impossible. I could pass a default date, but that date will be valid 1 day of the year, so there is no way to tell later on if it was an incomplete parsing or a sucessful one.

"Why do you need to do this?"

Because the dates that are incomplete I need to turn them into intervals, so:

I need documents from 2003 -> date(2003, 1, 1), date(2003, 12, 31)

And the dates that are matched by the month and year are also turned into intervals but smaller:

I need documents from March 2003 -> date(2003, 3, 1), date(2003, 3, 31)

Right now I cannot really do this, because after I call parseutil.parser.parse() I do not know if it was a complete match or a partial match, nor do I know the fields that were matched.

Fixed. See my answer below.

2

There are 2 answers

0
hedfol On

There is a way to accomplish this by using parse() twice but with different default dates.

The trick is to compare the year, month, date of the resulting dates. If they all match - return any of the parsed dates, otherwise - determine and return the intervals.

from dateutil.parser import parse
from datetime import datetime
from calendar import monthrange

DEF_DATE = datetime(1, 1, 1)
DEF_DATE_ALT = datetime(2, 2, 2)

def resolve(date_str):
    dx = parse(default=DEF_DATE, timestr=date_str, fuzzy=True)
    dy = parse(default=DEF_DATE_ALT, timestr=date_str, fuzzy=True)
    is_y = dx.year == dy.year
    is_m = dx.month == dy.month
    is_d = dx.day == dy.day
    if (is_y and is_m and is_d):
        return dx

    start = datetime(dx.year if is_y else 1,
                    dx.month if is_m else 1,
                    dx.day if is_d else 1)
    
    end_y = dx.year if is_y else dx.max.year
    end_m = dx.month if is_m else dx.max.month
    end = datetime(end_y, end_m,
                dx.day if is_d else monthrange(end_y, end_m)[1])
    return (start, end)


print(repr(resolve('I was born October 11th 2002')))
print(repr(resolve('I need documents from 2003')))
print(repr(resolve('I need documents from March 2003')))

Output:

datetime.datetime(2002, 10, 11, 0, 0)
(datetime.datetime(2003, 1, 1, 0, 0), datetime.datetime(2003, 12, 31, 0, 0))
(datetime.datetime(2003, 3, 1, 0, 0), datetime.datetime(2003, 3, 31, 0, 0))

And there is no need to worry about

that date will be valid 1 day of the year

since all significant values of the default dates differ from each other.

Even if those dates are changed to

DEF_DATE = datetime(2024, 2, 23)
DEF_DATE_ALT = datetime(3, 3, 3)

it will still produce a correct output.

0
StrikerOmega On

Finally fixed it by overrriding the "parse()" method and adding code to return early before the non parsed fields are replaced with default values:

class CustomParser(parser.parser):
    """
    Custom parser to handle incomplete dates
    """

    def parse(self, timestr, default=None, ignoretz=False, tzinfos=None, **kwargs):
        """
        Parse a string in one of the supported formats, using the settings of the parser.

        Args:
            (See parser.parse)

        Returns:
            (datetime) The parsed date or (_result) if the date is incomplete.
        """
        if default is None:
            default = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)

        res, skipped_tokens = self._parse(timestr, **kwargs)

        if res is None:
            raise parser.ParserError("Unknown string format: %s", timestr)

        if len(res) == 0:
            raise parser.ParserError("String does not contain a date: %s", timestr)

        # -- Added code to handle incomplete dates
        if any((res.year is None, res.month is None, res.day is None)):
            return res
        # -- End of added code

        try:
            ret = self._build_naive(res, default)
        except ValueError as e:
            six.raise_from(parser.ParserError(str(e) + ": %s", timestr), e)

        if not ignoretz:
            ret = self._build_tzaware(ret, res, tzinfos)

        if kwargs.get('fuzzy_with_tokens', False):
            return ret, skipped_tokens
        else:
            return ret