Lookbehind in date regex (python)

python regex date

54 观看

2回复

319 作者的声誉

I am having some trouble figuring out the look-behind in Python. More specifically I have this piece of text which has dates in (mm/dd/yyyy) (mm-dd-yyyy) formats and just the years in (yyyy) formats :

Jan-01-2001
Jan 01 2001
2003 2007
The year was 2009 when x decided to work for Google

What is the best way of matching to just extract the lines which have the yyyy. I should be able to extract 2003 , 2007 and 2009 but not any other dates like the Jan-01-2001 and Jan 01 2001. I tried the lookbehind operator and the best I could come with was ((?<!(-| ))\d{4}). But this selects only 2003 and not 2007 and 2009. I also tried using groups to define a date pattern and use them in conjunction with lookbehind, but that did not work. What would be the right and efficient way of doing this in regular expressions (Python)

作者: N00bsie 的来源 发布者: 2017 年 12 月 27 日

回应 2


1

14596 作者的声誉

决定

Brief

This only works with the sample strings you've presented (and where the year is not preceded by 2 digits followed by a space or hyphen). Assuming that all dates use 2 digit numbers to define a day of the month, this will work for you (since lookbehinds in python (and the majority of regex engines) cannot be quantified).


Code

See regex in use here

\b(?<!\b\d{2}[ -])\d{4}\b

Results

Input

Jan-01-2001
Jan 01 2001
2003 2007
The year was 2009 when x decided to work for Google

Output

2003
2007
2009

Explanation

  • \b Assert position as a word boundary
  • (?<!\b\d{2}[ -]) Negative lookbehind ensuring what precedes doesn't match the following
    • \b Assert position as a word boundary
    • \d{2} Match exactly 2 digits
    • [ -] Match either a space or hyphen - character
  • \d{4} Match exactly 4 digits
  • \b Assert position as a word boundary
作者: ctwheels 发布者: 2017 年 12 月 27 日

0

624 作者的声誉

I hope this may help you:

import re
string = """Jan-01-2001
Jan 01 2001
2003 2007 
The year was 2009 when x decided to work for Google"""
for year in string.split('\n'):
    search_date = re.search(r'^(?!\w{3}(?:\s+|-)\d{2}(?:\s+|-)\d{4}).+',year)
    if search_date:
      print(re.findall(r'\d{4}',search_date.group()))
作者: Pradam 发布者: 2017 年 12 月 27 日
32x32