Yet another regex - how to identify a querystring

176 views Asked by At

I am using urlrewriting.net for my urlrewriting. I need some help on the regex (which I still don't get....).

I would like to match

  • www.mysite.com/restaurant -> match and return "restaurant"
  • www.mysite.com/restaurant?page=1 -> match and return "restaurant"
  • www.mysite.com/restaurant?[SOME_RANDOM_QUERYSTRING] -> match and return "restaurant"
  • www.mysite.com/seattle/restaurant -> match and return "seattle" and "restaurant"
  • www.mysite.com/seattle/restaurant?page=1 -> match and return "seattle" and "restaurant"
  • www.mysite.com/seattle/restaurant?[SOME_RANDOM_QUERYSTRING] -> match and return "seattle" and "restaurant"
  • www.mysite.com/seattle/restaurant-michelangelo -> don't catch
  • www.mysite.com/seattle/restaurant/sushi -> match and return "seattle" and "restaurant" and "sushi"
  • www.mysite.com/seattle/restaurant/sushi?page=1 -> match and return "seattle" and "restaurant" and "sushi"
  • www.mysite.com/seattle/restaurant/sushi?[SOME_RANDOM_QUERYSTRING] -> match and return "seattle" and "restaurant" and "sushi"
  • www.mysite.com/seattle/restaurant-michelangelo -> don't catch

The point being I need the directory-parts of the url and not the querystring-parts. The thing is that I can see from my web analytics tool, that people search by two words. They both search for the city (seattle) + category (restaurant) eg. "seattle restaurant" and also for the city (seattle) + the name of the restaurant (restaurant-michelangelo) eg. "seattle restaurant-michelangelo". From a structural point of view, this is of course a mess, since this is not a hierarchy. In the ideal world the hierarchy would be city -> category -> restaurant. but I would still like to accommodate this search behavior in my url-structure. At the same time I also have a page listing all restaurants in the country.

I would like help on how to create the regexes as well as the most efficient way of creating them since I guess they could become quite expensive.

thanks

Thomas

1

There are 1 answers

1
James L. On

use this:

/\/[A-Za-z0-9]{1,}(?:\/|$|\?)/

Matches / Alphanumerics 1-infininity then slash, End of Line, or Question mark