I was just playing around over the weekend with a Content Source and Search Scope to index a few of my favourite SharePoint blogs and sites.
I set the start URLS to a few blog sites, but then when I added the following start URL in:
I got a bit of a shock. I had about 23,000 items indexed from that little beauty. I realised very quickly that the Content Source was set to “Only crawl within the server of each start address” and that server ishttp://technet.microsoft.com/
So, so try and limit the result set to just the SharePoint secion I created 2 Crawl Rules to:
When I run a test of the crawl rule for http://technet.microsoft.com/default.aspx I get told that the url will not be included in the index as it matches the exclusion rule and a test on the url http://technet.microsoft.com/en-us/sharepoint/default.aspx will be included as it matches the inclusion rule.
After performing a Full Index I found out that, as hoped, I just got pages below the http://technet.microsoft.com/en-us/sharepoint URL path.
Unfortunatelly, the pages I actually wanted were the technet library which doesn’t sit under this URL path, but that doesn’t change the fact that the above Crawl Rules work if you just want to index under a specific URL segment.