Allowed to Crawl Sites is a list of urls (one-per-line) and domains that the crawler is allowed to crawl. Only pages that are on sub-sites of the urls listed here will be crawled.

This textarea is only used in determining by can be crawled if Restrict Sites By Url is checked.

A line like:
  http://www.somewhere.com/foo/
would allow the url
  http://www.somewhere.com/foo/goo.jpg
to be crawled.

A line like:
 domain:foo.com
would allow the url
  http://a.b.c.foo.com/blah/
to be crawled.
It is also possible to allow a site using a regular expression:
 regex:/foo\d+/
would allow any url containing the string "foo" followed by 1 or more digits.
X