Depth, concurrency, delays, URL exclusion patterns, auth and per-audit overrides.
Crawl rules let you tune how aggressively we crawl, what we skip, and how we authenticate. Settings live in two places: Settings → Crawl (workspace defaults) and the per-audit Advanced panel (one-off overrides).
Maximum number of link hops from the seed URL. Default is 5.
| 1 | Just the seed URL. Useful for spot-checking a single page. |
| 2-3 | Quick overview of a small site. Covers most marketing sites. |
| 4-5 | Full coverage of typical sites under 5k pages. |
| 6-10 | Deep coverage. Use when you have orphan sections accessed via deep navigation. |
| unlimited | Crawl until exhausted. Combined with a max-pages cap recommended for very large sites. |
Concurrency controls how many requests we have in flight at once. Delay adds a pause between requests per worker. Both directly affect crawl speed and origin load.
| crawl_max_concurrent | Default 3. Range 1-10. Multi-tenant shared hosts may need 1; high-traffic Cloudflare sites can handle 10. |
| crawl_delay_ms | Default 500. Range 0-10000. Higher delays for fragile origins. |
Comma-separated list of substrings or regex patterns to skip. Plain strings = substring match; wrap in slashes = regex.
text/wp-admin, /cart, /checkout, ?orderby=, /\/page\/[0-9]+\/?$/
Excluded URLs are still discovered (so we can detect orphan-link issues) but not fetched, rendered or graded.
On by default. We obey Disallow/Allow rules matching our user agent. Disable for diagnostic audits where you want to see everything regardless of robots.txt:
textcrawl_respect_robots = false
Off by default. When enabled, we follow links to other domains for one level. Useful for audit-style "what does my outbound link profile look like" reports.
Two methods supported, configured under Settings → Crawl → Authentication:
textcrawl_auth_username = admin crawl_auth_password = ...
Paste the full cookie string (everything after Cookie: in a captured request):
textcrawl_auth_cookie = session_id=abc123; auth_token=xyz...
Audits → New audit → Advanced lets you override depth, exclusion patterns and seed URLs for a single run. Workspace defaults aren't affected.
All crawl rules can be set when starting an audit via API:
bashcurl -X POST https://api.semoptimiser.com/v1/audits \ -H "Authorization: Bearer sk_live_..." \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "depth": 5, "max_concurrent": 5, "delay_ms": 200, "excludes": ["/wp-admin", "/cart"] }'
One platform. Five fewer subscriptions.
Join 1,200+ agencies and in-house teams using SEMOptimiser to replace Semrush, Ahrefs, GA4 add-ons and rank trackers – with one workflow that actually ships fixes.