Web crawlers are used to index the internet to help people search the web more efficiently.
Basic knowledge of JavaScript and HTML
Basic knowledge of Treasure Data
Basic knowledge of Treasure Data JavaScript SDK
Treasure Data recommends you verify the implementation of any new features or functionality at your site using the Treasure Data JavaScript SDK version 3 before you start using it in production. It manages cookies differently. Be aware when referring to most of these articles that you need to define the suggested event collectors and Treasure Data JavaScript SDK version 3 calls in your solutions.For example, change //cdn.treasuredata.com/sdk/2.5/td.min.js to //cdn.treasuredata.com/sdk/3.0.0/td.min.js.
Because Treasure Data JavaScript SDK tracks all page views, raw data usually contains a lot of accesses from web crawlers. You can use td_browser parameter to recognize if the access is coming from the browser or not.
td_browser is recognized by user agents, and it works on our SDK Backend server. td_browser shows the following value for each Google Crawler.
| Crawler | user-agents | HTTP(S) requests user-agent | td_browser |
|---|---|---|---|
| Googlebot (Google Web search) | Googlebot | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.md) | “Googlebot” |
| Googlebot (Google Web search) | Googlebot | (rarely used): Googlebot/2.1 (+http://www.google.com/bot.md) | “Googlebot” |
| Googlebot News | Googlebot-News (Googlebot) | Googlebot-News | “Other” |
| Googlebot Images | Googlebot-Image (Googlebot) | Googlebot-Image/1.0 | “Other” |
| Googlebot Video | Googlebot-Video (Googlebot) | Googlebot-Video/1.0 | “Other” |
| Google Mobile (feature phone) | Googlebot-Mobile | SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.md) | “UP.Browser” |
| Google Mobile (feature phone) | Googlebot-Mobile | DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.md) | “Other” |
| Google Smartphone | Googlebot | Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12F70 Safari/600.1.4 (compatible; Googlebot/2.1; +http://www.google.com/bot.md) | “Googlebot” |
| Google Mobile AdSense | Mediapartners-Google | [various mobile device types] (compatible; Mediapartners-Google/2.1; +http://www.google.com/bot.md) | “Other” |
| Google Mobile AdSense | Mediapartners (Googlebot) | [various mobile device types] (compatible; Mediapartners-Google/2.1; +http://www.google.com/bot.md) | “Other” |
| Google AdSense | Mediapartners-Google | Mediapartners-Google | “Other” |
| Google AdSense | Mediapartners (Googlebot) | Mediapartners-Google | “Other” |
| Google AdsBot landing page quality check | AdsBot-Google | AdsBot-Google (+http://www.google.com/adsbot.md) | “Other” |