golang爬虫
- GoQuery
- colly
- soup
- Pholcus
golang-colly-爬虫
FAQ
Q: 如何设置只遍历当前网站?
colly.AllowedDomains("域名"),
Q: 如何下载遍历的HTML页面?
没找到下载HTML的方法
golang代码使用
go get -u github.com/gocolly/colly/...
package main
import (
"fmt"
"github.com/gocolly/colly"
)
func main() {
c := colly.NewCollector()
// Find and visit all links
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
e.Request.Visit(e.Attr("href"))
})
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting", r.URL)
})
c.Visit("http://go-colly.org/")
}