A quick search of the Internet shows that there are not a lot of recent books written on Web bots, especially on designing them, so this book fills the gap nicely. It is well written, has lots of practical examples and does not ignore the larger ethical and legal issues, recommended. While the expression "Web bot" might bring to mind some kind of shady software to some readers, this book explains that they are, when used properly, just another, possibly profitable, way of using the Internet. Readers should be aware that it requires the cURL extension, that is not always part of default PHP installs. Those without it might want to explore the Zend Framework's HTTP class or PEAR's own.
The first section explains what Web bots are and what kind of things you might do with them. For a technical book, there is also a fair amount of talk of business applications for Web bots, which is good to see. Also, readers will notice that most of the development is done with Windows in mind, since, early on the author mentions putting older PCs to work running Web bots, this makes sense. This book explains what (are Web bots), ideas for projects, how to download a Web page using PHP and cURL, parsing, form submission and managing large amounts of data. A note here, except for the early examples, most of the examples use the cURL extension. Its installation is briefly covered. As well, a lot of the code depends on a set of libraries provided at the book's Web site that hides a lot of low-level work.
The second section is about projects. Some of the projects demonstrated are: a price monitoring Web bots, image capture Web bots, link verification Web bots, anonymous browsing, search ranking, aggregation Web bots, FTP bots, NNTP bots, and bots that read and send e-mail. As the author stated in the introduction, he has been writing Web bots for years, and here you get to see his range of experience.
Part three, "Advanced technical considerations", builds on the projects, covering spiders, procurement Web bots and snipers, Web bots and cryptography, authentication, advanced cookie management, scheduling Web bots and spiders.
The forth part, larger considerations, is the chapter that some readers might skip. However, since Web bots and spiders operate "in the wild", this is an important chapter.
|