|
|
用户名:liming2008 笔名:liming2008 地区: 北京-北京 行业:其他 |
| 日 | 一 | 二 | 三 | 四 | 五 | 六 |
欢迎访问空儿的博客
install nutch on windows XP
nutch windows install guider
--By Liming Liu
Download and install the latest version, must select GCC while selecting packages.
Download jdk-1_5_0_06-windows-i586-p.exe and install(acquiescently, C:\Program Files\Java\jdk1.5.0_06 ).
Set environmental variable: NUTCH_JAVA_HOME: C:\Program Files\Java\jdk1.5.0_06
JAVA_HOME: C:\Program Files\Java\jdk1.5.0_06
Download apache-tomcat-6.0.13.exe and install(acquiescently, C:\Program Files\Apache Software Foundation\Tomcat 6.0).Remember the port, account and password.
Download nutch-0.9.tar.gz and unzip to nutch-0.9(such as C:\dev\search\netch\nutch-0.9).
Start Tomcat service, open http://localhost:8080/manager/html
Move to “WAR file to deploy”, upload file: C:\dev\search\netch\nutch-0.9\nutch-0.9.war.
Close Tomcat service, change directory name “ROOT” in “C:\Program Files\Apache Software Foundation\Tomcat 6.0\webapps” to “ ROOT-backup”, change directory name “nutch-0.9” in “C:\Program Files\Apache Software Foundation\Tomcat 6.0\webapps” to “ ROOT”.( OR do nothing)
Create directory “urls” in “C:\dev\search\netch\nutch-0.9”.
Create a file “testurlfile” in directory “urls”.
Add line: “http://www.bokee.com “ to file “testurlfile”.
Find file “C:\dev\search\netch\nutch-0.9\conf\ crawl-urlfilter.txt”, replace “MY.DOMAIN.NAME” with “bokee.com”
Find file “C:\dev\search\netch\nutch-0.9\conf\ nutch-site.xml”, edit it to this:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>http.agent.name</name>
<value>nutch</value>
<description>HTTP 'User-Agent' request header. MUST NOT be empty -
please set this to a single word uniquely related to your organization.
NOTE: You should also check other related properties:
http.robots.agents
http.agent.description
http.agent.url
http.agent.email
http.agent.version
and set their values appropriately.
</description>
</property>
<property>
<name>http.agent.description</name>
<value>liming agent.description</value>
<description>Further description of our bot- this text is used in
the User-Agent header. It appears in parenthesis after the agent name.
</description>
</property>
<property>
<name>http.agent.url</name>
<value></value>
<description>A URL to advertise in the User-Agent header. This will
appear in parenthesis after the agent name. Custom dictates that this
should be a URL of a page explaining the purpose and behavior of this
crawler.
</description>
</property>
<property>
<name>http.agent.email</name>
<value>agent.email</value>
<description>An email address to advertise in the HTTP 'From' request
header and User-Agent header. A good practice is to mangle this
address (e.g. 'info at example dot com') to avoid spamming.
</description>
</property>
</configuration>
Find file “C:\Program Files\Apache Software Foundation\Tomcat 6.0\webapps\ROOT\WEB-INF\classes\”, edit it to this:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>searcher.dir</name>
<value>C:\dev\search\netch\nutch-0.9\crawl.demo</value>
</property>
</configuration>
Find file “C:\Program Files\Apache Software Foundation\Tomcat 6.0\conf\server.xml”.Edit the item “<Connector port="8080" …/>” to this:
<Connector port="8080" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8443" acceptCount="100" debug="0" connectionTimeout="20000" disableUploadTimeout="true" URIEncoding="UTF-8"/>
Start tomcat service.
Start cygwin, cd to “C:\dev\search\netch\nutch-0.9”, run: bin/nutch crawl urls -dir crawl.demo -depth 2 -topN 50
Open http://localhost:8080 with internet explorer, you will see a real search engine.
(Or http://localhost:8080/nutch)
http://www.javaeye.com/topic/81627 Nutch_0.8实践 (1) X.D.Hua
http://www.ideagrace.com/club/simple/index.php?t312.html Nutch 于 winxp Kevin
http://blog.csdn.net/pwlazy/archive/2006/08/23/1109868.aspx windows下nutch0.8初探 pwlazy
Liming Liu:
刘黎明 北京科技大学计算机硕士 liuliming2008@126.com
url:
- 作者: liming2008 2007年06月23日, 星期六 21:01 回复(0) | 引用(0) 加入博采
install mpeg4ip
- 作者: liming2008 2007年06月22日, 星期五 10:53 回复(2) | 引用(0) 加入博采
an implementation of virtual file system
- 作者: liming2008 2006年09月10日, 星期日 11:20 回复(0) | 引用(0) 加入博采
64位虚拟机SPANVM的设计与实现
- 作者: liming2008 2006年09月1日, 星期五 10:24 回复(0) | 引用(0) 加入博采
通用磁盘格式文件系统研究
- 作者: liming2008 2006年08月18日, 星期五 19:21 回复(3) | 引用(0) 加入博采