วันก่อนผมพูดถึงสัมมนาใหญ่เรื่อง Weblogging Ecosystem
วันนี้เพิ่งได้ email แจ้งว่า
เขาจะมีแจกข้อมูลบล็อกสำหรับให้ไปวิจัย
จำนวนมหาศาลกว่า 1 ล้านบล็อก รายละเอียดตาม mail นี้ครับ
Announcing: data
availability for the 3rd Annual Workshop
on the Weblogging Ecosystem
We are happy to announce the public availability of a
substantial
collection of blog data for research purposes. The data is
being made
available by Intelliseek/BlogPulse in conjunction with the 3rd
Annual
Workshop on the Weblogging Ecosystem. A DVD containing full
text from
nearly 1 million blogs can be requested by filling out the form at
the
workshop homepage: http://www.blogpulse.com/www2006-workshop/
The release comprises a complete set of weblog posts for three
weeks in
July 2005 (on the order of 10M posts from 1M weblogs). This data
set has
been selected as it spans a period of time during which an event
of
global significance occurred, namely the London bombings. The
data set
includes the full content of the posts plus metadata in an easy to
parse
XML format. The metadata fields include: date of posting, time
of
posting, author name, title of the post, weblog url,
permalink,
tags/categories, and outlinks classified by type.
Much of the interest in research relating to weblogs involves
the
analysis of large quantities of data. As part of this workshop, we
are
very excited to provide a data set to the research community. The
aim is
to encourage the use of this data to focus the various views
and
analyses of the blogosphere over a common space. This will provide
a
unique opportunity to compare different views of the blogosphere
and to
stimulate interesting discussion and collaboration.
Researchers are welcome to concentrate on whatever aspects of the
data
they are interested in. Possible topics include:
- Topic detection and tracking
- Relation of blog data to other media
-
Social network analysis
- Qualitative analysis of small scale interactions
- Sentiment detection
- Search tools
- Detection of spam blogs
- Correlation of weblog events to "real-world" data (e.g. the
stock market)
- Clustering and ontology creation
- Measures of influence
- Visualization and mapping of the blogosphere
Please note that we welcome any submissions to the workshop, not
just
those making use of the data. Feel free to contact the
committee with
any questions you may have.
Eytan Adar, University of Washington
Natalie Glance, Intelliseek & BlogPulse
Matthew Hurst, Intelliseek & BlogPulse
ไม่มีความเห็น