GotoKnow
  • เข้าระบบ
  • สมัครสมาชิก
  • แผงจัดการ
  • ออกจากระบบ
GotoKnow

ข้อมูลบล็อกสำหรับนักวิจัย

วันก่อนผมพูดถึงสัมมนาใหญ่เรื่อง Weblogging Ecosystem
วันนี้เพิ่งได้ email แจ้งว่า เขาจะมีแจกข้อมูลบล็อกสำหรับให้ไปวิจัย
จำนวนมหาศาลกว่า 1 ล้านบล็อก  รายละเอียดตาม mail นี้ครับ

Announcing: data availability for the 3rd Annual Workshop
on the Weblogging Ecosystem

 We are happy to announce the public availability of a substantial
collection of blog data for research purposes.  The data is being made
available by Intelliseek/BlogPulse in conjunction with the 3rd Annual
Workshop on the Weblogging Ecosystem.  A DVD containing full text from
nearly 1 million blogs can be requested by filling out the form at the
workshop homepage: http://www.blogpulse.com/www2006-workshop/

The release comprises a complete set of weblog posts for three weeks in
July 2005 (on the order of 10M posts from 1M weblogs). This data set has
been selected as it spans a period of time during which an event of
global significance occurred, namely the London bombings.  The data set
includes the full content of the posts plus metadata in an easy to parse
XML format. The metadata fields include: date of posting, time of
posting, author name, title of the post, weblog url, permalink,
tags/categories, and outlinks classified by type.

Much of the interest in research relating to weblogs involves the
analysis of large quantities of data. As part of this workshop, we are
very excited to provide a data set to the research community. The aim is
to encourage the use of this data to focus the various views and
analyses of the blogosphere over a common space. This will provide a
unique opportunity to compare different views of the blogosphere and to
stimulate interesting discussion and collaboration.

Researchers are welcome to concentrate on whatever aspects of the data
they are interested in.  Possible topics include:
-  Topic detection and tracking
-  Relation of blog data to other media
-  Social network analysis
-  Qualitative analysis of small scale interactions
-  Sentiment detection
-  Search tools
-  Detection of spam blogs
-  Correlation of weblog events to "real-world" data (e.g. the stock market)
-  Clustering and ontology creation
-  Measures of influence
-  Visualization and mapping of the blogosphere

Please note that we welcome any submissions to the workshop, not just
those making use of the data.  Feel free to contact the committee with
any questions you may have.

Eytan Adar, University of Washington
Natalie Glance, Intelliseek & BlogPulse
Matthew Hurst, Intelliseek & BlogPulse

บันทึกนี้เขียนที่ GotoKnow โดย 

คำสำคัญ (keywords): blogresearch
หมายเลขบันทึก: 9995
เขียน:
แก้ไข:
อ่าน:
สัญญาอนุญาต: สงวนสิทธิ์ทุกประการ

ความเห็น (0)