How to Read All of Wikipedia
Wikipedia pages are information rich but not easily accessible to data mining because their content is only made uniform by convention, not strict input validation.We present an approach to data-mining large volumes of semi-structured text, such as Wikipedia dump files, using open-source tools. We employ compiler-writing and data-visualization tools in such a way that we always "know what we don't know". This converts a mining task to an incremental process we call "Exploratory Parsing" with many applications in our increasingly open-data rich world.
Conference Mailing List
If you would like to receive GOSCON announcements, please subscribe to our GOSCON mailing list.
latest tweets
RT @actormoon: "yugle2: @actormoon
기독교잡지 <복음과 상황>에 실린 제 인터뷰기사입니다 <야~! 합쳐>. 문성근
http://bit.ly/dlf1nK"
@kevjames3 I was fortunate enough to meet @WardCunningham after his keynote speech at @GOSCON. It was a very enlightening chat :)