"The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help. One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the Web. Leaving aside the artificial intelligence problem of training machines to behave like people, the Semantic Web approach instead develops languages for expressing information in a machine process-able form" -- Tim Berners-Lee, The Semantic Web Roadmap.
An introduction to Tim Berners-Lee's Semantic WebFor Tim Berners-Lee, who many recognise as the inventor of the World Wide Web, the Semantic Web has been 15 years in the making.
What is the Semantic Web? The Semantic Web is the name of a long-term project started by W3C with the stated purpose of realizing the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration, and reuse of data across various applications (from the W3C Semantic Web Activity Statement). The Semantic Web is a Web technology that lives on top of the existing Web by including machine-readable information in files without modifying the existing Web structure.
In its current format, raw HTML text and images contain meta-information that is readily understandable by a human, but has little or no meaning to computer programs. For instance, popular search engines can help you locate files containing specific words, but this content may not actually be what you're looking for. If the content matches the words you searched on, but pertains to a different topic than you had in mind, the result will not be what you intended. There is also no way for the search engine to relate to other related content a few steps down the virtual relationship path. The characters 95495 could mean a dryer belt, an American postal code, a street address, or a set of dinosaur slipper socks. Human language can efficiently operate when using the same term to mean somewhat different things, but automation does not.
In another example, let's say you were doing research on a CEO named Attilio Russo (fictitious). A standard HTML search will look for string occurrences (along with some fuzzy logic to find partial matches, etc.) of documents that contain Attilio Russo. In a semantic Web model, there would be semantic searches that look for documents on the Web with relationships to that data, that would then compile and organise the relationships and give you things like a list of previous companies Attilio worked for, the board of directors of those companies, companies those board members worked for, etc. This would allow a computer to form relationships from data on the Web in a way in which only humans can do currently.
The Semantic Web is designed to allow reasoning and inference capabilities to be added to the pure descriptions. In its simplest form, this includes stating facts such as ''a hex-head bolt is a type of machine bolt," but extends to the formation of complicated relationships. Features like this allow intelligent software to act on this descriptive information and follow logic paths based on them.







Talkback
The thing that concerns me is legacy data, all that is available now all 8bn pages Google searches. How easily will that be converted in this new Semantic form?