Friday, June 20, 2008

Semantic Web Panel Discussion at LinkedData Conference 2008


I recently attended a really fascinating panel discussion about the Semantic Web which was part of the LinkedData Planet 2008 conference.

Panel Organizers:

Marco Neumann, New York Semantic Web Meetup

Moderators:

* Hank Williams, Founder and CEO, Kloudshare
* Eric Hoffer, Second Integral

Panelists:

* Sir Tim Berners-Lee, W3C, MIT, inventor of the world wide web.
* Sergey Chernyshev, CTO, Semantic Communities LLC
* Dan Connolly, Research Scientist, W3C (tentative)
* Christine Connors, Global Director, Semantic Technology Solutions, Dow Jones & Company, Inc.
* Taylor Cowan, Emerging Solutions Principal, Sabre Holdings, Travelocity
* Richard Cyganiak, Reseacher, DERI and Project Leader D2RQ http://www.d2rq.org…
* Nic Fulton PhD, Chief Scientist, Reuters Media
* Marc Hadfield, President and CTO, Alitora
* Savas Parastastidis PhD, Architect, Technical Computing, Microsoft Research


Topics covered during the panel included:

How can we monetize the semantic web?
Is the semantic web relevant and ready for what we’re doing today?
The importance of interoperability and standards.
Barriers to adoption of semantic technologies and how can we encourage adoption of it
Practical insights into the semantic web.
What technologies and products are available today?

(Disclaimer: My notes are not comprehensive and I apologize in advance for any inaccuracies – they are meant more to spark thoughts and ideas than a complete record of the panel discussion. I tried to capture everything being said but in some cases I may have misheard or substituted some words for my own.)

The answers to the questions below are the combined set of panel responses, except were I called out TBL.) The first question went to Tim Berners-Lee. (who’s brilliance increasingly shone through as the panel progressed, and being from England myself, I was pleasantly surprised that he had an English accent, for some reason I thought he was American or Swiss, although judging from his suntan I would guess that he probably spent most of his time away from England!).


Q. What's next with WC3 initiatives? Should we encourage people to write more code and add more standards?

A’s: TBL - We need to have all these things coming together - open data, consuming data, playing with ideas. It would be a mistake to focus on one thing. Consumption of linked data, people pushing for mash ups sites. One thing is certain is that there needs to be more things out there consuming and using this stuff.

Q: Best reasons for people to adopt these technologies? Can we prove the value? Has anyone seen examples of this being done?

A’s: One of the best reasons is because they solve a problem or business need. We need to help them (companies, enterprise) understand the value of a semantic solution. How can we prove the values of semantic technology?

Within the life science community we seek to answer questions that are unanswerable.

It has been shown that companies can cut 6-8 months of development of a drug if they have access to this sort of information and data integration.

In the financial sector, hedge funds find that linking data sets help customers to make money, the same goes for legal and financial databases. In business people want all the information they can have as soon as possible.

At the moment there are massive amounts of data and airline companies don't want people to have that data.

With regards to semantic technologies….In reference to data feeds, SKOS is easier for people to understand and migrate to than OWL and RDF. For people that have been building taxonomies, it lets people identity discrete sets of data. If we can send people feedback - 'this string is.. earnings per share'.. it can have billions of dollars affects on the markets.

It is more elegant for consumers to tell companies - this is where I want to go and how long I’m going to be there. Then travel bidders can scrape the web and people/companies can find people who want these offers and 'spam' them. This is a perfect case for the semantic web and for creating the semantic data. The content is just available, they don't have to tell us and this means less work for the consumer.

Q: How can we facilitate communication about the semantic web?

A’s: There needs to be a community of discussion around describing data on the web where
people can share ideas about how to talk about data, a semantic web SIG (TBL's suggestion)

SWIRC channel, ES (?) wiki. The current tool sets are still not simple enough for the average user and until we make the tools simple won't get adoption we are talking about.

A wiki type communication environment where people can post and comment on and things can become standardized.

Q: How do we convert this tool one microformat into another one and support transitioning between microformats?

By the creation of ontologies for life sciences communities, triple stores. Pharmaceutical companies are doing this but it's hard to query data at the moment because it is inefficient.

With regards to data storage, we might want to adopt service model (like google) for storing and processing information and have vendors to do this. As service providers make it an easier sell than enterprise sell, there will be a trend in life sciences to outsource services.

We need a repository for communication where it is easy for developers to build solutions and create a graph so we can do interesting things such as queries. Microsoft is building RDF and OWL export feature into their products to enable object reuse..

Q: How do we process it and make it more efficient?

A’s: Triple stores not doing that well. If we can map data into a relational schema and still export RDF then we have the best of both worlds, triple stores flexible, relational stores efficient.


Q: Should people be rushing to implement RDF and SPARQL and to achieve the standards based interoperability as with SQL? What kind of queries can you express?

A’s: RDF and SPARQL integration to combine XML with relational data. We need to standardize these extensions and have a larger set as well as include mathematical and aggregate functions in SPARQL, like SQL currently does.

Q: Why should web app. Developers and IT departments care?

A’s: Semantic web technologies do not currently provide that level of control. A SPARQL query can kill servers… project for adding meta data… effort to talk to semantic web in REST

There are currently not enough applications to consume semantic data for web app developers to put efforts into developing it. It's a chicken and egg situation.

We need more standard data models.. yelp .. standard web ontologies, reuse of components,
web services where we can grab components and a high level of use between service that creates it...

There is a lot of hype around semantic web. It’s really all about structured data..

Could FB release data as RDF triples?? When information around data becomes a commodity people will talk about structuring data more. The question is how much data and how much to share?

Why not give it away in a standards way? When of interest to others...structure is there...

For N-way mash ups of content (coming from multiple sources) value is not currently there.
It is currently easier for developers not to bother at the moment.

Q: Barriers to adoption:

A’s: The data parsing process is currently too long to process. It is currently easier to just give name/value pairs...

Google has a huge global graph, taken a copy of it and created structure for others to discover,
converted it into structured info and adds value on top of it.

Q: What are some of the myths about the SW?

It won't happen until we have AI to convert unstructured data into structured data.

People won't have time to mock up their html pages with RDF data.

Once we start to correlate data there will be incentives to release it. There
are no services to consume it right now and add value to them. There is no monetization value.

(Start ups are already doing it.. the only question is formats...).

Q: What ways can people get from the universe they are at to another place?
What would you advise people to do?

FB and social networking sites, car companies, flights.. could add the menu for example

Audience participation

Discussion about mashups and the semantic web:

If all data is labelled correctly then all mashups are just a question of people asking for what they want.

Gave example of Seat Guru http://www.seatguru.com/ - airline ticket site.. all mashups will eventually become services, people will mashup the mashups... ideas, knowledge, discovery.

WSDL and REST incorporate ways to label APIs with semantic concepts, roll them up into auto discovery of API calls for looking up flight numbers which will be labelled with semantics....

A semantic mashup provider doesn't have to do it themselves, the annotation of mashups exists in a cloud.. what are mashups out there labelled with a particular concept.. Someone creates it, someone labels it and someone else consumes it.

Mashups are no more complicated than spreadsheet.

Q: What needs to be done to encourage adoption?

According to TBL: "RDF specs need trimming" , things need to be thrown out otherwise it
makes it difficult to read. RDF can't express all things.... it's ready but could use a bit of a clean up.

TBL: Express a literal as a subject.. to be able to say 3 is negation of -3.

RDF is ready when it is built into a programming language so we don't have to look for toolkits. I want a programming language where it incorporates RDF intrinsically and expands to full SPARKL query.

A few people have tried to use RDF and hack python underneath.

We need a killer app.. written on top of the graph… and have the data available in RDF.

Q: How do we link up the data with descriptions?

We need tools to create this easily. How do we get data linked with meaningful labels?
For data to be labelled with URIs we need an extra URI space which is community driven
and we need a wiki for ontologies and concepts...

We need trusted 3rd party and global community to define this..

TBL added that WC3 was founded for this.

One of the problems is to agree on and define ontologies, the most important thing to worry about is getting the data and what ontology can we use to map the data?

Database schemas..we can find all over the place.

RQ: What vocabulary, what schema?? How do we express the most important concepts?

A’s: People, location, sub class is city.. regarding ontologies - do we create new or reuse one?

The most important process is getting people to agree on it. Getting agreement on vocabulary is hard work.. have to engage stakeholders...

Metaweb startup that exports data into graph model and builds apis.. Build a semantic layer onto of RDBMS then we can ask questions on top of that. Each dB could have it's own semantic ontology around it and ask questions of all 500 dBs not just one...

The power of the predicate - we can create a controlled value of predicates..

Ontologies will become a commodity and we will have average level (publically accessible) ontologies and then the behind firewall stuff.

Outputting data is separate from attaching it to other's... inferencing engines... part of a bigger conversation.. How do we annotate other people's ontologies? We cannot plan this for all cases... create one based on current RDBMS.

Q: The tools are still not simple enough for the average user and until we make the tools simple we won't get adoption we are talking about. How large content organizations engage restricted on how mature the tools are?

Tools such as ClearForest exist to parse data into semantic relationships. Artificial Intelligence and natural language processing is the answer but we are not there yet.

(Audience Applause)

Some Afterthoughts: Exciting things to come with increasingly open data initiatives combined with semantic labeling, increased user adoption, opportunities for monetization and a technological infrastructure to create, maintain and parse semantic content. I came away from the panel feeling that we are on the verge of being ‘there’ although as with a lot of technologies it will take awhile to get all the kinks ironed out and for widespread adoption to take place. Although the web has come along way from the early ‘90’s there is still so much more still to come and it left me thinking why the semantic web didn’t evolve sooner, given the usefulness and power of it. Ideas, thoughts anyone??