Lately I've been thinking about all of the data pieces that I would like to track if I had a good system. But then, my data is scattered throughout a bunch of segregated systems and stored in diverse formats.
What if I had one central system to pool all of that data and manipulate it? I think an object-based data environment that allows me to add metadata to objects would be ideal. Now the persistence question is there--how will this be stored on disk? But I'm still working on that.
My thought right now is that I can mirror objects in other systems (e.g., my Outlook contacts) while still attaching other metadata not supported by systems (or at least not easily).
The metadata and object structure would have to be very extensible, with a core flexible object model. XML provides a good framework for this, but I'm looking for something a little more virtual--expressible in XML of course, but not tied to it.
The topic of the Semantic Web entered my thoughts during this time. The easiest way to create the Semantic Web (especially with today's hodge-podge Web) is not to require your average web page producer to learn special markup. Someone else can have that job!
No! The way to create the Semantic Web is to create vertical semantic applications that draws a stable community of users to a common schema by virtue of using the same client software. At say, 10, 100, 500, or 15000 users, you have a meaningful data schema for that application space.
Capitalism is great; competition is great. But the Semantic Web pines for standardization in the midst of choice. So a standard for trust in schema mappings must be devised. Trusted mappers sign their trusted maps and other parties (robots or real) trust the schema mappings in their own applications.
This allows applications and robots to "intuit" equivalence between schemas based on a system of trust. Tomorrow, anyone can create a brand new schema. And within their circle of trust, the semantics can be honored. Establish credibility with a major schema authority, and virtually everyone can trust your schema.
So here's the sequence:
Vertical application leads to
A community of users that agree to
A common data schema (probably XML) which becomes
A trusted schema by a schema mapping authority who publishes
A public schema mapping which is
Trusted by the "mapping downstream"
So an exchange might go like this:
Client: Hey, Semantic Web source! Do you have any data that adheres to the Simpson Geneology schema?
SW: Well, not exactly, but I do see that one of my trusted schema mapping authorities believes that the National Science Foundation's Geneology Lexicon contains some mappable elements in this location. Would you like to try there?
Client: Why yes, that would be fine. Oh, and since you trust the National Science Foundation's Geneology Lexicon as mappable to the Simpson Geneology schema, I will add that to my list of trusted mappings because I trust you. And in the future, I will also include the NSFGL in my list of acceptable result formats for my next request. Thanks again.
Granted, this "conversation" is protocol talk over the wire and sever and cyberspace.
And another thing. In the world of schema trust, we won't have a mega market giant like Verisign as the main schema mapping trust authority. Well, maybe we will, just de facto, but this will be very grass roots. I myself could choose to be a schema trust authority. Anyone who trusts me and authenticates my schemas (using PKI of course) can choose to also trust the mappings that I produce/author and/or trust.
So, my vision of the Semantic Web is not one gigantic standardized pool of data that follows good markup because everyone is speaking exactly the official vertical dialect for an application space. Rather, new application spaces pop up all the time. A software developer chooses to play nice by adhering to some standards that make the data created by the application usable by the Semantic Web. Vertical application communities evolve. Circles of trust form. Somebody releases an "open source" schema equivalence mapping. Everyone else trusts the mapping. The Semantic Web bots just skip along from stone to stone across to the other side of the river and back.
OK, I'm definitely tired right now, and I know I'm rambling, but you'll have to forgive me. At least I'm posting again, right?
I have given some thought on how to prevent SW spam. Think about it. If all it takes is a special set of tags for a certain knowledge space, anyone can plug them in. You remember the META keyword tags right? So, now smutmongers can specifically target you when you're looking for rare bird species.
So the whole "Google magic" might be needed in the SW space as well. Well, Tim (Berners-Lee), that's my two cents for today.