Architectural Building Blocks of Netflix Cloud Platform

Cloud Commandments

The “Cloud” has been the buzzword and flavor of the season in the silicon valley tech community. Netflix, (where I currently manage the Cloud Platform (Core Infrastructure) team) has been a pioneer in terms of embracing and being a thought leader in terms of web scale companies utilizing the “Cloud”.

Netflix’s journey into the “Cloud World” started around 2009. The provider of choice was AWS (Amazon EC2). I was very fortunate to be part of the team that worked on the proof of concept. Netflix now relies on the Amazon Public Cloud for almost all its needs bar a very few specific operations that are still hosted in Netflix managed datacenters.

On this journey, we built the Netflix Cloud Platform, brick by brick, and we learned a few lessons. We definitely benefited from various existing Open Source offerings and knowledge base on the web. In the spirit of giving back to the community, we now offer a rapidly growing set of these libraries and frameworks as Open Sourced libraries on Github (http://github.com/Netflix).

In addition, I and many of my colleagues participate in meetups and conferences and offer the knowledge gained by way of blog posts at http://techblog.netflix.com.

One such talk that I presented was at GitPro. GitPro offers a great opportunity for tech professionals to network and learn from each other.

The slides are available at Slideshare

The talk mainly covered the following topics

  • What is a Cloud
  • Netflix’s journey to the Cloud
  • Lessons Learned (Commandments for a Web Scale Cloud Deployment)
  • Netflix Cloud Platform and Open Source offerings
  • Tips and Techniques for the Cloud
  • For those that want to explore this journey and are considering embarking on a journey of your own, there are a lot of resources that can guide you. Please visit http://techblog.netflix.com for more information. For a broader perspective from other experiences, Quora stands out as a great forum to discuss and learn about various cloud computing challenges.

    Enjoy the journey!

    Start Slide Show with PicLens Lite PicLens

    Posted on July 12th 2012 by stonse

    Filed under Web Architecture, NoSQL, Cloud Computing | No Comments »

    Of Sheep and Wolves

    Sheep and Wolves
    As anyone working in a team setting will attest, the success or failure of a project/task depends on how well the collective team performs.
    This is true for a Startup, a product in a big established company or even a minor league sports team.

    This makes the composition of the team a crucial task and of great import. In fact most leaders/executives are judged by the quality of the team that they build and nurture.

    Recruiting thus makes for a very challenging and important task for the team/manager.

    Which brings me to the subject line of this topic. Sheep and Wolves!

    In a previous company that I worked at, we were in the middle of a recruiting binge. As was the norm in that group, we set out an interview panel consisting of key members from the team. There was a fair amount of debate on what was the right technique to interview candidates and how to rate them.
    One particular member of my team, who at the time played the role of an architect, would come out of an interview and declare “She is a wolf.” Or “He is a sheep”. That’s it. He didn’t believe in a 1 to 5 star rating or an elaborate discourse on the technical merits of the candidate.

    So we cornered him and asked him to explain his non-standard rating system. After all he was a technical architect and really, his job should be to evaluate the technical capabilities of the candidate in the area of algorithms, data structure etc. Whats with this “She is a Wolf!” line of interviewing?
    He then explained to me his rationale. Yes, he was probing the technical merits of the candidate – not asking psychological or art related questions – but his inclination was to probe whether the candidate would listen to his instructions and how the candidate would engage in a debate when he threw in curve balls in the technical task/question.

    For e.g. he might say “Never use the synchronized keyword in java … we never let anyone in our team use them” and wait for the response of the candidate.
    Depending on how the candidate handled this – just accepted it meekly, showed indignation towards the seemingly unmerited dictate or launched into a passionate debate on the locking semantics available in Java – he would rate the candidate to be a “wolf” or a “sheep”

    Now, why is this important? It turns out that if you just have a “rockstar” team, full of prima donna rockstar members, the team will not function too well. In the real world most projects/teams have challenging problems requiring deep thinking and solutions but very often its also true that there will be a whole bunch of mundane tasks and work required to launch the product.

    A successful team contains folks that are willing to or maybe limited to performing these tasks. A team full of rockstars will fight it out to get the meaty pieces – and no one will be handling the crucial small tasks of dotting the i’s and crossing the t’s required.

    Essentially, a team needs both wolves and sheep.

    Recently, there has been a lot of focus on hiring in the valley – where the economy does not seem to have been engulfed in a depression – atleast from a tech point of view.

    While building a team, its necessary to pay attention to many different angles and probe many different requirements. E.g

    • While its true that a team full of people with similar backgrounds and history will very likely gel together and work well, its critical to have someone in the team that has a different mindset and sees things differently. This prevents a “herd mentality” and aids the project. Sometimes its good to have a wolf in a sheep’s clothing :-)
    • Some companies believe in only hiring very senior people – the so called 10x folks (http://www.quora.com/10X-Engineers?q=experienced+10x). Are these companies better off by potentially also hiring a few “normal” or “average” folks as well in every team?

    Say you are in charge of hiring a software engineering team of 5 to 8 engineers. Further, lets say this was for building a web e-commerce site. Would you hire all engineers with the same skillset and expect them to equally contribute in all areas? For e.g. hire all generalist engineers who do database programming, server side as well as front-end programming? Or would you hire specialist in each field? (Before you say “its obvious, hire specialists”, I do know a few teams that have gone the “all generalist” way). Which model is more efficient and has a higher chance of success?

    Note that in real life most activities that require a “team” are composed of members that have varied skillsets.  For e.g. to build a baseball team, one does not go about hiring all first basemen. You need pitchers, cathers, pinch hitters … To build a team to man a restaurant’s kitchen, you need chefs, sou chefs, supplies in-charge etc.

    Different companies have different hiring techniques and philosophies behind building teams. Netflix hiring is unique and interesting. (NOTE: I currently work at Netflix. The thoughts in this blog are entirely mine and may or may not reflect the company’s thinking) It believes in specifically hiring “rockstars” and has been very successful thus far. (http://account.netflix.com/Jobs)

    There are of course many more angles to ponder.

    Interested in this subject? Here are some nice references and links to discussions in this area:

    Start Slide Show with PicLens Lite PicLens

    Posted on January 3rd 2012 by stonse

    Filed under Engineering Management | 1 Comment »

    The World of NoSQL

    The Silicon Valley was abuzz with Cloud Computing as the latest and coolest trend just a while ago … and now after coming to grip with “on demand instances and on demand services”, the nerdy crowd has moved on to the good old topic of infinitely scalable data persistence.

    The world of data persistence has seen been through the Mainframe era, and is just now passing the RDBMS age, dipping its toe into the brave new world of “NoSQL”. What is it? Well, for the most part its about

    1. Web Scale data - by this we mean millions of users, terabytes of data - a common scenario in most “Social/Web 2.0″ companies
    2. A desire to not require a “pre-defined” schema
    3. Horizontally scalable, distributed
    4. Subscribes to the CAP Theorem

    There is now a very bubbly cottage industry of “NoSQL” offerings - a list of which can be found here at http://nosql-database.org/

    At Netflix (where I currently work), we have been on the fore-front of the “Cloud Revolution” as a high profile, web scale consumer. Its now public knowledge (and hence I can blog about it :-)) that Netflix uses the Amazon EC2 infrastructure. Netflix also has to deal with

    1. Heterogenous Infrastructure (Netflix leased Data Center and Amazon EC2 in multiple regions)
    2. Heterogeneous Databases (Oracle, MySQL, SimpleDB, Cassandra, MongoDB and HBase to name a few) and how we (Netflix) straddle them
    3. Legacy codebases and integration, and an ever evolving new codebase and deployments that are round the clock (with a single button deployment of 100s of instances :-) )
    4. Rapidly growing subscribers and employees (yes, Netflix is hiring)

    All this makes for a challenging and interesting work day. My team mate, Siddharth Anand gave a very well received talk at a recent Silicon Valley Meetup about how we as a team and as a company rose to this challenge .. its an interesting talk for those that are thinking of moving to the “Cloud” and thinking of stepping into the Brave New World of NoSQL.

    (The Netflix part of the talk starts at around 0:10 mins into the session …)

    As far the actual debate goes on the choice of NoSQL offerings, I am in the camp of “Right Tool for the Right Job”.

    One single type of persistence store (say Oracle or MySQL) is the norm at most companies. Any kind of data is stored into this Relational Data Store whether or not the actual data requires ACID or relational semantics. A subset of these companies actually stored XML blobs into MySQL columns effectively nullifying most of the benefits.
    What NoSQL set of databases offer is a plethora of persistence choices that could be tailored for specific use cases.

    Just like when you use a Data Structure you would use a HashMap, a Set, a List, an Array etc. based on what your use case is (in terms of size/space constraints, performance etc.), you should use the right NoSQL offering based on your particular use case.

    Just how do you pick the right candidate? Well thats fodder for a rather lengthy post, but there are enough benchmarks, comparisons and offerings out there - just Google/Bing it.

    Start Slide Show with PicLens Lite PicLens

    Posted on March 2nd 2011 by stonse

    Filed under Web Architecture, NoSQL, Cloud Computing | No Comments »

    Head in the Clouds

    Cloud Computing

    It seems like the next buzzword in the industry after Web 2.0 slid into the horizon as last years prom queen. As with web 2.0, there are a lot of definitions and assumptions floating around on what a Cloud is exactly. Is it related to Grid Computing? Is it old wine in a new bottle? At this point the hype far outweighs the actual offerings out there. Oracle’s Larry Ellison famously dissed Cloud Computing and likened it to the Fashion industry.
    Lets dive into this new buzzword and see what it means to the software industry.

    What is a Cloud?

    Of course we all know what a cloud is. But what essentially is Cloud Computing?
    A quick search will yield you plenty of definitions, some succint and some wildly abstract. So what I offer here is essentially a consensus definition. A Cloud is like the bubble/cloud we engineers draw on a board when we design a complex systems. We roughly know the services this “cloud” offers - but are unwilling to delve into or care about the actual hardware/technology that this “cloud” runs on. Hence in this view, Cloud Computing is a concept in which virtual/utility computers/resources are sprung up on demand to perform services required to run an application.
    Of course, other than just computers/resources, you can think of data as being in the “cloud” as well (as in, you dont know/care how this data is stored - as long as you can get it/store it back on demand).

    Hence, by that definition,
    Cloud Computing = [technology(Infrastructure-ops + Apps)] + [Information(in-cloud + in-between)]

    If you think that the above defintion is too abtract, here are a couple of examples.
    Say you want to calculate the number of hyperlinks that are in existence on the web. You would normally need a few machines to crawl the web and perform the calculation. However, this just happens to be a research/fun application and you dont want to purchase the resources required to perform this calculation. With Cloud computing, you “lease” a set of machines on demand and run your crawlers and aggregators there. And once you have a satisfactory result, you just terminate this lease. For e.g. using http://80legs.com, thats exactly what you can do.  It runs java code on hundreds, thousands, or tens of thousands of distributed computers (they use Plura Processing to scale it up to about 50,000 distributed nodes).
    On the other hand, you might have a real web application that needs to be hosted. But the load/traffic on this site is not constant. Certain times based on an Event, the traffic surges - and other times its a trickle. Cloud Computing using say Amazon EC2 comes to your rescue here. You essentially create an “Amazon Machine Instance” (AMI) - with the code/software required to run your application. You store this “image” on a Storage service - such as Amazon S3. Then, you just launch these images/apps as and when you require them. You can have 0 or 1 instance of your application running at any given time and when the traffic flow increases, you deploy a few more on the go. You pay by the amount of time these machines were running and the bandwidth consumed - well roughly speaking.

    Commercial offerings on Cloud computing are still at an early stage and the players involved are still trying to perfect the setup. But it is something that is definitely catching on and hence deserves to be paid close attention to.

    Who are the players?

    Although Grid Computing has been available for a while now, the buzz for Cloud computing just started recently when web 2.0 companies started hosting their applications in the Cloud. Amazon EC2 currently seems to be the market leader in terms of both the buzz and maturity. Other players include Google AppEngine, Sun microsystems, Microsoft etc. If you are looking for a brief comparision of their offerings, this page here could help -> http://weblog.infoworld.com/tcdaily/archives/2008/07/video_tours_of.html?source=fssr.

    Who are the potential consumers?

    • Startups

    Using a on-demand cloud hosting is good idea for startups.
    For one, you dont have to invest in hardware until you know what your scalability requirements and yes, its different from leasing a cage at a datacenter where you have total control, but its so much more better on your wallet.

    • Small Business

    Well, at this point you are more “stable” and know what your load and hosting requirements are. It still might be valuable to host your application in the “cloud”. But your mileage may vary vis-a-vis leasing a cage. Note that this is mostly in terms of the cloud (such as EC2) as a hosting platform.

    • Mid-Large Companies

    For large enterprises who already have an established IT organization, “Cloud Computing” still provides value in 2 circumstances :
    1) As an “Overflow Buffer”:  Enterprise no longer need to over-provision their data center for the peak load.
    They just need to install equipment in-house for the average load. When peak load arrives, they just automatically provision additional resource from the cloud and redirect the excess load to those extra resource.  This means huge equipment cost savings.
    2) As an “Experimental Playground”:  Enterprise need a fast turn around to test out new ideas and customer acceptance.  But it is hard to justify equipment purchase before the idea is proven.  So they can deploy the new project in the cloud to test out the acceptance.  If it doesn’t work, they just tear the project down and no equipment is wasted.  If the idea work, they start to purchase in-house equipment and migrate the application from the cloud to the inhouse data center.

    For these large enterprise, their ultimate operating environment can be a mix of “data-center” and “public cloud”.  One major challenge is how to migrate their application components seamlessly between the “private” and “public” facilities efficiently without sacrificing security, reliability and performance.  This is an area that Amazon/Google/Microsoft are not motivated to solve (as Amazon/Google don’t like your datacenter portion and Microsoft won’t take care of your enterprise Java apps).  This is an area in which Startups can step in.

    • Financial/Private/Secure Companies

    For these institutions, the data security and handling is of paramount importance. Amazon Ec2 for instance does not provide SLAs except on a very few AWS APIs. For e.g. whats the garauntee on data stored in the S3? What is the round trip garauntee for data access from a SimpleDB? There are no response time garantees. This is going to be a challenge. It might make more sense for this sector to invest in “private clouds”.
    Gartner has predicated “Private cloud networks are the future of corporate IT” at http://www.networkworld.com/news/2008/111208-private-cloud-networks.html

    Now that we went through the definition of the cloud, the players and who it can benefit, its clear that Cloud Computing is here to stay. Which means to say that there is a going to be a definite impact on the Software Industry.
    For e.g.

    • Licensing model - how does one deal with a “per cpu” license? In the cloud, resources are booted up on demand - sometimes in 100s. What about Open Source licences?
    • Startup opportunities.
      • There will be opportunity for app/service vendors to build standardization, virtualization and services on top of existing cloud infrastructure.
      • Create Hosting Software Stacks (e.g. AMIs)

    In my next post, I would like to explore the impact Cloud Computing means for most of us when we architect web applications.

    Start Slide Show with PicLens Lite PicLens

    Posted on January 12th 2009 by stonse

    Filed under Web Architecture | No Comments »

    Structure and Organization

    Org Chart

    If you have ever worked in a fairly large corporate office, you would no doubt have attended numerous “re-org” meetings. They all seem to have a standard theme: “The previous structure/org chart was flawed (besides it was done by my predecessor who is a moron **cough cough**) and hence this brand new structure.”

    A matrix based org structure gives way to a divlet based one which is shoved away in favor of a “line-discipline” organization etc.  ad infinitum …

    “Organization” is also relevant when it comes to seating .. do you make all your engineer sit together regardless of what product they work on?, and all your QA Engineers in a different area OR do you make managers, engineers and QA sit together based on what product they are working on? … (ugh what if I’m working on multiple products?)

    The one that is of interest to me currently, is organization and structure for a Project developed using Java. With the advent of “layering” and “seperation of concerns”, most projects have a controller, model and view layer.

    Now, this brings its own question in terms of package structures. Do you club all your “controllers” together and your “models” etc. in its own package OR do you put them in a package based on the “vertical/feature” that they are serving — e.g. “shopping”, “reviews” etc.?

    An article at http://www.javapractices.com/topic/TopicAction.do?Id=205 talks about “Package by Feature” v/s “Package by Layer” — and for various reasons prefers the “package by feature” paradigm.

    Some excerpts from that article are posted below

    Package-by-feature separates the application according to its various features. The package names correspond to important, high-level aspects of the problem domain. A drug prescription application, for example, might have these packages :

    • com.app.doctor
    • com.app.drug
    • com.app.patient
    • and so on…

    Within each package are all (or most) items related to that particular feature : action, model, and data access object classes, for example. (There may be more than one of each class, or one might be missing, according to the needs of the problem).

    Package-by-layer instead reflects the various application layers in the highest level packages, for example :

    • com.app.action
    • com.app.model
    • com.app.dao
    • com.app.util

    Of course, to me, the structure you choose depends on who you are and what your role is. Maybe from a UI engineer’s point of view, its better to keep ALL UI related code in one folder. But from a “Vertical Leader” point of view, you would want all code, regardles of what part of MVC they fall under belong to one package i.e “package by feature”.

    Of course, its much more difficult to arrive at a solution when one is dealing with the physical world - as is the case in the “seating” problem above. Software, however, is flexible and amenable towards a solution. Is there a way we can solve the needs of BOTH the parties here?

    Lets think about the same issue in terms of your favorite Web-based Email product. Most tradional offerings such as Yahoo Mail, Hotmail, AOL Webmail etc. have gone with the old “folder” based approach to organizing emails. This brings out a question — where do I store emails that were sent by my wife but was about house bills - under “family” or under “finance”?

    Gmail however has taken the approach of “tag” based organization. Essentially ALL your emails are in the same flat structure (conceptually) and individual mails can be assigned multiple “tags”.
    Based on what you are trying to accomplish at the time, you can arrange and filter your emails based on the tag.

    Is there a way we can introduce the concept of “tag” based “packaging” in Java. For e.g.

    @Tags(controller, shopping)
    public class MyShoppingController{

    @Tags(util, shopping, general, review)
    public class MyProductUtil{

    @Tags(controller, review)
    public class MyReviewController{

    As shown above, if one were to use Annotations and store ALL java files in the same folder, but use Tag Annotations to define which “conceptual package” they belong to can we solve this issue? Of course, we need to think about the Java Language Specifications and its bearings on Package based scope and access privileges etc., not to mention other issues such as Namespaces and ClassLoaders. But its something to think about — well some day when I really have time :-) (yea, yea, purists of Java - its just a wild concept, dont fret over the actual semantics :-))

    Start Slide Show with PicLens Lite PicLens

    Posted on May 16th 2008 by stonse

    Filed under Web Architecture | 1 Comment »

    MashupCamp 2008


    Mashup Camp 2008 just concluded. Mashups seems like soooo last season. If one were to sort recent web technologies chronologically or in terms of buzz worthiness it would go somewhat like AJAX, Web 2.0, Mashups, Social Networking, (Bubble 2.0??), ….

    Sure, there were a decent number of participants and the usual suspects (AOL, M$, Y!) had their usual booths and swags in addition to their evangelization efforts of their respecitve Open APIs, but the buzz is clearly over for mashups … unless of course you are mashing up various social networking data :-) (FriendsFeedsFeedAggregator???)

    Tradionally “mashups” were in terms of aggregating data from various “sources” and overlaying them to produce a useable application. Also tradionally data were mostly Feeds of Pictures, News etc. and most were overlayed on a Map. Location based mashups are clearly useful and here to stay.

    iPhoneLocator (http://www.facebook.com/apps/application.php?id=10355565129) of course tries to mashup using all possible buzz worthy trends today namely iPhone, Mashups, Maps, and Facebook!. Yahoo annouced FireEagle (http://fireeagle.yahoo.net/) (“Fire Eagle is the secure and stylish way to share your location with sites and services online while giving you unprecedented control over your data and privacy.“)

    More recently it has forayed into different data types. For e.g. Voice based Mashups, various types of data embedded in Video etc. are making some headway. Ribbit (http://www.ribbit.com/) is at the forefront of Voice API revolution. Tim Burks demoed a different kind of mashup - one that is primarily done in programming layer - basically, Tim champions “Nu”, a Object Oriented Programming Language that can “mashup” different programming languages such as Lisp, Ruby and Object C. More on this at http://programming.nu/.

    Of course, now with the fad towards various Social Networking sites, there has been a lot of focus on OpenSocial (http://code.google.com/apis/opensocial/) and Shindig

    The trend clearly is towards “Open”ing of data. And of course AOL has now made AIM “Open” as well. Kapow kind of takes this openness to its literal extreme - by providing means to scrape data that is out in the “open” - clearly something that was not so kosher thus far (and still might not be all that legal).

    All in all, there is plenty of data and means to mashup out there to keep one interested in the area with enough to play around with and tinker with for quite a while.


    Start Slide Show with PicLens Lite PicLens

    Posted on March 21st 2008 by stonse

    Filed under mashup, Social Networking Applications | No Comments »

    Papers and Sites on Personalization and Information Retrieval

    With the advent of web 2.0 and the buzz index of “social networking” going through the roof, a lot of people are paying attention to the various methods of “understanding” the user and means of “tailoring” content to such users.

    Collaborative Filtering

    Since the subject is so vast, here are some of the links that has served me well over the past few months since I got involved with these subjects at work (http://my.aol.com/page/mgnet).

    Start Slide Show with PicLens Lite PicLens

    Posted on January 29th 2008 by stonse

    Filed under Personalization | No Comments »

    The World of Mobile Apps

    After the “Web 2.0″ revolution, geeks and marketing nerds needed a new frontier to be buzzed about and march towards. Enter Web 3.0 [http://en.wikipedia.org/wiki/Web_3].

    One of the trends thats been gathering traction is the usage of Cell Phones (Mobiles for the rest of the english speaking world) in place of the PC to conduct day-to-day activities. With the launch of iPhone and the annoucement of Google Android, a lot of buzz has been generated in this area.

    Just like the Web 2.0 revolution started a trend towards social networking sites and sharing, the Mobile revolution will make it much easier to participate in such activities with your Cell Phone. However, unlike the fairly mature and “open standards” based Web, the Mobile world is a mish-mash of pseudo standards, proprietary networks and “pay per use” nightmares. This, combined with the difficulty involved in implementing an application that can be used by a critical mass of cell phone users has been the limiting factor in the march towards a mobile world with first class apps.

    Yahoo Mobile (http://m.yahoo.com) is a clear leader in the mobile portal space.

    When it comes to social networking and “sharing” there are a series of startups vying to be the next talked about application.

    One of them - CellFish, accounced a “free” web service for sharing MultiMedia files with their “Add To Phone” offering.

    Mobile Sharing (CellFish)

    Users get a SMS message on their phone with a link to a “locker” site which hosts the multimedia asset that was “shared”.

    For e.g. you can send the Defrag picture above using this button ->

    Some other companies that have similar offerings are

    Mobile apps are still in its infancy (atleast in the States) as far as mass acceptance and usage is concerned, but there is no denying that it is the future.

    Start Slide Show with PicLens Lite PicLens

    Posted on November 29th 2007 by stonse

    Filed under Devices, Mobile | No Comments »

    DeFrag Conference at Denver

    Defrag Billboard

    Defrag Conference (http://defragcon.com) was held at Denver with the goal of getting thinkers, users and innovators in the space of Knowledge Management and Social Networking to discuss and brainstorm ideas/issues and trends in this space.

    Every conference has its own charactersitics - this one

    • Seems to have had an audience from both the Enterprise/Academic world (British Telecom, UC Berkeley, Siderean etc.) and the Consumer World (Medium, Yahoo, Google, AOL, FeedHub etc.)
    • There were about 150 to 200 attendees
    • The format was a mixture of single speaker sessions, moderated panel talks and free/open discussions on a topic of attendee’s choice.

    Structured Data and Attention

    Alex Iskold talked about “Structured Data and Attention” Key points were

    • User’s activities (reviewing, tagging, rating etc.) were currently stored in a Silo. There is no easy way of sharing, migrating these attention data across sites.
    • e.g. User does a review on Amazon on a book he purchased. Amazon benefits from this review. Amazon uses this to attract more users/offer recommendations. This recommendation however only works within Amazon.
    • Since attention data is only stored/accessed per site, it poses two problems.
      • It becomes difficult to apply it across other domains - e.g. a site like Netflix could have potentially used the Author’s review on the book to recommend movies based on his feelings
      • It makes it difficult to create an aggregated social/collaborative intelligence as the data is spread out and there is no easy way of obtaining/using them.

    Hence he calls for a standard to be developed around this issue in the following area.

    1. Attention Data Interchange Interfaces/APIs
    2. Attention Data Storage on a Trusted “Third” Party site in a well defined structured format.

    Questions asked: What about APML (http://www.apml.org): Alex is appreciative of the eforts put into it - its a good start, however he thinks its too generically structured (tags). He favors more rigidly defined structures e.g. Books, Shopping Items, Movies etc.

    Attention Data and User Control

    A couple of sessions (featuring Doc Searls, Esther Dyson etc.) talked about the emerging Social Network based applications and the real concerns/issues it brings forth.

    Esther pointed out that a lot of sites are collecting Attention Data - but most are NOT sharing what they collect/disseminate with the user. Esther prefers that a strong “Privacy/Rights” structure be worked around “Attention Data”. Users need to feel comfortable using a service and they trust a service when they can control their “Profile”.

    Doc Searls spoke on “Customer Reach v/s Vendor Grasp”. He lamented the lack/reduction of customer rights (e.g. Verizon’s 10,000 words Service Agreement). What if Consumers stood up for their rights and came up with their own “agreement” and terms/conditions with each of these “Vendors”? Doc is heading a project called Project VRM (http://cyber.law.harvard.edu/projectvrm/Main_Page) which is working towards providing a framework for “Vendor Relationship Management”

    Next Level Discovery/Search Panel

    Moderated by Bradely Horowitz from Yahoo with participants from Factiva.com, Krugle.com (Steve), Jabber etc.

    The topics of discussions were around Search queries and results. Bradely started off the discussions with the question “Does Search today suck?” and each of the panelists provided their take on it. The consensus seems to be that Search has come a long way - but it still has a long way to go especially in the Enterprise world and also in the non english speaking world.

    Today’s main search engines are “Keyword Search” based. These dont always work. Sometimes people would rather use Natural Language based queries. This method has seen very limited success. Some research show that finding answers by asking questions (natural language) is well - “natural”, but using keywords and drilling down/honing on the right result is “learned”.

    Bradely described how Yahoo! came up with Yahoo! Answers to solve this problem space (reasons: people like asking questions and obtaining answers and Korean users (which is where Answers conceptually originated) were somewhat at a disadvantage due to the lack of significant index of Korean data)

    If querying data was one aspect, consuming the search results was the other end of the problem. Yahoo!, Google et all have so far stuck to displaying ranked results as list of links (An Ask.com commercial mocks this lack of “feature”). Then there are models such as the one eBay Express is experimenting with at http://www.express.ebay.com/ that attempt to provide a “Category/Filter” based approach. Search for “Womens Reebok Shoes” and see how the results are displayed.

    Panelists also were of the opinion that comsuming and viewing search results across TimeLines, Geography etc. were also very valuable but very few consumer search sites provide this feature. (Some Enterprise Verticle ones do)

    There were some discussions on how “Vertical Search Tools” solved the problem of “Context”.

    Steve from Krugle.com offered the example of searching for “python”. Krugle understands this to be the programming language and need not disambiguate it for a snake.

    Another challenge facing Search today is in the enormous amount of un-indexed non-web data out there. Jeremie from Jabber expressed hopes that some day there will be a “Collaborative Search Protocol” which facilitates aggregating and consuming search from multiple vertical/generic search engines.

    OpenSocial v/s Closed Private

    Google’s Kevin Marks made an unscheduled presentation on OpenSocial. It was mainly in response to some concerns/confusions in the industry on its security, usage model etc. Kevin mostly talked about what OpenSocial is and how to go about using it. The slides can be found at http://docs.google.com/TeamPresent?docid=dfng2zqx_35gq33q7. A day earlier some folks at the DeFrag announced “Closed Private” - sort of an anti-thesis to OpenSocial. The discussion was around the concerns/issues surrounding “promiscous”, “open”, social networking sites.

    The key points were:

    • OpenSocial is still in a very early “Alpha” stage
    • Google likes to deploy products in early stages and collect feedbacks - this was a collaborative approach (with partners such as Linked In) and it was thought that its beneficial to announce this and work out the kinks in the “Open”
    • The main challenges are in the area of “Identity”. A particular person may have multiple “personas” - s/he may/may not want them to be intermingled.
    • Google currently is relying on Container Namespaced Identity (e.g. username@linked.com) and delegated authentication. Identity is not controlled by Google. (Google Account/gmail accounts are not necessarily tied with OpenSocial Identity - though its a possibility in the future)
    • The security is left to the individual containers for now (remains to be seen how it works out - we have already seen multiple hacks (http://www.techcrunch.com/2007/11/02/first-opensocial-application-hacked-within-45-minutes/))

    I asked Kevin if there were any plans to work on a “Data Format” for Social data interchange -> almost like a OPML type. For e.g. most “social networking” sites have common types of data e.g. “Friend/Buddy List”, “Activities”, “Personal Profile Information”. etc. Is it beneficial to define a data format (Social Netowrk Markup Language :-))? Kevin responded that the current aproach is to use the Atom publishing model and loosely defined “streams” for Activities. There are formats such as “hcards” for contact info, and then there are google namespaced shemas for some data types. There were some conversations around Widget specification standardization (Netvibes UWA?).

    It appears that Netvibes announced their own plans around OpenSocial (http://www.techcrunch.com/2007/11/06/netvibes-wants-to-tap-into-other-social-networks/)

    Information Overload Panel

    Is there an information overload? Or is it just a feeling peculiar to the alpha-geeks? What is the industry doing to solve it? Paul Kedrosky moderated this panel with participation from Will Morris (VP at AOL), Bradley Allen (Siderean), James Altucher (Stockpickr) and Chris Shipley (Guidewire Group)

    Will thinks Information Overload is definitely a problem, but so is information “underload”.

    Will also believes that serendipitous discovery is quite beneficial too. He then offers myAOL/Mgnet as one of the initiatives that AOL has undertaken in this area (the challenge is in finding the right balance). Will suggests that the fact that the system was able to understand his habits and suggest items such as “British runners in New York Marathon” and “Running Cat” etc. was of good use to him.

    (Disclosure: I work on the myAOL/Magnet team at AOL)

    Other panelists such as Chris were of the view that there is no such thing as information overload - it depends on the individual and the context.

    My take on information overload is that we need to differentiate information into “consumable/interesting data” and “actionable data items” We then have to prioritize and act on actionable items. Interesting data (including the serendipitously discovered articles) can wait for consumption during leisure.

    For e.g. the numerous “friend invites” from various social networking sites are “actionable data” - but of a slightly lower priority compared to “finish my design task on creating a OpenSocial based widget”



    I visited the booth of Siderean (http://www.siderean.com/) and had a chat with Gary Wright, Director of Western Region Sales who showed me a demo of how they have helped Oracle use the siderean custom integration to obtain “structural” information from their internal blogs, calendars, events etc. The results can be navigated by “People, Timeline, Place etc.” e.g. http://pressroom.oracle.com/prNavigator.jsp


    Medium (https://addons.mozilla.org/en-US/firefox/addon/4365) is a very impressive Firefox plugin that reveals the hidden world of people and activity behind your Firefox browser (impresssive in terms of features/concepts - Not the UI).

    Its about “collaborative browsing” - of meeting (online) with people who happen to be browsing the same site as you are - and then discovering what other/related sites these users are on - all in real time. It then provides a way to share/chat with these users.

    This concept is definitely not new - Stumble Upon, Delicious etc. have already found a way to provide items based on how other users similar to you have already discovered based on tags/ratings etc. But Medium’s twist on this is the real-time aspect and the “View” - which is an AJAX based sidebar that lets you watch this curious browsing behavior in real time.

    Start Slide Show with PicLens Lite PicLens

    Posted on November 7th 2007 by stonse

    Filed under Personalization, Social Networking Applications | 1 Comment »

    Who are you linked to?

    It appears that its all about Links: Who you are linked to? - and who links to you. Ever since Sergey and Larry came up with the PageRank paper with backlinks being a major contributor to the rank, the SEO industry is practically teetering over the edge with plenty of theories and ideas on how to “cheat” the system - or as in most cases, come out ahead it for legitimate reasons.

    One such company, www.kango.com, had such a hard time with it, that they came up with a satirical take on what Google might have to do in order to make it into the SEO rankings.

    google design satire

    Although “link madness” is a big menance, and Google’s recent Google Dance with Blog Links Farm notwithstadning, its been widely quoted that PageRank and backlinks are just one of the many (reportedly over 500) cascading factors that are taken into account to generate the SERPs.

    Start Slide Show with PicLens Lite PicLens

    Posted on October 25th 2007 by stonse

    Filed under Personalization | 1 Comment »

    Next »