Byte Rot: October 2012

Sunday 28 October 2012

How limitation can be source of goodness - NoSQL, REST and more

[Level C3]

It just dawned on me. I came to a startling realisation that limitation/restriction/constraint - which are words with negative connotation - can/will generate creativity and lead to goodness. This is more or less is saying "less is more". But looking at it from another angle.

Story of twitter

I do not know about you but I think Twitter is one of the biggest inventions of recent decades, somewhere along the lines of Gutenberg's printing or Priestly's Soda. Regardless of what most think about the revolution of social networking, I believe Twitter is not of the same breed - it is not a compressed facebook.

Twitter is centred around a stupidly simple idea: you have 140 characters to express yourself. No more. Yes, it also has re-tweet, follow, favourite, etc but these are features, if you remove them twitter will be more or less twitter although its usefulness will be limited. But if you remove the 140 character limitation suddenly it is not twitter anymore. Twitlonger is not twitter. With all due respect to @daltonc app.net with a limitation much different from 140 characters will not make a new twitter.

But why? Because by limiting you to use only 140 characters, you are forced to express yourself more succinctly. Makes you think really hard about what you want to say and remove all that matters less. Limitation leads you towards the ethos of twitter. You cannot write a line in the terms and condition of twitter asking users to write only intelligent tweets but twitter has achieved this by enforcing its 140 characters limitation.

Why NoSQL matters

They say behind every successful man, there is a powerful woman. And I would say behind every inflexible and crippling architecture in the enterprise there is a big legacy database. Database becomes legacy soon after the first release, since other layers change but database cannot keep up. What I mean by database is an RDBMS database (SQL Server, Oracle, you name it.

We have cut down our processes, we do agile, we do lean, we have got continuous integration, we do unit testing and TDD, we do BDD and continuous deployment, we do ... we have built a process to minimise the risk and impact of change. Yet it is so difficult to change and the hardest to change is database.

The crux of this issue goes to the fact that business logic creeps into database. I recently witnessed how a complex calculation had to be done in an inaccurate rounding since it had to match the database calculation logic - yes there was calculation in database.

We all believe that we should not put the business logic in database. But why do we keep doing it? Because we can. The problem is best practices and disciplines are difficult to enforce when project is late, we have a critical bug to fix or we need to make a quick change to the system and easiest solution is to change the stored procedure.

As far as SQL Server is concerned, we can have intra-model logic (calculated fields), domain-wide logic (stored procedure and user-defined functions) and cross-boundary logic (service broker) - and you can even deploy compiled code (SQL CLR). And since we can, we will.

And here is why I think NoSQL is useful - apart from all the hype around it. You simply cannot put business logic in it, so you don't. And this will lead into better design. So I like NoSQL not because of what I can do but because of what I cannot do. I would agree with Stonebraker's article and believe that NoSQL technologies could be low-tech compared to hi-tech RDBMS, but I cannot abuse them in the same way. NoSQL fully focuses on storage and retrieval and not replicating all those features that should not be implemented in a database (XML Manipulation, Message Bus, logic etc)

A table in SQL can translate to 5-6 Redis data structures so that you could effectively query and access data. So there is more work to be done but I like it since I would really think about what I need to store and what I need to query back (remember twitter?).

Lessons from REST

Regardless of all the bloated hype and endless controversies in interpreting REST, it works. It just simply works. So we have a set of constraints and if you follow them it will lead to goodness. Example? HTTP.

REST is an absolute example of how constraints can lead to a better design.

Limitation in creative arts

I am big fan of minimalism and Philip Glass is one of my favourite contemporary composers. In minimalism, a compact musical idea is repeated to create rhythm, melody and ultimately harmony - and this is usually created using layers of repetitive chords. For me who love minimal music, Satyagraha opera is the pinnacle of minimal musical expression through a limiting set of musical material.

Apart from musical material, limitation in number of instruments is also an important aspect. String quartet is one of the most expressive forms of music - and I just love it be it Beethoven or Shostakovich. Rock counterpart for the string quartet is probably the rock trio (singer as an instrument?) where some of the best music ever produced (from Jimi Hendrix and Cream to Rage Against The Machine and Nirvana).

In social and political terms, we have experienced an explosion of modern and beautiful art in the eastern bloc during the oppression of communist governments. Composers and film directors had to find their own language to express their art. Since they could no longer look outside for inspiration, they turned inside and a new era of creativity and great art flourished: Andrei Tarkovsky, Istvan Szabo, Andrzej Wajda, Shepitko and many more. Shostakovich arguably produced his best works during the fierce Stalinist oppression. It is interesting that when the restrictions are lifted, artist is no longer able to produce the same quality of works. Wajda's Iron Man seemed like just a shabby copy of Marble Man. And Tarkovsky's two last films made outside Russia did not feel like the previous ones.

The same oppression created the new wave of Iran Cinema with the likes of Makhmalbaf, Mehrjui, Kiarostami and others.

Now should we create an oppressive government so that we get a great artistic output?! No, but perhaps we can have a 60's style drug revolution which does the same :)

Monday 22 October 2012

Media type: how much can you cram into a single token?

[Level C4]

Introduction

This post discusses the problems associated with the use of a single token as media type (usually as the main value of the Content-Type header in HTTP response or Accept header in request) to describe all attributes of the content.

Motivation and background

This has been bugging me for a while. But recently I engaged in a discussion on twitter with Glenn Block @gblock and the rest of the REST enthusiast community on the options in versioning RESTful services. There are generally 2 camps: those advocating using Content Negotiation for versioning (putting version number in Content-Type header) and those preferring to stick to classic resource based versioning (including version number in the URL). Regardless of which one is better, MediaType lacks the richness required to express a media type and adding version information to a media type is not possible considering current status of the media type.

One of the main problems associated with the use of media type is its current implementation in various systems is key based, i.e. it involves matching all or none of the media type. As we will see this causes considerable problems in effective consumption of media types.

Media Type

Media type has been described in various RFCs (main one being RFC 2046) while historically these have been limited what is known as MIME types. RFC 4288 defines the procedure for registering the media types describing a formal process which needs to be followed to publicly register.

Registering a media type for a public API is all well and good but as described by this book, use of private APIs far exceeds use of public ones and registering all media types exposed within private APIs is impractical and unwarranted.

Also with popularity of REST-based APIs, there are going to be more and more service endpoints exposed. If all such services are to define new media types, we would have an explosion of media types rendering current implementation of content negotiation

Media type is a case of an extreme semantic mix-up. A single token has been used to express many different facets of a media type. In fact the semantic space with all its axes will contain many useful points yet industry currently uses a very sparse set of points defined as media type values. Rest of this space is unusable - as such a very inefficient solution.

We will now have a look at facets/axes.

1- Human-illegibility

This is the lowest and least specific level of semantic definition of a media type. It is very simple: content of a media type can be read by a human (for example text/plain, application/xml or application/json) or the data is meant for the machine comprehension or rendering (for example image/png or video/mpeg)

Having this information separate to the actual media type can help tools such as Fiddler to decide whether they can display text of the content whose media type is unknown to the tool. Media types initially used "text" to denote such information (e.g. text/xml or text/javascript) but these have been replaced with

2- Formatting

This is the most common and important axis of a media type information which informs the tools/clients which parser/interpreter/renderer to use for consuming such content. text/plain, application/xml, application/json, image/png or video/mpeg are all examples of such use of the media type.

There are several known vendor-specific media types in this space such as application/vnd.ms-excel.

3- Schema

This is a further specialisation of the formatting. Common examples include application/rss+xml or application/hal+json. Basically these mean that in terms of formatting, they are the same as their parent (application/xml or application/json) yet they follow a superset schema. Use of + sign - as far as I know - is not canonical and is merely a convention followed by the industry to add schema to the established formats. Comprehension of this convention would be crucial to correct interpretation of the media type without the need for having a dictionary of all possible values, however, I believe most tools we have at the moment lack such features.

4- Domain/Vendor specific

This is where we see most of the expansion in the media type space. Basically you could output your own media type via your private API. Since you will be the main consumer of the API, integration could be easy but it is very common for private APIs to go public - especially if they are successful. An example of such media types can be found here.

5- Versioning

Versioning is the highest aspect of a media type which is normally added to Domain-specific media types. This is a popular solution to the Web API versioning problem.

For example, you could have application/mydomain.customer.1.1 as opposed to application/mydomain.customer or application/mydomain.customer.1.0

So where is the problem?

Basically information gets lost.

First problem is that clients might be interested in a lower order of these aspects of media type while in order to consume the resource, they are forced to comprehend higher order and extract the axes they are interested in. For example, a tool such as fiddler could be only interested in only whether it could display the information for the end user as plain text. A client capable of consuming XML and deserialising to objects is only interested at knowing whether it is XML while it might be represented with a media type which is essentially XML but has a different value. On the other hand, if a server uses HAL to send domain objects/view models to the client, either it has to use the standard application/hal+json or use the domain level name of the media type (with or without a version).

Another problem is that the content negotiation process will become more complex. In the lack of a standard in defining multi-axial media types, most systems implement a dictionary based rule on content negotiation as such maintaining list of possible content types becomes a burdensome task.

A solution

Basically I believe we can solve this by keeping the common media types but use media type extensions in the Content-Type header (or in the Accept header). For example:

Content-Type: application/xml; human-illegible=true; domain-name=customer; domain-version=1.1

This will ensure that existing clients and servers will not break while new clients and servers can use new extensions for content negotiation and more loosely coupled resource consumption. I will try to expand upon this idea in another post.

Conclusion

Cramming as much as information into a single token and then try parsing that one token is not a good idea especially when it comes to media type which is the communication bridge between loosely coupled world of HTTP clients and servers.

Media type token value covers 5 different aspects of the resource and separating the concerns of breaking these aspects into their own tokens can result in more robust and decoupled systems.

Saturday 20 October 2012

How PayPal is helping Iranian government's internet censorship

[Level N]

I have difficulty believing what I read in Hacker News around the same time my account was blocked and closed by PayPal trying to pay 3$ for internet proxy to combat filtering which has rendered internet pretty much useless in Iran. Did he read my email? I do not know but it left me frustrated and hopeless in one of the most difficult times in my life.

As some of you might know, I have been going through rough times for the last 18 months. My mother in law passed away 3 weeks ago after a long battle during which my wife spent mostly with her - which is a consolation.

So when I visited her and family in Iran (that's where I am from) around 2 months ago, I realised that internet censorship is so bad that it has become really unusable. Sites such as twitter which I am addicted to are obviously blocked as they played an important role in Iran's suppressed Green Revolution, arguably first Twitter Revolution in world . In fact even this blog that you are reading and anything hosted on Google's blogspot is filtered - Iran had highest number of bloggers in the Middle East and a number of them are in prison. But even Gmail gets its share and gets blocked from time to time. According to some reports, 40% of Iranians (30 million) use internet which is second in Middle East after Israel, and as far as I know, most of them use either internet proxies, anti-filters, anonymisers and VPNs to bypass the censorship. So even if we say only half use anti-censorship tools, we are talking about 15 million people. I had actually setup a dedicated PC in UK for my wife to use as remote desktop connection (RDP) since sometimes these proxies are found by the government and blocked. But slow speed of internet (intentionally kept low) makes it almost useless since each refresh of the screen takes a few seconds.

So how does PayPal come into this? Most of these companies only accept PayPal. And PayPal blocks all accounts if it realises the IP is from Iran. Regardless of the amount, who is the receiver, how long the account has been used or behaviour of the account. Why? That is a a very good question, but maybe because someone is trying to purchase nuclear equipments or funding terrorism or ...! Honestly is that not silly? Paying 3$ for anti-censorship filter is illegal because you are connected from Iran? If anyone wants to use their PayPal account for illegal activities, they sure will use a proxy first so that they mask their IP. This will only affect ordinary people like me that are trying to pay for proxies as surely with the sanctions, you cannot buy anything that ships to Iran.

Now out of everyone, I am among the people least would wish to help Iran's government. My family was struck by this very government when my uncle working as Political Analyst for British Embassy was arrested by the authorities charged with spying back in 2009. He was then released after months but due to constant pressure and persecution from the authorities he had to flee from Iran and now has resumed his work in the Foreign Office in UK.

So where does this leave me?

Well it leaves my account blocked and closed. Having come back, I cannot use my account anymore. Emails I have sent have been responded with utter disinterent and "We don't care" attitude. And I have really hard time believing whether PayPal CEO does read complaint emails. I think I might have been rash with my tone in some emails but my frustration was extreme because of the unjustice.

But I am only one in many. Lives of many millions of Iranians have been affected by sanctions. Ordinary people suffer from the hands of the brutal government yet they find no consolation by the way they are treated outside Iran especially PayPal. For them, internet is the only way out of the oppression but blocking purchase of anti-censorship accounts is standing side-by-side with the Iranian regime. Does this make you happier Mr. David Marcus?