Software Nuggets

Friday, March 09, 2012

On Clairty in Papers

Bianca's advisor on draft version 0: "It's not clear. Make it more clear."
On v1: "It's quite vague..."
On v2: "Things are ambiguous..."
On v3: "It's not quite clear..."
On v4: "Things are not well-defined..."
...
On v23: "Things are still fuzzy..."

Bianca: "Did you ever consider getting your eyes checked?"

Thursday, February 23, 2012

Normal Science

Earlier, I had written on the how I thought it wrong to force academic researchers in Computer Science to promise technology transfer and collaborations with the industry. The crux of my argument was that industry and academia have different goals. I want to elaborate on that some more, especially from an academic perspective.

I think academic researchers are too hard on themselves when they judge the merit of their contributions in terms of adoption by industrial practitioners or whoever the perceived end-users are thought to be or by the amount of money they have saved someone. Those are arguably the standards that an industrial software firm would apply to judge the success of its product. Should academic researchers apply the same standard?

I think not.

As Kuhn famously pointed out, the researcher works in a community. The community judges the merit of a researcher's claims based on the paradigm the community follows. The paradigm is everything. The open problems comes from the paradigm. The solutions ought to make sense in the paradigm. A solution that makes sense may open up new problems, which then becomes part of the paradigm. That is what Kuhn calls normal science (as opposed to a paradigm shift, which is not the point of this post). A contribution then amounts to solving a problem within the paradigm.

Clearly, the problems of a paradigm cannot be generic things such as "to make contribution to social welfare" or "to have impact on industry practice" or "to foster end-user adoption". They have nothing to do with a particular paradigm. The problems of a paradigm are always technical in nature. Then why are researchers tempted to judge their contributions by standards that don't apply to them?

You will be surprised how many times a presenter is asked by another researcher from the audience, "...but how much money did you save anyone by applying your techniques?" (I had this asked of me once many year ago and even though I was relatively green, the question felt wrong to me even then), and how many times a researcher touts what he saved the world. Isn't this anti-intellectualism?

I suspect one reason is the excessive focus on churning out publications. Many researchers have become adept at churning them out. I think in many areas of Computer Science, this has the adverse effect of blurring the paradigm. Few people really know what others are doing, except superficially. (I am pretty sure there are some who don't even know what is in the paper that bears their name.) So in this sense, some researchers have unknowingly stopped judging contributions by the criteria of the relevant paradigm. And when that happens, they start resorting to the other non-paradigm criteria, by someone's else criteria.

I do not begrudge a researcher his success in commercializing his research or taking it to the masses. In fact, it is to be applauded, for it points to an additional skill-set that he has. But let us not make that the criteria of judging someone's research contributions.

I also have nothing against those who collaborate actively with industrial partners. To each his own. But that is no reason to apply non-paradigm criteria to those who don't.

At the beginning of his book "The Tacit Dimension" Polanyi mentions how he was struck by how communism required that scientists turn to solving the problem of the current five-year plan. It makes me wonder if he would have found questionable the constant harping about researchers delivering what the industry wants.

Tuesday, February 21, 2012

Argumentation in Research

There are many ways of supporting a thesis. Formal proofs and empirical evidence gathered from systematic experimentation are two of the most common in computer science. A third one that I find equally attractive is by presenting sound arguments for the thesis. However, in computer science, especially software engineering, it is a technique that is seldom-used and little-respected.

Advancing a thesis by argumentation is an analytical technique. Doing it well is not easy.

One must setup criteria, that is, the values, by which to judge the thesis. Further, the criteria better have broad appeal or else you will have to argue forcefully that they ought to be broadly accepted. If you fail to make a case for the criteria, your evaluation of the thesis against them would have little appeal to others.

One must have a clear grasp of the area that the thesis concerns, which includes being aware of the minute details of other people's approaches in the area: knowing exactly what they mean by the terms they use, knowing the details of their approach, and how what they are doing relates to your thesis. If you want to show the novelty of your thesis, you must show that others fail some of the criteria. Further, no one must be able to reasonably argue that they meet your criteria. Anticipating and addressing the arguments that others will put forward to reject your thesis is where a significant portion of your energy will be spent. You have to close potential all loopholes, at least in your mind if not in the paper itself.

It is a big mistake to put a well-argued paper in the same category as vision papers, or position papers, or "nice to read, but lacks evaluation". A vision paper would chart out the world an author would like to see, perhaps listing specific challenges and promising approaches. A position paper just says what the author thinks. Sometimes the author may take a contrarian stance in a position paper; however, no deep analysis of the strength of his position and the weakness of others need be involved. And it is patently wrong to say that paper that relies solely on arguments to make its points lacks evaluation: doesn't it evaluate the thesis against the criteria (and the criteria themselves).

A vision paper: Berners-Lee et al.: Scientific publishing on the semantic web

A position paper: Chopra et al.: Research Directions in Multiagent Systems

A well-argued technical paper: Singh.: Agent Communication Languages: Rethinking the Principles

Sometimes a good argument is all it needs to show limitations in a whole class of approaches that have been adequately formalized and "proved", implemented, and tested empirically. If instances of the latter were found worthy of publication, isn't the argument that shows their limitations at least as worthy?

Argumentation is a first-class scientific technique of validating claims. Let us not judge a good argument by the criteria of others techniques. Let us not outright consider a well-argued paper as unscientific or unsubstantiated. Let us give them a fair evaluation.

There is indescribable beauty in a well-argued paper. Dijkstra said that a formula is worth a thousand pictures. An argument is worth a thousand formulas.

Sunday, February 06, 2011

Professor Jay Misra

Read his speeches.

I had the opportunity to meet Professor Misra when he came to give a talk at NC State in 2008. What I took from his talk was the beauty of minimality: how a computer scientist must strive to find the minimal set of operators or specification constructs and then build everything systematically from those.

Earlier that day, Munindar (my PhD advisor) had taken me to Professor Misra's hotel so we could talk over breakfast. Three generations ate together -- Munindar was once Professor Misra's student at UT Austin. And that day, I observed that Munindar was still his student.

Read Professor Misra's speeches, and you will know why he deserves all our admiration.

Unfortunately for me, I didn't impress him much.

Wednesday, November 17, 2010

Jeff Naughton on some challenges for the community

The presentation may be found here:
http://pages.cs.wisc.edu/~naughton/naughtonicde.pptx

Some of his key points

- Low acceptance rates is a major cause for the current situation of publish or perish. The solution is to accept more papers, perhaps all papers.

Although I never felt comfortable boasting about the acceptance rate of the venues where my papers have appeared, I never gave it much thought either. Conferences should stop boasting about their selectiveness.

- Three random reviewers simply does not cut it. Reviews should be more discussion-oriented.

- To mitigate the problem of bad reviews, publish the reviews. Let the authors be anonymous, but reveal the reviewers' identity. Let's put our money where our mouths are.

I have seen many many bad reviews. We should not let reviewers get away with bad reviews.

- Let's take advantage of technology to come up with new publishing and dissemination models! Check out the LiquidPub project (http://liquidpub.org/)

We need to think hard about these things. We need courage too.

Friday, April 09, 2010

RIP Future Work

Future work from many years ago remains unscathed. I prefer to not write these sections anymore -- they make me uneasy, especially the work part of it. However, I cannot jettison the section altogether. A situation where the a reviewer says 'It's a solid, conceptually well-founded paper that reconciles some long-standing issues in our field. I feel however that lacking a future work section the next steps are unclear, which will likely plunge our field into chaos. Hence the reject.' is not entirely unimaginable. So I started using Future directions instead. It is not only noncommittal, it is also in the spirit of spreading enlightenment -- I may follow them, but if you were any good, you'd follow them too.

The possibilities in a finality

In one of the papers I coauthored: "Finally, Section 6 concludes this paper". May be the writer thought about the possibility that finally one may not conclude a paper. Or perhaps, finally, one could conclude some other paper. Or perhaps, there is a Section 7, but Section 6 concludes this paper. Or some combination of the above -- Section 6 concludes this paper, but Section 7 some other. Or perhaps the author knows how very tedious the paper is, so he wrote the sentence to reassure the reader that there is an end to the tyranny.

Tuesday, November 10, 2009

Agents exist! They are everywhere!

I was cribbing to a colleague of mine that I had to give an example of a specific kind of multiagent system, a kind that I think didn't exist. The colleague suggested that it would be impossible to do that because there is not a single multiagent system in the world, let alone a specific kind.

He couldn't be more wrong. Multiagent systems are everywhere; it is just that we build these systems without using agent-oriented concepts, so we do not see them as such. The application eBay is a multiagent system: it involves eBay (the organization), bidders, sellers, payment processors, and so on. The application Orbitz is a multiagent system: it involve customers, airlines, Orbitz (the organization), credit card companies, banks, and so on. Your home security system is a multiagent system if it includes active monitoring by a security agency. These are multiagent systems because they involve interactions between autonomous parties, in other words agents.

You and your microwave do not constitute a multiagent system: the microwave is under your control. However, it would be a multiagent system if your microwave could say "No, I don't want to defrost the chicken now". Whereas you and the security agency that installed the sensors constitute a multiagent system, neither two or more sensors by themselves nor the sensors in conjunction with you constitute one. There is no sense in which such a sensor is autonomous in relation to you nor in relation with other sensors.

Every agent is a component, but not every component is an agent. Every multiagent system is necessarily distributed, but not every distributed system is a multiagent system.

The only test of agenthood is autonomy. Something is not an agent because it has a sense, reason, act (SRA) loop. Autonomy can only be understood in relation with other agents; the SRA loop is about the internal construction of the agent. Whether a bidder on eBay is a human or "intelligent" software using an SRA loop or "stupid" software -- for example, one that blindly raises its bid every so often -- does not change the fact that they are all agents.

An agent is not an agent because it is written in JADE or WADE (agent-oriented programming languages); it is not one because it was modeled and designed using agent-oriented abstractions. Anything that is autonomous is an agent; anything that is not is not an agent, and labeling such a thing as an agent is to abuse the term. In fact, the term "autonomous agent" is redundant in itself.

As multiagent system researchers, our goal is to enable their programming in a way that accommodates autonomy.

Thursday, September 24, 2009

Beyond loose coupling: Completely decoupled agents

In the beginning, a software system was thought of as a process. The process invoked methods to carry out its intended tasks. Both object-oriented and procedural programs are essentially of this nature. The various system components were said to be tightly coupled or integrated.

This view of a software system evolved into one that involved multiple communicating processes, typically by messaging passing. This view reflected a looser coupling between processes and emphasized interoperation between processes via protocols. Traditionally, a protocol specifies the ordering and synchronization constraints upon the sending and receiving of messages by each process.

It is time for another sea change in software systems modeling. Instead of talking in terms of processes, we need to talk in terms of autonomous agents engaging each other. This is only essential given that many of today's Internet applications are services provided by autonomous organizations, and typically involve multiple such organizations. An agent has a legal identity like you and me; properties such as trust, reputation, and responsibility can be attributed to agents. By contrast, a process has no legal identity, and it makes no sense to talk about it in terms of the above properties.

The word 'autonomous' has been much used and abused, and tends to invoke skeptical looks from many and even derision from some. Others conflate it with autonomic. I mean autonomous in the specific sense that no agent has control over another. Autonomy for me is strictly an interagent relation, not between agents and their developers, and it most certainly has nothing to do with intelligent behavior.

Traditional protocols are all about control; therefore, they are not suitable to realize engagements. Engaging means interacting meaningfully. There is nothing meaningful about traditional protocols. The difference between TCP and an e-business protocol such RosettaNet is only in their functionality; not in the nature of their specifications.

To be able to engage, agents have to be coupled only to extent required to interact in a semantically correct manner and no more. Semantic means in general -- for all applications, whether it be banking, supply chain, selling books or anything else. Semantic means at the level of the application.

Commitments yield exactly such a notion of semantic correctness: an agent is interacting correctly as long as it satisfies its commitments. What do we care then whether goods follow payment or payment goods! It's all the same as long as the involved commitments are satisfied!

With commitment-based protocols, agents would be only be nominally coupled, for they would be free to act as they please. An agent could send any message anytime. It would even have the choice of violating its commitments (typically though, there would be penalties for doing that). I prefer to call such agents completely decoupled agents, for they are no more coupled than is absolutely necessary.

Wednesday, July 22, 2009

A Memo on Academic Research

These are tough times for academic research in Computer Science. Multi-institution research, technology transfer and exploitation in "real-world" -- the term has come to mean nothing to me and its usage makes me cringe even -- scenarios is more commonly a prerequisite to funding. It is simply not enough to publish in peer-reviewed venues. An intellectual proposition is considered to be of little merit unless accompanied by deliverables -- I don't know the history of this term, but I suspects its origin in industry -- in the form of software, hardware, documentation, and demos.

What are my primary responsibilities to society as a researcher in Computer Science? To apply my mind to the challenges people face and to strive for semantic solutions, not ad hoc ones; to disseminate the results of my efforts; and to mentor students.

These responsibilities conflict with most of the things that are currently passing in academia. For example, applying my intellect pretty much rules out multi-institution research because of the bureaucratic commitments such an activity entails and the reduction of any intellectual proposition to the lowest common denominator. Coordinating two institutions fruitfully is hard enough; coordinating a consortium of 15, each of which brings multiple research groups, is an exercise in futility. Coordination should not be confused with dissemination. Disseminating is like spreading spores -- the spores might take root in fertile ground and germinate into something wonderful. By contrast, coordination is like a marriage of persons who speak completely different languages. Most cultures hold the marriage of minds as an ideal; why in academia then have we created conditions that encourage institutions to enter into a marriage of convenience?

Some of the things I mentioned above, such as cooperation and technology transfer, are not bad in of themselves. What is bad is making them integral to research proposals. Research is risky: one might not achieve the results one had hoped to, and it typically takes many years for a line of research to mature. Such being the case, how can every researcher promise a transfer of technology to industry? I would view with suspicion anyone who promises that: either they are being dishonest or there is little research in their proposal. It would be a thoroughly good idea to have separate funding for transfer of technology; that's precisely the role of incubators. (Some universities indeed have incubators set aside for this purpose.)

Clearly, we are accountable towards those who fund our research, whether it be the taxpayer or a private entity. But how may we be judged? By peer review -- by more or less the same criteria that are used to judge a PhD dissertation. The number of quality publications, citations, technology transfer, the numbers of PhD students graduated, software produced, demos, could all be factors in the evaluation. However, the biggest part of judging will continue to be "did the research lead to significant new results and insights"? The judges could include some of the people who approved the proposal, with the others replaced by independent experts in the field -- it's like being called to do jury duty.

My intention is not to paint the software industry in a malign light. However, we must accept that industry and academia simply have different motivations, different objectives and timescales for achieving those, and different standards of judging achievement. It is best to leave them both to their own devices.

Friday, July 17, 2009

Services and Components, Architecture and Design

I heard a couple of things recently. One, that design subsumes architecture: when designing a system, you design the components and the interconnections between them (the interconnections representing the architecture). Two, that services are merely components, thus implying that SOA is old wine in a new bottle.

A service is a component in the broad sense that it may be independently designed and packaged, and its computational usage is via a published interface. Thus, just as for components, it makes sense to talk about service composition, substitution, interoperability, and so on. However, the engineering of services is a world apart from that of components; engineering services requires fundamentally different abstractions. The differences arise from their pragmatic aspects. Components have no stakeholders; services do. Components have no autonomy; services (via their stakeholders) do. Components do not interact (rather they invoke each other); services interact. There is no question of a component being compliant with respect to other components; the question is fundamental for services given their autonomy. It makes no sense to sanction a component; the risk of sanctions helps keep services compliant.

Moreover, in today's world of services, more so than ever, it pays immensely to treat architecture as an independent artifact of engineering. A service is not a system in the traditional sense; it is simply a participant in an open system (such as the Web, or to be more specific, Amazon marketplace). A service's architecture is a description of the service's interactions with other services, all of which may serve the interests of independent stakeholders. At a high-level, architecture entails the commitments that a service could be party to, the contextual regulations that are binding upon it (such as HIPAA or Amazon marketplace policies), the monitoring of its compliance, and the sanctions that it may face in case of noncompliance. Where only one stakeholder is concerned with a system in its entirety, such a normative view of affairs is of little value. But when multiple stakeholders are concerned, as is the case in any services application, each stakeholder would want to make sure that the architecture accurately reflects a normative stance that is compatible with his requirements.

Saturday, December 01, 2007

Using Blogs for Web Site Design

It seems to me that lists are adequate to represent most of the content in our web sites. And blogs serve adequately as lists. In addition, content can be exported in standard formats. Blogs are easy to manipulate. And, additionally they make sense by themselves. With processors like Yahoo! Pipes out there now, content from blogs can be processed in useful ways.

My content content of lists of
--my bio (a singleton)
--my various degrees
--my publications
--work experience
--books I've read
--photos
--book's I've read
--movies I've seen
an so on.

I'm going to create a blog corresponding to each list I wish to maintain and use Yahoo! Pipes like software to create a coherent web site out of them. Why do I need to maintain a resume? I'll just have a processor merge feeds I need to create a resume, export the result into a format from which I can generate PDF and send it around.

There are various other ways in which other can use my content to create mashups. Why should people add to their HTML pages anymore! Just blog and mash.

Saturday, February 17, 2007

Disseminating Research

One suggestion:
---------------------
It would be awesome if instead of maintaining an HTML publications page, a researcher maintained ATOM or RSS feeds of her publications. Yes, there could be multiple feeds, e.g., by area of research. This way fellow researchers could subscribe to her feeds for updates. They could mashup her feed with others through a service such as Yahoo! pipes to create more interesting feeds. Plus, with the growing set of tools and services around feeds, it should be easier to maintain feeds than HTML pages.

The other day I tried to create a Yahoo! pipes mashup of the feeds of fellow researchers whose papers I frequently cite. However, since there were no feeds available, I just couldn't do it. Hence, consider this an exhortation upon all researchers to start publishing feeds. They have much to gain from it, and nothing to lose.

One idea:
-------------
Entering bibitems in bibliographies and managing them is so tedious and error-prone, don't you think? Well, here is a solution that alleviates this burden to an extent. An author should make the accurate bibtex of her publications dereferencable by URIs. Then, any other author's local bibliography should logically consist of only id:URI pairs. A bibtex processor should be smart enough to fetch using those URIs (and automatically "populate", if needed, each bibitem in the bibliography for offline use).

A point to note is that the information in a paper's bibtex is a subset of the information in the corresponding entry of the author's publication feed. As long as each entry has its own URI, a bibtex processor can fetch an entry in the feed directly, and process it to extract the relevant elements. Hence, a paper's author has to work no harder to create the bibtex.

Authors often move from institution to institution, and therefore the mappings from URI to URLs could change. How to manage these mappings without imposing any additional burden on authors, I'll need to think about.

Monday, January 23, 2006

Aspect Oriented Programming (AOP)

AOP's value lies in producing uncluttered code through a separation of concerns. Point cuts and related advice are coded separately from the places where they will be applicable, and are weaved into the instruction stream, typically at runtime. One of the first thoughts that I had was about a programmer's understanding of the functionality of a system designed and coded using AOP. How does a programmer understand the functionality and the flow of a program when they are distributed across different files with nothing to relate them? I had raised these questions in a review I once wrote for an AOP related paper.

The answer is that a programmer doesn't need to understand the functionality of the entire program; if concerns are well-separated, then he need only understand the code that implements the functionality he is responsible for, possibly in conjunction with other programmers who are also responsible for that functionality. You will notice that I am, more or less, equating functionality with concern. An advice can be weaved in one concern only by another concern. A programmer doesn't need to care what advice is being weaved by other concerns, that being the whole idea about separation of concerns.

In the Jan/Feb 2006 issue of IEEE Software magazine, there appears an article about AOP in the point-counterpoint section. The counterpoint argues that unlike hierarchical design, AOP obscures the understanding of a software module. I don't believe that is the case for reasons outlined above. I view AOP as sitting on top of hierarchical design with various aspects as peers. (Perhaps AOP itself can use some hierarchical design concepts as it matures). Another argument made against AOP is that it is hard to identify non-trivial aspects. Most examples describe rather trivial aspects such as logging, profiling, and synchronization. I find this argument valid. However, I believe this to be just an initial hump which will be overcome as systems designers get used to thinking about in terms of aspects. To conclude, I agree with its detractors to the extent that AOP has yet to prove its value, but I also agree with its supporters that it will see widespread adoption.

BTW, the paper I reviewed did not achieve a separation of concerns. In such cases, understanding program behavior would be a challenge second to none.