Monday, June 8, 2009

One-liners from Reviews

As I have told before the paper that is now an accepted publication at ACL-IJCNLP 2009 went through 9 review cycles. At each one of those I have received some abusive reviews. At some of them I also received wonderfully positive reviews (with a sugar candy attached to it). At all of them I received neutral reviews (Yes, all of them!).

I thought I could share with you some of the most abusive, some of the most sweet and some stupidly neutral reviews (which won't help an acceptance or rejection or betterment of the paper!). What follows are quotes from the reviews and I hope the reviewers themselves won't read this some day. ;-) It doesn't matter to me at all, they were all anonymous so if they flame me that means they are revealing their identity. :D


Abusive

  • I was, unfortunately, unable to identify the particular contribution that this paper makes to this body of literature.
  • First, I was unable to determine exactly what the authors meant by "query focus" in their discussion.
  • The paper studies the role of query-relevance in multi-document summarization, and makes several findings: (1) ..... (2) ........ (3) .......
    • I find (1) and (2) particularly trivial and not very interesting
    • Finding (3) is perhaps less expected, though still makes quite a bit of sense.
  • I believe that this paper should not be accepted for publication because the experiments described do not support in any way the authors’ aim – which is to determine whether or not query focus affects sentence selection in MDS.
  • The feature suggested is trivial and does not add anything to the set of features already in use in extractive summarisation. The authors ignore decades of research in IR aimed at finding non-obvious matches of query-terms.
  • The observations made in Table 2 are not only trivial and non-surprising, but even absurd.
  • This paper is quite confusing. I was not always entirely sure of the point it was trying to make.
  • Modeling this as a likelihood is interesting, but perhaps complicates the issue somewhat (presumably a simple count of proportions, which can also be tested statistically, would do?)
  • This is not particularly surprising. Some discussion (even speculative) of the implications of this -- i.e. what are the humans doing that the systems are not -- would make this poster much more interesting.
  • I think the fundamental notion behind this paper is sound and interesting, but a lot of the analysis is flawed.
  • Theoretical analyses of data sets used in past work are also interesting, especially if they are thoroughly studied and written up well, but this paper tends to be somewhat hard to read and doesn't (at least to this reviewer) yield an "a ha!" moment at the end.
  • The results seem obvious, and the question does not seem to justify the effort involved in quantifying the differences in query bias between human and automated summaries. Some aspects of the data and the method are questionable.
  • The methodology used in the paper is not wrong, but the outcomes are rather obvious and it is difficult how the findings can prove useful to the research community.
  • One of the conclusions is that computers and humans use different strategies to produce summaries. This is well-known and it is not necessary to use statistics for this. The authors try to suggest that the difference between human summarisation and automatic summarisation is the way a query is handled, but no support for this claim is offered.
  • This paper deals with an interesting issue which is the distinction between query-focused and query-biased summarization. However, it fails for several main reasons:
    • It fails to define the problem and to make the distinction between the query-focused and query-biased summarization. The problem itself is ill-defined.
    • the methods used and the formalization are weak.
    • the justification is irrelevant
    • the results are too lousy and there is no funding at all.

Sweet

  • This is interesting work. You need to stay focused in the conclusions and indicate the significance for automatic text summarisation of feature analysis as a predictor of content.
  • This is an interesting paper. Using the appropriate literature, the paper analyzes the role of query focus in summarization performance on the DUC 2007 task. The analysis is insightful and could become a standard reference, if the presentation of the experiments was clearer and their interpretation less confusing.
  • Sect 6.3: this is the ultimate insight of this paper, please elaborate!
  • Clearly-written paper on an important topic. The gap between human and machine produced summaries is very interesting.
  • The work is interesting, supported with good evidence and is well presented. I support its inclusion in the conference as a short paper.
  • The paper is very well written, and provides a useful comparison between human and machine approaches to query focused automatic summarisation.

Neutral

  • I particularly liked the introduction of an equiprobable summarizer, which allows the impact of query-focus to be observed in a system setting, in addition to observations from the existing data sources.
  • This definitely an interesting and worthwhile topic -- and one that has attracted plenty of attention from researchers working in both the MDS and QA communities.
  • While I commend the authors for noting that their results will not be of use to systems that are performing well, the results seem to me to be too weak in general to be of interest.
  • The investigation reported is interesting, but in some sense it is obvious.
  • The conclusion is interesting, however, not surprising.


I have (now) no complaints about any of the reviews. Some of the abusive reviews were not wrong after all, it is correct from their point-of-view, their context. Of course the one I would want to fight against is the last abusive statement above. If the reviewer doesn't understand the problem then it is his problem partly, he can't have high confidence on his reviews, can he ? Anyway, this post is only to show the variation in human perception about what they read. Everybody has had their judgment. Everybody gave their verdict. But what is the truth ? Whether or not this paper is useful to the research community can be decided a few years from now based on whether or not someone cites it outside our group. I am patient enough to wait. :-)


PS: There is a marginally related discussion on natural language processing blog: How to reduce reviewing overhead?

PS2: There is another recent related discussion on the corpora-list where Adam Kilgarriff raised issues over the current process of reviewing and accused it of wastage of time. Others responded rather critically saying "There is no other simpler way to get author feedback and no known simple way to get away from awful reviewing!".

PS3: Another post on the art of reviewing seems worthy now. May be I will come back with it some day.

1 comment:

Mohammed Babji said...

Too much Depth ....:) :)
Never imagined the Chosen Crystal before its worn on the Thrown......is subjected to pass through so many filters and at last selected for the high regard,
.......Showing the justification for acceptance.