Monday, November 29, 2010

Hospital Data Mining Hits Paydirt

Hospital Data Mining Hits Paydirt -- intersting article on how hospitals are using data mining to identify revenue opportunities. I was very excited to see this is soemthign I have done for other verticals like trasportation and logistics. Good to see good analytics practice moving to newer verticals. A truly exciting time to be in analytics!

-- Datamining_guy

Wednesday, November 24, 2010

results from another datamining competition

 FICO, UCSD Announce Winners of International Predictive Analytics Competition - MarketWatch http://goo.gl/yGIQT


The competition  asked participants to predict future purchases for consumers.


The competition was divided into two categories -- one category utilized raw data, and one category utilized transformed data -- and each category had a Graduate and Undergraduate division. The top three finishers in each category and each division shared $10,000 in cash prizes. The winners were:
Undergraduate Division - Raw Data Category
1st place         Shivam Juneja, Institute of Engineering and Technology, Bhaddal
                  (India)
2nd place         Benjamin Hamner, Duke University (USA)
3rd place         Rohan Anil, Birla Institute of Technology and Science, Pilani
                  (India)
Undergraduate Division - Transformed Data Category
1st place         Benjamin Hamner, Duke University (USA)
2nd place         Harsh Pareek, Chiraag Juvekar and Santosh Ananthakrishnan; Indian
                  Institute of Technology, Bombay (India)
3rd place         Rohan Anil, Birla Institute of Technology and Science, Pilani
                  (India)
Graduate Division - Raw Data Category
1st place         Quan Sun, University of Waikato (New Zealand)
2nd place         Alexey Gorodilov, Moscow Institute of Physics and Technology
                  (Russia)
3rd place         Jianfei Wu, North Dakota State University (USA)
Graduate Division - Transformed Data Category
1st place         Santi Villalba, University College Dublin (Ireland)
2nd place         Jaeyong Lee, Pohang University of Science and Technology (Korea)
3rd place         Ilhwan Ko, Pohang University of Science and Technology (Korea)

--datamining_guy

Tuesday, November 23, 2010

Hearst Challenge Update

We are entering the home stretch of the competition.  The last time I checked we had 700 teams registered. Also after a long time the leader board saw some movement at the top, with "alegro" from Ukraine taking the top spot. One of the most exciting things for me has been the rich diversity in the participants. Here is a recent breakup by country:

CountryNumber
of
Participants
United States309
India66
Information not available41
Canada33
Australia29
United Kingdom27
Taiwan17
China11
Hungary9
Spain8
New Zealand7
Brazil6
France5
Germany5
Netherlands5
Poland5
Russian Federation5
South Africa5
Denmark4
Israel3
Mexico3
Slovenia3
Sweden3
Austria2
Chile2
France, Metropolitan2
Indonesia2
Iran2
South Korea2
Turkey2
United Arab Emirates2
Afghanistan1
American Samoa1
Argentina1
Bangladesh1
Bosnia and Herzegovina1
Bulgaria1
Colombia1
Ecuador1
Egypt1
Finland1
Guatemala1
Hong Kong1
Japan1
Kuwait1
Luxembourg1
Malaysia1
Pakistan1
Portugal1
Romania1
Singapore1
Sri Lanka1
Sudan1
Switzerland1
Uganda1
Ukraine1
Viet Nam1
Grand Total651


A total of 58 countries! Watch this space for more updates.

--Datamining_guy

Wednesday, November 17, 2010

Dynamic Pricing -- The Customer Antidote!

Having worked in the transportation industry before getting into consulting,  the idea of dynamic pricing has always greatly appealed to me.  The space on a flight or truck or train is a perishable commodity, and  can be priced based on capacity, customers willingness to pay and market conditions.

Back in 2004 my employer was getting into dynamic pricing, so I spent a fair amount of time understanding PROS and even attended their annual event at Houston. PROS and SABRE  between themselves almost served the entire airline industry and supplied them with their dynamic pricing capability.

Timing  of the transaction has big role to play in any dynamic pricing scheme, and for a long time the buyers of transportation services ( especially consumers but also businesses) have tried to figure out ways to understand how the dynamic pricing scheme works and trying to outsmart it.

While I have heard of  one off cases of success at this, I had never seen a systematic effort to understand this till now. Check this out:  A datamining approach to outsmarting dynamic pricing.  It is now integrated with BING's travel functionality.

Now I am  waiting for the reaction from the dynamic pricing engines and the counter reaction. Let the games begin!

--Datamining_guy

Friday, November 12, 2010

Top 25 Articles in Economics -- By downloads

A nostalgic list for the Economist in me:


RankJournal ArticleFile Downloads
 Total  
1The Market for 'Lemons': Quality Uncertainty and the Market Mechanism20,316  
George A. Akerlof
2The Pricing of Options and Corporate Liabilities18,940
Fischer Black and Myron S. Scholes
3Prospect Theory: An Analysis of Decision under Risk13,839
Daniel Kahneman and Amos Tversky
4Credit Rationing in Markets with Imperfect Information11,435
Joseph Stiglitz and Andrew Weiss
5Increasing Returns and Long-run Growth9,187
Paul Michael Romer
6Co-integration and Error Correction: Representation, Estimation, and Testing9,145  
Robert F. Engle and Clive W. J. Granger
7A Contribution to the Empirics of Economic Growth8,801
N. Gregory Mankiw, David Romer and David Weil
8Theory of Rational Option Pricing8,569
Robert C. Merton
9Common risk factors in the returns on stocks and bonds8,385
Eugene F. Fama and Kenneth French
10Agency Problems and the Theory of the Firm8,053
Eugene F. Fama
11Corruption and Growth7,751  
Paolo Mauro
12The pyramid of corporate social responsibility: Toward the moral management of organizational stakeholders7,228
Archie B. Carroll
12Efficient Capital Markets: A Review of Theory and Empirical Work7,228
Eugene F. Fama
14A Theory of the Term Structure of Interest Rates7,197
John C Cox, Ingersoll, Jonathan E, and Stephen A Ross
15Endogenous Technological Change7,176
Paul Michael Romer
16Finance and Growth: Schumpeter Might Be Right6,968  
Robert King and Ross Levine
17Event Studies in Economics and Finance6,823
A. Craig MacKinlay
18Time to Build and Aggregate Fluctuations6,460
Finn E. Kydland and Edward C. Prescott
19A Model of Balance-of-Payments Crises6,387
Paul Krugman
20Expectations and Exchange Rate Dynamics6,380
Rudiger Dornbusch
21Rules Rather Than Discretion: The Inconsistency of Optimal Plans6,262  
Finn E. Kydland and Edward C. Prescott
22The Cross-Section of Expected Stock Returns6,171
Eugene F. Fama and Kenneth French
23The Costs and Benefits of Ownership: A Theory of Vertical and Lateral Integration6,148
Sanford Jay Grossman and Oliver D. Hart
24Production, Information Costs, and Economic Organization6,115
Armen A Alchian and Harold Demsetz
25Economic Growth in a Cross Section of Countries6,012
Robert J. Barro

JMP 9 Tree Functionality

I have been a heavy user of  CART( SALFORD SYSTEMS) for the past 6-7 years and really love it.   Today I had the opportunity to see the Tree functionality that comes with JMP 9, and must say I am very impressed!

While the visuals are  not very appealing ( Why?  contradicts the  otherwise visual rich appeal of JMP), the functionalities are very good.  I  particularly loved the ability to prune and shape  any node the way I want to and really control the overall tree.  I have always maintained that this is something which can be a very dangerous functionality and would not recommend it for  novices, in the hands of an experienced analyst, the inferences are going to be so much richer! Love it.

Now waiting for  JMP to incorporate the multi way splits  available with Knowledge Seeker (Angoss) !



-- Datamining_guy

Thursday, November 11, 2010

Hearst Challenge Update

Here is the latest on Hearst Challenge:

600+ teams now registered

we have moved the date when the final evaluation dataset will be available to December 1.

There was some controversy last night, when the current leader decided to hang up his boots and  posted his code for the world to see.  On one hand it is correct that sharing the knowledge leads to  the development of a superior model, but  if you take the thought to the  other extreme -- it would not be much of a competition if every team was required to share its  code and methodology  with all the participants.  In balance, given the short duration of the competition, I wish the code was not posted.  Anyway, it is  water under the bridge and the show goes on --- :)

-- Datamining_guy

Tuesday, November 9, 2010

The future of Datamining in Medical Diagnosis?

I have for long wondered about the  relevance of datamining in medical diagnosis. Think about what a doctor does: Collects data  by examination of the patient, through pathological and radiographic tests and on the basis of these data points, makes an inference or diagnosis about the  ailment inflicting the patient.  So much  in common to the basic  tenets of datamining!

I got a first glimpse of this about 4 years back during the 2006 KDD CUP related to Pulmonary Embolism sponsored  by Siemens.  The focus of the competition  was to assist  in automated detection of the disease, with  separate problems related to  false positives and false negatives. The Holy Grail of the competition was  an algorithm  to predict with 100% certainty if a patient was  healthy!

The 2008 KDD CUP competition again was again related to this area but dealt with breast cancer.


It seems , like everything else  in analytics, IBM is dabbling in this space as well. Check out this very interesting  article in the Atlantic.  As the article indicates, a big hurdle here is to get the practitioners to embrace the technology.

-- datamining_guy

Tuesday, November 2, 2010

Hearst Challenge Update

The competition at the Hearst Challenge is heating up and  530+ teams are now registered.  A pack of about 5 teams  are now slightly separated from the rest of the field and  represent, Australia, US, UK and China.

Here are the top  5 as of this afternoon:

In other news, we have started  planning for next years competition!

--datamining_guy

Monday, November 1, 2010

Top 10 things in IT/Analytics related world for 2011!

Here is a very interesting article  from the Gartner's Symposium earlier this month. Look at number 5 and 6  on the list.

Next generation Analytics --- Which is basically real time analytics where every business function is supported by automated analysis and predictions about the future!

Social Analytics --- A broad lumping together of Social media and social network related analytics!

I agree with both and I think a big application would be the  union of the two ( see for example, my earlier posting on FourSquare).

One thing not on the list,  but which has a potential for being very big is Speech Analytics.

-- Datamining_guy