Hospital Data Mining Hits Paydirt -- intersting article on how hospitals are using data mining to identify revenue opportunities. I was very excited to see this is soemthign I have done for other verticals like trasportation and logistics. Good to see good analytics practice moving to newer verticals. A truly exciting time to be in analytics!
-- Datamining_guy
Monday, November 29, 2010
Wednesday, November 24, 2010
results from another datamining competition
FICO, UCSD Announce Winners of International Predictive Analytics Competition - MarketWatch http://goo.gl/yGIQT
The competition asked participants to predict future purchases for consumers.
The competition was divided into two categories -- one category utilized raw data, and one category utilized transformed data -- and each category had a Graduate and Undergraduate division. The top three finishers in each category and each division shared $10,000 in cash prizes. The winners were:
The competition asked participants to predict future purchases for consumers.
The competition was divided into two categories -- one category utilized raw data, and one category utilized transformed data -- and each category had a Graduate and Undergraduate division. The top three finishers in each category and each division shared $10,000 in cash prizes. The winners were:
Undergraduate Division - Raw Data Category
1st place Shivam Juneja, Institute of Engineering and Technology, Bhaddal
(India)
2nd place Benjamin Hamner, Duke University (USA)
3rd place Rohan Anil, Birla Institute of Technology and Science, Pilani
(India)
Undergraduate Division - Transformed Data Category
1st place Benjamin Hamner, Duke University (USA)
2nd place Harsh Pareek, Chiraag Juvekar and Santosh Ananthakrishnan; Indian
Institute of Technology, Bombay (India)
3rd place Rohan Anil, Birla Institute of Technology and Science, Pilani
(India)
Graduate Division - Raw Data Category
1st place Quan Sun, University of Waikato (New Zealand)
2nd place Alexey Gorodilov, Moscow Institute of Physics and Technology
(Russia)
3rd place Jianfei Wu, North Dakota State University (USA)
Graduate Division - Transformed Data Category
1st place Santi Villalba, University College Dublin (Ireland)
2nd place Jaeyong Lee, Pohang University of Science and Technology (Korea)
3rd place Ilhwan Ko, Pohang University of Science and Technology (Korea)
--datamining_guy
Tuesday, November 23, 2010
Hearst Challenge Update
We are entering the home stretch of the competition. The last time I checked we had 700 teams registered. Also after a long time the leader board saw some movement at the top, with "alegro" from Ukraine taking the top spot. One of the most exciting things for me has been the rich diversity in the participants. Here is a recent breakup by country:
A total of 58 countries! Watch this space for more updates.
--Datamining_guy
Country | Number of Participants |
United States | 309 |
India | 66 |
Information not available | 41 |
Canada | 33 |
Australia | 29 |
United Kingdom | 27 |
Taiwan | 17 |
China | 11 |
Hungary | 9 |
Spain | 8 |
New Zealand | 7 |
Brazil | 6 |
France | 5 |
Germany | 5 |
Netherlands | 5 |
Poland | 5 |
Russian Federation | 5 |
South Africa | 5 |
Denmark | 4 |
Israel | 3 |
Mexico | 3 |
Slovenia | 3 |
Sweden | 3 |
Austria | 2 |
Chile | 2 |
France, Metropolitan | 2 |
Indonesia | 2 |
Iran | 2 |
South Korea | 2 |
Turkey | 2 |
United Arab Emirates | 2 |
Afghanistan | 1 |
American Samoa | 1 |
Argentina | 1 |
Bangladesh | 1 |
Bosnia and Herzegovina | 1 |
Bulgaria | 1 |
Colombia | 1 |
Ecuador | 1 |
Egypt | 1 |
Finland | 1 |
Guatemala | 1 |
Hong Kong | 1 |
Japan | 1 |
Kuwait | 1 |
Luxembourg | 1 |
Malaysia | 1 |
Pakistan | 1 |
Portugal | 1 |
Romania | 1 |
Singapore | 1 |
Sri Lanka | 1 |
Sudan | 1 |
Switzerland | 1 |
Uganda | 1 |
Ukraine | 1 |
Viet Nam | 1 |
Grand Total | 651 |
A total of 58 countries! Watch this space for more updates.
--Datamining_guy
Wednesday, November 17, 2010
Dynamic Pricing -- The Customer Antidote!
Having worked in the transportation industry before getting into consulting, the idea of dynamic pricing has always greatly appealed to me. The space on a flight or truck or train is a perishable commodity, and can be priced based on capacity, customers willingness to pay and market conditions.
Back in 2004 my employer was getting into dynamic pricing, so I spent a fair amount of time understanding PROS and even attended their annual event at Houston. PROS and SABRE between themselves almost served the entire airline industry and supplied them with their dynamic pricing capability.
Timing of the transaction has big role to play in any dynamic pricing scheme, and for a long time the buyers of transportation services ( especially consumers but also businesses) have tried to figure out ways to understand how the dynamic pricing scheme works and trying to outsmart it.
While I have heard of one off cases of success at this, I had never seen a systematic effort to understand this till now. Check this out: A datamining approach to outsmarting dynamic pricing. It is now integrated with BING's travel functionality.
Now I am waiting for the reaction from the dynamic pricing engines and the counter reaction. Let the games begin!
--Datamining_guy
Back in 2004 my employer was getting into dynamic pricing, so I spent a fair amount of time understanding PROS and even attended their annual event at Houston. PROS and SABRE between themselves almost served the entire airline industry and supplied them with their dynamic pricing capability.
Timing of the transaction has big role to play in any dynamic pricing scheme, and for a long time the buyers of transportation services ( especially consumers but also businesses) have tried to figure out ways to understand how the dynamic pricing scheme works and trying to outsmart it.
While I have heard of one off cases of success at this, I had never seen a systematic effort to understand this till now. Check this out: A datamining approach to outsmarting dynamic pricing. It is now integrated with BING's travel functionality.
Now I am waiting for the reaction from the dynamic pricing engines and the counter reaction. Let the games begin!
--Datamining_guy
Friday, November 12, 2010
Top 25 Articles in Economics -- By downloads
A nostalgic list for the Economist in me:
JMP 9 Tree Functionality
I have been a heavy user of CART( SALFORD SYSTEMS) for the past 6-7 years and really love it. Today I had the opportunity to see the Tree functionality that comes with JMP 9, and must say I am very impressed!
While the visuals are not very appealing ( Why? contradicts the otherwise visual rich appeal of JMP), the functionalities are very good. I particularly loved the ability to prune and shape any node the way I want to and really control the overall tree. I have always maintained that this is something which can be a very dangerous functionality and would not recommend it for novices, in the hands of an experienced analyst, the inferences are going to be so much richer! Love it.
Now waiting for JMP to incorporate the multi way splits available with Knowledge Seeker (Angoss) !
-- Datamining_guy
While the visuals are not very appealing ( Why? contradicts the otherwise visual rich appeal of JMP), the functionalities are very good. I particularly loved the ability to prune and shape any node the way I want to and really control the overall tree. I have always maintained that this is something which can be a very dangerous functionality and would not recommend it for novices, in the hands of an experienced analyst, the inferences are going to be so much richer! Love it.
Now waiting for JMP to incorporate the multi way splits available with Knowledge Seeker (Angoss) !
-- Datamining_guy
Thursday, November 11, 2010
Hearst Challenge Update
Here is the latest on Hearst Challenge:
600+ teams now registered
we have moved the date when the final evaluation dataset will be available to December 1.
There was some controversy last night, when the current leader decided to hang up his boots and posted his code for the world to see. On one hand it is correct that sharing the knowledge leads to the development of a superior model, but if you take the thought to the other extreme -- it would not be much of a competition if every team was required to share its code and methodology with all the participants. In balance, given the short duration of the competition, I wish the code was not posted. Anyway, it is water under the bridge and the show goes on --- :)
-- Datamining_guy
600+ teams now registered
we have moved the date when the final evaluation dataset will be available to December 1.
There was some controversy last night, when the current leader decided to hang up his boots and posted his code for the world to see. On one hand it is correct that sharing the knowledge leads to the development of a superior model, but if you take the thought to the other extreme -- it would not be much of a competition if every team was required to share its code and methodology with all the participants. In balance, given the short duration of the competition, I wish the code was not posted. Anyway, it is water under the bridge and the show goes on --- :)
-- Datamining_guy
Tuesday, November 9, 2010
The future of Datamining in Medical Diagnosis?
I have for long wondered about the relevance of datamining in medical diagnosis. Think about what a doctor does: Collects data by examination of the patient, through pathological and radiographic tests and on the basis of these data points, makes an inference or diagnosis about the ailment inflicting the patient. So much in common to the basic tenets of datamining!
I got a first glimpse of this about 4 years back during the 2006 KDD CUP related to Pulmonary Embolism sponsored by Siemens. The focus of the competition was to assist in automated detection of the disease, with separate problems related to false positives and false negatives. The Holy Grail of the competition was an algorithm to predict with 100% certainty if a patient was healthy!
The 2008 KDD CUP competition again was again related to this area but dealt with breast cancer.
It seems , like everything else in analytics, IBM is dabbling in this space as well. Check out this very interesting article in the Atlantic. As the article indicates, a big hurdle here is to get the practitioners to embrace the technology.
-- datamining_guy
I got a first glimpse of this about 4 years back during the 2006 KDD CUP related to Pulmonary Embolism sponsored by Siemens. The focus of the competition was to assist in automated detection of the disease, with separate problems related to false positives and false negatives. The Holy Grail of the competition was an algorithm to predict with 100% certainty if a patient was healthy!
The 2008 KDD CUP competition again was again related to this area but dealt with breast cancer.
It seems , like everything else in analytics, IBM is dabbling in this space as well. Check out this very interesting article in the Atlantic. As the article indicates, a big hurdle here is to get the practitioners to embrace the technology.
-- datamining_guy
Tuesday, November 2, 2010
Hearst Challenge Update
The competition at the Hearst Challenge is heating up and 530+ teams are now registered. A pack of about 5 teams are now slightly separated from the rest of the field and represent, Australia, US, UK and China.
Here are the top 5 as of this afternoon:
In other news, we have started planning for next years competition!
--datamining_guy
Here are the top 5 as of this afternoon:
In other news, we have started planning for next years competition!
--datamining_guy
Monday, November 1, 2010
Top 10 things in IT/Analytics related world for 2011!
Here is a very interesting article from the Gartner's Symposium earlier this month. Look at number 5 and 6 on the list.
Next generation Analytics --- Which is basically real time analytics where every business function is supported by automated analysis and predictions about the future!
Social Analytics --- A broad lumping together of Social media and social network related analytics!
I agree with both and I think a big application would be the union of the two ( see for example, my earlier posting on FourSquare).
One thing not on the list, but which has a potential for being very big is Speech Analytics.
-- Datamining_guy
Next generation Analytics --- Which is basically real time analytics where every business function is supported by automated analysis and predictions about the future!
Social Analytics --- A broad lumping together of Social media and social network related analytics!
I agree with both and I think a big application would be the union of the two ( see for example, my earlier posting on FourSquare).
One thing not on the list, but which has a potential for being very big is Speech Analytics.
-- Datamining_guy
Subscribe to:
Posts (Atom)