Wednesday, December 29, 2010
Wednesday, December 22, 2010
Teradata to acquire Aprimo for $525 million :: BtoB Magazine#seenit#seenit#seenit#seenit
Teradata finally catches up with IBM's UNICA acquisiton
Teradata to acquire Aprimo for $525 million :: BtoB Magazine#seenit#seenit#seenit#seenit
Teradata to acquire Aprimo for $525 million :: BtoB Magazine#seenit#seenit#seenit#seenit
Tuesday, December 21, 2010
Using a Computer to Fight Medicare Fraud - WSJ.com
A very interesting article in the WSJ on how data mining techniques are being used to fight medicare fraud.
The California example in the article is particularly interesting --- very similar to work done in the credit card space.
Using a Computer to Fight Medicare Fraud - WSJ.com
The California example in the article is particularly interesting --- very similar to work done in the credit card space.
Using a Computer to Fight Medicare Fraud - WSJ.com
Thursday, December 16, 2010
Hearst Challenge Update
The Hearst challenge session at the NCDM2010 conference went off very well yesterday! The three finalist teams, MIRACLE ( Xiaoshi Lu), One Million Monkeys ( Eric Jackson) and A^3 ( Aleksey Fadeev, Aleksey Ashkimin and Arthur Abdullin) did an excellent job with the presentations! All finalists we represented with beautiful crafted crystal trophies from Tiffany.
Congratulations to A^3 for winning the grand prize of $25,000!
Here are some details on the tools and techniques used by the participants:
Looking forward to next years competition!
--Datamining_guy
Congratulations to A^3 for winning the grand prize of $25,000!
Here are some details on the tools and techniques used by the participants:
Looking forward to next years competition!
--Datamining_guy
Foursquare looks to bolster digital marketing capabilities with data mining | RICG
Looks like Foursquare is getting into the Amazon/Netflix type collaborative filtering based recommendations business:
Foursquare looks to bolster digital marketing capabilities with data mining RICG
Foursquare looks to bolster digital marketing capabilities with data mining RICG
Friday, December 10, 2010
Civil Liberties and Datamining?
As the use of data mining to make better(or profitable) policy/business decisions is gaining ground, an undercurrent of concerns related to improper use of data is also developing.
I recently posted an article about this on the Analytics Happenings linkedin group, which deals with the controversy related to using data mining for for medical marketing.
Here is another interesting article from The Constitution Project that echoes some of these concerns.
Basically, there is call the have civil liberties and privacy law concerns baked into policy related data mining
Opposites Agree on Data Mining's Importance and the Need for Controls Security Management
For those working in lending or insurance industries, the call for these restrictions might not be anything new, as some of are already in place there.
However, I hope this undercurrent of concern does not slow down or kill the adoption of analytics in newer areas.
Here is the link the original report:
http://www.constitutionproject.org/pdf/DataMiningPublication.pdf
I recently posted an article about this on the Analytics Happenings linkedin group, which deals with the controversy related to using data mining for for medical marketing.
Here is another interesting article from The Constitution Project that echoes some of these concerns.
Basically, there is call the have civil liberties and privacy law concerns baked into policy related data mining
Opposites Agree on Data Mining's Importance and the Need for Controls Security Management
For those working in lending or insurance industries, the call for these restrictions might not be anything new, as some of are already in place there.
However, I hope this undercurrent of concern does not slow down or kill the adoption of analytics in newer areas.
Here is the link the original report:
http://www.constitutionproject.org/pdf/DataMiningPublication.pdf
Thursday, December 9, 2010
Thursday, December 2, 2010
Can Crowdsourcing be an alternative to traditional Consulting?
During the course of Hearst challenge, several acquaintances have commented on the beauty of the analytics competition business model? Putting up amounts which are significantly less than normal consulting fees, companies can get large number of people to work on a problem that is of interest to them.
While a lot of this is true, I do see some drawbacks of the competition or crowd sourcing approach:
These drawbacks do not mean that this is not a viable way of solving business problems. Just that some more thinking and improvisation might be needed to make it scalable, or else it might just be a niche strategy --but definitely a very enticing one.
--Datamining_guy
While a lot of this is true, I do see some drawbacks of the competition or crowd sourcing approach:
- To the extent organizations invest in analytics to gain a competitive advantage, the crowd sourcing approach has a disadvantage that it is harder to keep a secret in a crowd. You might swear the winner to secrecy, but what about the guy who almost won?
- Data confidentiality is another issue. As a consultant, I have always seen clients being very sensitive to giving others access to their own data. Therefore, they will be very reluctant to post really sensitive or important data in a public forum
- Another problem with the crowd sourcing model is that it is a winner(s) take all system, putting a lot of risk on the participant. for example, 750 teams participated in the Hearst Challenge and put in 6 weeks of effort, but only 1 will get the $25k prize. Therefore, for someone to be willing to put in that kind of effort, they must be either doing it part time in spare time, or just starting out. So majority of full time participants in these competitions are likely to be students or organizations looking to make a name for themselves. This might have scalability.
These drawbacks do not mean that this is not a viable way of solving business problems. Just that some more thinking and improvisation might be needed to make it scalable, or else it might just be a niche strategy --but definitely a very enticing one.
--Datamining_guy
Monday, November 29, 2010
Hospital Data Mining Hits Paydirt
Hospital Data Mining Hits Paydirt -- intersting article on how hospitals are using data mining to identify revenue opportunities. I was very excited to see this is soemthign I have done for other verticals like trasportation and logistics. Good to see good analytics practice moving to newer verticals. A truly exciting time to be in analytics!
-- Datamining_guy
-- Datamining_guy
Wednesday, November 24, 2010
results from another datamining competition
FICO, UCSD Announce Winners of International Predictive Analytics Competition - MarketWatch http://goo.gl/yGIQT
The competition asked participants to predict future purchases for consumers.
The competition was divided into two categories -- one category utilized raw data, and one category utilized transformed data -- and each category had a Graduate and Undergraduate division. The top three finishers in each category and each division shared $10,000 in cash prizes. The winners were:
The competition asked participants to predict future purchases for consumers.
The competition was divided into two categories -- one category utilized raw data, and one category utilized transformed data -- and each category had a Graduate and Undergraduate division. The top three finishers in each category and each division shared $10,000 in cash prizes. The winners were:
Undergraduate Division - Raw Data Category
1st place Shivam Juneja, Institute of Engineering and Technology, Bhaddal
(India)
2nd place Benjamin Hamner, Duke University (USA)
3rd place Rohan Anil, Birla Institute of Technology and Science, Pilani
(India)
Undergraduate Division - Transformed Data Category
1st place Benjamin Hamner, Duke University (USA)
2nd place Harsh Pareek, Chiraag Juvekar and Santosh Ananthakrishnan; Indian
Institute of Technology, Bombay (India)
3rd place Rohan Anil, Birla Institute of Technology and Science, Pilani
(India)
Graduate Division - Raw Data Category
1st place Quan Sun, University of Waikato (New Zealand)
2nd place Alexey Gorodilov, Moscow Institute of Physics and Technology
(Russia)
3rd place Jianfei Wu, North Dakota State University (USA)
Graduate Division - Transformed Data Category
1st place Santi Villalba, University College Dublin (Ireland)
2nd place Jaeyong Lee, Pohang University of Science and Technology (Korea)
3rd place Ilhwan Ko, Pohang University of Science and Technology (Korea)
--datamining_guy
Tuesday, November 23, 2010
Hearst Challenge Update
We are entering the home stretch of the competition. The last time I checked we had 700 teams registered. Also after a long time the leader board saw some movement at the top, with "alegro" from Ukraine taking the top spot. One of the most exciting things for me has been the rich diversity in the participants. Here is a recent breakup by country:
A total of 58 countries! Watch this space for more updates.
--Datamining_guy
Country | Number of Participants |
United States | 309 |
India | 66 |
Information not available | 41 |
Canada | 33 |
Australia | 29 |
United Kingdom | 27 |
Taiwan | 17 |
China | 11 |
Hungary | 9 |
Spain | 8 |
New Zealand | 7 |
Brazil | 6 |
France | 5 |
Germany | 5 |
Netherlands | 5 |
Poland | 5 |
Russian Federation | 5 |
South Africa | 5 |
Denmark | 4 |
Israel | 3 |
Mexico | 3 |
Slovenia | 3 |
Sweden | 3 |
Austria | 2 |
Chile | 2 |
France, Metropolitan | 2 |
Indonesia | 2 |
Iran | 2 |
South Korea | 2 |
Turkey | 2 |
United Arab Emirates | 2 |
Afghanistan | 1 |
American Samoa | 1 |
Argentina | 1 |
Bangladesh | 1 |
Bosnia and Herzegovina | 1 |
Bulgaria | 1 |
Colombia | 1 |
Ecuador | 1 |
Egypt | 1 |
Finland | 1 |
Guatemala | 1 |
Hong Kong | 1 |
Japan | 1 |
Kuwait | 1 |
Luxembourg | 1 |
Malaysia | 1 |
Pakistan | 1 |
Portugal | 1 |
Romania | 1 |
Singapore | 1 |
Sri Lanka | 1 |
Sudan | 1 |
Switzerland | 1 |
Uganda | 1 |
Ukraine | 1 |
Viet Nam | 1 |
Grand Total | 651 |
A total of 58 countries! Watch this space for more updates.
--Datamining_guy
Wednesday, November 17, 2010
Dynamic Pricing -- The Customer Antidote!
Having worked in the transportation industry before getting into consulting, the idea of dynamic pricing has always greatly appealed to me. The space on a flight or truck or train is a perishable commodity, and can be priced based on capacity, customers willingness to pay and market conditions.
Back in 2004 my employer was getting into dynamic pricing, so I spent a fair amount of time understanding PROS and even attended their annual event at Houston. PROS and SABRE between themselves almost served the entire airline industry and supplied them with their dynamic pricing capability.
Timing of the transaction has big role to play in any dynamic pricing scheme, and for a long time the buyers of transportation services ( especially consumers but also businesses) have tried to figure out ways to understand how the dynamic pricing scheme works and trying to outsmart it.
While I have heard of one off cases of success at this, I had never seen a systematic effort to understand this till now. Check this out: A datamining approach to outsmarting dynamic pricing. It is now integrated with BING's travel functionality.
Now I am waiting for the reaction from the dynamic pricing engines and the counter reaction. Let the games begin!
--Datamining_guy
Back in 2004 my employer was getting into dynamic pricing, so I spent a fair amount of time understanding PROS and even attended their annual event at Houston. PROS and SABRE between themselves almost served the entire airline industry and supplied them with their dynamic pricing capability.
Timing of the transaction has big role to play in any dynamic pricing scheme, and for a long time the buyers of transportation services ( especially consumers but also businesses) have tried to figure out ways to understand how the dynamic pricing scheme works and trying to outsmart it.
While I have heard of one off cases of success at this, I had never seen a systematic effort to understand this till now. Check this out: A datamining approach to outsmarting dynamic pricing. It is now integrated with BING's travel functionality.
Now I am waiting for the reaction from the dynamic pricing engines and the counter reaction. Let the games begin!
--Datamining_guy
Friday, November 12, 2010
Top 25 Articles in Economics -- By downloads
A nostalgic list for the Economist in me:
JMP 9 Tree Functionality
I have been a heavy user of CART( SALFORD SYSTEMS) for the past 6-7 years and really love it. Today I had the opportunity to see the Tree functionality that comes with JMP 9, and must say I am very impressed!
While the visuals are not very appealing ( Why? contradicts the otherwise visual rich appeal of JMP), the functionalities are very good. I particularly loved the ability to prune and shape any node the way I want to and really control the overall tree. I have always maintained that this is something which can be a very dangerous functionality and would not recommend it for novices, in the hands of an experienced analyst, the inferences are going to be so much richer! Love it.
Now waiting for JMP to incorporate the multi way splits available with Knowledge Seeker (Angoss) !
-- Datamining_guy
While the visuals are not very appealing ( Why? contradicts the otherwise visual rich appeal of JMP), the functionalities are very good. I particularly loved the ability to prune and shape any node the way I want to and really control the overall tree. I have always maintained that this is something which can be a very dangerous functionality and would not recommend it for novices, in the hands of an experienced analyst, the inferences are going to be so much richer! Love it.
Now waiting for JMP to incorporate the multi way splits available with Knowledge Seeker (Angoss) !
-- Datamining_guy
Thursday, November 11, 2010
Hearst Challenge Update
Here is the latest on Hearst Challenge:
600+ teams now registered
we have moved the date when the final evaluation dataset will be available to December 1.
There was some controversy last night, when the current leader decided to hang up his boots and posted his code for the world to see. On one hand it is correct that sharing the knowledge leads to the development of a superior model, but if you take the thought to the other extreme -- it would not be much of a competition if every team was required to share its code and methodology with all the participants. In balance, given the short duration of the competition, I wish the code was not posted. Anyway, it is water under the bridge and the show goes on --- :)
-- Datamining_guy
600+ teams now registered
we have moved the date when the final evaluation dataset will be available to December 1.
There was some controversy last night, when the current leader decided to hang up his boots and posted his code for the world to see. On one hand it is correct that sharing the knowledge leads to the development of a superior model, but if you take the thought to the other extreme -- it would not be much of a competition if every team was required to share its code and methodology with all the participants. In balance, given the short duration of the competition, I wish the code was not posted. Anyway, it is water under the bridge and the show goes on --- :)
-- Datamining_guy
Tuesday, November 9, 2010
The future of Datamining in Medical Diagnosis?
I have for long wondered about the relevance of datamining in medical diagnosis. Think about what a doctor does: Collects data by examination of the patient, through pathological and radiographic tests and on the basis of these data points, makes an inference or diagnosis about the ailment inflicting the patient. So much in common to the basic tenets of datamining!
I got a first glimpse of this about 4 years back during the 2006 KDD CUP related to Pulmonary Embolism sponsored by Siemens. The focus of the competition was to assist in automated detection of the disease, with separate problems related to false positives and false negatives. The Holy Grail of the competition was an algorithm to predict with 100% certainty if a patient was healthy!
The 2008 KDD CUP competition again was again related to this area but dealt with breast cancer.
It seems , like everything else in analytics, IBM is dabbling in this space as well. Check out this very interesting article in the Atlantic. As the article indicates, a big hurdle here is to get the practitioners to embrace the technology.
-- datamining_guy
I got a first glimpse of this about 4 years back during the 2006 KDD CUP related to Pulmonary Embolism sponsored by Siemens. The focus of the competition was to assist in automated detection of the disease, with separate problems related to false positives and false negatives. The Holy Grail of the competition was an algorithm to predict with 100% certainty if a patient was healthy!
The 2008 KDD CUP competition again was again related to this area but dealt with breast cancer.
It seems , like everything else in analytics, IBM is dabbling in this space as well. Check out this very interesting article in the Atlantic. As the article indicates, a big hurdle here is to get the practitioners to embrace the technology.
-- datamining_guy
Tuesday, November 2, 2010
Hearst Challenge Update
The competition at the Hearst Challenge is heating up and 530+ teams are now registered. A pack of about 5 teams are now slightly separated from the rest of the field and represent, Australia, US, UK and China.
Here are the top 5 as of this afternoon:
In other news, we have started planning for next years competition!
--datamining_guy
Here are the top 5 as of this afternoon:
In other news, we have started planning for next years competition!
--datamining_guy
Monday, November 1, 2010
Top 10 things in IT/Analytics related world for 2011!
Here is a very interesting article from the Gartner's Symposium earlier this month. Look at number 5 and 6 on the list.
Next generation Analytics --- Which is basically real time analytics where every business function is supported by automated analysis and predictions about the future!
Social Analytics --- A broad lumping together of Social media and social network related analytics!
I agree with both and I think a big application would be the union of the two ( see for example, my earlier posting on FourSquare).
One thing not on the list, but which has a potential for being very big is Speech Analytics.
-- Datamining_guy
Next generation Analytics --- Which is basically real time analytics where every business function is supported by automated analysis and predictions about the future!
Social Analytics --- A broad lumping together of Social media and social network related analytics!
I agree with both and I think a big application would be the union of the two ( see for example, my earlier posting on FourSquare).
One thing not on the list, but which has a potential for being very big is Speech Analytics.
-- Datamining_guy
Monday, October 25, 2010
In data base Analytics
Teradata just announced a new version of their database that allows organizations to compare current and historical data. They hope to use this functionality to attract Small and Medium Enterprises which otherwise do not have access to a lot of analytics.
Two things that interest me here:
(a) Another instance of somebody using in-database analytics. I am personally not very sure about this. I definitely is useful but very soon users will be asking for more advanced model driven features.
(b) Another player positioning itself for the SME space. A really useful (and probably winning) solution in this space will in my opinion have black box, semi customizable advanced modeling modules for different business decisions like targeting, cross sell, churn, forecasting etc. Not sure if in -database is the answer here.
-- datamining_guy
Two things that interest me here:
(a) Another instance of somebody using in-database analytics. I am personally not very sure about this. I definitely is useful but very soon users will be asking for more advanced model driven features.
(b) Another player positioning itself for the SME space. A really useful (and probably winning) solution in this space will in my opinion have black box, semi customizable advanced modeling modules for different business decisions like targeting, cross sell, churn, forecasting etc. Not sure if in -database is the answer here.
-- datamining_guy
Saturday, October 23, 2010
Participant composition at Hearst Challenge
Here is the latest on Hearst Challenge. We are up to 400+ participants. An interesting question is whether a team from industry will take the title, or will academia prevail?
In terms of participation the split approximately is 85% non academic and 15% academic. The academic number might be somewhat underestimated as those who have registered with their gmail/yahoo/hotmail type accounts have been tagged as non academic.
In an interesting development, NUIM is offering extra credit to its students for serious participation in the competition.
Who will win it all?
watch this space for more ----
datamining_guy
In terms of participation the split approximately is 85% non academic and 15% academic. The academic number might be somewhat underestimated as those who have registered with their gmail/yahoo/hotmail type accounts have been tagged as non academic.
In an interesting development, NUIM is offering extra credit to its students for serious participation in the competition.
Who will win it all?
watch this space for more ----
datamining_guy
Subscribe to:
Posts (Atom)