Information contained on this page is provided by an independent third-party content provider. WorldNow and this Station make no warranties or representations in connection therewith. If you have any questions or comments about this page please contact firstname.lastname@example.org.
SOURCE Data Science Central
ISSAQUAH, Wash., Oct. 29, 2013 /PRNewswire/ -- A mathematical problem related to big data was solved by Jean-Francois Puget, engineer in the Solutions Analytics and Optimization group at IBM France. The problem was first mentioned on Data Science Central, and an award was offered to the first data scientist to solve it.
Bryan Gorman, Principal Physicist, Chief Scientist at Johns Hopkins University Applied Physics Laboratory, made a significant breakthrough in July, and won $500. Jean-Francois Puget completely solved the problem, independently from Bryan, and won a $1,000 award.
The competition was organized and financed by Data Science Central. Participants from around the world submitted a number of interesting approaches. The mathematical question was asked by Vincent Granville, a leading data scientist and co-founder at Data Science Central. Granville initially proposed a solution after performing large-scale Monte Carlo simulations, but his solution turned out to be wrong.
The problem consisted in finding an exact formula for a new type of correlation and goodness-of-fit metrics, designed specifically for big data, generalizing the Spearman's rank coefficient, and being especially robust for non-bounded, ordinal data found in large data sets. From a mathematical point of view, the new metric is based on L-1 rather than L-2 theory: In other words, it relies on absolute rather than squared differences. Using squares (or higher powers) is what makes traditional metrics such as R squared notoriously sensitive to outliers, and avoided by savvy statistical modelers. In big data, outliers are plentiful and even extreme outliers are not rare. It can render conclusions from a statistical analysis invalid, so this is a critical issue. This outlier issue is sometimes referred to as the curse of big data.
Jean-Francois and Brian both came with a new approach: Instead of running heavy computations, they used mathematical thinking and leveraged their expertise in mathematical optimization as well as in permutation theory and combinatorics. And they succeeded. This proves that sometimes, mathematical modeling can beat even the most powerful system of clustered computers. Though usually, both work hand in hand.
Additional details can be found here: http://bit.ly/133S6ns.
Data Science Central is the industry's online resource for big data practitioners. From Analytics to Data Integration to Visualization, Data Science Central provides a community experience that includes a robust editorial platform, social interaction, forum-based technical support, the latest in technology, tools and trends and industry job opportunities.
©2012 PR Newswire. All Rights Reserved.