<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.0.1" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Keith McCormick</title>
	<link>http://keithmccormick.com</link>
	<description></description>
	<pubDate>Fri, 30 Apr 2010 12:47:05 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.1</generator>
	<language>en</language>
			<item>
		<title>Annual Rexer Analytics Survey</title>
		<link>http://keithmccormick.com/?p=122</link>
		<comments>http://keithmccormick.com/?p=122#comments</comments>
		<pubDate>Fri, 30 Apr 2010 12:43:59 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
		
	<category>Uncategorized</category>
		<guid isPermaLink="false">http://keithmccormick.com/?p=122</guid>
		<description><![CDATA[I take part in this each year, and find the results very interesting. Please do participate as well.
Rexer Survey 
]]></description>
			<content:encoded><![CDATA[<p>I take part in this each year, and find the results very interesting. Please do participate as well.</p>
<p><a href="http://www.rexeranalytics.com/Data-Miner-Survey-2010-Intro.html">Rexer Survey </a></p>
]]></content:encoded>
			<wfw:commentRSS>http://keithmccormick.com/?feed=rss2&amp;p=122</wfw:commentRSS>
		</item>
		<item>
		<title>Excel &#8220;Caveats&#8221;</title>
		<link>http://keithmccormick.com/?p=121</link>
		<comments>http://keithmccormick.com/?p=121#comments</comments>
		<pubDate>Fri, 30 Apr 2010 12:34:13 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
		
	<category>Uncategorized</category>
		<guid isPermaLink="false">http://keithmccormick.com/?p=121</guid>
		<description><![CDATA[Recently there was a discussion of the dangers of using Excel for Stats on the SPSSX-L listserve. This U Mass Amherst link was mentioned.
Using Excel for Statistical Data Analysis - Caveats 
&#160;
]]></description>
			<content:encoded><![CDATA[<p>Recently there was a discussion of the dangers of using Excel for Stats on the SPSSX-L listserve. This U Mass Amherst link was mentioned.</p>
<p><a href="http://www.umass.edu/statdata/software/handouts/excel.html">Using Excel for Statistical Data Analysis - Caveats </a></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRSS>http://keithmccormick.com/?feed=rss2&amp;p=121</wfw:commentRSS>
		</item>
		<item>
		<title>ASA&#8217;s Joint Statistical Meetings</title>
		<link>http://keithmccormick.com/?p=120</link>
		<comments>http://keithmccormick.com/?p=120#comments</comments>
		<pubDate>Mon, 26 Apr 2010 13:39:32 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
		
	<category>Uncategorized</category>
		<guid isPermaLink="false">http://keithmccormick.com/?p=120</guid>
		<description><![CDATA[Yet another annual conference that is on my list, but that I haven&#39;t made it too yet. The timing of this one, July 31st to August 5th, and its location, Vancouver, make this very tempting.
JSM 2010 
]]></description>
			<content:encoded><![CDATA[<p>Yet another annual conference that is on my list, but that I haven&#39;t made it too yet. The timing of this one, July 31st to August 5th, and its location, Vancouver, make this very tempting.</p>
<p><a href="http://www.amstat.org/meetings/jsm/2010/">JSM 2010</a> </p>
]]></content:encoded>
			<wfw:commentRSS>http://keithmccormick.com/?feed=rss2&amp;p=120</wfw:commentRSS>
		</item>
		<item>
		<title>Sort Order in K-Means Cluster</title>
		<link>http://keithmccormick.com/?p=117</link>
		<comments>http://keithmccormick.com/?p=117#comments</comments>
		<pubDate>Sun, 25 Apr 2010 15:03:21 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
		
	<category>Programming</category>
	<category>Stats</category>
		<guid isPermaLink="false">http://keithmccormick.com/?p=117</guid>
		<description><![CDATA[I have used and taught K-means cluster for many years. I have never worried too much about sort order, but a recent experience made me revisit it. I was running a simple data set through all the IBM SPSS Statistics (SPSS) Cluster Techniques, and also ran them all through IBM SPSS Modeler (Modeler). The short [...]]]></description>
			<content:encoded><![CDATA[<p>I have used and taught K-means cluster for many years. I have never worried too much about sort order, but a recent experience made me revisit it. I was running a simple data set through all the IBM SPSS Statistics (SPSS) Cluster Techniques, and also ran them all through IBM SPSS Modeler (Modeler). The short version of the story is that the results were not identical.</p>
<p>Consider the following SPSS output with the default 10 iterations:</p>
<p><a href="http://keithmccormick.com/wp-content/uploads/10%20Iterations.png" title="10 Iterations.png"><img src="http://keithmccormick.com/wp-content/uploads/10%20Iterations.png" border="0" alt="10 Iterations.png" /></a>
<p>The &quot;Iteration History&quot; (not shown) would indicate that the solution had not converged. It may not be immediately clear what you are seeing here because the clusters are moved around. Upon scrutiny, however, you will find that some of the values simply do not match. Clearly sorting (in this case on ID) changes the results. It is not desirable to have unstable results. </p>
<p>There are three interesting options.</p>
<p>1) Consider using Hierarchical first, calculation the cluster centers, and then sending those as the initial cluster centers. I have always been frustrated that Help does not make it clear how to format this information in the required file. While this is still true, the Syntax Reference Guide does make it clear. (Note: Modeler does not support Hierarchical because it is VERY slow on large data sets.) </p>
<p><a href="http://keithmccormick.com/wp-content/uploads/Intial%20Cluster%20Center%20Format.png" title="Intial Cluster Center Format"><img src="http://keithmccormick.com/wp-content/uploads/Intial%20Cluster%20Center%20Format.png" border="0" alt="Intial Cluster Center Format" /></a></p>
<p>2) Consider reading up on K++. While not implemented in SPSS or Modeler, there is some interesting information on this new (2007) algorithm. One could always consider R or Python for implementation.</p>
<p><a href="http://en.wikipedia.org/wiki/K-means%2B%2B">K++ on Wikipedia </a></p>
<p>For a Python implementation consider the following page I found. I wish I could say I have tried to combine this with syntax to produce a complete solution - I have not.</p>
<p><a href="http://blogs.sun.com/yongsun/entry/k_means_and_k_means">K++ Python </a></p>
<p>3) The third option, and definitely the easiest, is to simply increase the number of iterations. See the results using 20 iterations. Notice that all the values are identical.</p>
<p><a href="http://keithmccormick.com/wp-content/uploads/20%20Iterations.png" title="20 Iterations.png"><img src="http://keithmccormick.com/wp-content/uploads/20%20Iterations.png" border="0" alt="20 Iterations.png" /></a> </p>
]]></content:encoded>
			<wfw:commentRSS>http://keithmccormick.com/?feed=rss2&amp;p=117</wfw:commentRSS>
		</item>
		<item>
		<title>What&#8217;s in a Name?</title>
		<link>http://keithmccormick.com/?p=115</link>
		<comments>http://keithmccormick.com/?p=115#comments</comments>
		<pubDate>Sat, 24 Apr 2010 14:14:28 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
		
	<category>SPSS Inc.</category>
	<category>IBM SPSS Statistics</category>
	<category>IBM SPSS Modeler</category>
		<guid isPermaLink="false">http://keithmccormick.com/?p=115</guid>
		<description><![CDATA[I considered making my literary reference more colorful, and thought better of it. There has been some confusion over the last year resulting from some name changes for the products made by SPSS, Inc, a IBM Company. This web page should help considerably: Product Naming Guide. Virtually all the products will retain similar names, but [...]]]></description>
			<content:encoded><![CDATA[<p>I considered making my literary reference more colorful, and thought better of it. There has been some confusion over the last year resulting from some name changes for the products made by SPSS, Inc, a IBM Company. This web page should help considerably: <a href="http://www.spss.com/software/product-name-guide/">Product Naming Guide</a>. Virtually all the products will retain similar names, but will begin with &quot;IBM SPSS&quot;, as in IBM SPSS Statistics, and IBM SPSS Modeler. </p>
<p>The confusion largely stemmed from the fact that the PASW (Predictive Analytics SoftWare) naming system was so short lived. It may take some time for all materials, and software menus to reflect the new names. </p>
]]></content:encoded>
			<wfw:commentRSS>http://keithmccormick.com/?feed=rss2&amp;p=115</wfw:commentRSS>
		</item>
		<item>
		<title>What&#8217;s New in Modeler 14</title>
		<link>http://keithmccormick.com/?p=114</link>
		<comments>http://keithmccormick.com/?p=114#comments</comments>
		<pubDate>Wed, 07 Apr 2010 18:48:16 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
		
	<category>IBM SPSS Modeler</category>
		<guid isPermaLink="false">http://keithmccormick.com/?p=114</guid>
		<description><![CDATA[I am looking forward to start working in version 14. I think some of the new features are going to save me time when working in the tool. 
The presentation starts with a overview of data mining. If you are pressed for time, the Introducing Modeler 14.0 begins about 10 1/2 minutes into the presentation [...]]]></description>
			<content:encoded><![CDATA[<p>I am looking forward to start working in version 14. I think some of the new features are going to save me time when working in the tool. </p>
<p>The presentation starts with a overview of data mining. If you are pressed for time, the Introducing Modeler 14.0 begins about 10 1/2 minutes into the presentation with a discussion of overall capabilities and the two release editions: premium and professional. The New Features part of the show begins about 15 1/12 minutes into the presentation.</p>
<p>Q&amp;A begins about 10 minutes before the presentation end. </p>
<p><a href="http://www.spss.com/events/event.cfm?E_ID=3233">Modeler 14.0  Webcast</a> </p>
]]></content:encoded>
			<wfw:commentRSS>http://keithmccormick.com/?feed=rss2&amp;p=114</wfw:commentRSS>
		</item>
		<item>
		<title>AVIS Europe Success Story</title>
		<link>http://keithmccormick.com/?p=113</link>
		<comments>http://keithmccormick.com/?p=113#comments</comments>
		<pubDate>Wed, 07 Apr 2010 12:22:50 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
		
	<category>Uncategorized</category>
		<guid isPermaLink="false">http://keithmccormick.com/?p=113</guid>
		<description><![CDATA[AVIS Europe

]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.whatech.com.au/technology-releases/marketing/1274-ibm-spss-predictive-analytics-software-helps-avis-transform-client-campaigns-and-cut-marketing-costs-in-half">AVIS Europe</a>
</p>
]]></content:encoded>
			<wfw:commentRSS>http://keithmccormick.com/?feed=rss2&amp;p=113</wfw:commentRSS>
		</item>
		<item>
		<title>KDD 2010</title>
		<link>http://keithmccormick.com/?p=112</link>
		<comments>http://keithmccormick.com/?p=112#comments</comments>
		<pubDate>Wed, 07 Apr 2010 12:10:51 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
		
	<category>Uncategorized</category>
		<guid isPermaLink="false">http://keithmccormick.com/?p=112</guid>
		<description><![CDATA[&#160;The Knowledge Discovery in Databases conference is in Washington, DC this year. This is a rather technical conference, so occasional users of SPSS Modeler may find it a bit intense. I have nearly gone for the last three years. It is unclear if this is the year. The Dates are July 25-28. On the 25th [...]]]></description>
			<content:encoded><![CDATA[<p>&nbsp;The Knowledge Discovery in Databases conference is in Washington, DC this year. This is a rather technical conference, so occasional users of SPSS Modeler may find it a bit intense. I have nearly gone for the last three years. It is unclear if this is the year. The Dates are July 25-28. On the 25th are workshops including one on Social Networks - the one that intrigues me most. </p>
<p><a href="http://www.kdd.org/kdd2010/">KDD 2010</a></p>
<p>I also love the KDD Cup task this year - predicting student performance using tutoring data. I am not participating, but that data set would have brought me full circle as my undergraduate thesis was producing a concept for learning style based tutorial software.</p>
<p><a href="https://pslcdatashop.web.cmu.edu/KDDCup/">KDD Cup</a> </p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRSS>http://keithmccormick.com/?feed=rss2&amp;p=112</wfw:commentRSS>
		</item>
		<item>
		<title>Directions 2010 Rome</title>
		<link>http://keithmccormick.com/?p=111</link>
		<comments>http://keithmccormick.com/?p=111#comments</comments>
		<pubDate>Wed, 07 Apr 2010 12:04:20 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
		
	<category>Uncategorized</category>
		<guid isPermaLink="false">http://keithmccormick.com/?p=111</guid>
		<description><![CDATA[Directions Website

]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.spss.com/spssdirections/emea/">Directions Website</a>
</p>
]]></content:encoded>
			<wfw:commentRSS>http://keithmccormick.com/?feed=rss2&amp;p=111</wfw:commentRSS>
		</item>
		<item>
		<title>Interview with Deepak Advani</title>
		<link>http://keithmccormick.com/?p=110</link>
		<comments>http://keithmccormick.com/?p=110#comments</comments>
		<pubDate>Wed, 07 Apr 2010 12:02:45 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
		
	<category>SPSS Inc.</category>
		<guid isPermaLink="false">http://keithmccormick.com/?p=110</guid>
		<description><![CDATA[Advani Interview 

]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.informationweek.com/news/software/open_source/showArticle.jhtml?articleID=223100559&amp;pgno=1&amp;queryText=&amp;isPrev=">Advani Interview </a></p>
<p><a href="http://www.informationweek.com/news/software/open_source/showArticle.jhtml?articleID=223100559&amp;pgno=1&amp;queryText=&amp;isPrev=" title="Advani Interview"><br /></a></p>
]]></content:encoded>
			<wfw:commentRSS>http://keithmccormick.com/?feed=rss2&amp;p=110</wfw:commentRSS>
		</item>
	</channel>
</rss>
