SetonHallPirate 585 Report post Posted March 24 (edited) So, how'd we do? By my count, the abacus (adjusted for the brackets) had 554 points. The count Ching is going to go by is likely a few points lower, however. Edited March 24 by SetonHallPirate Share this post Link to post Share on other sites

Ching 163 Report post Posted March 24 Prelim results are in, WIN Magazine is the 2019 Rank the Rankers Champion. WIN Magazine Flowrestling Intermat Trackwrestling The Open Mat WrestleStat Seeding Dual Impact Index Amateur Wrestling News 2 pamela and andegre reacted to this Share this post Link to post Share on other sites

nhs67 111 Report post Posted March 24 Interesting to see that Wrestlestat had the most best results, despite a 6th place overall. In fact it was the only one to have the best in over two(2) weight classes, with four. At the other end Flo, despite having the 2nd best overall had precisely zero weight classes where they had the best rating. 2 Ching and andegre reacted to this Share this post Link to post Share on other sites

pish6969 249 Report post Posted March 24 Wow. Super close. Congrats WIN. 1 Ching reacted to this Share this post Link to post Share on other sites

qc8223 210 Report post Posted March 24 On 3/14/2019 at 7:04 PM, pish6969 said: Can’t wait for the results. Flo stuck with Shak at 8 or so. That’s gonna hurt Guess it didn't hurt so bad after all. Maybe pyles was onto something. Share this post Link to post Share on other sites

pish6969 249 Report post Posted March 25 Guess it didn't hurt so bad after all. Maybe pyles was onto something.Or maybe Shaks knee was hurt more than I realized. But touché Share this post Link to post Share on other sites

qc8223 210 Report post Posted March 25 4 minutes ago, pish6969 said: Or maybe Shaks knee was hurt more than I realized. But touché Probably that. I still think Shak was better than everybody not named Myles. What a bizarre tournament for that whole bracket. Not to pat myself on the back, but I made a prediction about a month ago that 184 would implode at nationals. Didn't consider Martin to subject to that kind of implosion though. Share this post Link to post Share on other sites

Ching 163 Report post Posted March 25 On 3/16/2019 at 2:24 PM, SetonHallPirate said: Thinking we should be allowed to use our algorithm's projected brackets, rather than projected seeds. Mine can be found at https://wrestlingbypirate.wordpress.com/2019/03/16/dual-impact-index-ncaa-championships-bracket-projections/ I would like to do this, but only you are WrestleStat could provide that info. The issue of two top 8's connecting in the second round is definitely a problem and the reason I include the seeds as a player. Also, if you have a wrestler ranked 8th and they go 2-2, you still get 6 points. That is the same number of points you get if they place 5th. Share this post Link to post Share on other sites

Ching 163 Report post Posted March 25 If any rankers want access to my spreadsheet with breakdowns, DM your e-mail address. I don't want to share publicly because the rankings are copyrighted. Share this post Link to post Share on other sites

andegre 112 Report post Posted March 25 (edited) Thanks again Ching! One thing that we/WrestleStat would also like to point out, if you exclude 285, WrestleStat was #1. (I know, picking data to fulfill my argument....). Pretty cool to see though, that all of the work myself and my algorithm guy did this offseason paid off in a BIG way. (Last year we scored around 515 if I recall). Thanks algorithm guy ;) Scratch that, leaving out 285 OR 157 and WrestleStat wins... :) edit 2: my “algorithm guy” wants his name changed to: “Oregon State Alum who doesn't do social media”, so thank you, “Oregon State Alum who doesn't do social media” Edited March 25 by andegre Share this post Link to post Share on other sites

SetonHallPirate 585 Report post Posted March 25 Pretty ironic that my abacus did better with the 1-8 rankings than it did optimized for the brackets! 1 andegre reacted to this Share this post Link to post Share on other sites

Ching 163 Report post Posted March 25 58 minutes ago, SetonHallPirate said: Pretty ironic that my abacus did better with the 1-8 rankings than it did optimized for the brackets! I actually think it makes sense. By optimizing for the seeds, you are bringing lower ranked wrestlers higher up and they are more likely to flame out with 1-2 or BBQ. The inverse is also true, a higher ranked wrestler getting pushed down to optimize is more likely to outperform their ranking. Share this post Link to post Share on other sites

jammen 241 Report post Posted March 25 Now you need to evaluate whether the scores have any validity. i.e. is a score of 544 substantially different from a score of 588. This will depend upon the metrics you used to determine the scores. For instance, if you simply measured the % that the sites predicted AAs and their exact placements you would get a range of 65-75% and 20-25% respectively based upon previous years results. Share this post Link to post Share on other sites

SetonHallPirate 585 Report post Posted March 26 3 hours ago, Ching said: I actually think it makes sense. By optimizing for the seeds, you are bringing lower ranked wrestlers higher up and they are more likely to flame out with 1-2 or BBQ. The inverse is also true, a higher ranked wrestler getting pushed down to optimize is more likely to outperform their ranking. Excellent point. Optimizing for the brackets does make it more likely (ie. .000001% rather than zero percent) that I'd shoot the moon and score 800, though! Share this post Link to post Share on other sites

Billyhoyle 1,464 Report post Posted March 26 5 hours ago, andegre said: Thanks again Ching! One thing that we/WrestleStat would also like to point out, if you exclude 285, WrestleStat was #1. (I know, picking data to fulfill my argument....). Pretty cool to see though, that all of the work myself and my algorithm guy did this offseason paid off in a BIG way. (Last year we scored around 515 if I recall). Thanks algorithm guy ;) Scratch that, leaving out 285 OR 157 and WrestleStat wins... :) edit 2: my “algorithm guy” wants his name changed to: “Oregon State Alum who doesn't do social media”, so thank you, “Oregon State Alum who doesn't do social media” Is your ranking system just a basic ELO score? Share this post Link to post Share on other sites

andegre 112 Report post Posted March 26 It’s BASED/ORIGINATED from the Elo algorithm, with many changes to work for wrestling. There are approximately 10 different equations/algorithms that go into the whole thing. Then I wrote a simulation engine last off-season to allow me to do thousands of simulations to continually optimize it. Share this post Link to post Share on other sites

Gallway 0 Report post Posted March 26 (edited) Why not just track the number of time each services higher ranked wrestler beats the lower ranked. Because of bracket placement, getting a guy to end up in the position you ranked him can vary. But when two guys go head to head, a ranking service is stating which one should win. Just track that. Edited March 26 by Gallway Share this post Link to post Share on other sites

andegre 112 Report post Posted March 26 On 3/24/2019 at 8:03 PM, Ching said: If any rankers want access to my spreadsheet with breakdowns, DM your e-mail address. I don't want to share publicly because the rankings are copyrighted. Hey @Ching, can you send me a couple of your excel file? My "Oregon State Alum who doesn't do social media" is already sending me modifications to start running simulations against....we're gonna get first eventually!!! 1 Ching reacted to this Share this post Link to post Share on other sites

dakotajudo 5 Report post Posted March 26 19 hours ago, jammen said: Now you need to evaluate whether the scores have any validity. i.e. is a score of 544 substantially different from a score of 588. This will depend upon the metrics you used to determine the scores. For instance, if you simply measured the % that the sites predicted AAs and their exact placements you would get a range of 65-75% and 20-25% respectively based upon previous years results. Suppose we assume that each weight class is an independent measure of ranking accuracy. How well do the different rankers predict the outcome of a bracket? If we assume that scores are continuous and that errors are randomly distributed, when we can test this using a simple AOV, in R (given that stacked.ching is the long version of the table Ching posted): ching.lm <- lm(Score ~ Source + Weight, data=ching.stacked) anova(ching.lm) ## Analysis of Variance Table ## ## Response: Score ## Df Sum Sq Mean Sq F value Pr(>F) ## Source 8 187.0 23.38 0.7211 0.6722 ## Weight 9 9227.1 1025.23 31.6256 <2e-16 *** ## Residuals 72 2334.1 32.42 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Given that the data sums of scores, we might use a Poisson model, ching.glm <- glm(Score ~ Source + Weight, family=poisson, data=ching.stacked) anova(ching.glm,test="LRT") ## Analysis of Deviance Table ## ## Model: poisson, link: log ## ## Response: Score ## ## Terms added sequentially (first to last) ## ## ## Df Deviance Resid. Df Resid. Dev Pr(>Chi) ## NULL 89 218.387 ## Source 8 3.283 81 215.104 0.9153 ## Weight 9 173.040 72 42.064 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We might continue with a categorical analysis of the table, perhaps Fisher's Exact test (here, the data are in original table form) fisher.test(ching.table,simulate.p.value = TRUE) ## ## Fisher's Exact Test for Count Data with simulated p-value (based ## on 2000 replicates) ## ## data: ching.table ## p-value = 0.9975 ## alternative hypothesis: two.sided Finally, if we don't assume any distribution, a rank-based test is appropriate friedman.test(Score ~ Source | Weight, data=ching.stacked) ## ## Friedman rank sum test ## ## data: Score and Source and Weight ## Friedman chi-squared = 3.4866, df = 8, p-value = 0.9002 friedman.test(Score ~ Weight | Source, data=ching.stacked) ## ## Friedman rank sum test ## ## data: Score and Weight and Source ## Friedman chi-squared = 56.718, df = 9, p-value = 5.722e-09 Long and short - there is more variation among scores between weight classes than there is variation among the rankers, and the range of differences among rankers is small relative to the error within weight classes. The difference between 544 and 588 is about 4 points per weight class, while high and low totals in any weight class, say 125 (45-65) varies more than that. There may be ways to further decompose the error in the comparisons to distinguish (at least in a statistical sense) among the rankers, but there's not enough information in Ching's table. 2 Ching and pamela reacted to this Share this post Link to post Share on other sites

andegre 112 Report post Posted March 26 23 minutes ago, dakotajudo said: Suppose we assume that each weight class is an independent measure of ranking accuracy. How well do the different rankers predict the outcome of a bracket? If we assume that scores are continuous and that errors are randomly distributed, when we can test this using a simple AOV, in R (given that stacked.ching is the long version of the table Ching posted): ching.lm <- lm(Score ~ Source + Weight, data=ching.stacked) anova(ching.lm) ## Analysis of Variance Table ## ## Response: Score ## Df Sum Sq Mean Sq F value Pr(>F) ## Source 8 187.0 23.38 0.7211 0.6722 ## Weight 9 9227.1 1025.23 31.6256 <2e-16 *** ## Residuals 72 2334.1 32.42 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Given that the data sums of scores, we might use a Poisson model, ching.glm <- glm(Score ~ Source + Weight, family=poisson, data=ching.stacked) anova(ching.glm,test="LRT") ## Analysis of Deviance Table ## ## Model: poisson, link: log ## ## Response: Score ## ## Terms added sequentially (first to last) ## ## ## Df Deviance Resid. Df Resid. Dev Pr(>Chi) ## NULL 89 218.387 ## Source 8 3.283 81 215.104 0.9153 ## Weight 9 173.040 72 42.064 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We might continue with a categorical analysis of the table, perhaps Fisher's Exact test (here, the data are in original table form) fisher.test(ching.table,simulate.p.value = TRUE) ## ## Fisher's Exact Test for Count Data with simulated p-value (based ## on 2000 replicates) ## ## data: ching.table ## p-value = 0.9975 ## alternative hypothesis: two.sided Finally, if we don't assume any distribution, a rank-based test is appropriate friedman.test(Score ~ Source | Weight, data=ching.stacked) ## ## Friedman rank sum test ## ## data: Score and Source and Weight ## Friedman chi-squared = 3.4866, df = 8, p-value = 0.9002 friedman.test(Score ~ Weight | Source, data=ching.stacked) ## ## Friedman rank sum test ## ## data: Score and Weight and Source ## Friedman chi-squared = 56.718, df = 9, p-value = 5.722e-09 Long and short - there is more variation among scores between weight classes than there is variation among the rankers, and the range of differences among rankers is small relative to the error within weight classes. The difference between 544 and 588 is about 4 points per weight class, while high and low totals in any weight class, say 125 (45-65) varies more than that. There may be ways to further decompose the error in the comparisons to distinguish (at least in a statistical sense) among the rankers, but there's not enough information in Ching's table. Waymit! Did my "Oregon State alum who doesn't do social media" just get an account on this forum? lol Share this post Link to post Share on other sites

Ching 163 Report post Posted March 27 8 hours ago, dakotajudo said: Suppose we assume that each weight class is an independent measure of ranking accuracy. How well do the different rankers predict the outcome of a bracket? If we assume that scores are continuous and that errors are randomly distributed, when we can test this using a simple AOV, in R (given that stacked.ching is the long version of the table Ching posted): ching.lm <- lm(Score ~ Source + Weight, data=ching.stacked) anova(ching.lm) ## Analysis of Variance Table ## ## Response: Score ## Df Sum Sq Mean Sq F value Pr(>F) ## Source 8 187.0 23.38 0.7211 0.6722 ## Weight 9 9227.1 1025.23 31.6256 <2e-16 *** ## Residuals 72 2334.1 32.42 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Given that the data sums of scores, we might use a Poisson model, ching.glm <- glm(Score ~ Source + Weight, family=poisson, data=ching.stacked) anova(ching.glm,test="LRT") ## Analysis of Deviance Table ## ## Model: poisson, link: log ## ## Response: Score ## ## Terms added sequentially (first to last) ## ## ## Df Deviance Resid. Df Resid. Dev Pr(>Chi) ## NULL 89 218.387 ## Source 8 3.283 81 215.104 0.9153 ## Weight 9 173.040 72 42.064 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We might continue with a categorical analysis of the table, perhaps Fisher's Exact test (here, the data are in original table form) fisher.test(ching.table,simulate.p.value = TRUE) ## ## Fisher's Exact Test for Count Data with simulated p-value (based ## on 2000 replicates) ## ## data: ching.table ## p-value = 0.9975 ## alternative hypothesis: two.sided Finally, if we don't assume any distribution, a rank-based test is appropriate friedman.test(Score ~ Source | Weight, data=ching.stacked) ## ## Friedman rank sum test ## ## data: Score and Source and Weight ## Friedman chi-squared = 3.4866, df = 8, p-value = 0.9002 friedman.test(Score ~ Weight | Source, data=ching.stacked) ## ## Friedman rank sum test ## ## data: Score and Weight and Source ## Friedman chi-squared = 56.718, df = 9, p-value = 5.722e-09 Long and short - there is more variation among scores between weight classes than there is variation among the rankers, and the range of differences among rankers is small relative to the error within weight classes. The difference between 544 and 588 is about 4 points per weight class, while high and low totals in any weight class, say 125 (45-65) varies more than that. There may be ways to further decompose the error in the comparisons to distinguish (at least in a statistical sense) among the rankers, but there's not enough information in Ching's table. Dude, you should have asked me for the data! I would have saved you from retyping it. Thanks for doing this. If I'm reading your analysis correctly, the RtR has not been proven to not be bullsh!t. I'm going to call that a win until you can definitively say it is bullish!t. Either way, I will still be crowning a champ every year. Share this post Link to post Share on other sites

andegre 112 Report post Posted April 29 We're working on implementing the RPI calculations into the WrestleStat rankings algorithm....over 500 simulations have been done so far; I'd say we're making progress!!! 1 Housebuye reacted to this Share this post Link to post Share on other sites

BobDole 726 Report post Posted April 29 On 3/26/2019 at 2:01 PM, dakotajudo said: Suppose we assume that each weight class is an independent measure of ranking accuracy. How well do the different rankers predict the outcome of a bracket? If we assume that scores are continuous and that errors are randomly distributed, when we can test this using a simple AOV, in R (given that stacked.ching is the long version of the table Ching posted): ching.lm <- lm(Score ~ Source + Weight, data=ching.stacked) anova(ching.lm) ## Analysis of Variance Table ## ## Response: Score ## Df Sum Sq Mean Sq F value Pr(>F) ## Source 8 187.0 23.38 0.7211 0.6722 ## Weight 9 9227.1 1025.23 31.6256 <2e-16 *** ## Residuals 72 2334.1 32.42 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Given that the data sums of scores, we might use a Poisson model, ching.glm <- glm(Score ~ Source + Weight, family=poisson, data=ching.stacked) anova(ching.glm,test="LRT") ## Analysis of Deviance Table ## ## Model: poisson, link: log ## ## Response: Score ## ## Terms added sequentially (first to last) ## ## ## Df Deviance Resid. Df Resid. Dev Pr(>Chi) ## NULL 89 218.387 ## Source 8 3.283 81 215.104 0.9153 ## Weight 9 173.040 72 42.064 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We might continue with a categorical analysis of the table, perhaps Fisher's Exact test (here, the data are in original table form) fisher.test(ching.table,simulate.p.value = TRUE) ## ## Fisher's Exact Test for Count Data with simulated p-value (based ## on 2000 replicates) ## ## data: ching.table ## p-value = 0.9975 ## alternative hypothesis: two.sided Finally, if we don't assume any distribution, a rank-based test is appropriate friedman.test(Score ~ Source | Weight, data=ching.stacked) ## ## Friedman rank sum test ## ## data: Score and Source and Weight ## Friedman chi-squared = 3.4866, df = 8, p-value = 0.9002 friedman.test(Score ~ Weight | Source, data=ching.stacked) ## ## Friedman rank sum test ## ## data: Score and Weight and Source ## Friedman chi-squared = 56.718, df = 9, p-value = 5.722e-09 Long and short - there is more variation among scores between weight classes than there is variation among the rankers, and the range of differences among rankers is small relative to the error within weight classes. The difference between 544 and 588 is about 4 points per weight class, while high and low totals in any weight class, say 125 (45-65) varies more than that. There may be ways to further decompose the error in the comparisons to distinguish (at least in a statistical sense) among the rankers, but there's not enough information in Ching's table. Every Iowa fan on here is completely lost by this post. Their heads are spinning at a rate that cannot be measured. Share this post Link to post Share on other sites