Jump to content
Ching

Rank the Rankers - 2019

Recommended Posts

Interesting to see that Wrestlestat had the most best results, despite a 6th place overall. In fact it was the only one to have the best in over two(2) weight classes, with four.

 

At the other end Flo, despite having the 2nd best overall had precisely zero weight classes where they had the best rating.

Share this post


Link to post
Share on other sites
On 3/14/2019 at 7:04 PM, pish6969 said:

Can’t wait for the results. Flo stuck with Shak at 8 or so. That’s gonna hurt

Guess it didn't hurt so bad after all. Maybe pyles was onto something.

Share this post


Link to post
Share on other sites
4 minutes ago, pish6969 said:

 


Or maybe Shaks knee was hurt more than I realized.

But touché

 

Probably that. I still think Shak was better than everybody not named Myles. What a bizarre tournament for that whole bracket. Not to pat myself on the back, but I made a prediction about a month ago that 184 would implode at nationals. Didn't consider Martin to subject to that kind of implosion though.

Share this post


Link to post
Share on other sites
On 3/16/2019 at 2:24 PM, SetonHallPirate said:

Thinking we should be allowed to use our algorithm's projected brackets, rather than projected seeds. Mine can be found at https://wrestlingbypirate.wordpress.com/2019/03/16/dual-impact-index-ncaa-championships-bracket-projections/

I would like to do this, but only you are WrestleStat could provide that info. The issue of two top 8's connecting in the second round is definitely a problem and the reason I include the seeds as a player. Also, if you have a wrestler ranked 8th and they go 2-2, you still get 6 points. That is the same number of points you get if they place 5th.

Share this post


Link to post
Share on other sites

Thanks again Ching!

One thing that we/WrestleStat would also like to point out, if you exclude 285, WrestleStat was #1. (I know, picking data to fulfill my argument....).

Pretty cool to see though, that all of the work myself and my algorithm guy did this offseason paid off in a BIG way. (Last year we scored around 515 if I recall).

 

Thanks algorithm guy ;)

 

Scratch that, leaving out 285 OR 157 and WrestleStat wins... :)

 

edit 2: my “algorithm guy” wants his name changed to: “Oregon State Alum who doesn't do social media”, so thank you, “Oregon State Alum who doesn't do social media”

Edited by andegre

Share this post


Link to post
Share on other sites
58 minutes ago, SetonHallPirate said:

Pretty ironic that my abacus did better with the 1-8 rankings than it did optimized for the brackets!

I actually think it makes sense. By optimizing for the seeds, you are bringing lower ranked wrestlers higher up and they are more likely to flame out with 1-2 or BBQ.  The inverse is also true, a higher ranked wrestler getting pushed down to optimize is more likely to outperform their ranking.

Share this post


Link to post
Share on other sites

Now you need to evaluate whether the scores have any validity.  i.e. is a score of 544 substantially different from a score of 588.  This will depend upon the metrics you used to determine the scores.   For instance, if you simply measured the % that the sites predicted AAs and their exact placements you would get a range of 65-75% and 20-25% respectively based upon previous years results.

Share this post


Link to post
Share on other sites
3 hours ago, Ching said:

I actually think it makes sense. By optimizing for the seeds, you are bringing lower ranked wrestlers higher up and they are more likely to flame out with 1-2 or BBQ.  The inverse is also true, a higher ranked wrestler getting pushed down to optimize is more likely to outperform their ranking.

Excellent point. Optimizing for the brackets does make it more likely (ie. .000001% rather than zero percent) that I'd shoot the moon and score 800, though!

Share this post


Link to post
Share on other sites
5 hours ago, andegre said:

Thanks again Ching!

One thing that we/WrestleStat would also like to point out, if you exclude 285, WrestleStat was #1. (I know, picking data to fulfill my argument....).

Pretty cool to see though, that all of the work myself and my algorithm guy did this offseason paid off in a BIG way. (Last year we scored around 515 if I recall).

 

Thanks algorithm guy ;)

 

Scratch that, leaving out 285 OR 157 and WrestleStat wins... :)

 

edit 2: my “algorithm guy” wants his name changed to: “Oregon State Alum who doesn't do social media”, so thank you, “Oregon State Alum who doesn't do social media”

Is your ranking system just a basic ELO score?

Share this post


Link to post
Share on other sites

It’s BASED/ORIGINATED from the Elo algorithm, with many changes to work for wrestling. There are approximately 10 different equations/algorithms that go into the whole thing. Then I wrote a simulation engine last off-season to allow me to do thousands of simulations to continually optimize it.

Share this post


Link to post
Share on other sites

Why not just track the number of time each services higher ranked wrestler beats the lower ranked.  Because of bracket placement, getting a guy to end up in the position you ranked him can vary.  But when two guys go head to head, a ranking service is stating which one should win.  Just track that.

Edited by Gallway

Share this post


Link to post
Share on other sites
On 3/24/2019 at 8:03 PM, Ching said:

If any rankers want access to my spreadsheet with breakdowns, DM your e-mail address. I don't want to share publicly because the rankings are copyrighted.

Hey @Ching, can you send me a couple of your excel file? My "Oregon State Alum who doesn't do social media" is already sending me modifications to start running simulations against....we're gonna get first eventually!!!

Share this post


Link to post
Share on other sites
19 hours ago, jammen said:

Now you need to evaluate whether the scores have any validity.  i.e. is a score of 544 substantially different from a score of 588.  This will depend upon the metrics you used to determine the scores.   For instance, if you simply measured the % that the sites predicted AAs and their exact placements you would get a range of 65-75% and 20-25% respectively based upon previous years results.

Suppose we assume that each weight class is an independent measure of ranking accuracy. How well do the different rankers predict the outcome of a bracket? If we assume that scores are continuous and that errors are randomly distributed, when we can test this using a simple AOV, in R (given that stacked.ching is the long version of the table Ching posted):

ching.lm <- lm(Score ~ Source + Weight, data=ching.stacked)
anova(ching.lm)
## Analysis of Variance Table
## 
## Response: Score
##           Df Sum Sq Mean Sq F value Pr(>F)    
## Source     8  187.0   23.38  0.7211 0.6722    
## Weight     9 9227.1 1025.23 31.6256 <2e-16 ***
## Residuals 72 2334.1   32.42                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Given that the data sums of scores, we might use a Poisson model,

ching.glm <- glm(Score ~ Source + Weight, family=poisson, data=ching.stacked)
anova(ching.glm,test="LRT")
## Analysis of Deviance Table
## 
## Model: poisson, link: log
## 
## Response: Score
## 
## Terms added sequentially (first to last)
## 
## 
##        Df Deviance Resid. Df Resid. Dev Pr(>Chi)    
## NULL                      89    218.387             
## Source  8    3.283        81    215.104   0.9153    
## Weight  9  173.040        72     42.064   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We might continue with a categorical analysis of the table, perhaps Fisher's Exact test (here, the data are in original table form)

fisher.test(ching.table,simulate.p.value = TRUE)
## 
##  Fisher's Exact Test for Count Data with simulated p-value (based
##  on 2000 replicates)
## 
## data:  ching.table
## p-value = 0.9975
## alternative hypothesis: two.sided

Finally, if we don't assume any distribution, a rank-based test is appropriate

friedman.test(Score ~ Source | Weight, data=ching.stacked)
## 
##  Friedman rank sum test
## 
## data:  Score and Source and Weight
## Friedman chi-squared = 3.4866, df = 8, p-value = 0.9002
friedman.test(Score ~ Weight | Source, data=ching.stacked)
## 
##  Friedman rank sum test
## 
## data:  Score and Weight and Source
## Friedman chi-squared = 56.718, df = 9, p-value = 5.722e-09

Long and short - there is more variation among scores between weight classes than there is variation among the rankers, and the range of differences among rankers is small relative to the error within weight classes. The difference between 544 and 588 is about 4 points per weight class, while high and low totals in any weight class, say 125 (45-65) varies more than that.

There may be ways to further decompose the error in the comparisons to distinguish (at least in a statistical sense) among the rankers, but there's not enough information in Ching's table.

 

Share this post


Link to post
Share on other sites
23 minutes ago, dakotajudo said:

Suppose we assume that each weight class is an independent measure of ranking accuracy. How well do the different rankers predict the outcome of a bracket? If we assume that scores are continuous and that errors are randomly distributed, when we can test this using a simple AOV, in R (given that stacked.ching is the long version of the table Ching posted):


ching.lm <- lm(Score ~ Source + Weight, data=ching.stacked)
anova(ching.lm)

## Analysis of Variance Table
## 
## Response: Score
##           Df Sum Sq Mean Sq F value Pr(>F)    
## Source     8  187.0   23.38  0.7211 0.6722    
## Weight     9 9227.1 1025.23 31.6256 <2e-16 ***
## Residuals 72 2334.1   32.42                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Given that the data sums of scores, we might use a Poisson model,


ching.glm <- glm(Score ~ Source + Weight, family=poisson, data=ching.stacked)
anova(ching.glm,test="LRT")

## Analysis of Deviance Table
## 
## Model: poisson, link: log
## 
## Response: Score
## 
## Terms added sequentially (first to last)
## 
## 
##        Df Deviance Resid. Df Resid. Dev Pr(>Chi)    
## NULL                      89    218.387             
## Source  8    3.283        81    215.104   0.9153    
## Weight  9  173.040        72     42.064   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We might continue with a categorical analysis of the table, perhaps Fisher's Exact test (here, the data are in original table form)


fisher.test(ching.table,simulate.p.value = TRUE)

## 
##  Fisher's Exact Test for Count Data with simulated p-value (based
##  on 2000 replicates)
## 
## data:  ching.table
## p-value = 0.9975
## alternative hypothesis: two.sided

Finally, if we don't assume any distribution, a rank-based test is appropriate


friedman.test(Score ~ Source | Weight, data=ching.stacked)

## 
##  Friedman rank sum test
## 
## data:  Score and Source and Weight
## Friedman chi-squared = 3.4866, df = 8, p-value = 0.9002

friedman.test(Score ~ Weight | Source, data=ching.stacked)

## 
##  Friedman rank sum test
## 
## data:  Score and Weight and Source
## Friedman chi-squared = 56.718, df = 9, p-value = 5.722e-09

Long and short - there is more variation among scores between weight classes than there is variation among the rankers, and the range of differences among rankers is small relative to the error within weight classes. The difference between 544 and 588 is about 4 points per weight class, while high and low totals in any weight class, say 125 (45-65) varies more than that.

There may be ways to further decompose the error in the comparisons to distinguish (at least in a statistical sense) among the rankers, but there's not enough information in Ching's table.

 

Waymit! Did my "Oregon State alum who doesn't do social media" just get an account on this forum? lol

Share this post


Link to post
Share on other sites
8 hours ago, dakotajudo said:

Suppose we assume that each weight class is an independent measure of ranking accuracy. How well do the different rankers predict the outcome of a bracket? If we assume that scores are continuous and that errors are randomly distributed, when we can test this using a simple AOV, in R (given that stacked.ching is the long version of the table Ching posted):


ching.lm <- lm(Score ~ Source + Weight, data=ching.stacked)
anova(ching.lm)

## Analysis of Variance Table
## 
## Response: Score
##           Df Sum Sq Mean Sq F value Pr(>F)    
## Source     8  187.0   23.38  0.7211 0.6722    
## Weight     9 9227.1 1025.23 31.6256 <2e-16 ***
## Residuals 72 2334.1   32.42                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Given that the data sums of scores, we might use a Poisson model,


ching.glm <- glm(Score ~ Source + Weight, family=poisson, data=ching.stacked)
anova(ching.glm,test="LRT")

## Analysis of Deviance Table
## 
## Model: poisson, link: log
## 
## Response: Score
## 
## Terms added sequentially (first to last)
## 
## 
##        Df Deviance Resid. Df Resid. Dev Pr(>Chi)    
## NULL                      89    218.387             
## Source  8    3.283        81    215.104   0.9153    
## Weight  9  173.040        72     42.064   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We might continue with a categorical analysis of the table, perhaps Fisher's Exact test (here, the data are in original table form)


fisher.test(ching.table,simulate.p.value = TRUE)

## 
##  Fisher's Exact Test for Count Data with simulated p-value (based
##  on 2000 replicates)
## 
## data:  ching.table
## p-value = 0.9975
## alternative hypothesis: two.sided

Finally, if we don't assume any distribution, a rank-based test is appropriate


friedman.test(Score ~ Source | Weight, data=ching.stacked)

## 
##  Friedman rank sum test
## 
## data:  Score and Source and Weight
## Friedman chi-squared = 3.4866, df = 8, p-value = 0.9002

friedman.test(Score ~ Weight | Source, data=ching.stacked)

## 
##  Friedman rank sum test
## 
## data:  Score and Weight and Source
## Friedman chi-squared = 56.718, df = 9, p-value = 5.722e-09

Long and short - there is more variation among scores between weight classes than there is variation among the rankers, and the range of differences among rankers is small relative to the error within weight classes. The difference between 544 and 588 is about 4 points per weight class, while high and low totals in any weight class, say 125 (45-65) varies more than that.

There may be ways to further decompose the error in the comparisons to distinguish (at least in a statistical sense) among the rankers, but there's not enough information in Ching's table.

 

Dude, you should have asked me for the data! I would have saved you from retyping it.

Thanks for doing this. If I'm reading your analysis correctly, the RtR has not been proven to not be bullsh!t. I'm going to call that a win until you can definitively say it is bullish!t. Either way, I will still be crowning a champ every year.

Share this post


Link to post
Share on other sites
On 3/26/2019 at 2:01 PM, dakotajudo said:

Suppose we assume that each weight class is an independent measure of ranking accuracy. How well do the different rankers predict the outcome of a bracket? If we assume that scores are continuous and that errors are randomly distributed, when we can test this using a simple AOV, in R (given that stacked.ching is the long version of the table Ching posted):


ching.lm <- lm(Score ~ Source + Weight, data=ching.stacked)
anova(ching.lm)

## Analysis of Variance Table
## 
## Response: Score
##           Df Sum Sq Mean Sq F value Pr(>F)    
## Source     8  187.0   23.38  0.7211 0.6722    
## Weight     9 9227.1 1025.23 31.6256 <2e-16 ***
## Residuals 72 2334.1   32.42                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Given that the data sums of scores, we might use a Poisson model,


ching.glm <- glm(Score ~ Source + Weight, family=poisson, data=ching.stacked)
anova(ching.glm,test="LRT")

## Analysis of Deviance Table
## 
## Model: poisson, link: log
## 
## Response: Score
## 
## Terms added sequentially (first to last)
## 
## 
##        Df Deviance Resid. Df Resid. Dev Pr(>Chi)    
## NULL                      89    218.387             
## Source  8    3.283        81    215.104   0.9153    
## Weight  9  173.040        72     42.064   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We might continue with a categorical analysis of the table, perhaps Fisher's Exact test (here, the data are in original table form)


fisher.test(ching.table,simulate.p.value = TRUE)

## 
##  Fisher's Exact Test for Count Data with simulated p-value (based
##  on 2000 replicates)
## 
## data:  ching.table
## p-value = 0.9975
## alternative hypothesis: two.sided

Finally, if we don't assume any distribution, a rank-based test is appropriate


friedman.test(Score ~ Source | Weight, data=ching.stacked)

## 
##  Friedman rank sum test
## 
## data:  Score and Source and Weight
## Friedman chi-squared = 3.4866, df = 8, p-value = 0.9002

friedman.test(Score ~ Weight | Source, data=ching.stacked)

## 
##  Friedman rank sum test
## 
## data:  Score and Weight and Source
## Friedman chi-squared = 56.718, df = 9, p-value = 5.722e-09

Long and short - there is more variation among scores between weight classes than there is variation among the rankers, and the range of differences among rankers is small relative to the error within weight classes. The difference between 544 and 588 is about 4 points per weight class, while high and low totals in any weight class, say 125 (45-65) varies more than that.

There may be ways to further decompose the error in the comparisons to distinguish (at least in a statistical sense) among the rankers, but there's not enough information in Ching's table.

 

Every Iowa fan on here is completely lost by this post. Their heads are spinning at a rate that cannot be measured.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×