One of the last things we determined (but which is important enough that I mention first) is that in addition to developing a ranking system, we must gather enough data on individual matches that we can analyze the results of the rating system afterward, as well as use the data after the fact to test other potential rating systems.
Anyway,
That change is being applied because in general, lopsided groups are well-balanced in terms of the damage they can do (the larger team's damage is reduced, the smaller team's damage is boosted), as mentioned previously. It is an open question whether grouping with other players of significantly different skill effects the overall skill of your team in a linear fashion the way a calculated mean models it, but this is one of the reasons we'll be tracking results independently of the aggregate rating.
The second modification we're making is that for each member on each team, the score adjustment will be scaled linearly based on the distance of that member's rating from the mean rating of the team: |RA-RA1| * K( SA - EA). I'm also keeping in mind scaling the rating delta using a Gaußian distribution as an alternative, but again, we're starting simple.
Other modifications under consideration include replacement of K with a function K(GA, GB) that varies the potential size of the delta based on the (average?) number of games played by each side — essentially, the system used by Days of Wonder for Gang of Four to encourage experienced players to play newbies, and to let newbies reach their proper rating more quickly.
I doubt that these changes are novel, or even — to be honest — the best ranking system we could use (even accepting the performance constraint of "use as little CPU as possible"). Instead, they are a first stab at the problem, and one that may prove sufficiently accurate that a more complicated approach simply isn't warranted. The point of maintaining the data, and doing periodic comparisons of the existing rating system with other alternatives, is to figure out how complex the system needs to be before ratings stabilize and player skill is accurately recognized.


