This is our old blog. It hasn't been active since 2011. Please see the link above for our current blog or click the logo above to see all of the great data and content on this site.

Splits/Boxes/Gamelog Suggestions

Posted by Sean Forman on March 3, 2010

I've been working on getting the latest and greatest data from RetroSheet onto the site and will be making a few additions to the affected pages as well, which I'll go into more later when they launch.

Building all of this stuff is a five day process where our server runs continuously for five days building the 120,000 box scores, the 9m rows of play-by-play, the 5m rows of gamelogs, and 10m rows of splits. So adding a little thing here and there just isn't worth it. I've got about two windows a year to get things added and this is one of them. So if you want to suggest a split, gamelog, or boxscore feature, now would be a good time to do so.

One idea I've had since we'll be adding a lot of data from 1920-1939 (no pbp, just boxes) is to add a split for vs. RHstarter and vs. LHstarter. We won't know Lou Gehrig's exact splits, but we'll know what he did when a lefty started the game and when a righty started the game.

Others you would like to see?

Note: We also had a twitter outage after our blog update, but things are back up and running now.

43 Responses to “Splits/Boxes/Gamelog Suggestions”

  1. Dave Says:

    so the game, event, , b vs. p, and streak finders will be updated with the 1920-1939 stats (babe ruth's 1927 streaks will be there...)?

  2. Sean Forman Says:

    Yes, that is the plan. It is going to be a bit hard with the 12 year gap, but I think we can make it work.

  3. Sean Forman Says:

    Actually, let me clarify that. We wont' have b vs. p or events because we don't have pbp for those years. We will have streak and game finders for those years.

    Splits will also be limited to complete game splits like day/night, home/road, by opp, by lineup slot, etc.

  4. Gary Marbry Says:

    I'd like to see the split "DP Situation" (1st base occupied; less than 2 outs) both in the splits sections and as an option in the event finder.

    And this is probably pie in the sky, but an option on the splits page to display the splits as home only, road only, and perhaps versus lefty or righty. IOW, a double split like "RISP at home" or "After 0-1 counts versus lefties".

  5. Charles Saeger Says:

    1) Make groundball/flyball and power/finesse splits be equal to a set percentage of the league for each split. Say, have the 25% of all pitchers with the most walks and strikeouts per batter faced be power pitchers, and the 25% with the lowest be finesse pitchers, rather than having a set minimum.

    2) For seasons without full PBP data, do display opposition lefty/righty splits on the team level only. It will help to know how many lefties a team faced, even if we can't peg them to a specific pitcher.

    3) Same story with opposition records -- even if we cannot peg a double allowed or a stolen base to a specific pitcher, knowing the team total will be helpful.

    4) For games wherein multiple fielders played a position, Retrosheet doesn't give an innings total for the individual fielders. It would help to have an estimate.

    5) An opponents' fielding line, which would let us know where the errors fell and give us some idea of how often this team hit groundballs.

  6. Zachary Says:

    I'd like to be able to cross-index the various splits. I'd like to be able to see what field players hit to in particular parks, for instance, or be able to divide the opponent splits into home and away. How well has Derek Jeter hit against the Red Sox in Yankee Stadium? How many opposite field home runs did Carlos Delgado hit at Shea?

  7. Greg Rybarczyk Says:

    In Play Index, don't just list the top 8 or 10 choices for a filter, list them all (if they are teams or parks, at least). It's annoying to try to check a player's PA's at another park, and if it's not one of the top ones he played in, you have to worm your way there by a long, difficult route...

  8. JDV Says:

    I've written a couple of times previously about a minor glitch in this site's fielding statistics. I hope this is an appropriate forum to try again.

    Using this link (http://www.baseball-reference.com/leagues/MLB/2009-standard-fielding.shtml) as an example, the problem is this...for multi-team, multi-position players, the season totals aren't all accurate.

    To illustrate, switch the view from the default (alphabetical) to the 'GS' column.

    - The new display will start with Prince Fielder, who was the only major leaguer to start all 162 games in the field.

    - As you scan down the page, you'll find Orlando Cabrera at Line # 8. He played for two teams in the same league, but all at the same position. His entry is correct.

    - Scroll further to Matt Holliday at Line # 20. He played for two teams in different leagues, but all at the same position. His entry is also correct.

    - You'll find Victor Martinez at Line # 45. He played for two teams (same league) at two different positions. His totals are also correct.

    - You'll find Mark DeRosa at Line # 107. He played for two teams in different leagues, and started games at four different positions for each team. Still, his totals are correct.

    - Eventually, you'll find Nyjer Morgan at Line # 261. He played for two teams in the same league, but played two positions for one team and only one position for the other. That may be the key because his totals are wrong. He should be found at Line # 143 with 115 GS. For some reason, only his Pirates totals appear at Line # 261. Morgan later appears at Line # 310, showing his combined total of GS at only one position.

    That was long-winded, but it must be a simple problem with a simple fix.

  9. Ryan Wilkins Says:

    While seconding every request mentioned up to this point, I'd really love to see:

    * All of the pitch-type statistics (e.g., bb-ref.com/leagues/MLB/2009-pitches-pitching.shtml), especially in the player leaderboards, display at least one decimal place (Or at least make that available when you view in CSV or PRE form, if space is a major issue.) B-R offers some really great information on those pages -- information you can't get elsewhere -- but I'd love to see a little more specificity in the rankings and the numbers displayed.

  10. Djibouti Says:

    Not sure if you're touching team pages, but if you are, a few suggestions:
    When you're looking at a team's stats page it would be nice if there was some kind of indicator for which players were traded/acquired that season. Maybe a symbol next to the name like '+' for acquired and '-' for traded/dropped.
    A more convoluted but kind of interesting addition would be a table at the bottom of the page listing player movements. A 3-column table with 'Name', 'Date', and 'Movement'. The movement column would include things like 'picked up in trade', 'lost in trade', 'demoted to minors', 'promoted from minors', 'picked up off of scrap heap for midseason playoff push following injury to useful player', etc.

  11. Johnny Twisto Says:

    I cosign Charles Saeger's first suggestion. The way it's set up now, almost no one from the '50s falls under the "power pitcher" split, because there were many fewer strikeouts then.

  12. Gerry Says:

    I'd like to be able to sort players on Black Ink, Gray Ink, etc. I'd like to be able to sort pitchers on pitching and batting stats simultaneously, e.g., who hit the most home runs of any pitcher who struck out 200 batters in a season. I'd like to be able to sort on single season and career stats simultaneously - what's the record for most hits in a season by a player who had fewer than 1000 hits in his career?

  13. Sean Forman Says:

    @4:Gary

    I've added DP situation. Double splits can be done sort of with the event finders.

    @5:Charlie

    I'll look at the GB/FB suggestion, but that probably isn't going to happen.

    re: 2) do you mean the pitching stats of the LH and RH pitchers facing a team? How is 2 and 3 different?

    4) how would you suggest estimating it?

    5) interesting idea, I'll look at add a cumulative fielding line for the opponents by position.

    @6: Zach

    The PI event finder will give you the Jeter info.

    @7: Greg, I'll work on that.

    @8 JDV, I'll fix that before the season starts

    @9: Ryan, I'll see about adding a digit.

    @10: not the focus right now, but a good idea

    @12, please see the description of this blog entry. 🙂

  14. Jeff James Says:

    Are there home & away splits, and I'm just blind?

  15. DavidRF Says:

    @12:

    Gerry, I get the following:

    Johnny Hadopp 225/880
    Beau Bell 218/806
    Dale Alexander 215/811
    Dustin Pedroia 213/580
    Hanley Ramirez 212/771
    Benny Kauff 211/961

  16. Raphy Says:

    DavidRF is referring to Jonny Hodapp who had 225 hits in 1930. (I wouldn't ordinarily correct this, but the b-r search tool returns nothing for the name as typed.)
    http://www.baseball-reference.com/players/h/hodapjo01.shtml

  17. DavidRF Says:

    Sorry. Typo. My script returns:

    H_N H_Sum H_Max playerID
    9 880 225 hodapjo01
    7 806 218 bellbe01
    5 811 215 alexada01
    4 580 213 pedrodu01
    5 771 212 ramirha01
    8 961 211 kauffbe01

    ... and I tried to make it more readable. I got a little lysdexic.

  18. Sean Forman Says:

    @14:Jeff

    Splits are linked just above the player and team batting and pitching stats.

  19. Mike Sandler Says:

    I've asked for this a couple of times over the last couple of years, I'll try again. I'd like the ability to sort by Batting Runs. The stat appears on the players stat page, but it's not a sorting option. This is only for the season finder.

  20. Raphy Says:

    Since we're answering Gerry's examples, here are the single-season home run leaders among pitchers with 200+ strikeouts. (I found this using just PI and Excel)
    7 - Jack Stivetts (1890 and 1891), Don Drysdale (1965) & Earl Wilson (1966)
    6 - John Clarkson (1887), Fergie Jenkins (1971) & Carlos Zambrano (2006)
    5 - Jim Whitney (1883) & Bob Gibson (1965 and 1972)

  21. Avoiding the twin killing in the two hole | River Avenue Blues Says:

    [...] exact numbers for GIDP opportunities isn’t available right now, though it could be coming. For now what we can do is work with an estimated number. Clearly, we can narrow down opportunities [...]

  22. Evan Brunell Says:

    -xFIP
    -manager ages
    -able to do that year by year sorting for a specific team when he splits years between teams. IE if Player A spends 01-07 with Boston then 08 split between Boston and Baltimore, I'd like to be able to, if I wanted, get the 06-08 Boston numbers but with your AWESOME selecting yearly splits tool i wouldnt be able to get just Boston 08 itd be 08 collective

  23. Gerry Says:

    Thanks Sean, DavidRF, Raphy. I wasn't particularly interested in the exact searches I mentioned; I'd like to be able to do that *kind* of search on my own, instead of having to rely on the kindness of strangers. But I'm sure you knew that.

  24. Charles Saeger Says:

    SF@13: #1 is a bigger deal for power/finesse anyways. Right now, if you move past the steroid era, the numbers are meaningless because so few pitchers qualify to be power pitchers. Really, you're best off just ditching the split as things stand.

    #2 and #3 are related. I'm just making sure there's a full opposition line, and for pitchers' splits as well.

    #4 is going to be a project for someone(s). My solution was to use plate appearances -- the relief fielder gets the minimum number of innings needed to have batted so many times. If that isn't clear, use plate appearances as a split, any excess going to the starter -- in a 8.2 inning game, if both shortstops have 2 PA, we'd grant 5 innings to the starter and 4.2 to the reliever. If someone isn't doing it by hand, which is probably too time consuming, have a computer assign by plate appearances percentage.

  25. Dave Says:

    The only down side to the "game event" multi-split feature is that you can't get something unless it's in the top amount.
    A player that has 5 of something may not show up in the first list because the more amounts are given first (in the NFL TD log, there is a "show full list" message that allows you to expand the list to show all from the first page...not just from the top amounts)

  26. Dave Says:

    How about a season streak feature for players, pitchers, and teams

    (most consecutive years/seasons getting X amount of something...)

  27. Charles Saeger Says:

    CS@24 #4: I meant starter gets 5 innings, relief fielder gets 3.2 innings.

    I'm thinking there has to be a reason, but why are there no individual pages showing ballpark data? Say, a page showing the doubles hit in Wrigley Field each year and similar such. Someone down on Tango's site mentioned lefty/righty ballpark data, but unless I'm a complete dipstick, I can't even find a page dedicated to each park.

  28. Charles Saeger Says:

    Now I'm just piling, but didn't James make a similarity score for seasons too? Would that be displayable?

  29. Sean Forman Says:

    Charles,

    You can get ballpark totals on the league splits pages. I hope to add a comprehensive ballpark page at some point.

    I've added manager ages to their outputs.

    Season streaks are a long-time request.

  30. Chris J. Says:

    One semi-related thought:

    Let's say I click on (random example here) 2009 AL offensive splits. Then I click on the red for April batting info for all teams. Boom - all team info for April pops up. Cool.

    Can I click on the header rows so that the info is organized by that column? I used to be able to do that, but now I can't. It comes up by sOPS+ (I think) but I can't get anything else up.

  31. Sean Forman Says:

    Chris,

    You can't sort within a popup, but if you click on the permalink option you can then sort to your heart's content.

  32. Vlad Says:

    Maybe you can already do this, and I just don't know how, but with players who are traded partway through a season, it'd be nice to be able to isolate sub-splits within that season for each team (so that we could check for differing situational usage on the two rosters, for example).

  33. Sean Forman Says:

    Vlad:

    Look for this text
    2008 Season Splits: Season Total / Boston Red Sox / Pittsburgh Pirates

    on this page
    http://www.baseball-reference.com/players/split.cgi?n1=bayja01&year=2008&t=b

  34. Travis G. Says:

    not sure if this has been mentioned:

    Stadium Splits!

    LHP/RHP vs. LHB/RHB in Yankee Stadium, Fenway Park, etc.

  35. Charles Saeger Says:

    SF@29 -- yeah, I was looking for the multi-year data, akin to the current pages from Retrosheet on, well, steroids.

  36. Ryan JL Says:

    1. Batting Runs and Batting Wins added to the PI!

    2. Baserunning information for the event finders (eg so I can find who has stolen home the the most times, etc.)

  37. Traco Bucco Says:

    This might be a bit difficult to put together, but what about linking box scores to Google Archives articles about the games? For example, the box for this 1974 Pirates-Cubs game could link to the game's coverage in the Pittsburgh Press.

  38. Gary Marbry Says:

    One last thing I'd like is a better way of distinguishing players with the same name (like the two Alex Gonzalez').

    This may be required soon as (for now) both Ramon Ramirez' are members of the Red Sox!

  39. Norman Says:

    Would it be possible to get the statistics of players for a team and a league on a specific date like June 15, 1930?

    Would it be possible to get statistics for the LAST x games of the season? Presently we can get statistics for players for the Cubs for the first 135 games of the 1969 season, would if be possible to get the last 27?

    Lastly, would it be possiblee to get LIFETIME statistics for a team instead of just the cumulative totals?
    I know this is not related to games but....

    Keep up the great work sir....

  40. WanderingWinder Says:

    Something I would like to see is a team leaderboard, where you could see things like "most innings pitched by a team in a single season" or "most runs scored by a team in a single season" - this seems pretty basic, but I can't find it anywhere on teh site as is.

  41. Charles Saeger Says:

    Parting request: most common lineup/defensive alignment against lefty/righty pitchers. You could also do most common by month, to get an idea how teams shift things around during the year, but it isn't as important.

  42. Gary Marbry Says:

    I swear this is my last suggestion: In the streak finder, can there be some sort of indicator to show that the streak is still "active"?

  43. bobm Says:

    We won't know Lou Gehrig's exact splits, but we'll know what he did when a lefty started the game and when a righty started the game.

    Others you would like to see?

    I know you won't be incorporating pbp data. However, I would be interested in seeing any pbp data available for innings in which Lou Gehrig hit a grand slam. Maybe it could be down as a separate special feature linked to Gehrig's BB-REF player page, akin to the log available for Ripken's streak. Thank you.