Announcement

Collapse
No announcement yet.

Euroleague data scraping

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Euroleague data scraping

    Would it be interesting for someone to have the script for scraping the data out of Euroleague's website?

  • #2
    Ok, so I assume someone would need sooner or later. Anyone who needs the script I've put it here.
    You need to install Python 2.7.x (personally I used 2.7.13) with pip (my suggestion install Python with all of the features), and you need to run these commands from command prompt:
    > pip install beautifulsoup4
    > pip install requests
    > pip install openpyxl


    and you should be ready to start the script. It generates and data.xlsx file with the season totals for each player currently in the roster of euroleague team. Hope it will help someone

    Comment


    • #3
      Thanks, I'll check it out. I haven't practiced python in a while.

      Comment


      • #4
        Originally posted by Oly_fan View Post
        Thanks, I'll check it out. I haven't practiced python in a while.
        Honestly the code is sluggish, if I taken more time to do it and not just one afternoon it would be more clear. I am C++ developer so I also don't do Python often, but the script does what its suppose to do, it parses those pages and creates .xlsx file.

        PS. I've added the comments in the script so you can understand it better
        Last edited by unnamed; 12-28-2016, 12:56 AM.

        Comment


        • #5
          Just a heads-up, the code as it is currently breaks down on Darussafaka because Zizic doesn't have a position assigned to him yet. Very easily fixed and other than that, it works fine. Since it was my first time scraping data from sites I'm glad I learnt something.

          I had this optimistic idea; since euroleague only has shot charts for each game, I thought I could get the datasets behind each chart, aggregate them and produce shot charts for players/teams over whole seasons. However, the actual data seems unavailable? I know nothing about HTML so I could be missing it.

          Comment


          • #6
            Originally posted by Oly_fan View Post
            Just a heads-up, the code as it is currently breaks down on Darussafaka because Zizic doesn't have a position assigned to him yet. Very easily fixed and other than that, it works fine. Since it was my first time scraping data from sites I'm glad I learnt something.

            I had this optimistic idea; since euroleague only has shot charts for each game, I thought I could get the datasets behind each chart, aggregate them and produce shot charts for players/teams over whole seasons. However, the actual data seems unavailable? I know nothing about HTML so I could be missing it.
            My main goal with this is to calculate PER based on Hollinger calculation and then do some kind of analysis (one thing that falls on my mind is ANOVA) to find ratio between the NBA and Euroleague players, but I realized I might need even more advanced script for PER, because I need how did every opponent played against the team of the player for whom I want to calculate PER.

            Concerning the script, I had several different ideas and I decided to use the most straightforward one. I realize if I used dictionary instead of list this wouldn't collapse in case of Zizic like it is now, but then again I thought every player would have his playing position

            Getting the shooting chart would require completely different script. You'd need to go game by game and collect the data. You'll probably need images as well, for the court, for the made and missed attempt. That would require you to install yet another library pillow, should you decide to use openpyxl.
            Last edited by unnamed; 12-28-2016, 10:29 PM.

            Comment


            • #7
              I am an IT goat, but if you are willing to share xls files, I 'd thank you
              EUROLEAGUEADDICTED

              Comment


              • #8
                Originally posted by unnamed View Post
                Getting the shooting chart would require completely different script. You'd need to go game by game and collect the data. You'll probably need images as well, for the court, for the made and missed attempt. That would require you to install yet another library pillow, should you decide to use openpyxl.
                I did go game by game but I couldn't find the data. I think, if I had it, I could make the rest using R and shiny; I'm more familiar with those. I saw people sharing python code for making shot charts too if that didn't work out.

                Comment


                • #9
                  Originally posted by Oly_fan View Post
                  I did go game by game but I couldn't find the data. I think, if I had it, I could make the rest using R and shiny; I'm more familiar with those. I saw people sharing python code for making shot charts too if that didn't work out.
                  If you want to inspect the code right mouse click in the browser over the page you want to check then click 'Q' (this works for Firefox). I've found shooting chart on the game options, like here. It could be extracted.

                  Comment


                  • #10
                    Originally posted by unnamed View Post
                    If you want to inspect the code right mouse click in the browser over the page you want to check then click 'Q' (this works for Firefox). I've found shooting chart on the game options, like here. It could be extracted.
                    I did that but I couldn't find the coordinates data. Maybe I missed it, I'll have another look when I can.

                    Comment


                    • #11
                      Originally posted by Oly_fan View Post
                      I did that but I couldn't find the coordinates data. Maybe I missed it, I'll have another look when I can.

                      You got a lot of "g" tags and inside every "g" tag there is "circle" tag, coordinates are attributes cx and cy.

                      Comment


                      • #12
                        " ng-attr-cx="{{point.COORD_Y * 776 / 2800 + 56}}" ng-attr-cy="{{point.COORD_X * 416 / 1500 + 218}}
                        " ng-attr-cx="{{(800 - (point.COORD_Y * 776 / 2800 + 56))}}" ng-attr-cy="{{point.COORD_X * 416 / 1500 + 218}}
                        This is just a general formula to see where each shot goes on this 'court':
                        bg-shooting-md.jpg

                        The point.COORD_X and point.COORD_Y are the actual variables for each shot.

                        Comment


                        • #13
                          Tag looks like this:
                          <circle ng-class="selected == point.NUM_ANOT ? 'selected' : ''" ng-attr-cx="{{point.COORD_Y * 776 / 2800 + 56}}" ng-attr-cy="{{point.COORD_X * 416 / 1500 + 218}}" ng-click="setPlayer(point.EQUIPO, point.JUGADOR, point.MINUTO, point.CONSOLA, point.ID_JUGADOR, point.PUNTOS_A, point.PUNTOS_B, point.NUM_ANOT)" r="10" fill="#FFFFFF" stroke="#F7941E" stroke-width="2" cx="101.17428571428572" cy="320.61333333333334"></circle>
                          I've put in bold the what attributes I referred to. It looks like its distance in pixels.

                          Comment


                          • #14
                            Ok, with chrome I was just getting the general code. I found it using firefox.

                            Comment


                            • #15
                              Originally posted by radallo View Post
                              I am an IT goat, but if you are willing to share xls files, I 'd thank you
                              Its pointless to share xls file because the data change as the fixtures progress

                              Comment

                              Working...
                              X

                              Debug Information