F1 Car Changes Data Viz

CPSC 436V (2021)

Summary

This project explores Formula1 car improvements from 2000-2020. My team and I look at the improvement (or worsening) of cars over the years through power-to-weight and lap times — this is done through interactive charts which visualizes the change. The project is styled using F1's branding guide. We implemented an interesting view: an animated head to head lap based on sector times for 2014 and 2019.

We also got in the CPSC 436V hall of fame for this project!

Team

  • Shabab Khan
  • Lydia Zheng
  • Robert Mo

My Role

Beyond ideating the initial project and the visualizations, I was also the front end designer, lead programmer, documenter, and helped with data ingest and processing.

Links

view project | source

Tools Used

  • D3.js
  • HTML/CSS
  • MongoDB
  • Python
  • Google Sheets

Brief Explanation

Please note that the brief explanation here skips over finer details, but provides general decision making, and experience. The report is loooong but informative! You can find the full report in PDF / HTML format by clicking on the respective format types.

Visualization Goals and Tasks

Each visualization had a different goal and task.
1. Power and Weight
(aka Mechanical Changes)

  • Overview: This view could show us some interesting and unexpected trends in the power:weight ratio over seasons. The reasons, of course, could be various, and thus this would provide a starting point of investigation.
  • Filtered scatterplot/sub-overview: the viewer may want to see how the cars cluster up in terms of power:weight.
  • Detail view: here, we look at progress over time and get a feel for constructor progress in building a car. Both the power and power:weight charts will show a constructor's progress through the years.

2. Lap Time Progression
(aka LT0 + LT1)

  • Overview: With this, we can identify trends in averaged fastest lap times over years/seasons. We get a gauge for overall car performance that way. LT0 also allows us to look up the fastest overall seasons through theoretical lap times (the calculation can be found in the long form report).
  • Small multiples: We can find the best/worst times, view by year(s), we can compare lap time trends per circuit through the trend line. This visualization provides a leeway for people to browse lap time data and see which tracks were run in which seasons.

3. Lap Time Visual Comparison Tool
(aka LT2)
  • Stacked bar chart: This presents the part-to-whole relationship of sector times to lap times (i.e. the sum of sector times becomes the lap time).
  • Animation: Provides a cool way to visualize differences in sector times/lap times.
  • Being pretty cool to look at.
  • Visualization Interaction

    This section will mostly be animations and short captions. Each sub-section is listed as Name (ABBR). The detailed text is in the report.

    Mechanical Changes (MC)

    This visualization displays all possible years, a set of cars, and information for one car.
    The key interactions are in the upper line chart to scatter plot to lower line chart interaction, and the tool tip that appears on hover.


    Fig. 1. Tool tip and filtering selection on line chart.

    Fig. 2. Tool tip and 2 selections on scatterplot, tool tip on small multiple line charts.
    Lap Time Progression (LT0 + LT1)

    This visualization displays "theoretical lap time progression" a metric we created that indicates changes in car performance, the tracks ran from 2000-2020 and the average times.
    The key interactions are in the upper line chart to lower small multiple line chart interaction, and the tool tip that appears on hover.


    Fig. 3. Tool tips on overview (LT0).

    Fig. 4. Selection on overview (LT0), tool tips and de-selection on small multiples (LT1). Shows the bi-directional relationship between overview (LT0) and small multiples (LT1).

    Fig. 5. Selection and de-selection on small multiples (LT1). Shows the bi-directional relationship between overview (LT0) and small multiples (LT1).

    Fig. 6. Disable/enable points on small multiples (LT1). This lets the viewer get a feel for the general trend.

    Fig. 7. Clear Selection to remove anything already (pre-)selected, and reset button to go back to initial load state.
    Lap Time Visual Comparison Tool (LT2)

    Fig. 8. Tool tips on stacked bar chart. This shows the different sector times for the different years.

    Fig. 9. Radio buttons to change the track selection.

    Fig. 10. Starting the race! A quick glimpse at how the actual race function of this visualization works. Check it out here!

    Description of Data

    We used 3 data sets for the project.

    1. Formula 1 World Championship (1950 - 2020). Missing entries were collected by Lydia from the F1 Wikipedia pages.
    2. F1 Cars (2000 - 2020): This was collected by me from Wikipedia, f1technical, etc.
    3. F1 Sector Times:

    Each data set was filtered before being run through the preprocessing stage. To view the rationale behind our choices in the attributes we kept, as well as their types, cardinalities, meanings, etc., please see the full report (pdf / html).

    Data Preprocessing

    Each data set had some form of preprocessing.

    We started by putting our first data set (the F1 World Championship set) into a MongoDB instance and found it was an immense number. We filtered it down to seasons from 2000 to 2020 and then threw it into a python script that Lydia and I wrote. This python script computed average lap times. We noticed that there was some race data missing between 2000-2003, and so we had to manually fill in that data on a Google docs Sheet (and then re-run our script).

    The F1 car list data set was pre-processed entirely in python that I wrote. It was a scraper that would pull data from a (thankfully) standardized table that each car entry had on Wikipedia. Some data was missing due to the proprietary nature of almost everything in F1, and for that, we tried our best to fill it in reasonably (through searching the internet or using estimated numbers).

    We initially collected data from Kaggle for this, and then we filtered the data so only the tracks we supported (i.e. used between 2000-2020) are left. Furthermore, the data is combined/merged so that a JSON object has all 3 sectors we need for a given track/year. We manually updated the JSON output to include 2014 data from the FIA website; the reason for this is that we could better illustrate the difference (mostly the improvement) of the cars when comparing 2014 and 2019.

    Reflections

    During this project, I learned a lot about project management, development, setting and re-defining goals as necessary.

    Evolving Viz/Technical Goals

    Every visualization went through some level of changes, some more significant.

    Changes to Mechanical Changes Viz

    We expanded this visualization with one more chart. In its original form, this chart was simply a scatterplot of the cars and 2 line charts. The final version has the same overall parts in that it has a scatterplot and line charts, but they now interact in a way such that one of the line charts (power:weight) acts as a filter for the scatterplot (and the vehicles it displays).

    There is also a jitter on the scatter plot. This helped remove some of the occlusion of data in the overview of the scatterplot. It does make the data a little less accurate, but being able to see the actual points was decided to be more important, and the actual numbers are available in the tooltip.

    Changes to LT1 Viz

    In terms of viz design, this was changed from a stacked scatter plot to a small multiples visualization. This change allowed us to remove large amounts of occlusion, create a clean aesthetic look, and visualize missing data in an interesting way.

    Changes to LT2 Viz

    We were originally going to do a standard barchart but we decided to do a stacked barchart to visualize the sector times better; it represents the part-to-whole relation better.

    For Next Time

    Overall, I'm happy with the project. However, the amount of time we spent filling out data... it was... inordinate. So having a well sorted data set would be the main thing to address for the next personal viz project.

    Just to repeat the message above: The brief explanation below skips over finer details, but provides general decision making, and experience. The report is loooong but informative! Check it out in PDF / HTML.