This project explores Formula1 car improvements from 2000-2020.
My team and I look at the improvement (or worsening) of cars over the years through
power-to-weight and lap times — this is done through interactive charts which visualizes
the change. The project is styled using F1's branding guide.
We implemented an interesting view: an animated head to head lap based on sector
times for 2014 and 2019.
We also got in the CPSC 436V hall of fame for this project!
Beyond ideating the initial project and the visualizations, I was also the front end designer, lead programmer, documenter, and helped with data ingest and processing.
This section will mostly be animations and short captions. Each sub-section is listed as Name (ABBR). The detailed text is in the report.
This visualization displays all possible years, a set of cars, and information for one car.
The key interactions are in the upper line chart to scatter plot to lower line chart interaction,
and the tool tip that appears on hover.
This visualization displays "theoretical lap time progression" a metric we created that indicates
changes in car performance, the tracks ran from 2000-2020 and the average times.
The key interactions are in the upper line chart to lower small multiple line chart interaction,
and the tool tip that appears on hover.
We used 3 data sets for the project.
Each data set was filtered before being run through the preprocessing stage. To view the rationale behind our choices in the attributes we kept, as well as their types, cardinalities, meanings, etc., please see the full report (pdf / html).
Each data set had some form of preprocessing.
We started by putting our first data set (the F1 World Championship set) into a MongoDB instance and found it was an immense number. We filtered it down to seasons from 2000 to 2020 and then threw it into a python script that Lydia and I wrote. This python script computed average lap times. We noticed that there was some race data missing between 2000-2003, and so we had to manually fill in that data on a Google docs Sheet (and then re-run our script).
The F1 car list data set was pre-processed entirely in python that I wrote. It was a scraper that would pull data from a (thankfully) standardized table that each car entry had on Wikipedia. Some data was missing due to the proprietary nature of almost everything in F1, and for that, we tried our best to fill it in reasonably (through searching the internet or using estimated numbers).
We initially collected data from Kaggle for this, and then we filtered the data so only the tracks we supported (i.e. used between 2000-2020) are left. Furthermore, the data is combined/merged so that a JSON object has all 3 sectors we need for a given track/year. We manually updated the JSON output to include 2014 data from the FIA website; the reason for this is that we could better illustrate the difference (mostly the improvement) of the cars when comparing 2014 and 2019.
During this project, I learned a lot about project management, development, setting and re-defining goals as necessary.
Every visualization went through some level of changes, some more significant.
We expanded this visualization with one more chart. In its original form, this chart was simply a scatterplot of the cars and 2 line charts. The final version has the same overall parts in that it has a scatterplot and line charts, but they now interact in a way such that one of the line charts (power:weight) acts as a filter for the scatterplot (and the vehicles it displays).
There is also a jitter on the scatter plot. This helped remove some of the occlusion of data in the overview of the scatterplot. It does make the data a little less accurate, but being able to see the actual points was decided to be more important, and the actual numbers are available in the tooltip.
In terms of viz design, this was changed from a stacked scatter plot to a small multiples visualization. This change allowed us to remove large amounts of occlusion, create a clean aesthetic look, and visualize missing data in an interesting way.
We were originally going to do a standard barchart but we decided to do a stacked barchart to visualize the sector times better; it represents the part-to-whole relation better.
Overall, I'm happy with the project. However, the amount of time we spent filling out data... it was... inordinate. So having a well sorted data set would be the main thing to address for the next personal viz project.