F1 Car Changes Full Report.

* Page exported from gdocs.

1) Overview.

1.1) Teaser Image

1.2) Summary

Our project explores Formula1 car improvements from 2000-2020. We specifically look at the improvement of cars over the years through power-to-weight and lap times — this is done through interactive charts, somewhat styled using the Formula1 branding guide, which visualizes the change (both improvements and worsening). Our innovative view is an animated head to head lap based on sector times for 2014 and 2019.

2) Data.

2.1) Description of Data.

Our dataset is threefold; we will discuss them in sections.

2.1.1) Formula 1 World Championship (1950 - 2020).

This is a massive split, tabular, dataset that contains “all information on the Formula 1 races, drivers, constructors, qualifying, circuits, lap times, pit stops, championships from 1950 [to the] 2020 season”, according to its curator. As such, we present it in multiple tables to keep it manageable. The numbers and attributes you see below are not reflective of the final data in the project—we did a series of pre-processing steps with these attributes.

Circuits: Circuits are where Formula 1 races are held. There are 9 attributes, but we have chosen 3.

Attribute	Type	Cardinality/Range	Note
circuitId	Categorical	76	id key of circuit, used to cross ref. other tables
CircuitRef	Categorical	76	Reference name of circuit (internal code use)
name	Categorical	76	Actual name of circuit (for display purposes)

Constructors: These are the manufacturers for the cars. There are 5 attributes, we have chosen 3.

Attribute	Type	Cardinality/Range	Note
constructorId	Categorical	213	id key of constructor, used for cross ref.
constructorRef	Categorical	211	Reference name of constructor
name	Categorical	211	Actual name of constructor (for display purposes)

Qualifying: This is the race period in which racers try to get their best times, so they can start further up the field during the actual race. There are 9 attributes, we have chosen 3.

Attribute	Type	Cardinality/Range	Note
raceId	Categorical	1047	id key of race, used for cross ref.
constructorId	Categorical	213	id key of constructor, used for cross ref
driverId	Categorical	851	id key of driver, used for cross ref.
q1	Quantitative	(0:53.904, 2:33.885)	Time in Qualifying Session 1
q2	Quantitative	(0:53.647, 2:12.470)	Time in Qualifying Sess. 2
q3	Quantitative	(0:53.377, 2:09.776)	Time in Qualifying Sess. 3

Lap Times: These are the lap times as recorded. Of the 6 attributes, we have chosen the following 3:

Attribute	Type	Cardinality/Range	Note
raceId	Categorical	1047	id key of race, used for cross ref
time	Quantitative	(0:50.404, 9:45.712)	Lap time in minutes
driverId	Categorical	851	d key of driver, used in cross ref.

Races: These are general race information, from the year to what circuit, to the race name. There are 8 attributes, but we have chosen the following 4:

Attribute	Type	Cardinality/Range	Note
raceId	Categorical	32	id key of race, used to cross ref. other tables
year	Ordinal	71	Year of race filtered to 2000-2020 (inclusive)
circuitId	Categorical	76	id key of circuit, used to cross ref. other tables
name	Categorical	76	Actual name of race (for display purposes)

Seasons: These are the years in which F1 takes place (F1 seasons are yearly). Of the 2 attributes, we have chosen 1:

Attribute	Type	Cardinality/Range	Note
year	Ordinal	71	Year of the F1 season

Drivers: These are the drivers who race in F1. Of the 9 attributes, we have chosen 4 of them.

Attribute	Type	Cardinality/Range	Note
driverId	Categorical	851	id key of driver, used to cross ref. other tables
driverRef	Categorical	850	Reference name of driver (internal code use)
forename	Categorical	470	A driver’s first name
surname	Categorical	792	A driver’s last name

These tables from Dataset 1 (F1 1950-2020) are completely unused:

Driver Standings: There are 7 attributes.
Pit Stops: There are 7 attributes.
Constructor Results: There are 5 attributes.
Constructor Standings: There are 7 attributes.
Results: There are 10 attributes.
Status: There are 2 attributes.

2.1.2) F1 Cars (2000 - 2020).

This dataset is a collection of every car that has raced in Formula 1 from 2000 to 2020. It is a dataset which we collected (the process is described below in pre-processing) for the purpose of this project. Each car has the following attributes:

year: The year that the car was made in. Somewhat synonymous with season as cars are almost always new for every season.

Ordinal, Cardinality: 21

constructor: The company (constructor in F1 jargon) that built the car.

Categorical, Cardinality: 46

car: The car's model name.

Categorical, Cardinality: 232

power: Horsepower.

Quantitative, Range: (600, 1100)

weight: Weight in kg.

Quantitative, Range: (600, 743)

powerToWeightRatio (DERIVED): A derived attribute in which we divide power by weight. We derive this in the pre-processing stage.

Quantitative, Range: (0.868, 1.652)
Why: Power to weight is a helpful characteristic to understand as it gives us a few things: (1) general idea of how performant the car is (higher is better), (2) it’s an industry/car nerd standard that they like to portray.

group: What constructor ‘lineage’ the selected car is from. See “why” for more explanation.

Categorical, Cardinality: 17
Why: This lets the user know the progress of the underlying team. Team names are not always consistent in racing, and they’re bought and sold surprisingly often. They often change names (even if they are not sold, title sponsors can require this), however in all of this, they are still the same team. For example, Racing Point has been under many different names (Jordan, Midland F1, Spyker, Force India, Aston Martin in 2021), but their main team has had little turnover, e.g. Otmar Szafnauer has been the team principal since 2009.

color: What color the group is.

Categorical, Cardinality: 17
Why: This is specifically used for the Mechanical Changes scatterplot points. The color is chosen and set by the group the current car belongs to, and the colors, for the most part, are chosen based on the newest car from that group. For example, if a constructor stopped racing in 2005, we would use their 2005 livery, or if a constructor changes ownership multiple times (e.g. Jordan Grand Prix to Racing Point), all the Jordan cars will be colored pink as Racing Point’s 2020 car was pink.

2.1.3) F1 Sector Times (2014, 2019).

This dataset is a very small collection of qualifying time sector times—Formula 1 splits up their lap timing into sectors. Each sector is a portion of the race circuit.

It is surprisingly hard to get a hold of this data (due to a series of culture issues within the FIA/Formula 1). It was not readily given out for most of the series’ lifespan, the Formula 1 website itself only has data from 2014 to the current season—over 60 years of not-so-easily accessible data.

The data is as follows:

year: Year of the F1 season

Ordinal, Cardinality: 2

circuitName: Actual name of circuit (for display purposes)

Ordinal, Cardinality: 3

sector1: First sector recorded time.

Quantitative, (18.28, 32.914)

sector2: Second sector recorded time.

Quantitative, (33.096, 41.484)

sector3: Third sector recorded time.

Quantitative, (17.318, 36.994)

Team: The constructor/team the driver raced for

Categorical, Cardinality: 2

Driver: The driver who set the times

Categorical, Cardinality: 4

2.1.4) Calculated Theoretical Lap Time

This is something we derived through a great deal of preprocessing on the main F1 1950-2020 dataset. We will discuss it in the pre-processing section 2.3.1, as it is not necessarily a dataset in and of itself, but rather a JSON file created and pre-processed purely for convenience. Nonetheless, its attributes are as follows:

Year: Ordinal, Cardinality: 21.
Calculated Theoretical Lap Time in Seconds: Quantitative, (81.81, 90.0)

2.2) Data Sources.

We have 3 main datasets. All processed data can be found on our repo: https://github.students.cs.ubc.ca/cpsc436v-2020w-t2/436v-project_c7s1b_v4x0b_w9d1b/tree/data-scraping

Datasets:

Formula 1 World Championship (1950 - 2020): Missing entries were collected by Lydia from the F1 Wikipedia

https://www.kaggle.com/rohanrao/formula-1-world-championship-1950-2020

F1 Cars (2000 - 2020): This was collected by Shabab from Wikipedia, f1technical, etc.: https://github.students.cs.ubc.ca/cpsc436v-2020w-t2/436v-project_c7s1b_v4x0b_w9d1b/tree/data-scraping
F1 Sector Times:

2.3) Pre-Processing Pipeline.

2.3.1) Pre-Processing Dataset 1 (Formula 1 World Championship (1950-2020)).

For dataset 1, we put it into MongoDB and did an immense amount of filtering on the database. We performed multiple joins on the appropriate keys to create a view with the fastest lap times for each race, per season. We filtered the seasons down to 2000 to 2020, inclusive. We exported that and then more pre-processing work in python to compute averages based on this dataset (which will be described below). We got the total number of rows down to 387. The object structure for the lap time data is as follows (only the new attributes are commented on in the table):

Object = {circuitRef, location, driverId, driverRef, forename, surname, raceId, year, constructorId}

New Attribute	Description
bestLapTime	The best lap time for a given track, for a given year, that we filtered and queried for
circuitName	Actual name of circuit, same as “name” in circuit table, but renamed
laptimeMillis	The best Lap Time converted into milliseconds

We noticed that the dataset was missing quite a bit of race data for the 2000-2003 seasons while doing the initial filtering. We initially left those entries as-is, so we could have data to work with. However, as we want complete data, more pre-processing was done in order to fill out missing entries, cross-referencing with Wikipedia. This was done in a spreadsheet, which was re-exported to a python notebook, where a calculation was set up to see how much cars have improved per year and this was based on a Reddit post that had a convenient enough methodology that we could utilize for our data.

Percent decrease/increase was calculated based on the best lap time of each circuit between two subsequent years. Based on the table of multipliers (1.xx for increase (i.e. bad), 0.xxx for decrease (i.e. good)), we were able to calculate a theoretical lap time for each year based off all the multipliers, starting at a theoretical lap time of 90 seconds multiplied by the first year’s (2000) multiplier. Each multiplier is then applied to the theoretical lap time result of the previous year. As such, the theoretical lap time serves as a more accurate overview for our visual that displays the averaged fastest lap time by year (LT0), in the form of lap time adjusted based on yearly average multipliers.

This “sub dataset” we produced is an array of the following structure:

[ [Year, Calculated Theoretical Lap Time in Seconds ], [ … ], [ ... ] ]

Where index 0 is the year, and index 1 is the calculated value.

Further preprocessing went on where we re-named certain very long circuit names to their alternate/more colloquial names, for example Autodromo Internazionale Enzo e Dino Ferrari was renamed to Imola — this is what most people call it and the long name is its formal/official one. For the purposes of this viz, and in general, the colloquial/actually used names are the correct choice as they are known by every fan and are even used by the F1 live commentators.

2.3.2) Pre-Processing Dataset 2 (F1 Cars (2000-2020)).

The pre-processing was done in Python. The scripts are provided in the data-preprocessing branch linked above. First we get the cars by season, and then from there we get the URIs inside of the scraped data. The tables are luckily standardized, so we can then scrape each car’s specific page for the constructor, the model name, the amount of power, the weight, the drivers who drove it. When there was missing data, we filled it with the FIA regulations for that year, and then once filled, we went and searched the internet for the actual number (as Wikipedia did not have complete data, unfortunately). We add our own field called “group” into the data, which was explained in the previous section. We also apply an unique color per group—this is used in the project’s scatterplot.

2.3.3) Pre-Processing Dataset 3 (F1 Sector Times).

We initially collected data from Kaggle for this, and then we filtered the data so only the tracks we supported are left. Furthermore, the data is combined/merged so for a json object has all 3 sectors we need for a given track/year. We manually updated the JSON output to include 2014 data from the FIA website; the reason for this is that we could better illustrate the difference (mostly the improvement) of the cars when comparing 2014 and 2019.

3) Goals and Tasks.

3.1) Power and Weight (Mechanical Changes).

The reasons users may want to look at this visualization are multifold:

Overview: they may want to view interesting looking data points -- for example, an unexpected trend in the power-to-weight ratio over the years, which could be the starting point of an investigation into the potential causes—whether it be engine changes, regulation, deregulation, or something else.
Filtered Scatterplot (Sub-Overview): they may want to see, in general, how F1 cars tend to cluster up in terms of horsepower to weight. This won’t be the same for every season, or even within-season, but clusters tend to emerge, whether it be through engine choice or regulation.
Detail view: the main task to be done here is viewing progress over time (similar to the overview, but more specific in this view). Both the power and power-to-weight ratio chart will display a constructor’s progress throughout the years. The user will be able to get a gauge of the constructor’s progress towards potentially building a faster (or slower, in some cases) car.

Because power (and related) figures are always of interest to car enthusiasts, the mechanical changes charts should be of interest, if only to see how cars end up clustering in terms of power, or how power levels go up/down. This would allow the user to further search as to why certain cars have gone down in power, or why all cars have gone down (e.g. it could be an engine supplier, or different regulations, etc.).

The overview to “sub-overview” to detail view unidirectional interactions are explained in the section below.

Note: In our write up we will mostly refer to this as Mechanical Changes, as per PM1 and PM2, to avoid confusion. We believe that Power and Weight is an apt title and description—however we realize that throwing around too much similarly-worded jargon may be potentially confusing to non-car enthusiasts.

3.2) Lap Time Progression (LT0, LT1)

There are many reasons why the user would be interested in this visualization, namely:

Overview: The overview allows users to identify trends in the averaged fastest lap times in F1 over years. This lets us see how overall car performance is changing (better performing cars have faster lap times). For example, there may be a period of time where the lap times are contrary to the trend prior, which in-turn may develop into users researching what the potential cause for this increase in time would be.

Overview: LT0, the overview allows users to lookup the fastest overall seasons through the theoretical lap times that we calculated for this chart.

Small Multiples: For users who are unfamiliar with the tracks and F1, the small multiples provide a leeway for them to browse the lap time data and learn about which seasons which tracks were run for F1.

Small Multiples: Compare the lap time trends per circuit by viewing the trend lines in a given circuit’s chart. We have implemented a “disable small multiple points” button for them to disable the points and view the trend line only, if they so choose—hopefully this allows even easier understanding of general trends.

Small Multiples: A user can find the best/worst times, or they can select whichever year(s) they want and visually compare those.

Small Multiples: Through the use of the interpolated/dashed line, a user can see when F1 does not use a certain track, and then comes back to it.

Small Multiples: With the tool tip, the user can see which driver set the fastest lap and they can potentially see the dominant drivers. See section 6.3 for a note regarding this goal/task.

3.3) Lap Time Visual Comparison Tool (LT2)

Beyond it being pretty cool, some reasons why the user would be interested in this visualization are:

Stacked Bar chart: The stacked bar chart has developed into a view for presenting the part-to-whole relationship of sector times to lap time (i.e. the sum of sector times becomes the lap time). Due to limited data availability and time, we only display 2014 and 2019.

Animation: This is meant to provide an innovative way to visualize sector time differences, which users can compare and contrast. Again, due to time and data limitations we were only able to display a total of 2 years (2014 and 2019), and 3 circuits: Suzuka Circuit, Circuit de Monaco, and Marina Bay Street Circuit. Users can watch a “race” between 2014 and 2019 for their selected track, and do a [comparison] grounded in an animated race between the two lines, visualizing how much faster/slower the sector time and the cars are.

Animation: Users would be able to consume and enjoy the line animation as a simulation of what a head-to-head race between the two fastest cars of the respective year’s grand prix would resemble, if it were to occur in real life. Of course, this is abstracted to the sector times taken between the two years worth of data.

4) Visualization.

The Mechanical Changes section includes three major parts: The overview, sub overview, and detail view.

4.1) Power and Weight (Mechanical Changes).

4.1.1) Design Rationale.

F1 cars are some of the highest horsepower/lowest weight cars in the world, especially in the single-seater, open-cockpit category. As such, this visualization is meant to give an understanding of how the constructors have progressed their cars (over the 2000-2020 seasons).

We split this into multiple linked views because there is a plethora of data to visualize and so it can get convoluted/difficult to view in one or two views. We will explain this further below.

4.1.2) Visual Encoding Choices.

Mechanical Changes Overview

Marks: Point mark for each year’s averaged power-to-weight ratio. Line mark for connection mark between points.

Channels:

Vertical and horizontal position channels encode average power-to-weight ratio and year respectively
Hue is used to differentiate between unselected and selected points. Currently it’s black for unselected data points, and red for the selected. The line remains consistently black.

Mechanical Changes Filtered Scatterplot (aka “Sub-Overview”)

Marks: Point mark for each car’s power-to-weight ratio.

Channels:

Vertical and horizontal position channels encode weight and horsepower respectively

Due to occlusion, we apply a jitter function that alters the vertical and horizontal positions to move overlapped points further apart.
While this decreases the accuracy of data, we believe this improves the readability and distinguishability of the individual points.

Hue encodes the constructor Group points. Currently it is color-coded according to the official F1 branding… though somewhat loosely due to the project spanning 20 seasons, and many constructors using similar hues.

For example, Ferrari is displayed in rosso corsa, the Italian racing red (#D40000).
For some of the groups, we made adjustments to eliminate redundancy in colors.

Opacity channel is used to encode a user’s selection. Opaque points indicate that they are selected, while translucent points indicate that they are unselected.

Combined with jitter, this also helps readability of any occluded points on the scatter plot .

Mechanical Changes Detail View

Marks: Point for each car’s power over the years. Point for each car’s power-to-weight ratio over years. Line mark for connection mark between points.

Channels:

Vertical position encodes power(Upper y-axis) and power-to-weight(Lower y-axis), and the horizon position encodes the year attribute.
It may be important to note that color hues encode nothing here.

4.1.3) The Views.

Initially we had decided that a static chart would be sufficient, but after some discussion we had expanded this section in Milestone 2 to include 2 line charts on top of the existing scatterplot, which portrays trends of power-to-weight and power of an F1 car over the years (dual y axis). For Milestone 3, we expanded it further to include the Average Power-to-Weight ratio over the Years to have a filter over the years—we did this because there was a great deal of occlusion. With a new addition of the new overview that filters for the scatterplot(formerly overview).

We purport that this is a filtered, multiform, overview/detail type visualization, where the overview has data that is aggregated per year, and is selectable to show items in its detail view(s). Furthermore, upon no selection in the overview, there is nothing to view in its detail view, the “sub-overview”/filtered scatterplot.

The scatterplot and two line charts (power/power-to-weight) relation is still the same as Milestone 2. The scatterplot has all the appropriate cars and the detail view that shows specific attributes for a subset of data—the subset aspect applies to the overview-sub overview/filtered scatterplot relation as well.

Overview.

In the overview, we have a line chart that displays the trend of power-to-weight ratio, averaged by year. The Y-Axis is averaged power-to-weight ratio, while the X-Axis is years, from 2000-2020. This view helps the user in a few ways: (1) It details a trend of the ratio that gives an executive summary of the power-to-weight changes over the years, (2) acts as a filter for the scatterplot to only include a subset of years that the user is interested in exploring, which subsequently also propagates the filtering to the detailed view. By default, 2010 and 2020’s averaged power-to-weight ratios are selected (which also populates the scatterplot).

Filtered Scatterplot (“Sub Overview”).

We sometimes call this a sub overview; it is a result of the filtered selection from the (actual) overview and yet the filtered scatter plot itself acts as an overview for what we call the detail view—thus, sub overview. In this section, we have a scatterplot that simply displays all the cars, and they’re positioned by their power-weight. The Y-Axis is weight, and the X-Axis is power. This view allows the user to see a few things: (1) when combined with the tooltip, where a car lies in terms of number figures in comparison to all the others, (2) groups/clusters that might pop up, F1 has regulations on power/weight, but they’re limits, and not every constructor can always reach that limit—many often do not. By default, and by way of the overview’s default, the scatterplot shows cars from 2010 and 2020 as they have respectively, the minimum and maximum averaged power-to-weight. None of the points are actually selected however, which leaves the detail view blank.

Detail View.

In the detail view, we have 2 charts that show a subset of data -- the subset is based on the selected car’s constructor. Horsepower over the Years’ (Chart 1/the upper chart) Y-Axis is that of a horsepower figure for a selected constructor’s cars, and Power-to-Weight over the Years’ (Chart 2/the lower chart) Y-Axis is that of the power-to-weight ratio for that constructor’s cars. Both charts share the same X-Axis, which is years (2000 to 2020), and the data displayed is whatever is available, as not every team has competed in every year (e.g. HAAS F1 is a very new team, starting their run in 2016). By default, the detail view is blank as nothing is selected in the filtered scatterplot.

4.1.4) Interaction Details.

The linechart acts as an overview/filter for the scatterplot and has an unidirectional interaction. Upon selecting a year (or years) from the overview linechart, the associated data for the individual cars from a user’s selection will be displayed in the scatterplot, color coded by the constructor group through hue.

Fig 1. Nothing selected at all (this is not the default, we use this to illustrate our viz).

Fig. 2. A single year is selected in the overview line chart, populating the scatterplot. Note the tooltip.

Fig. 3. More years are selected, and more points are plotted.

The scatterplot acts as a “sub-overview” for the initially empty line graph and has a unidirectional interaction. Upon selecting a car from the scatterplot, the associated line charts display changes over the years for that specific car’s constructor.

Fig. 4. We select the 2015 Mercedes. Notice that its point is now opaque, and the line charts are now populated.

If the user clicks the “Ferrari F2001” point in the scatterplot from 2002 data, then the detail view will show the horsepower and power-to-weight for all the Ferraris that have competed in F1 from 2000 to 2020. If the user then clicks the “Jordan EJ11” point in the scatterplot, the data shown in the detail view will transition to that of all that constructors’ cars. As mentioned previously (see ‘Group’ attribute explanation), the data shown is that of the constructor itself, and names change. This means that when we click a Racing Point car, we will also see Jordan, and Midland F1 cars -- because the team was bought out and the name was changed.

See the following series of images for a walk through of our example.

Fig. 5. (above) Ferrari F2001 is selected from the scatter plot, as 2002 was selected on the overview and thus its data is displayed.

Fig. 6. (below) Jordan EJ11 is clicked, and the line charts change.

Once a point is clicked in the scatterplot, the appropriate cars within that grouping are highlighted by their opaqueness, contrasting to unselected translucent points. The appropriate cars are what were explained above -- a given constructors’ cars. That is to say, if we click a Ferrari, all the Ferraris will be highlighted. Due to the filtering by year functionality in the overview, sometimes multiple years’ worth of data must be selected to find additional data points that belong to the same constructor group, since there may be few data points per year.

Figs. 7 and 8 show the Ferrari constructor group selected; first with only 2015 data (thus, 1 car) and then with 2001 and 2015 data showing, and so both Ferrari points are highlighted automatically.

When switching from data point to another, the connection line in detail view is animated for ease of comparison and viewing pleasure.

There are tooltips on the overview, sub overview, and detail view that shows itself upon hovering over a data point. This will display the averaged power-to-weight ratio and year for the overview, the car’s basic data (its constructor, name, year/season, etc.) in the sub overview, and the same format for car’s basic data (constructor, name, year, weight, power, power-to-weight) in the detail view.

Fig. 9. (Left Top) shows the tool tip on the overview line chart. Fig. 10. (Right Top) shows the tooltip on the “sub overview”/filtered scatterplot. Figs 10 and 11 (bottom left and right) show tooltips for the detail view line charts.

4.2) Lap Time Progression (LT0, LT1).

4.2.1) Design Rationale.

Lap times are one of the most important metrics in any form of racing, and as F1 cars are some of the fastest circuit race cars on the planet, and they can produce some ridiculously quick lap times around the circuits they race on. Beyond fast laps being very exciting to watch, lap times in general tell us the overall performance of the car. It’s quite possibly the most important thing to have in this project.

We left LT0 the same from PM2, and we left it as a line chart. We chose this because it was a reasonable way to show the trend of the theoretical lap, which in turn allows users to see progression of improvement quickly. LT1 has been changed to small multiples, and this allows users to see the general lap time trends (and their highlighted years as well). Small multiples seemed to be a reasonable way to show multiple trends, for multiple race circuits.

A potentially interesting feature we included was to disable the points in the small multiples, as to only show the lines. This should allow users to see the trends even easier.

As the axes for this section are a little different, we will explain them here as a design rationale. For LT0, we have a traditional Y axis label, but no X-axis label — this was left out for a cleaner aesthetic, but also the fact that we believed that given the contextual clues of “Yearly”, the project being based on the years 2000-2020, its x-axis is self-evident.

For LT1 (the small multiples), we have traditional Y-axis only on the left side (i.e. the row with Monza, Interlagos, etc.) , see the image on the left. The reason for this was that having every y-axis label would have (1) caused poor readability, (2) caused higher (passive) cognitive load due to the amount of stuff on screen, and (3) looked visually unappealing. For similar reasons, we display the x-axis only along the bottom most row. We do keep the ticks on every small multiple, as a visual cue/indicator which hopefully acts as a sufficient reminder of the y and x axis and their values.

4.2.2) Visual Encoding Choices.

F1 Lap Time Progression From All Races (2000-2020), LT0 Overview

Marks:

Point mark for each year’s theoretical lap time.
Line mark for connection mark between the points.

Channels:

Position on a common scale - Vertical and horizontal position channels encode theoretical lap time (in minutes, the y-axis) and years (2000-2020, the x-axis) respectively.
Upon selection of point, color hue channel encodes the selected year(s) attribute. Red encodes year(s) selected, while the default black hue indicates a point is unselected. This color hue encoding is mirrored in the detail view.

Lap Times for Tracks (Small Multiples), LT1 Detail View

Marks:

Point mark for each track’s best lap times, one per year.
Line mark for connection mark between the points.

Dotted line style express interpolated data for certains years of a track while solid lines are for existing data

This is due to not all tracks were ran consistently over the entirety of the year range considered (2000- 2020)

Channels:

Horizontal position encodes attribute of year for the given track; Vertical position encodes the best lap time in minutes.
Color hue channel encodes selected year(s). Rationale is the same as LT0 for this encoding, and for consistent association via color hue similarity. Specifically, all red points would encode selection in both LT0 and LT1.

This encoding has changed significantly from our previous milestone since LT1 was changed from a stacked scatter plot into small multiple line charts

Shape, through the use of a dashed line, encodes data that is interpolated—i.e. the track went unused in F1 from years X to years Y, thus instead of providing a solid line (which would mislead the user), we instead use a dashed line to show the fact that this data is technically missing.

4.2.3) The Views

The original idea was to have two parts to this view: the overview, which we will refer to as LT0, and the detail view, which we will call LT1. Together, they form a multiform, overview type of visualization. While the general idea was kept consistent, the type of visualization for the detail view (LT1) has been changed from a stacked scatter plot to small multiples of line charts to accommodate for excessive occlusion.

Overview.

LT0 is our overview. It’s a line chart that displays averaged fastest lap times over the years. The Y-Axis is averaged best theoretical lap time in minutes, and the X-Axis is years from 2000 to 2020. LT0 serves multiple purposes: (1) Users will be able to view how the averaged fastest lap times have changed over the years using theoretical time from derived multipliers, (2) this chart acts as an overview for the small multiples line charts seen below in LT1. For PM3, Y-Axis has been changed to theoretical laps for a more accurate multiplier-based calculation on yearly lap times increase/decrease(See section on Pre-Processing Pipeline for how the calculations were performed).

Detail View.

In the detail view, we have LT1, a small multiples line chart and a detail-view/disaggregation of LT0. The Y-Axis is the fastest qualifying lap time (in minutes), and the X-Axis are the years inspected (2000-2020). Each point on each line chart represents the fastest qualifying lap time for a year, and the line charts are separated into one per F1 circuit. LT1 allows users: (1) to swiftly compare and contrast fastest lap times over the years by track, (2) allows users to see the specific active timelines of different circuits, as well as trends across tracks, (3) see gaps between usage of tracks through the use of the dashed lines for interpolated data.

4.2.4) Interaction Details.

There is a bidirectional interaction between LT0 and LT1. Upon clicking year points in LT0, highlights will occur in LT1. The points per track, for the selected years will be highlighted in LT1, where each selected year in LT0 will display in LT1 as a red highlighted point for that year across all tracks.

Fig. 12. (above) The view for when nothing is selected.

Fig. 13. (below) shows the year 2012 selected in the LT0 overview. This highlighted change is reflected across all 2012 points in the LT1 detail view.

A user can also click a point that is not currently selected on the small multiples line charts (LT1) and see the corresponding year be highlighted in LT0 and other tracks in LT1.

Fig. 14. (above) When an unselected year (2000) is clicked on in the A1-Ring circuit in LT1, the year’s point is highlighted across all other tracks as well as the overview (LT0).

Users may also toggle already highlighted points on LT0 by deselecting them on the small multiples, or deselect points in LT1 to see the corresponding year deselected from the LT0 overview.

All highlights are made distinguishable by the color cue red, so that the years selected stay distinct. Upon re-clicking either the highlighted point in LT0, or any of the highlighted points in LT1, the highlighted color hue will be reset to a black, which signifies it is unclicked.

Fig. 15. Clicking on year 2000 when it is already highlighted in LT0 deselects the year from LT0 and all of LT1. Note the change of hue back to black upon deselection.

Fig. 16. Clicking on the 2012 lap time data in Silverstone Circuit results in deselection of 2012 in all other circuits, as well as deselection in the overview.

For example, selecting 2003 in LT0 will result in a highlight on LT1 for all the year 2003 points across all tracks, and deselecting a 2003 point from the A1-ring circuit would deselect all 2003 in all the small multiples line charts.

Fig. 17. (above) Selecting 2003 in the LT0 overview highlights all 2003 points in the LT1 small multiples detail view.

Fig. 18. (below) Deselecting from A1-Ring results in deselecting all highlighted points, in both the overview and detail view.

If the user wanted to compare the best time at Suzuka Circuit in 2003 to the best time at Suzuka in 2019. They are able to click on both years and highlight the points of interest in the Suzuka Circuit line chart. This also highlights the years 2003 and 2019 across all circuits. This way, if users wished to compare the Suzuka times to, say, Circuit de Monaco, no additional highlighting is necessary.

Fig. 19. (above) A comparison of Suzuka Circuit times with emphasis on 2003 and 2019. Without further selection needed, can expand this comparison to other highlighted circuits, such as Circuit de Monaco.

An alternative approach without clicking would be to hunt the points down in LT1 by hovering over the point and looking at the tooltip.

Figs. 20 and 21 (above) The tooltip information for 2003 and 2019 lap times respectively on Suzuka Circuit.

Upon hover to a data point, LT0 currently displays the average theoretical lap time for the specified year in seconds through the tooltip. LT1’s tooltip currently shows the year, the lap time at a specific track during that year, in (MM:SS.s) time format, and the driver for that lap time on the specified track.

Fig. 22. (above) Tooltip for the overview, LT0. Includes year and theoretical lap time.

Fig. 23. (below) Tooltip for the detail view, LT1. Includes year, best lap time and driver information.

In addition, you can disable points, or clear selection, or reset with the corresponding buttons. Reset displays the default chart, which has 2014 and 2020 arbitrarily highlighted.

Fig. 24. (above) When the disable small multiple points button is pressed, all points in the small multiples are disable to allow a toggled view for trend searching. Note the button label has changed to allow for enabling the points.

Fig. 25. (above) Clear selection button removes all selected/highlighted points from both views.

Fig .26. When the reset button is pressed, the default view for LT0 and LT1 is displayed, with 2014 and 2020 highlighted.

4.3) Lap Time Visual Comparison Tool (LT2).

4.3.1) Design Rationale.

F1 cars are constantly evolving. In order to show the lap time difference over the years, we look at Suzuka Circuit, Circuit de Monaco, and Marina Bay Street Circuit and put the best qualifying lap sector times of 2014 and 2019 into a bar chart. We also (and more interestingly) animate said sector times in a head-to-head race. This provides a

4.3.2) Visual Encoding Choices.

Stacked Barchart

Marks: A glyph that is a composite of line marks that encodes lap times for each year/sector

Channels:

Position on common scale for each glyph
1D length encodes lap time
Hue is used to differentiate between sectors
Spatial region

Aligned: the overall glyph, and the sector 1 bar component of the glyph
Unaligned: the other bar components (sector 2, sector 3)

Animation View

Marks:

Line mark encodes the geometric shape of the circuit
Line mark that is used for the overall lap time

Proportion of the line mark encodes the the progress of the race

Channels:

Motion encodes the sector time (and thus as a result, the overall lap time)
The different year’s lap/sector time are doubly encoded by:

Spatial position channel
Color hue channel

Black encodes the figure of the map
Other hues encode the different sectors for a given lap

4.3.3) The Views

Our original plan was to support selection on the year(from 2000 to 2020) and animate the whole lap time in one constant animation. However, the line moves in a linear manner and does not show the change over different portions of the lap. However, we are not able to grab as much data over 20 years and decided to only show the sector time of 2014 and 2019.

The view is that of a multiform one—both the stacked barchart and the animated view use the same sector time data, but they visualize them in wildly different manners.

Stacked Barchart.

Since we have limited data on sector time, In the bar chart, we display the best sector time for each track in 2014 and 2019. The Y-Axis is the best sector time, while the X-Axis is years 2014 and 2020. This view gives the user a better understanding on: (1) how does the best lap time change over the year on the same track, (2) what’s the change in each sector

Animation.

In the animation, we are showing all three sectors side by side on a track for both 2014 and 2019. The user is able to have a real time feeling on how much faster a 2019 F1 car is compared to a 2014 sector by sector.

4.3.4) Interaction Details.

The radio buttons serve as the overall filter by track, that determines the rest of the visualizations displayed in this section.

Fig .27. (above) Default state for the visualization. Radio button for Suzuka is selected.

While the default state for the animation view automatically selects for Suzuka circuit, upon switching over to a different button, both the stacked bar chart and the track animation svg will be altered accordingly to match the selection. The stacked bar chart provides an overall comparison of the lap times between 2014 and 2019,

Fig. 28. (above) Switching selection from Suzuka Circuit to Circuit de Monaco using the radio buttons, stacked bar chart and circuit shape are updated accordingly for the new circuit.

further separated into sector times. Upon clicking on the start race button, two lines will be drawn side by side for the two years’ lap timed data, at speeds that correspond to the raw sector time data.

Fig. 29. (above) Start race button is pressed. Animation starts with the comparison of two years sector data, displayed side by side in a racing format. Refer to the legend for distinguishing between the years and sector times.

As line animations are drawn on the geometric circuit shape, with the sectors differentiated by both their position on the circuit map as well as by the differing color hue. After all 3 sectors (the overall lap time is made up of a sum of all 3 sector times) are drawn, the circuit race between the two years is completed, and the lines at full progress remain on the track.

Fig . 30. (above) Circuit animation has reached completion. Geometric shape fully colored.

Selecting a new track will reset the track with no animation playing, while pressing the start race button again will restart the animation.

We have added tooltips to the stacked barcharts, which includes time information such as the specific sector time, as well as the overall elapsed lap time for the specified track during that year.

Fig. 31. (above) Tooltip for stacked bar chart. Includes sector time and lap time information.

5) Credits.

Credits and sources will be listed by visualization, and then general.

5.1) Power and Weight (Mechanical Changes)

either https://github.com/UBC-InfoVis/2021-436V-examples/tree/master/d3-interactive-scatter-plot or https://github.com/UBC-InfoVis/2021-436V-examples/tree/master/d3-static-scatter-plot

The structure changed quite a bit—stuff like initData was added for control flow and readability, we also added a jitter function.

https://github.com/UBC-InfoVis/2021-436V-examples/tree/master/d3-interactive-line-chart

Line chart reference: The general structure is still probably familiar, at the least—but specific-to-us implementation details like transitions, variables for connectivity were added.

https://stackoverflow.com/questions/40458363/how-to-add-jitter-to-avoid-overplot-in-d3

This post helped fix the jitter function implementation. Specifically, the “+ Math.random”, before that it was “value * Math.random”, which was very off.

5.2) LT0/LT1

https://github.com/UBC-InfoVis/2021-436V-examples/tree/master/d3-interactive-line-chart

Referred to this class example for setting up initial LT0

https://github.com/UBC-InfoVis/2021-436V-examples/tree/master/d3-canvas-scatter-plot

Referenced for LT1 initial design

https://stackoverflow.com/questions/66230918/what-is-the-best-way-to-create-small-multiples-in-d3-js-v6

Made small multiples originally using this, major changes were made:

Rewrote it to follow course reusable chart concept
Some modification with regard to data appending (switch to join, appending stuff in the join)
Changed the scale functions
Basically changed most of it and kept some concepts like how they create new SVG objects for each chart group, and calling the render function in a forEach.

https://www.reddit.com/r/formula1/comments/cs1txp/f1_lap_times_by_year_from_20002019/

Calculated average theoretical lap time each year based off this post, minor changes:

We didn’t fully comprehend what they meant by the square root of the two year time difference, so we did not incorporate that portion into our calculations
We dropped tracks with multiple NaN values/ were not used enough to provide accurate data
Kept most of the rest of the calculations consistent with post, like how we kept the arbitrary 90s as the starting point for theoretical lap time

5.3) LT2

http://zevross.com/blog/2019/08/20/load-external-svgs-with-d3-v5/

How to load svg file in d3.

http://bl.ocks.org/methodofaction/4063326

How to animate a svg path.

https://developer.mozilla.org/en-US/docs/Web/SVG/Tutorial/Paths

SVG Path explained:

Track maps from Wikipedia:

5.4) General

Formula 1 font obtained from reddit.
Titillium Web from Google.
Scrolling behaviour obtained from stackoverflow , scrolling button calculations from different stackoverflow post

No major changes (that we can remember).

6) Reflection.

6.1) Project evolution and growth

Our project evolved quite a bit from milestone 1. We added more charts, more functionality and flip-flopped between LT2 implementations. Overall, there was more scope creep than there should have been.

For example, our initial anticipation did not account for how long the data processing portion would take

6.2) How have your visualization goals changed?

Each visualization has gone through changes, some minor and some drastic.

6.2.1) Mechanical Changes.

This section of the project was expanded to include one more chart. We went from a scatterplot of the cars and 2 line charts, to a line chart of overall power-to-weight ratios per year that acts as a filter for the scatterplot of cars (which now displays based on selected years).

The scatterplot has also been given a jitter. The reason both the filtering chart and the jitter were implemented is that we have a lot of overplotting, due to the nature of our data for this part of the project (that is, race cars that can be both of secretive specifications and also of regulation maxing specifications) many points lie on top of each other. Thus both combine into an attempt to solve scatter plot occlusion through putting less onto said scatterplot and to modify a point's X and Y coordinates a tiny bit for the sake of readability.

The rest of mechanical changes have not changed.

6.2.3) LT0.

This section’s y-axis calculations were altered significantly. Initially, we had a placeholder method for calculating the average lap time per year, which was then shifted to theoretical lap time in minutes based on derived yearly multipliers done in the data processing step. Overall though, the choice of visualization did not change, as it remained a line chart as the same intended axises remained.

6.2.4) LT1.

This visualization was changed from a stacked scatter plot with line connections into small multiples, which removed the occlusion issues we had with the stacked scatterplot with line connection (see left image). The result is a set of small multiples that had the labels on the outside left/bottom and is overall much easier to read (see bottom left image). After creating the small multiple, we figured that it would be a good idea to let people know that some data is actually “missing”—not every track is used in every season of F1. We did this with a dashed line and let people know that it is interpolated data (see bottom right image).

6.2.4) LT2.

Our visualization goals reverted to our initial lofty intentions of having sector times animated. We were mistakenly under the impression that we needed a plethora of tracks to be available, not understanding that this was effectively a (polished) tech demo. This misunderstanding caused us to change our goal of comparing sector times to comparing overall lap times — a less interesting visualization. In the latter half of Milestone 3, the teaching staff alerted us to this fact and we got to work changing the static speed lap viewer to a sector time based lap viewer.

6.3) How have your technical goals changed?

6.3.1) Mechanical Changes.

For mechanical changes, our technical goal developed alongside the visualization, due to issues that arose in the visualization. After consulting with the opinion of the teaching staff as well as within our team, we ended up expanding our design to include a filter to the old overview, which became a “sub overview” — that is, the scatterplot is still the overview for the power progress/power-to-weight progression line charts as it was before, but it has its own overview now.

This change helped remove some occlusion of data in the old overview’s (current “sub overview”) scatterplot, with overlap that made it highly difficult for distinguishability in some parts of the graph. This was a decently quick transition, with a heavy reference on how the existing detail view and old overview was set up to instantiate a new line chart overview by year. As expected, this addition in visualization did fix our initial issue with occlusion for the car data.

6.3.2) LT0.

In the grand scheme of things, the LT0 visualization has not changed much from our initial concept as an averaged value chart that serves as an overview to the more detailed dive into the specific lap time across different tracks, as covered in LT1.

6.3.3) LT1.

For LT1, our technical goals evolved along with the visualization—or rather, because of the visualization. Due to the occlusion mentioned in the previous section, we discussed some options with the team, conferred with the teaching staff and decided that small multiples were the way to go. This was a learning experience in both getting the small multiples working (though thankfully some good reference did help), and, then afterwards, getting the circles to draw on the line—this had no reference; after speaking with Steve, he correctly (and very quickly) pointed us in the direction of a data-pass issue which did solve our issue.

6.3.4) LT2.

For LT2, we replaced the standard barchart with a stacked barchart to show the sector times, as this would give a better part-to-whole representation. Since we limit the number of tracks to 3, instead of using a drop down manual, we use radio buttons for track selection. We also narrow down the years to 2014 and 2019 only, thus the interaction for barchart where users could choose between years is disabled.

6.4) Realism of Proposal.

Our proposal was realistic — we implemented our charts as described (for better or for worse). Our changes served to further improve legibility/conform to branding/apply foundational theory.

6.5) Was there anything you wanted to implement that you ultimately couldn't figure out how to do?

This is difficult to measure — we figured out what we wanted to do. However, this does not mean that there weren’t some initial hacks being used (such as a counter to make sure axes only appended N amount of times), before realizing “better” ways to do it.

One thing that we did not have the time to do was connect the dataset of cars and lap times. The preprocessing was getting to be too much—it would have been a nice addition for LT1. However, the overall point of LT1 was viewing the general trend and so the viz keeps its usefulness in that aspect, but it would be a great supplement to be able to see which manufacturers are dominant, or have been setting the fastest laps. This idea alone must be its own visualization however, as there is a lot to do for it, and a lot we can do for it.

6.6) If you were to make the project again from scratch (or any other interactive visualization), what would you do differently?

The data preprocessing was a nightmare for what we picked, so the most pertinent thing that we would do differently is pick a good dataset. Everything else went good and overall it was a good learning experience.

7) Team Management

7.1) Time Estimation.

We did not give estimates per task, instead we opted for overall hours worked. We will provide estimated hours per task, then tally them up, and compare them to our estimated weekly hours. New items are indicated by (NEW) on the item listing.

Lydia	Made / setup blank project Setup eslint with airbnb rules Made LT0 starter Pre-processing data Write up Git repo maintenance, merging Massive amount of data pre-processing, collected missing circuit entries (new) Calculating average lap time with multiplier(new) Mechanical change updates (new) Scrolling buttons (new) Write up (new)
	PM1 Estimate for Lydia: {16 hr}. Actual Total from PM3: {30+ hrs}.
Robert	Sector data pre-processing Svg file searching Svg file editing so multiple tracks/sectors and background are shown Animation event handling/cancelling Radio button (new) Cleanup tool tip Button styling Write up
	PM1 Estimate for Robert: {16 hr}. Actual Total from PM3: {28}.
Shabab	viz expansion idea (new) Styling All of git merging nightmare Write up Fine tuning programming aspect mc update, lt0/lt1 update lt1 small multiple (new) bossed people around (not new) lt2 sector time barchart, lt2 data sleuthing for sector time (new)
	PM1 Estimate for Shabab: {16 hr}. Actual Total from PM3: {30+ hours}.

So overall our tasks shifted around from person to person, but we got a lot of what we said done, when comparing it to our PM3 work schedules, and we are on track for the most part.

The data had been more of a nightmare than we expected, we spent an inordinate amount of time on it… which was… unfortunate, to say the least.

Also, we are evidently not good at estimating times required, as consistent with our previous milestone.