Baby Names Animated Graph - arrays, objects and CSV files in p5.js
This sketch demonsrates my knowledge of arrays, objects and using CSV files. I always wanted to incorporate user textual input to make something more useful than the usual "fun" things made thus far. The sketch takes in a name and two years from the user's input, and then makes a bar graph showing the number of babies named each year with that name between and in the given years. The data is gathered from a number of CSV files, each file listing all of the baby names for a single year, the gender of the babies and the total number of babies with that name. For added flare, I wanted the graph to update dynamically from user input and animate the bars. They grow instead of blip into existence. The sketch accomplishes the following:
Clearly demonstrates the use of arrays and CSV files.
Shows how a javascript object can be used to effectively control the timing of animation.
Demonstrates how a single function for animation can be applied to a whole array of objects.
Demonstrates drawing text and geometries to screen based on changes in screen dimensions
Uses text input fields and detects events from them.
Process:
I chose the data I selected because it was a set of data that used multiple files, which would encourage me to learn a handy skill for CSV files, since most work involving CSV files involves multiple files.
I first made sure I could extract the data I needed from the CSV files by writing a for loop that would store the data from a given number of files into a table structure. The algorithm loaded data from files for years between two hard-cided years. These hard-coded years would later be compared to user input to not cause any segmentation faults (user wanting data from a file not loaded in the table).
I wrote an algorithm that would scale the data so that it could fit onto the screen. Drawing the data straight to the screen, because the numbers are so large, causes the points to be graphed outside of the canvas. The algoirthm sets the highest value to the height of the graph, then proportions the other points accordingly. Later this height was changed to incorporate a buffer between the graph and the edges of the window.
The buffers are all scaled in relation to the bottom buffer of the graph, which is seen by a black line when the graph is drawn. This line is set to the (7/8) of the window height. The top of the graph is then set to the size of the bottom buffer - (6/7) of the window height. The 7 and 8 from the first quantity were eventually made into variables, so I could tinker with the buffer size until it looked right. (1/8) window height for the buffer looked the best so it stuck.
The side margins were calculated in a similar fashion, except it was (1/8) window width.
Originally the data for a user-selected name was represented by three line graphs: one for total number of babies with a name, and one for each the number of female and the number of male babies with the name. Eventually this was changed to a bar graph, with the total quantity of each year represented by a bar with the color of representing the smallest quantity, and a bar representing the data and color representing the largest quantity printed on front of the "total" bar. What remains showing from the "total" bar represents the smallest quantity for the year. In case that doesn't make any sense, this makes the smallest quantity's bar look like its sitting on top of the largest quantity's bar, making the combined bar represent the total. Three bars for the price of two.
I made two states for the sketch: one for "empty" user-input fields and another for populated user-input fields ready to be used for graph generation. The first state also checks to see if the user-fields contain valid data, so that the sketch doesn't try to draw the graph with bad parameters.
To prevent too much CPU use, the graph is only drawn when the input fields detect new input. This was extra helpful when I decided to implement bar animation, so that the animation would only occur when new user-input is received and the graph needs to update.
The text is managed with javascript objects. Each object contains the text and position information for all text drawn to the screen. Keeps the code nice and organized, but unfortunately it doesn't do anything practical.
Before a new graph is drawn, it scans through the tables populated during the preload stage. Each table holds all baby names for a single year. Each name has a gender and the quantity of babie with the gender received that name (most names will have both genders. Each gender for that name has a different data entry, thus a different quantity). Whenever scanner finds a target name, it saves the number of babies with name to an array corresponding to the gender of the name. When it finds the name for both genders (or has finished scanning a table), it then adds the quantities for the two genders and saves the sum to the "totals" array.
An example of a practical object is my Timer object. I got tired of figuring out real-world timing for animation, so I decided to my an object with methods that does it for me. Basically when the object is created, it stores the time in milliseconds the program was running before the object was created. When I want to know the current elapsed time in milliseconds, it takes the total program running time and subtracts the previously-stored millisecond value from it to give me how much time has elapsed since the object was created. Another method resets this elapsed time by updating the millisecond value stored initially to the current millisecond value, effectively making the current() method to return "0." Now I never have to worry about keeping track with current and total elapsed time!
This timer object is critical for the animation function. The animation function animated a single point drawn on the screen, "tweening" it between two keyframes. Basically, what the animation function does is....
Take in a keyframe to track what "stage" the animation is in.
Also take in the total number of keyframes, the minimum point value (where the point should be at the beginning of the animation) , and the maximum point value (where the point should be at the end of the animation).
The animation function returns where the point should be depending on the given keyframe.
Returns keyframe*((maxValue-minValue)/maxKeyframes) + minValue. In English: The diffence between the min and max points is divided by how many keyframes there are, giving how much "distance" shoulw be traveled with each keyframe. Multiply by the current keyframe count, and you get how much "distance" was covered by the point by that time. Add that value to the beginning point, and you get the current position of the point.
This gives the **linear** animation of the point. If I wanted something more smooth and "natural," I could use this function:
With this, the rate of change starts at, 0, increaces to 1, and returns to 0 (sin(x)). To translate this to position data, I took the integral sin((keyframe/maxKeyframes)*PI) and added 1 to it, so that the position would start at 0 + minValue instead of minValue - (maxValue-minValue), which would be very wrong. I then divided by two to make sure the upper limit was correct: I want the max value to be maxValue, NOT maxValue*2 - minValue !
The next keyframe is determine by elapsed time, to both make the animation throttle to match hardware speed and to make it so I can time animations in human time, like "seconds," and note "frames." The keyframe is updated with this function: Return currentTime * (maxKeyframes/endTime)
maxKeyframes/endTime is the amount of keyframes per millisecond. Multiplied by the current number of milliseconds gives current number of keyframes. To make sure it doesnt animate for longer than anticipated, if currentTime > endTime, keyframe is simply set to maxKeyframes.