Csse333 Final Report
Csse333 Final Report
Final Report
Baseball Statistics Manager
CSSE333
S2G4
May 17, 2019
Derek Grayless, Eric Kirby, Cherise McMahon
Table of Contents
Table of Contents 1
Executive Summary 2
Introduction 2
Problem Description 2
Solution Description 3
Frontend 3-4
Backend
Features 5-6
Key Challenges 7
Database Design 8
Security Measures 8
Integrity Constraints 8
Stored Procedures 10
Appendix 13
Resources 13
Relational Schema 13
Position Player *1 14
Pitcher *2 14
Entity Relationship Diagram 15
Justification of Entity Relationship Diagram 15
Index 16
Glossary 16
1
Executive Summary
This document includes details on the goals, background, and implementation details of the
Baseball Statistics Management System. Included is a brief overview of the project, the Entity
Relationship diagram associated with it, a summary of the problem, and more. This document
acts as the skeletal system of the project, allowing the execution to be tracked based on
benchmarks.
Introduction
This document serves as the final report for the Baseball Statistic Manager database. This
project team consisted of Derek Grayless, Eric Kirby, and Cherise McMahon. The purpose of this
document is to outline the process and end product of the Baseball Statistics Manager, which
we have created over the last five weeks. It will evaluate the successes and hardships of the
project. This overview will also reflect on the final problem statement and the security analysis
as well as the strengths and weaknesses of the final product.
Problem Description
Since the use of Sabermetrics, an empirical way of evaluating a baseball players performance,
became commonplace in the game of baseball in the early 2000s, the amount of data available
to Major League Baseball (MLB) teams has exploded. As more data has become available, the
way baseball is played has drastically changed as a result. Players and coaches are now
expected to understand the complex decisions that are being made by front offices. Because of
this, the need for tools to make sense of the vast amount of information available is extremely
high. Players should be able to spend more time focusing on playing, than learning.
The goal of this project was to give the fans the similar abilities of front offices to search, filter,
and save their favorite searches. This way the statistics of their choosing would be right at their
fingertips. The Baseball Statistics Manager seeks to accommodate users both in the stands and
on the field.
2
Solution Description
Our project includes a frontend web application written in Angular, which consists of TypeScript,
HTML and CSS. This communicates to the backend composed of a Python program with a
Flask microframework. This in turn connects to the SQL Server database where we query for
data that the user has requested. This result then works its way back through the system to the
Angular user interface where the data is formatted and displayed.
Frontend
The frontend is a web-based application written in Angular. This allows for an appealing user
interface, while also having the ability to be scaled to a bigger project in the future. The user
interface consists of four tabs. These include the Home tab, Login tab, Search tab, and Favorites
tab. The Home tab contains information about our project and its purpose. The Login tab is
where a user can login and create an account. Once logged in, this is where the user will be able
to log out as well. The Search tab allows the user to search through almost a hundred years of
data. The user can search based on players, teams, and award statistics. For the Player search,
they have the option to select from Position Players and Pitchers, the entire MLB or just a single
league, as well as regular season or postseason. Based on these inputs, the user can filter
based on year, team, and player name. Once they hit search, the results to their search will
populate a table below the filters. For the Team search, the user can filter by year and team
name. For the Awards search, the user can filter by award name and year. The results of these
searches are displayed in the same fashion that Player search results are displayed. If the user
is logged in and has specified search criteria, they have the ability to save their search with a
unique name. If a search is saved, it can be found in a drop down in the Favorites tab. If selected
from this drop down, the user can select Run Favorite or Delete Favorite Filter. Run Favorite
Filter will reroute the user to the search page and automatically retrieve and display the results
from the saved search. Delete Favorite Filter will delete the saved search from their list. If the
user is not logged in, the Favorites tab displays a message instructing the user to login to
access this page.
Backend
The frontend is supported by the backend, which is a Python program with a Flask framework to
allow the frontend to communicate with the database. The Flask library supports routing for the
3
REST API. These routes then use pyodbc to connect to the SQL Server DBMS. The Baseball
Stats database contains tables such as Award, League, Pitcher, PositionPlayer, SavedSearches,
Stats, Team, User, and more that holds our data. The Python Flask routes call stored procedures
to retrieve the data and send it back to the frontend or catch errors and return helpful messages
to the user.
4
Features
Feature Feature Name Feature Description
Number
1 Web Application Our front end user interface is available over the internet. A
front end written using Angular, Typescript, HTML, and CSS,
a back end using Flask written in Python, and SQL Server
acting as our DBMS.
3 Create User A user can create an account with a valid username and
password. This can then be used to login.
4 User Login A user can login to an account they’ve created at any time,
and information saved to their account such as favorite
searches will be available to the user upon logging in.
5 User Logout When a user is done using the Baseball Statistics Manager,
they are able to logout. Upon logging out, the user loses the
capability to save searches.
6 Player Search Any user, regardless of their login status, has the ability to
use the Player search feature found under the Player tab of
the Search page. Players can also be search for by name.
The full name of a player is not required to search for a
given player, only a substring of their full name. Data
available to users dates all the way back to the 1920s.
7 Team Search Any user, regardless of their login status, has the ability to
use the Team search feature under the Team tab of the
Search page. Data available to users dates all the way back
to the 1920s.
8 Awards Search Any user, regardless of their login status, has the ability to
use the Award search feature under the Awards tab of the
Search page. Data is available for any awards a player could
have won dating back to the 1920s.
9 Search Filters A feature of Player, Team, and Awards searches alike is that
they all have search filters that make narrowing down the
results of a search easy to.
5
10 Save Search Users who are logged into their Baseball Statistics Manager
account have the ability to save a search that they have
performed. These saved searches are available to the users
in the Favorites tab. They can then run this again to return
the results of the original search.
11 Delete Saved Search Users who are logged into their Baseball Statistics Manager
account have the ability to delete a search that they have
previously saved. The user has the ability to delete saved
searches in the Favorites tab.
6
Key Challenges
The first challenge encountered was having to learn Angular, Flask, Typescript, HTML, and CSS,
all of which we had no prior experience. This lack of knowledge was overcome by having a lot of
patience and doing a lot of research as needed. To get started, a tutor gave the team a
walkthrough of Angular and the use of components. From here, online tutorials were able to
explain individual features that had to be implemented. The group also tried to take advantage
of our professor’s office hours. One thing we learned is that when learning new frameworks and
libraries the quickest way to learn is to just jump in and try it out. It doesn’t take long to start to
learn from your mistakes, and you will start to identify previous problems that you encountered,
which you then know how to solve.
An additional challenge was maintaining consistency between the different layers of our
application across different IDEs. Making sure our code in Eclipse Data Import, Flask Back End,
and Angular Front End was consistent with any structural change made in the SQL Server
database was difficult. To try and combat the failures due to any misassignment, the rule of
thumb that we adopted was if a change was made in one IDE to a layer of the application to not
proceed with anything until the changes were dispersed through the others. This worked well
when it was actually put into practice, but sometimes all but one would be updated, or a change
would not be saved. Better coding habits can always be worked on!
Finally, as just mentioned there were a lot of moving parts in this database, which commonly
resulted in errors throughout our program. Our first order of debugging was done by running
through the frontend and reading any errors from the the inspection of a page in our front end
user interface. After doing so, the source of the errors were able to be found and necessary
changes to the code were made. Of course, it is impossible to anticipate every error that your
program will encounter, especially when there is user input involved. We also did not want errors
in our program to upset the data integrity of our system, leaving the database in an invalid state.
To combat this issue, the necessary parameters or operations that were known to potentially
cause errors were encapsulated into Try-Catch blocks throughout our program so that we had
full control on how our system would respond when an error occurred. After catching the error,
we often set some field in our Angular front end that would consequently notify the user that
something went awry. This worked well and provides the user with useful feedback.
7
Database Design
The database Entity-Relationship model and Relationship Schema can be found in the Appendix.
Security Measures
The frontend takes in values from the user such as login credentials and search criteria and
sends it to the backend. The backend then calls a stored procedure depending on the action to
be performed. With user input text boxes, a major security risk to the database is SQL Injection
attacks. This is taken care of in the Python program. Each stored procedure is called using a
cursor.execute statement. This ensures that the user inputted text contains no executable SQL.
This approach is much safer than concatenating the user input into a string to send to the
database and provides safety from SQL Injection attacks.
Another concern was user account privacy. To ensure privacy, we needed to protect the user’s
password if someone ever gained access to the database. In the Python, we run the password
through a hashing function and send the hashed version to the database. When it is retrieved, it
is run through another function to recover the original password. This way even if someone
gained access to the database, they would need a hashing function to get the password. One
problem with this, however, is that someone could still gain access to the password in its
transfer from the frontend to the backend. This was outside the scope of this project, but would
need to be implemented if this project was expanded.
Integrity Constraints
Referential Integrity constraints are maintained by the foreign key relationships as specified in
the Relationship Schema in the Appendix. On update and delete queries, the default option is
used and the changes are rejected.
Integrity Constraints that are followed throughout the database are listed below.
● Each entry in the Award, League, Player, Stats, and Team tables have a unique ID
assigned to them by the database.
● Username must be unique
● SearchName must be unique for each Username
8
Stored Procedures
create_User Creates a new user from the inputted Username and Password.
GetAwardNames Returns all of the distinct names from the Awards table.
GetAwards Returns the top 150 results for Player name of the award recipient,
the Award name, and Year for a given award name and/or year.
GetAwardYears Returns all of the distinct years from the Awards table.
GetTeams Returns the top 150 results for Team name, wins, losses, league,
and year for a given team name and/or year.
GetYears Returns all of the distinct years that either a position player or
pitcher have played.
PitcherInsert Creates a new entry in the Pitcher’s table for current season
statistics, also adds entry to the Statistics table.
PositionPlayerInsert Creates a new entry in the Position Player’s table for current
season statistics, also adds entry to the Statistics table.
PositionPlayerPostInsert Creates a new entry in the Position Player’s table for postseason
statistics, also adds entry to the Statistics table.
SavedReturn Returns all saved searches associated with the logged in user.
SortPlayers Returns the top 100 player statistics based on the given filters of
league, team, player name, year, position, and postseason.
9
Design Analysis
Strengths
● Users with accounts are able to personalize their frontend while not affecting any of the
statistical data.
● Account passwords are hashed and error checking prevents a user from gaining extra
knowledge about the backend of the system.
● Error checking prevents users from entering poor data into the system.
● Arbitrary ID primary keys create a lot of freedom for being able to have instances of data
with the same association to a team or person still be a different entry.
○ Foreign keys become significantly easier.
○ Relationships can be stored as just the different IDs.
● Names of the tables, attributes, and stored procedures are clear.
● Import, even on massive scales, functions without issue.
● The statements were built as prepared, callable, or using the cursor in python, so the
system is not susceptible to injection attacks.
Weaknesses
● Importing the full set of data takes a significant amount time. There is a chance this
could be optimized, but the CSV files are read line by line and then imported. No
optimization directly comes to mind, as there is no way to get around reading in each line
from our CSV files.
● Passwords are not encrypted from the frontend to the backend. This could lead to
privacy issues, but was out of the scope of this project. To solve this we could add SSL
so that the front end and back end could communicate securely.
● The user only has the ability to filter searches. They are unable to sort the data. This is
predetermined by how the data is retrieved in the stored procedure. This reduces
usability and could be implemented if the project were to expand. To do this, we could
have an additional dropdown, where the user could specify how they would like the data
they are searching for ordered.
● If a search returns more than 100 results, it only returns the first 100. This means that
the user may not be able to view all of the information that they want to. To help with this
we would have to bring the data into the frontend in pages to limit the amount of data
retrieval at a single time. To do this, we could store some value indicating what page of
data the user currently has displayed. We could then add a next button to our front end
interface and upon clicking the next button, we would increment the value indicating the
10
current page of data and then pass this value to the backend and consequently to the
database stored procedure where we search for our data based on the user’s filters. The
database could then take this number and use some multiplier to return the rows that fit
within the range of the rows specified by the page value.
11
Appendix
Resources
Relational Schema
12
Position Player *1
● G (Games Played)
● AB (At-Bats)
● R (Runs)
● H (Hits)
● 2B (Doubles)
● 3B (Triples)
● HR (Home Runs)
● RBI (Runs Batted In)
● SB (Stolen Bases)
● CS (Caught Stealing)
● BB (Bases on Balls/Walks)
● SO (Strike Outs)
Pitcher *2
● W (Wins)
● L (Losses)
● G (Games Pitched)
● GS (Games Started)
● CG (Complete Games)
● SHO (Shutouts)
● SV (Saves)
● H (Hits Allowed)
● ER (Earned Runs)
● HR (Home Runs Allowed)
● BB (Base on Balls Allowed/Walks Allowed)
● SO (Strike Outs)
● ERA (Earned Run Average)
● R (Runs Allowed)
13
Entity Relationship Diagram
The design of this system has two main perspectives; a logged in user and a logged out user. A
logged out user is capable of reading data, but a logged in user can alter their own values in the
database. There are three types of searches, which are for player, team, or award. Player
searches are focused around the PlaysOn relationship of a player, their statistics for the position
based on the year, and the team they were on at the time. The team search is based on the
team name and the year, and the data is based on their results for that season. Finally the
award reflects the Has relationship, where a player will have an award for a certain year and in
a certain league. A User entity has a username and password that can be logged in with, and a
logged in user has the ability to save and search filters for all search types, and delete them.
14
Index
Angular (3, 4, 5)
Python (3, 4, 7, 9)
Flask (3, 4, 6)
Frontend (3, 6, 8, 10)
Backend (3, 8, 10, 11)
Glossary
15