17 Jun 2020 Datathon – Data Dive Recap
Last night we hosted our virtual Data Dive event! SmartSA Partners presented on their data sets and resources for our upcoming 2020 SmartSA Datathon, and we answered questions submitted by viewers. Read on for some key points from each partner, and visit the video recording our Facebook page. The time of each partner presentation during the video is noted next to their heading. Applications are due June 24th!
Intro – What is the SmartSA Datathon? (5:46)
CivTechSA and the SmartSA partners have teamed up to share datasets with the public in order to connect communities, ideas, and data to improve the quality of life today as we build a smarter tomorrow. We are hoping teams can discover and identify opportunities and/or missing data elements in the data sets and explain how those insights could be applied in practice for the potential benefits of the community and/or partners in three impact areas: affordability, environmental quality, and access to services.
We encourage all skilled and interested students and professionals such as developers, engineers, data scientists, and civic technologists to apply (2-6 people per team). For more information on prizing, or access to the Data Catalog and application visit civtech-sa.com/datathon/.
Google Cloud Platform Overview with Respec (9:25)
Google CoLab feature is a shared Python, collaborative notebook that is free to use. DataStudio is another free tool you can use to explore the studio and connect to data sets, look at data in different dimensions. More accessible tools and data wrangling features are coming out more regularly from Google, so you can work with data without being an expert.
Large range of data to include comparison data for bus stop locations and schedules, and geo location data which is new this year. Geo Location includes street names and zip codes for bus stops, and scheduled versus actual miles driven. Because ridership has changed due to COVID-19, they have given more info to data scientists access that go back to February so that there was a more “normal” set available.
CPS Energy (19:06)
The meter location data is jumbled up in an effort to protect location and privacy requirements. Detailed weather information is also included in the sets to compare usage patterns across SA, the weather and how it correlates, and emissions. You can also visit the TCQ website to download the data, too, to view more information about emissions. We haven’t met current attainment goals for CO2 Emission Reduction goals. There are also EV charging stations to see how they’re strategically placed and what that could do to help us reach sustainability goals.
Data set provided is residential consumption by home measuring gallons of water used by consumers, 500,000 active customers at one time. During summer months, there is more consumption: people are home more often, pool usage goes up, etc. Leaks can also lead to spikes and then return over time. A Data Dictionary is provided along with color coding to help keep things organized.
Much of the data SARA collects is focused on sustainability and green infrastructure. Data sets include Bexar County water quality data, 5min rainfall collection, a summary for 24hr period of rainfall, sample data for impervious coverage, specifically Brooks City Base & UTSA Campus, hard surfaces, driveways, sidewalks, etc. where rain flows off because it can’t be absorbed in the ground. There are also links to high-res contour data, and gauge readings from 4 gauges along Mission Reach. You can also go to website and download more up-to-date sets. The data in the catalog has been reformatted this catalog to better describe and present the information. SARA would love to see what else you could find to combine and work with USGS and other freely available sets to make it a much larger package.
Smaller data set focusing on spring runs where endangered species live, which is a big reason Edwards Aquifer exists– to protect those endangered species. The data provided is water quality and water quantity data in that area. EAA is interested to see how people can interact with all of these water-related data sets.
City of San Antonio presented some of their public GIS data, and have been sharing this data publicly for the past 3 years. The Open Gov data page has some additional links to public data sets. Categories of data include political boundary, zoning & landuse, public safety, storm water utility, CIP & bond project, development, parks & recreation and historic preservation, and transportation. We also have other data sets available like 311 service calls, building permits, etc. The City is always adding new stuff, and updating the data sets regularly– sometimes weekly, sometimes daily.
Bexar County (45:18)
The data set provided is Bexar County foreclosures. Every time a property goes under foreclosure, it gets processed in the Bexar County clerks office and they capture that data. The set goes back to 2007, and the attributes are simple: address, type of foreclosure, school district, and then a foreclosure document number. They are hoping teams will find a way to take this important data set and find values for the community, which very much ties into the Datathon impact areas of affordability and access to services.
How will the event run differently this year being virtual?
We’re already virtual with this event! We’ll continue to utilize Zoom. For the Mentor Day on July 11, we will utilize a private Zoom with breakout rooms. For the prizing and final pitch day on July 18, we will livestream the Zoom where teams and judges will be logged in to pitch, deliberate offline, then announce the winners.
Who are the judges for this event?
The judges will be representatives from our SmartSA partners, Geekdom, and the City of San Antonio Office of Innovation. They will not be the same individuals that are mentoring, but will be from these organizations.
VIA: Will paratransit data be provided by VIA? Will VIA Link program data be provided?
The paratransit data will be in the logged transit file. There is information on all vehicles.
The VIA Link program is relatively new, so we did not provide it for this year. We’d like to in future competitions, but don’t have it available at this time. Great for us to know for future data sets and close gaps.
Are there data sets that have taken priority or have more interest due to recent developments with COVID-19?
There has not been a pivot with these data sets or challenges to focus on COVID-19 at this time. However, the City of San Antonio does provide data sets on the COSA Open Data page. So far, there are only 3 data sets available: confirmed cases, zip code, and spacial distribution. They are working to add cases reported over 14 days will be added to the portal, and continue to communicate with Metro Health to provide better information or how to present the data coming in, but have to be careful with HIPPA requirements at this time.
Are we free to use any combination of the data sets? Can we use outside data sets?
Yes! We hope you look at multiple data sets for really interesting insights. Using other data sets to cross reference or include is encouraged. If you do plan on using different data sets, please include where you’re looking and what data sets you might bring in when you apply, as well as specifying which impact areas you want to touch.
Can you give a success story from a previous Datathon?
The 2018 winning team came up with and idea called Cool Connect, which connects senior citizens with VIA bus stops and local cooling centers in the event of a power outage, extreme weather, or other emergency weather situations. A recap of 2019 Datathon winners can be found on a past blog post here.
What is the judging criteria for the event?
When scoring the applications and pitches, we are looking at feasibility and ease of implementation, how compelling it is, the inspiration and fire behind what you’re wanting to achieve, how insightful it is, how innovative it is, and the quality of the presentation.
What does a mentor day look like?
The Mentor Day will be virtual, from 10-2pm. SmartSA Mentors will go deeper into the data, and will be able to tailor the info to your projects. We’ll utilize breakout rooms in Zoom so each team meets with each mentor. We find it beneficial to meet with all mentors to help you think of things differently and spend time listening, networking, and get very specific insight on your project. You will also have different mentors the morning before your pitches to help with your presentations, and any insights in how to make your project more actionable or implementable.
Can there be more than 1 team using each SmartSA partner? (ex. only one team for VIA data, one for SARA)
ABSOLUTELY! Want you to use multiple partner sets and challenge areas throughout the program.
I’m not currently part of a team, but want to be. How can I get connected to one?
Leave a comment below and we can try to connect you with one, or email us at firstname.lastname@example.org to connect you to someone.
I’m not a data scientist, but have a project management background, can I be helpful?
YES! The teams are always strongest when balancing expertise across a variety of areas.