Data ingestion/cleaning system using the publish-subscribe design pattern in Python.

Task/Problem Description:

I have the data producer (module [url removed, login to view] and function get_data) that produces two types of football data: play-by-play and tracking. The play-by-play is a dictionary, and I'd like you to do some error checking, cleaning, and reformatting.

- Rename all camel case field names and sub-field (nested) names with underscores ("playStartTimestamp" becomes "play_start_timestamp").

- Replace all camel case event names (values of the elements in the "events" list with field name "event") with underscore notation ("offensiveFormation" to "offensive_formation").

- Check that all elements of the list "events" occur before or at the play end time and after or at the play start time. If not true, log it and remove the event from the list.

- Remove events "playStart", "reviewStatus", "play_submit", and "playType" by checking the value of field "event" in the "events" list.

- When a field or subfield name is an ID, for example "play_id", capitalize "id" to "ID".

- If any fields are not as you expect, log it.

The second type of data is raw tracking location. This comes in the form of a list of lists. Each element in the top-level list corresponds to a single measurement of the location of a single entity (player, referee, object on the field). The sub-list contains the entity identifier, two location coordinates x and y, and a timestamp.

[// ID x y t

[41021, 10.1, 121.111, 1445559932.34],

['pylon-1', 100, 120, 1445559932.39],

[1034151, 100, 120, 1445559933.11],



Please reformat it as:


ID: {'x': [...], 'y': [...], 't': [...]},



Both data types are prone to problems that I'd like you to try to correct, when possible. If there are cases when you can't clean the data, please log the bad data when the code encounters it.

After performing these transformations, publish the resulting data to two subscribers that simply write the data to JSON files. One subscriber is for play-by-play data and the other subscriber is for tracking data. You may use libraries, but do not use messaging libraries such as zmq.

I will provided you with the code for the data producers along witht JSON files for Play by play and tracking. If you have anu doubts regarding this you can reach me on mail.

Please send me back the code and the screenshots of the output. You can provide this explanation as comments in the code or as a separate document.

Tags: Linux, Python
Bid on this Job!


Here are some random jobs:

Professional Academic Writer

I am looking to hire a very professional academic writer who owns perfect skills in research writing. Indeed, I have a paper which needs some changes, additions, SPSS analysis, and proofreading so that it can be improved and accepted for publication. The paper...Read Full Description

phd project on women health

looking for a freelancer who can collect data from the women age group of 40-55year data need to be collected online mode or offline mode via the interview method, in their surrounding area total 100data collection needs to be done. experience in data collecti...Read Full Description

video Recognition for Posture machine learning from database of 2d pictures

Given an image of a person from a video, (2) the detected 3D coordinates of this person’s nodes point at a certain frame (please leave the frame id as a parameter such that we can play with it during the evaluation), and (3) the position of the camera (This ...Read Full Description

Revised First Project of 2019

[login to view URL]...Read Full Description

hi anyone has shayari app pre made or status app pre made with php as admin panel

hi anyone has shayari app pre made or status app pre made with php as admin panel...Read Full Description

Build an iOS app

We are trying to build a network related app....Read Full Description

Looking for customers for our products

I am looking for a buyer sourcing people.All information will be shared in pvt chat....Read Full Description

Developer to load Vue application onto EC2 instance

I have a semi-complete web/mobile hybrid application written in Vue. I need a developer to load the application into EC2, configure the instance, and get it linked up to a domain in the account. The database is non-functional, I'm not overly concerned abo...Read Full Description


© 2005 - 2019
1447688 contracts/jobs/projects currently available in our database.