Data ingestion/cleaning system using the publish-subscribe design pattern in Python.

Task/Problem Description:

I have the data producer (module [url removed, login to view] and function get_data) that produces two types of football data: play-by-play and tracking. The play-by-play is a dictionary, and I'd like you to do some error checking, cleaning, and reformatting.

- Rename all camel case field names and sub-field (nested) names with underscores ("playStartTimestamp" becomes "play_start_timestamp").

- Replace all camel case event names (values of the elements in the "events" list with field name "event") with underscore notation ("offensiveFormation" to "offensive_formation").

- Check that all elements of the list "events" occur before or at the play end time and after or at the play start time. If not true, log it and remove the event from the list.

- Remove events "playStart", "reviewStatus", "play_submit", and "playType" by checking the value of field "event" in the "events" list.

- When a field or subfield name is an ID, for example "play_id", capitalize "id" to "ID".

- If any fields are not as you expect, log it.

The second type of data is raw tracking location. This comes in the form of a list of lists. Each element in the top-level list corresponds to a single measurement of the location of a single entity (player, referee, object on the field). The sub-list contains the entity identifier, two location coordinates x and y, and a timestamp.

[// ID x y t

[41021, 10.1, 121.111, 1445559932.34],

['pylon-1', 100, 120, 1445559932.39],

[1034151, 100, 120, 1445559933.11],



Please reformat it as:


ID: {'x': [...], 'y': [...], 't': [...]},



Both data types are prone to problems that I'd like you to try to correct, when possible. If there are cases when you can't clean the data, please log the bad data when the code encounters it.

After performing these transformations, publish the resulting data to two subscribers that simply write the data to JSON files. One subscriber is for play-by-play data and the other subscriber is for tracking data. You may use libraries, but do not use messaging libraries such as zmq.

I will provided you with the code for the data producers along witht JSON files for Play by play and tracking. If you have anu doubts regarding this you can reach me on mail.

Please send me back the code and the screenshots of the output. You can provide this explanation as comments in the code or as a separate document.

Tags: Linux, Python
Bid on this Job!


Here are some random jobs:

Photoshop someone into a picture

I would like to have my uncle placed into a family photo that he arrived late for. After he came I took pictures of him in the same area to use for fabrication (harder than I had imagined). This particular uncle is currently in the process of passing away so I...Read Full Description

Need website for shop

Need website for shop done now...Read Full Description

I need a graphic designer for logo

I'm physiotherapist working in the field of sports. My passion is "pain free motion". I would need some cool solution for my personal logo. HLO SIR I M MAKING FABULOUS TEMPLATES FOR UI M ALSO COME FROM MEDICINE FIELD,SO IF U DON'T MIND U HI...Read Full Description

I need a graphic designer for logo

I'm physiotherapist working in the field of sports. My passion is "pain free motion". I would need some cool solution for my personal logo....Read Full Description

ICO whitepaper and Website content -- 2

Hi guysI am looking for an ICO whitepaper to write it for an exchange, and also help us to build the content for the website. Proof of work must be provided while dealing, if you dont have any previous work, or not creative. Then dont bidCheck carefully...Read Full Description

Creation Logo

Création logo pour un site de foodservice spécialisé boulangerie....Read Full Description

Create a professional video based on my document

Create a professional video based on my document...Read Full Description

car web

I need a new website. I need you to design and build a landing page....Read Full Description


© 2005 - 2018
949601 contracts/jobs/projects currently available in our database.