Data ingestion/cleaning system using the publish-subscribe design pattern in Python.

Task/Problem Description:

I have the data producer (module [url removed, login to view] and function get_data) that produces two types of football data: play-by-play and tracking. The play-by-play is a dictionary, and I'd like you to do some error checking, cleaning, and reformatting.

- Rename all camel case field names and sub-field (nested) names with underscores ("playStartTimestamp" becomes "play_start_timestamp").

- Replace all camel case event names (values of the elements in the "events" list with field name "event") with underscore notation ("offensiveFormation" to "offensive_formation").

- Check that all elements of the list "events" occur before or at the play end time and after or at the play start time. If not true, log it and remove the event from the list.

- Remove events "playStart", "reviewStatus", "play_submit", and "playType" by checking the value of field "event" in the "events" list.

- When a field or subfield name is an ID, for example "play_id", capitalize "id" to "ID".

- If any fields are not as you expect, log it.

The second type of data is raw tracking location. This comes in the form of a list of lists. Each element in the top-level list corresponds to a single measurement of the location of a single entity (player, referee, object on the field). The sub-list contains the entity identifier, two location coordinates x and y, and a timestamp.

[// ID x y t

[41021, 10.1, 121.111, 1445559932.34],

['pylon-1', 100, 120, 1445559932.39],

[1034151, 100, 120, 1445559933.11],

...

]

Please reformat it as:

{

ID: {'x': [...], 'y': [...], 't': [...]},

...

}

Both data types are prone to problems that I'd like you to try to correct, when possible. If there are cases when you can't clean the data, please log the bad data when the code encounters it.

After performing these transformations, publish the resulting data to two subscribers that simply write the data to JSON files. One subscriber is for play-by-play data and the other subscriber is for tracking data. You may use libraries, but do not use messaging libraries such as zmq.

I will provided you with the code for the data producers along witht JSON files for Play by play and tracking. If you have anu doubts regarding this you can reach me on mail.

Please send me back the code and the screenshots of the output. You can provide this explanation as comments in the code or as a separate document.


Tags: Linux, Python
Bid on this Job!

 

Here are some random jobs:

Redesign Shopify Store -- 2

Looking for a freelancer who can think out of the box and redeisgn our existing shopify store and give a fresh look to our website.New Bees are welcome....Read Full Description

3D Modellig

I need a Scetches and 3D modeling for new product Development....Read Full Description

Experienced Java Developer for long term

Hello EveryoneI need an Experienced java developer for long term projects and currently i have a project related to java FX with GUI a simple one.inheritance , polymorphism etc. I would need you on daily basis for 1 or 2 projects. Try to bid lowest for long te...Read Full Description

Proofreading job

i'VE put a test so we can evaluate your proofreading skills. Also a sample model of how we want it done. Send me the text through the chat box. Thanks....Read Full Description

EEG CLASSIFICATION USING RASPBERRY PI 3

Im using an eeg headset emotiv epoc to collect eeg data and send it to raspberry pi via bluetooth the data should be cleaned, feature extracted and classified in raspberry pi....Read Full Description

Simple Python signal processing task

1. Get as input an audio file.2. Divide to windows of 30 seconds, every 1 second (configurable).3. Apply a band pass filter (configurable).4. Multiply the window with many sine waves of different frequency and phase (configurable) Note this is different than F...Read Full Description

Recover Website - Accidentally Delated my WPConfig

I need my website fixed. I accidentally deleted my wp-config file from WordPress. and I also installed the WordPress again. I have whole website folder backup. I need someone to [login to view URL] website...Read Full Description

Customize PHP Nuke Platform Theme

I have a older PHP Nuke theme I need tweaked to be used for my website which is running on PHP Nuke Platform....Read Full Description

 

© 2005 - 2018 getFreeLancer.com
1239652 contracts/jobs/projects currently available in our database.