Credit Card Fraud Detection System

Problem StatementImagine you are working for a leading credit card company called‘Cred Financials’.
The company continuously monitors its customers’ credit card transactions, be it in any part of the world, to discover and dismissfraudulent ones.
The company also has a strong support team to address customer issues and queries.
Credit card fraudis defined asa form of identity theft in which an individual uses someone else’s credit card information to make purchases or to withdraw funds from the account.
The incidences of such fraudulent transactions have skyrocketed as the world has moved towards a digital era.
The following statistics will help you understand the gravity of the situation.
With a rising number of fraud cases, the company’s major focus is to provide its customers with a delightful experience while ensuring that security is not compromised.
You, as a big data engineer, must architect and build a solution to cater to the following requirements:Fraud detection solution:This is a feature to detect fraudulent transactions, wherein once a cardmember swipes his/her card for payment, the transaction should be classified as fraudulent or authentic based on a set of predefined rules.
If a fraud is detected, then the transaction must be declined.
Please note that incorrectly classifying a transaction as fraudulent will incur huge losses to the company and also provoke negative consumer sentiment.
Customers' information:The relevant information about the customers needs to be continuously updated on a platform from where the customer support team can retrieve relevant information in real time to resolve customer complaints and queries.
DataNow, let’s understand the types of data you will deal with.
The following tables containing data come into consideration for this problem:card_member(The cardholder data is added to/updated in this table by a third-party service)card_id – Card number,member_id – 15-digit member ID of the cardholder,member_joining_dt – Date of joining of a new member,card_purchase_dt – Date and time when the card was purchased,country – Country in which the card was purchased,city – City in which the card was purchasedcard_transactions(All incoming transactions(fraud/genuine) swiped at POS terminals are stored in this table.
Earlier the transactions were classified as fraud or genuine in a traditional way.
However, with an explosive surge in the number of transactions, a Big Data solution is needed to authenticate the incoming transactions and enter the transaction data accordingly):card_id – Card number,member_id – 15-digit member ID of the cardholder,amount – Amount swiped with respect to the card_id,postcode – Zip code at which this card was swiped (marking the location of an event),pos_id – Merchant’s POS terminal ID, using which the card has been swiped,transaction_dt – date and time of the transaction event,status – Whether transaction was approved or not, with Genuine/Fraud valuemember_score(The member credit score data is added to / updated in this table by a third-party service):member_id – 15-digit member ID who has this card,score – The score assigned to a member defining his/her credit history, generated by upstream systemsSincecard_memberandmember_scoretables are updated by the third-party services, they are stored in a central AWS RDS.
You will be given the already classifiedcard_transactionstable data in the form of a CSV file, which you can load in your NoSQL database.
The other type of data is the real-time streaming data generated by the POS (Point of Sale) systems in JSON format.
The streaming data looks like this:Transactional payload (data) attributes sent by POS terminals’ gateway API on to the Kafka topic :card_id– Card number,member_id– 15-digit member ID of the cardholder,amount– Amount swiped with respect to the card_id,pos_id– Merchant’s POS terminal ID, using which the card has been swiped,postcode– Zip code at which this card was swiped (marking the location of an event),transaction_dt– date and time of the transaction eventHere is an example of a JSON payload structure that gets produced:{“card_id”:348702330256514,“member_id”: 000037495066290,“amount”: 9084849,“pos_id”: 614677375609919,“postcode”: 33946,“transaction_dt”: “11-02-2018 00:00:00”}Architecture and ApproachHaving understood the various kinds of data involved in this project, it’s time to understand how to approach the task of building a solution to this problem statement.
The following diagram will help you understand how the entire architecture should look like.
Read ahead to understand more.
The data from the several POS systems will flow inside the architecture through a queuing system like Kafka.
The POS data from Kafka will be consumed by the streaming data processing framework to identify the authenticity of the transactions.
Note:One of the SLAs of the company is to complete the transaction within 1 second.
Hence, the framework should be accordingly chosen to facilitate this SLA.
Once the POS data enters into the Stream processing layer, it is assessed based on some parameters defined by the rules.
Only, when the results are positive for these rules, the transaction is allowed to complete.
Now, what are these rules? How can one obtain these parameters and where are they stored? These questions will get answered in some time.
Once the status of a transaction is determined as a “Genuine” or “Fraudulent”, the details of the transaction, along with the status, are stored in thecard_transactionstable.
Now, let’s understand the various parameters defined by the rules required to determine the authenticity of the transactions.
Here are the three parameters that we will use to detect whether a transaction is fraudulent or not.1.
Upper Control Limit:Every card user has an upper limit on the amount per transaction that is different from the maximum transaction limit on each card.
This parameter is basically an indicator of transaction pattern associated with a particular customer.
This upper bound, also known as the “Upper Control Limit” or UCL, can be used as a parameter to authenticate a transaction.
Suppose you have a past record of making transactions with an average amount of $20,000, and one day the system observes a transaction of $200,000through your card.
This can be a possible case of a fraud.
In such cases, the cardholder receives a call from the credit card company executives to validate the transaction.
UCL is derived using thefollowing formula:UCL=(MovingAverage)+3×(StandardDeviation)The above formula is used to derive the UCL value for each card_id.
The Moving average and the Standard Deviation for each card_id iscalculated based on the last 10 amount credited with a ‘Genuine’.2.
Credit score of each member:This is a straightforward rule, where we have amember_scoretable in which member ids and their respective scores are available.
These scores are updated from a third-party service.
If the score is less than 200, that member’s transaction is rejected as he/she could be a defaulter.
This rule simply defines the financial reputation of each customer.3.
Zip code distance:The whole purpose of this rule is to keep a check on the distance between the card owner's current and last transaction location with respect to time.
If the distance between the current transaction and the last transaction location with respect to time is greater than a particular threshold, then this raises suspicion on the authenticity of the transaction.
Suppose at timet=t0minutes, a transaction is recorded in Mumbai and at timet=(t0+10)minutes, a transaction from the same card_id is recorded in New York.
A flight flies with a cruising speed of about 900km/hr which means that someone travelling by Airbus can travel a KM in 4 Secs, KM=4 Sec.
This can be a possible case of fraud.
Such cases happen very often when someone acquires your credit card details and make transactions online using those details.
In such cases, the cardholder receives a call from the credit card company executiveto validate the transaction.
Use the postcode library (will be provided ahead for checking distance between two zip codes) to get the distance between two zip codes.
Now that you know each of the parameters, let’s understand the approach to calculate these.
Let’s start with theupper control limit (UCL).
The historical transactional data is stored in thecard_transactionstable, as defined earlier in the table description.
UCL value has to be calculated for each card_id for the last 10 transactions.
One approach could be to trigger the computation of this parameter for a card_id every time a transaction occurs.
However, considering the 1 second SLA, this may not be a very good practice as batch jobs are always associated with huge time delays.
Another approach could be to have a lookup table which stores the UCL value based on the moving average and standard deviation of the last 10 transactions of each card_id.
Whenever a transaction occurs, the record corresponding to the card_id can be easily fetched from this lookup table rather than calculating the UCL value at the time of the transaction.
This lookup table needs to be updated at regular intervals by running queries on data stored in the AWS RDS and the NoSQL database.
Note:You need to use a NoSQL distributed database to implement the look-up table.
The database must be scalable and consistent.
Use a NoSQL database which gives schema evolution, schema versioning, row-level lookups (efficient reads), and tunable consistency.
For every ‘card_id’, this database must store theUCLvalue.
Use appropriate ingestion methods available to bringcard_memberandmember_scoredata from AWS RDS andcard_transactionsdata from the NoSQL database into the Hadoop platform.
This data is then processed by running batch jobs to fill data inthe look-up table.
After the initial load, there will be incremental loads th
Tags: Hadoop, Big Data


Similar Freelance jobs:

Freelancer | BigData/Hadoop Experts

Looking for engineers/experts to mentor and interview job seekers in the BigData/Hadoop domain. This is a time-flexible, location independent undertaking requiring about 5-10 hours per week. Ideal experience requirement is between 3-10 years of work experience. We pay per session/interview.This will be an on-going assignment.
Full Description of Freelancer | BigData/Hadoop Experts

Hadoop Work Help Needed

Assignment 2==============1. Download the zip file from the following location & move it to your Hadoop environment.Place this data under /LDZ/data/ in Hadoop The above is an archive of data online forum for technicalquestions & answers).3. Build a data warehouse (location:/DWZ) with data partitioned based on Creation Date andthen Post Type.4. Once your data has been moved to the warehouse, find out the following:- What are the top 10 most answered questions in Stack Overflow posts for a particular…
Full Description of Hadoop work help needed

Need Help From Hadoop Developer To Fix Sqoop Tool

Hi need help to fix a bug during sqoop installation.So I want any hadoop expert who can verify my installationproperly and fix sqoop toolI have well installed Sqoop it doesn't work properly.Thanks for your help
Full Description of Need Help from Hadoop Developer to fix sqoop tool

Looking For A Job Support For ETL BDE Developer Role

Job Title: ETL BDE Developer.Reqs: Cloudera HadoopData Lake is HiveFile Format is ParquetUsing Perl ScriptBDM engine is BlazeHive data to see HueUsing Putty and WinSCPGood in problem solving and trouble shooting issues in BDM.
Full Description of Looking for a Job Support for ETL BDE Developer Role


© 2005 - 2020
14,919 contracts/jobs/projects currently available in our database.

There are 219 users online now.
Most online ever was 1923.