Patterns

Scoring

Guiding star:

How quick & easy can we accurately answer upcoming business questions?

Why? Less barrier = rapid repetitions = better questions = better insights = more adaptability (our moat).
Tools / abilities (short term)
- Cluster contents based on bag-of-words / n-grams / tf-idf
- Script: need to deal with emails
- Burst relabelling / reextracting (for pasts data)
Data Building
- Manual regex troops with list of labels/extractions

Possible obstacles

Data too big
- Quick win: delete unnecessary emails
- ETL deletion
- We may need to move it to another infra (Hadoop gitu?)
Regression to make models
(in progress)

1 Define Sample & Performance Window
2 Define Bad Definition (now at 15+DPD)
3 Define Variables to be used
4 Binning Variables (10 binning - equal width)
5 Optimize Binning (group those 10 binning by considering the bad rate)
6 Calculate WOE (ln bad/good) and IV ((bad-good) x ln(bad/good)). Exclude IV < 0.02
7 WOE Transformation
So future expansion is simple & easy
- Embedded vars JSON structure
- Caching Approved (+Rejected?) apps di Playground

- Gremlin setup di prod
- Scoring variable, pembuatannya dipisah dr scoring
Quicker rescoring
- Rescoring needs to be rapid enough to be finished in just a few hours
Data Sources
- Graphs (Yustian)
- Pefindo
Monitoring Quality
- How to?

Scrape data

Regression

Kepikiran ide

Derivated vars

Bikin patterns

Relabel

Check & discuss

Add model to prod

   Login to remove ads X
Feedback | How-To