Toggle navigation
MYNDBOOK
Popular
My Library
Signup for free!
Login
Athena
insight_finding
Patterns
Scoring
Guiding star:
How quick & easy can we accurately answer upcoming business questions?
Why? Less barrier = rapid repetitions = better questions = better insights = more adaptability (our moat).
Tools / abilities (
short term
)
-
Cluster contents based on bag-of-words / n-grams / tf-idf
- Script: need to deal with emails
- Burst relabelling / reextracting (for pasts data)
Data Building
- Manual regex troops with list of labels/extractions
Possible obstacles
Data too big
- Quick win: delete unnecessary emails
- ETL deletion
- We may need to move it to another infra (Hadoop gitu?)
Regression to make models
(in progress)
1 Define Sample & Performance Window
2 Define Bad Definition (now at 15+DPD)
3 Define Variables to be used
4 Binning Variables (10 binning - equal width)
5 Optimize Binning (group those 10 binning by considering the bad rate)
6 Calculate WOE (ln bad/good) and IV ((bad-good) x ln(bad/good)). Exclude IV < 0.02
7 WOE Transformation
So future expansion is simple & easy
- Embedded vars JSON structure
-
Caching Approved (+Rejected?) apps di Playground
- Gremlin setup di prod
- Scoring variable, pembuatannya dipisah dr scoring
Quicker rescoring
- Rescoring needs to be rapid enough to be finished in just a few hours
Data Sources
- Graphs (Yustian)
- Pefindo
Monitoring Quality
- How to?
Scrape data
Regression
Kepikiran ide
Derivated vars
Bikin patterns
Relabel
Check & discuss
Add model to prod
Login
to remove ads
X
Feedback
|
How-To