insight_finding | Myndbook

Patterns

Scoring

Guiding star:

How quick & easy can we accurately answer upcoming business questions?

Why? Less barrier = rapid repetitions = better questions = better insights = more adaptability (our moat).

Tools / abilities (short term)

- Cluster contents based on bag-of-words / n-grams / tf-idf

- Script: need to deal with emails

- Burst relabelling / reextracting (for pasts data)

Data Building

- Manual regex troops with list of labels/extractions

Possible obstacles

Data too big

- Quick win: delete unnecessary emails

- ETL deletion

- We may need to move it to another infra (Hadoop gitu?)

Regression to make models

(in progress)

1 Define Sample & Performance Window

2 Define Bad Definition (now at 15+DPD)

3 Define Variables to be used

4 Binning Variables (10 binning - equal width)

5 Optimize Binning (group those 10 binning by considering the bad rate)

6 Calculate WOE (ln bad/good) and IV ((bad-good) x ln(bad/good)). Exclude IV < 0.02

7 WOE Transformation

So future expansion is simple & easy

- Embedded vars JSON structure

- Caching Approved (+Rejected?) apps di Playground

- Gremlin setup di prod

- Scoring variable, pembuatannya dipisah dr scoring

Quicker rescoring

- Rescoring needs to be rapid enough to be finished in just a few hours

Data Sources

- Graphs (Yustian)

- Pefindo

Monitoring Quality

- How to?

Patterns

Scoring

Guiding star:

Possible obstacles

Scrape data

Regression

Kepikiran ide

Derivated vars

Bikin patterns

Relabel

Check & discuss

Add model to prod