Budget App Scraper
I’ve bought a daily budget application last November, and I’ve been entering my expenses to it every day. But when I wanted to process this data with some scripts, I noticed that the application didn’t have export functionality. In hindsight, I should have been more careful while choosing an app. While this was a mistake on my part, it also provided an opportunity to play around with Python and write about it.
While the app doesn’t have exports, it does have a History page with infinite scroll. Because my phone isn’t rooted, I am not able to copy the app files directly. That leaves the history page as my only source of data. My plan is to take screenshots of the whole page, stitch them together and do OCR to read the text.
Taking the screenshots
The first part is the easiest. It can be accomplished with adb and some shell scripting. Here’s a script that takes a screenshot of the phone, scrolls it down a little and repeats this process until you stop it.
#!/bin/sh TARGET='/home/leo/screen-images' num=0 while true; do formatted=$(printf '%05d' $num); adb shell screencap -p > "$TARGET/$formatted.png"; adb shell input swipe 500 600 500 400; ((num++)) done;
This created exactly 200 images in my target folder before it reached the end of the screen. This number will be different based on the app, screen size and how many entries you have saved.
Stitching the images
The next task is to stitch these images together using the common parts. Normally this is a complicated task that requires fancy algorithms. But in our case it is straightforward because these are screenshots, the pixels are moving one axis and everything is aligned perfectly.
#!/usr/bin/env python3 from PIL import Image from PIL import ImageChops import numpy as np import glob pattern = '/home/leo/screen-images/*.png' images = sorted(glob.glob(pattern)) final_height = 1440 + (len(images) - 1) * 260 main_image = Image.new('RGB', (720, final_height)) main_image.paste(Image.open(images)) def size(im1, im2): box = ImageChops.difference(im1, im2).getbbox() if box is None: return 0 return (box - box) * (box - box) def find_overlap_y(img1, img2): crop1 = img1.crop((0, 1000, 720, 1440)) min_y = min(range(500, 1000), key=lambda x: size(crop1, img2.crop((0, x, 720, x+440)))) return min_y + 440 prev_img = Image.open(images) main_y = 1440 for i, path in enumerate(images[1:]): print(path) screen = Image.open(path) y = find_overlap_y(prev_img, screen) cropped = screen.crop((0, y, 720, 1440)) main_image.paste(cropped, (0, main_y)) main_y += 1440 - y prev_img = screen main_image.save('/home/leo/test.png', 'PNG')