Aurebesh OCR with Python
Greetings! Welcome to my post about creating an Aurebesh OCR (Optical Character Recognition) tool using the programming language Python (mostly OpenCV and scikit-image). This have been a fun personal project of mine since early 2023, and I'm still plugging away at it! This is a work in progress.
I am, by education, an artist and a writer who also loves science. In addition to my artsy fartsy classes, I took computer science and systems courses and actually went on to get a Master's in Information Science. But since then, I've been learning to program on my own. There are so many free resources available on the web and great books and communities to boot that anyone can learn how to program if they want to.
If you can write or do art, then you are primed to be a programmer, too. The reason? Artists break down complex forms into simple shapes; writers break down complex ideas into sentences; computer programmers break down complex processes into simple steps. You already know how to think like a programmer!
In order to keep your feed clean, I've hidden the vast majority of the post below the fold. Read on if you dare!
What Is OCR?
Optical character recognition (OCR) is basically teaching a computer to "recognize" letters in images. This feature is readily available in things like Google Lens and many PDF applications wherein the application can take an image and find the letters, making it easy to turn them into actual editable text. If you've ever been given a screenshot of an address and had to type it out by hand, OCR is a tool that would let you copy-paste that bad boi from the image right into your navigational app of choice.
With the English alphabet, this is actually fairly easy these days. One of the main reasons for that is most of the letters in English are contiguous, meaning there are few letters that are made up of separate shapes. Letters that don't have contiguous shapes are called noncontiguous, letters like the lowercase "i" and "j" for example.
When it comes to non-English alphabets, such as Spanish which has accents/enes or Arabic, it becomes much more difficult for the computer to "recognize" the letter as a whole. Solving the problem of noncontiguous letters in OCR appears to be an ongoing field of study based on my research.
(I like to read scientific research papers to see if anyone has solved the problems I am trying to solve already as well as to learn how to search and name the issues I am facing. And reading scientific papers is relaxing and fun!).
Aurebesh has eight noncontiguous letters, which is nearly 25% of the whole alphabet! (For reference, less than 10% of the lowercase English alphabet is noncontiguous letters). The noncontiguous letters in Aurebesh make OCR difficult, but there are ways to bridge the gap!
How Does OCR Work?
OCR is a type of computer vision, and computer vision is how your phone recognizes your face and unlocks itself. Like Scott McCloud talks about in his book about Comics, a face is essentially two dots and a line (or a colon and a parenthesis : ) in our case). Computer vision simplifies what the computer is "seeing" in an image and then finds the shapes, like the two dots and a line. But instead of finding faces, OCR focuses entirely on finding letters.
So, an image is just a bunch of tiny squares that are one pixel big, and those pixels contain data. Those pixels are also neighbors with other pixels, and comparing those pixels to each other can tell you a lot about what is going on in an image! But the data in images tend of "overwhelm" computers, so it's better to simplify the image before trying to find the letters.
Step One: Simplify the Image
Let's take this screenshot of the Clone Memorial from The Bad Batch's second episode of season two (whaddup Crosshair and Cody). Even for my human vision, some of these letters are hard to distinguish from the rocky texture of the wall's surface! So, let's simplify the image.
First, we want to remove all the information about color from the image by converting it to greyscale. Then we perform something called "Local Thresholding". This action turns the greyscale image into a binary image, meaning every pixel in the image is now either black or white (hence binary). If you've ever adjusted an image of a painting in Photoshop or the like, perhaps you've messed around with Thresholds before! We are doing something similar here.
Now, the point of using a threshold is to determine a what point a pixel on a greyscale becomes black or white, and by setting that threshold, it allows us to transform the image based on where those values fall per pixel.
But remember what I said about neighbors a little earlier? When the computer is decided whether or not a pixel should be black or white, it can also take into account the values of that pixel's neighbors to inform it's decision.
For example, let's say there's a pixel that's kind of light but it's surrounded by pixels that are rather dark. The computer can decide based on parameters we set whether that one light pixel should stay light despite it's neighbors being dark or to make it dark like it's neighbors. That's how the relationship between pixels can help the computer figure out what's more likely a contiguous shape and what isn't.
After generating the binary image, it becomes a little easier to see the shapes of the letters, but there's still a lot of noise in the image from that rock wall texture. Luckily, there are ways of tweaking the greyscale image and the parameters of our thresholding to try and finetune the resulting binary image. These parameters include changing the size of the block of pixels the computer is viewing at a time as well as applying mathematical alterations such as Gaussian blur or using means.
(I'll bet you didn't know there was math behind the trusty Gaussian blur work in your photo editing software 😉).
Here's some tests I ran to figure out which method would be the most conducive to OCR:
We can see how the different tests result in slightly different results, and the one that turns out to produce the most reliable results is actually the standard "Control" method.
Step Two: Clarify the Shapes
Now that we've simplified the image, we want to try and see if we can clear up what we know are letters in the image. One of the best ways to do this that I have found is to perform "Morphological Transformations" on the binary image.
In truth, there's only two types of morphological transformations, erosion and dilation. These methods do exactly what they sound like: erosion erodes away pixels and dilation dilates pixels by adding them.
In addition to erosion and dilation, we can also use them in tandem one after the other to perform closing and opening, meaning we erode the image and then dilate what remains or vice versa. Let's see some morphological transformations in action on our binary image (which has been inverted now):
The closing method of morphological transformation provides letters with much more clean edges and fewer pixelated gaps or chinks. Noise was not reduced, but the letters themselves look much more clean.
Step Three: Locate Individual Letters
So, we have a decently cleaned and clarified image! It's time to locate all the shapes in the image and then figure out which of them might actually be the letters for which we are looking.
Thus far, I have had the most success with contouring. Using the binary image that contains pixels that are either black or white, it's possible to follow the edge of a shape that's created where black and white pixels meet, aka contouring.
The green line running around the outside of this shape is the contour that the computer identified. Notice that it only found the outside edge of the shape, but not the inside, which the computer does not yet know how to identify! However, it may not be necessary, and programmers like to be lazy, just like artists. 😉
When we use contouring on the image, the computer identifies every single shape in the image that it can find---including all those little blobs left over from the rock wall texture. However, we can use maths to calculate the area of each contoured object and eliminated any shape that is less than a certain area as a possible letter.
Step Four: Match the Shape to the Letter
We've made it to the final step! Woohoo!
Armed with our extracted shapes that are most likely to be letters, we can now try and match those shapes to the known Aurebesh alphabet. There's a few different methods I've tried in order to do this, such as connected component analysis (CCA) and canny edges, and different methods have different strengths and weaknesses. The one I use now is from the OpenCV library called, quite literally, Match Shapes.
Sometimes it gets an "A" for effort:
And sometimes it totally works:
And I think I screamed in joy when I got this result the first time!
Step Five: Iterate, Iterate, Iterate!
There's so much more I want to do with this tool, like check for corners and maybe even turn it into a web app that anyone can use, but for now that's all!
This concludes my crash course in OCR with Aurebesh using Python. I'll be sure to reblog this post when I have updates! If you made it to the end of this THANK YOU FOR READING!!! And don't be afraid to try new things. ☺
47 notes
·
View notes
How did you distinguish between lowercase L and capital i? I see that they have slightly different images in your repo, but I'm not sure how you managed to tell them apart in the original image.
exactly that. i took both, classified one as I and the other as l, and checked the result. whichever of the two ways gave me most of the image back (the wrong way actually didn't even give me a valid JPEG header) was the correct one. i just checked both
Ah, I see, so ClearType actually ensures that even the color aliasing artifacts around each letter will be consistent, so that a lowercase L will always be "column of light yellow, column of near black, column of light blue" while uppercase i will always be "column of reddish orange, column of medium blue"?
Does this mean that ANYTHING using ClearType with this font & point size will have the same color patterns? Or is it only guaranteed to be consistent within one particular block of text, with the specific aliasing patterns determined on the fly based on some magic formula?
8 notes
·
View notes
Looks like the Tough Guy, the godfather of obstacle course racing, really is gone forever - combination of Mr Mouse's health problems and you can't maintain a permanent giant wooden obstacle course on two years of no entry fees.
It is insane and delightful to me that the roots of this batshit sport are in a horse sanctuary just outside Wolverhampton. I've done it four times, once each of the January, April, July and October versions. It was gloriously amateurish. The obstacles had names of dubious taste. We signed a disclaimer saying if it killed us it was "my own bloody fault for coming".
My main memories of the Tough Guy are absolute terror, punctuated by the quasi-mystical experiences you get when adrenaline and endorphins are two hells of some drugs. At one point alone in the woodland running section I became convinced I'd been taken by the Fair Folk and a hundred years and a day had passed in the real world - but then I saw the aid station and calmed down a bit. Everything about it was confusing and scary. I made an account in a mud-running forum purely to ask what the hell the instructions meant, only to be told they would never make any sense and I just needed to turn up at 10am and hope for the best. Excuse me I have agoraphobia and OCD, that is not how I roll. That aspect was legit more terrifying than, say, discovering you're not actually that keen on heights when you're out on the ropes at the top of the Behemoth and regretting the life choices that took you to this point (four times). Or the really cold water. Or their enthusiasm for electric obstacles. Or the combined electric and cold water obstacles, like Viagra Falls.
Somehow it went straight through the OTT masculinity of other OCRs and out the other side. Nobody shouted at you or made you do burpees. The course is difficult enough and everyone's having a hard time; no need to make it worse. There was the Ghost Squad, topless pyromaniac drumming Vikings in face paint who ran the warm up and occasionally popped up out of nowhere to assist and/or startle you. There was Mr Mouse in his kilt and moustache like a goddamn celebrity. There was a major gender imbalance. But it somehow didn't feel as macho as some others. It actually felt like we were all in this insanity together.
64 notes
·
View notes
In his exemplary 1845 work of social investigation, The Condition of the Working Class in England, Friedrich Engels showed how the conditions of capitalism created a social war among the masses, where proletarians in slum conditions are pit against each other in a battle for survival. As Engels summed up,
Competition is the completest expression of the battle of all against all which rules modern civil society. This battle, a battle for life, for existence, for everything, in case of need a battle of life and death, is fought not between the different classes of society only, but also between the individual members of these classes. Each is in the way of the other, and each seeks to crowd out all who are in his way, and put himself in their place.
Today this social war continues, but with the added dimension that the bourgeoisie has learned well how to exacerbate contradictions among the masses, with deliberate policies pitting one section of the masses against another. In our efforts to bring forward a class-conscious section of the proletariat, we must understand how the battle lines of the social war among the masses are drawn and find ways to contend with the reactionary ideology and politics and the practical antagonisms they foster. Our aim must be to convince the masses to refuse to play the capitalist game of competing with each other, instead developing their understanding of the system behind that game and embracing a communist attitude towards their class sisters and brothers, here and around the world.
12 notes
·
View notes
rugged maniac OCR 2023
i had the amazing opportunity to join one of my friends for an obstacle course race/mud run for the first time this past weekend! i’ve never run in a race of any kind really, so this was exciting as both a race and an obstacle course! the course was a 5k running course filled with like, 20-30 obstacles including climbing over walls/ladders/fences of many kinds, huge slides, rope net climbing, crawling under barbed wire through mud pits, and so much more.
i stayed with my friends the whole time and i didn’t really train, so we walked a good chunk of it. there were tons of hills, sandy trails, forest paths, and other terrain that made running even harder anyway. safe to say i definitely wasn’t running for time, just for fun!
it was so incredibly fun to push my body and just let myself have fun and go crazy. it was like an adult playground on steroids!! the endorphin rush was incredible and even a few days later i still feel so excited and proud just talking about it. i am 100% doing this again, and would recommend to anyone else who’s thinking about it!
2 notes
·
View notes