Parsing bytes of a binary file in PHP and translate groups into a placeholder

139 views Asked by At

I could use some advice - I'm parsing a binary file in php, to be specific, it's a Sega Genesis rom-file. According to the table I have made, certain bytes correspond to characters or control different stuff with the game's text-engine.

There are bytes, which are used for characters as well as "controller"-bytes, for line-breaks, conditions, color and a bunch of other stuff, so a typical sentence will probably look like this:

FC 03 E7 05 D3 42 79 20 64 6F 69 6E 67 20 73 6F 2C BC BE 08 79 6F 75 20 6A 75 73 74 20 61 63 71 75 69 72 65 64 BC BE 04 61 20 74 65 73 74 61 6D 65 6E 74 20 74 6F 20 79 6F 75 72 BC 73 74 61 74 75 73 20 61 73 20 61 20 77 61 72 72 69 6F 72 21 BD BC

which I can translate to:

<FC><03><E7><05><D3>By doing so,<NL><BE><08>you just acquired<NL><BE><04>a testament to your<NL>status as a warrior!<CURSOR>

I want to specify properties for such a controller-byte-string such as length and write my own values to certain positions..

See, bytes that translate into characters (00 to 7F) or line-breaks (BC) only consist of a single byte while others consist of 2 (BE XX). Conditions (FC) even consist of 5 bytes: FC XX YY (where X and Y refer to offsets which I need to calculate while I put my translated strings together)

I want my parser to recognize such bytes and let me write XX YY dynamicly. Using strtr I can only replace "groups" e.g. when I put the static bytestring into an array.

How would you do this while keeping the parser flexible? Thanks!

2

There are 2 answers

0
degant On BEST ANSWER

Assuming you have your hex values available as string, you can use this regex to parse it like you've mentioned. If you identify more rules other than FC**** or BE** then you can directly add them to the below regex so that they are also extracted.

(?<fc>FC(\w\w){4})|(?<be>BE(\w\w))|(?<any>(\w\w))

Now using named groups fc, be, any to identify result set easily using arrays such as $matches['fc'].

Regex Demo: https://regex101.com/r/kR9kdP/5

$re = '/(?<fc>FC(\w\w){4})|(?P<be>BE(\w\w))|(?P<any>(\w\w))/';
$str = 'FC03E705D3FC0006042842616D20626162612062';

preg_match_all($re, $str, $matches, PREG_PATTERN_ORDER, 0);

// Print the entire match result
print_r(array_filter($matches['fc']));  // Returns an array with all FC****
print_r(array_filter($matches['be']));  // Returns an array with all BE**
print_r(array_filter($matches['any'])); // Returns rest **

PHP Demo: http://ideone.com/qWUaob

Sample Results:

Array
(
    [0] => FC03E705D3
    [1] => FC00060428
)
Array
(
    [50] => BE08
    [59] => BE04
    [113] => BE08
    [132] => BE04
)

Hope this helps!

2
Barmar On

You can put hex characters in a regexp by using \x##, where ## is the hex code for the character. So you can match FC XX YY with:

preg_match('/(?=\xfc).{4}/, $bytes, $match);

$match[0] will then contain the 4 bytes after FC. You could split them up into pairs with capture groups:

preg_match('/(?=\xfc)(..)(..)/, $bytes, $match);

$match[1] will contain XX and $match[2] will contain YY.