Regex to split string by space and number using preg_split in PHP?

363 views Asked by At

I need to split a string by number and by spaces but not sure the regex for that. My code is:

$array = preg_split('/[0-9].\s/', $content);

The value of $content is:

Weight 229.6104534866 g
Energy 374.79170898476 kcal
Total lipid (fat) 22.163422468932 g
Carbohydrate, by difference 13.641848209743 g
Sugars, total 4.3691034101428 g
Protein 29.256342349938 g
Sodium, Na 468.99386390008 mg 

Which gives the result:

Array ( [0] => Weight 229.61045348 [1] => g
Energy 374.791708984 [2] => kcal
Total lipid (fat) 22.1634224689 [3] => g
Carbohydrate, by difference 13.6418482097 [4] => g
Sugars, total 4.36910341014 [5] => g
Protein 29.2563423499 [6] => g
Sodium, Na 468.993863900 [7] => mg
) 1

I need to split the text from the number but not sure how, so that:

[0] => Weight
[1] => 229.60145348
[2] => g

and so on...

I also need it to ignore the commas, brackets and spaces where the label is. When using explode I found that 'Total lipid (fat)' instead of being one value separated into 3 values, not sure how to fix that with regex.

When using explode() I get:

[0] => Total
[1] => lipid
[2] => (fat)

but I need those values as one for a label, any way to ignore that?

Any help is very appreciated!

3

There are 3 answers

1
Chris Tannetta On

Thanks to everyone for the help. I found that by adding a double space in between all values then setting the explode parameter to the double space it ignored what I needed.

0
Jan On

Instead of splitting, you might very well match and capture the required parts, e.g. with the following pattern:

^(?P<category>\D+)\s+(?P<value>[\d.]+)\s+(?P<unit>.+)

See a demo on regex101.com.


In PHP this could be

<?php

$data = 'Weight 229.6104534866 g
Energy 374.79170898476 kcal
Total lipid (fat) 22.163422468932 g
Carbohydrate, by difference 13.641848209743 g
Sugars, total 4.3691034101428 g
Protein 29.256342349938 g
Sodium, Na 468.99386390008 mg ';

$pattern = '~^(?P<category>\D+)\s+(?P<value>[\d.]+)\s+(?P<unit>.+)~m';

preg_match_all($pattern, $data, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
print_r($matches);
?>

See a demo on ideone.com.

0
mickmackusa On

As an alternative to using a preg_ functions, sscanf() allows the decimal value to be explicitly typed as a float (if that is valuable).

Unfortunately due to the greedy nature of sscanf(), the space between the label and the float value will still be attached to the label string. If this is a problem, the label value will need to be rtrim()ed.

Code: (Demo)

// $contentLines = file('path/to/content.txt');
$contentLines = [
    'Weight 229.6104534866 g',
    'Energy 374.79170898476 kcal',
    'Total lipid (fat) 22.163422468932 g',
    'Carbohydrate, by difference 13.641848209743 g',
    'Sugars, total 4.3691034101428 g',
    'Protein 29.256342349938 g',
    'Sodium, Na 468.99386390008 mg',
];

var_export(
    array_map(
        fn($line) => sscanf(
            $line,
            '%[^0-9]%f%s',
        ),
        $contentLines
    )
);