Extension Arrays¶
Lance provides extensions for Arrow arrays and Pandas Series to represent data types for machine learning applications.
BFloat16¶
BFloat16 is a 16-bit floating point number that is designed for machine learning use cases. Intuitively, it only has 2-3 digits of precision, but it has the same range as a 32-bit float: ~1e-38 to ~1e38. By comparison, a 16-bit float has a range of ~5.96e-8 to 65504.
Lance provides an Arrow extension array (lance.arrow.BFloat16Array
)
and a Pandas extension array (lance.pandas.BFloat16Dtype
) for BFloat16.
These are compatible with the ml_dtypes
bfloat16 NumPy extension array.
If you are using Pandas, you can use the lance.bfloat16 dtype string to create the array:
import pandas as pd
import lance.arrow
series = pd.Series([1.1, 2.1, 3.4], dtype="lance.bfloat16")
series
0 1.1015625
1 2.09375
2 3.40625
dtype: lance.bfloat16
To create an an arrow array, use the lance.arrow.bfloat16_array()
function:
from lance.arrow import bfloat16_array
array = bfloat16_array([1.1, 2.1, 3.4])
array
<lance.arrow.BFloat16Array object at 0x.+>
[1.1015625, 2.09375, 3.40625]
Finally, if you have a pre-existing NumPy array, you can convert it into either:
import numpy as np
from ml_dtypes import bfloat16
from lance.arrow import PandasBFloat16Array, BFloat16Array
np_array = np.array([1.1, 2.1, 3.4], dtype=bfloat16)
PandasBFloat16Array.from_numpy(np_array)
BFloat16Array.from_numpy(np_array)
<PandasBFloat16Array>
[1.1015625, 2.09375, 3.40625]
Length: 3, dtype: lance.bfloat16
<lance.arrow.BFloat16Array object at 0x.+>
[1.1015625, 2.09375, 3.40625]
When reading, these can be converted back to to the NumPy bfloat16 dtype using
each array class’s to_numpy
method.