crandas.ctypes#
- exception crandas.ctypes.ColumnBoundDerivedWarning#
Bases:
Warning
- class crandas.ctypes.Ctype#
Bases:
object
Ctypes, or “crandas types”, are an extensible client-side type system that allow the user to provide additional type information beyond pandas/numpy dtypes.
Ctypes are represented as class instances, e.g. NullableInteger(). Some classes take arguments in their initialization, like Varchar(max_length=12). Each Ctype also has a string representation, like “varchar[12]”. Either of these can be specified to the ctype kwarg of cd.DataFrame, so e.g. >>> cd.DataFrame({“ints”: [1, 2, 3], “strings”: [“a”, “bb”, “ccc”]},
ctype={“ints”: NullableInteger(), “strings”: “varchar[5]”})
If a manual ctype is not specified, the appropriate ctype is automatically deduced using the pandas dtype. For details of how this is implemented, see the Ctype.for_series() classmethod.
Internal workings#
Each class (so e.g. Integer) has CtypeBase as a base class, and is decorated with @Ctype.register, which registers the Ctype’s .dtype, .ctype properties so that the Ctype class may perform automatic ctype inference on pandas.Series objects.
- classmethod for_series(series, ctype_spec=None)#
Determine the Ctype for a pandas.Series object, based on the specified ctype_str, the series.dtype, the ctype_cls.from_series() function, or the value_type (i.e. the type of next(iter(series))), in that order.
- classmethod from_spec(ctype_spec)#
Determine the Ctype based on a specification, that is a ctype object (i.e. an instance of a subclass of CtypeBase), a string, or a Python type
- class crandas.ctypes.CtypeBase#
Bases:
object
- ctype: str
name of the ctype; corresponds to the API type communicated to the server
- dtype: str
the ctype corresponds to this pandas dtype
- args: List[str]
names of positional arguments (that are interpreted to be of type int)
- kwargs: List[str]
names of keyword arguments
- value_types: List[object]
this Ctype applies to values of these types (i.e. isinstance(value, obj) where obj in value_types)
- crandas.ctypes.column_crandas_to_pandas(col, elements, modulus, not_null)#
Converts crandas JSON column to a set of values could be used in the pandas DataFrame constructor. Takes the column, the unmasked element values and the modulus used for this column.
- Parameters:
col ((JSON-serializable) object) – crandas JSON column
elements (numpy array) – unmasked element values in [0,modulus)
modulus (int) – modulus for the values
not_null (numpy array of bits, or None) – indicator bits, if nullable
- Returns:
values found in col
- Return type:
Set of values (int/str/bit)
- Raises:
RuntimeError – Only works for columns of type int, str or bits
- crandas.ctypes.column_pandas_to_crandas(series, ctype_spec=None, auto_bounds=False)#
Convert pandas column to JSON representation for use in “new” command. This function does not perform masking and instead sets col[“elements”] to an iterable of integers
- Parameters:
series (pd.Series) – pandas column
auto_bounds (bool, default: False) – if given, do not warn about automatically derived column bounds
- Returns:
if the column is nullable, a 2-tuple is returned where the second item is a column that should be uploaded as the not_null store
- Return type:
1-tuple or 2-tuple of (JSON-serializable) dicts
- Raises:
TypeError – Column type is not supported by crandas
- crandas.ctypes.derive_int_bounds(series, spec_min_value, spec_max_value)#
Derive int bounds from series and max/min specification
If specified maximum and/or minumum is given, this range is used, and it is verified that the values in the series comply with the range.
Otherwise, an integer type is derived according to the following order of preference: uint8, int8, uint16, int16, uint24, int24, uint32, int32. These respective datatypes have ranges [0,255], [-127,127], etc. Note that, for signed types, -2**(bit_length-1) is not included, e.g., int8 does not contain -128. This is done so that the product of two int8 fits in an int16, etc, and in particular, two int16 can be multiplied with each other while still fitting in a int32 and thus without performing field conversion.
In case an integer type is derived from data, a ColumnBoundDerivedWarning is given.
- crandas.ctypes.encode_integers64(elements)#
Convert a set of values from the range (-session.modulus/2,session.modulus/2) to a np.array of type int64