python - numpy recarray from CSV dtype has many columns but shape says just one row, why is that? -


my csv has mix of strings , numeric columns. nump.recfromcsv accurately inferred them (woo-hoo) giving dtype of

dtype=[('null', 's7'), ('00', '<f8'), ('nsubj', 's20'), ('g', 's1'), ...

so mix of strings , numbers can see. numpy.shape(csv) gives me

(133433,)

which confuses me, since dtype implied column aware. furthermore accesses intuitively:

csv[1] > ('def', 0.0, 'prep_to', 'g', 'query_w', 'indef', 0.0, ... 

i error

cannot perform reduce flexible type

on operations .all(), when using numeric column. i'm not sure whether i'm working table-like entity (two dimensions) or 1 list of something. why dtype inconsistent shape?

a recarray array of records. each record can have multiple fields. record sort of struct in c.

if shape of recarray (133433,) recarray 1-dimensional array of records.

the fields of recarray may accessed name-based indexing. example, csv['nsub'] , equivalent to

np.array([record['nsub'] record in csv]) 

this special name-based indexing supports illusion 1-dimensional recarray 2-dimensional array -- csv[intval] selects rows, csv[fieldname] selects "columns". however, under hood , strictly speaking if shape (133433,) 1-dimensional.

note not recarrays 1-dimensional. possible have higher-dimensional recarray,

in [142]: arr = np.zeros((3,2), dtype=[('foo', 'int'), ('bar', 'float')])  in [143]: arr out[143]:  array([[(0, 0.0), (0, 0.0)],        [(0, 0.0), (0, 0.0)],        [(0, 0.0), (0, 0.0)]],        dtype=[('foo', '<i8'), ('bar', '<f8')])  in [144]: arr.shape out[144]: (3, 2) 

this 2-dimensional array, elements records.

here bar field values in arr[:, 0] slice:

in [148]: arr[:, 0]['bar'] out[148]: array([ 0.,  0.,  0.]) 

here bar field values in 2d array:

in [151]: arr['bar'] out[151]:  array([[ 0.,  0.],        [ 0.,  0.],        [ 0.,  0.]])  in [160]: arr['bar'].all() out[160]: false 

note alternative using recarrays pandas dataframes. there lot more methods available manipulating dataframes recarrays. might find more convenient.


Comments

Popular posts from this blog

javascript - How to get current YouTube IDs via iMacros? -

c# - Maintaining a program folder in program files out of date? -

emulation - Android map show my location didn't work -