python - numpy recarray from CSV dtype has many columns but shape says just one row, why is that? -


my csv has mix of strings , numeric columns. nump.recfromcsv accurately inferred them (woo-hoo) giving dtype of

dtype=[('null', 's7'), ('00', '<f8'), ('nsubj', 's20'), ('g', 's1'), ...

so mix of strings , numbers can see. numpy.shape(csv) gives me

(133433,)

which confuses me, since dtype implied column aware. furthermore accesses intuitively:

csv[1] > ('def', 0.0, 'prep_to', 'g', 'query_w', 'indef', 0.0, ... 

i error

cannot perform reduce flexible type

on operations .all(), when using numeric column. i'm not sure whether i'm working table-like entity (two dimensions) or 1 list of something. why dtype inconsistent shape?

a recarray array of records. each record can have multiple fields. record sort of struct in c.

if shape of recarray (133433,) recarray 1-dimensional array of records.

the fields of recarray may accessed name-based indexing. example, csv['nsub'] , equivalent to

np.array([record['nsub'] record in csv]) 

this special name-based indexing supports illusion 1-dimensional recarray 2-dimensional array -- csv[intval] selects rows, csv[fieldname] selects "columns". however, under hood , strictly speaking if shape (133433,) 1-dimensional.

note not recarrays 1-dimensional. possible have higher-dimensional recarray,

in [142]: arr = np.zeros((3,2), dtype=[('foo', 'int'), ('bar', 'float')])  in [143]: arr out[143]:  array([[(0, 0.0), (0, 0.0)],        [(0, 0.0), (0, 0.0)],        [(0, 0.0), (0, 0.0)]],        dtype=[('foo', '<i8'), ('bar', '<f8')])  in [144]: arr.shape out[144]: (3, 2) 

this 2-dimensional array, elements records.

here bar field values in arr[:, 0] slice:

in [148]: arr[:, 0]['bar'] out[148]: array([ 0.,  0.,  0.]) 

here bar field values in 2d array:

in [151]: arr['bar'] out[151]:  array([[ 0.,  0.],        [ 0.,  0.],        [ 0.,  0.]])  in [160]: arr['bar'].all() out[160]: false 

note alternative using recarrays pandas dataframes. there lot more methods available manipulating dataframes recarrays. might find more convenient.


Comments

Popular posts from this blog

Load Balancing in Bluemix using custom domain and DNS SRV records -

oracle - pls-00402 alias required in select list of cursor to avoid duplicate column names -

python - Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] error -