`csvsee.utils`¶

Shared utility functions for the csvsee library.

exception csvsee.utils.NoMatch¶: Exception raised when no column name matches a given expression.

class csvsee.utils.ProgressBar(end, prefix='', fill='=', units='secs', width=40)¶

An ASCII command-line progress bar with percentage.

Adapted from Corey Goldberg’s version: http://code.google.com/p/corey-projects/source/browse/trunk/python2/progress_bar.py

update(current)¶: Set the current progress.

csvsee.utils.boring_columns(csvfile)¶: Return a list of column names in csvfile that are “boring”–that is, the data in them is always the same.

csvsee.utils.column_names(csv_file)¶: Return a list of column names in the given .csv file.

csvsee.utils.filter_csv(csv_infile, csv_outfile, columns, match='regexp', action='include')¶

Filter csv_infile and write output to csv_outfile.

columns: A list of regular expressions or exact column names
match: regexp to treat each value in columns as a regular expression, exact to match exact literal column names
action: include to keep the specified columns, or exclude to keep all columns except the specified columns

csvsee.utils.float_or_0(value)¶

Try to convert value to a floating-point number. If conversion fails, return 0.

Examples:

>>> float_or_0(5)
5.0

>>> float_or_0('5')
5.0

>>> float_or_0('five')
0

csvsee.utils.grep_files(filenames, matches, dateformat='guess', resolution=60, show_progress=True)¶: Search all the given files for matching text, and return a list of (timestamp, counts) for each match, where timestamp is a datetime, and counts is a dictionary of {match: count}, counting the number of times each match was found during intervals of resolution seconds.

csvsee.utils.line_count(filename)¶: Return the total number of lines in the given file.

csvsee.utils.matching_fields(expr, fields)¶

Return all fields that match a regular expression expr, or raise a NoMatch exception if no matches are found.

Examples:

>>> matching_fields('a.*', ['apple', 'banana', 'avocado'])
['apple', 'avocado']

>>> matching_fields('a.*', ['peach', 'grape', 'kiwi'])
Traceback (most recent call last):
NoMatch: No matching column found for 'a.*'

csvsee.utils.matching_xy_fields(x_expr, y_exprs, fieldnames, verbose=False)¶

Match x_expr and y_exprs to all available column names in fieldnames, and return the matched x_column and y_columns.

Example:

>>> matching_xy_fields('x.*', ['y[12]', 'y[ab]'],
...     ['xxx', 'y1', 'y2', 'y3', 'ya', 'yb', 'yc'])
('xxx', ['y1', 'y2', 'ya', 'yb'])

If x_expr is empty, the first column name is used:

>>> matching_xy_fields('', ['y[12]', 'y[ab]'],
...     ['xxx', 'y1', 'y2', 'y3', 'ya', 'yb', 'yc'])
('xxx', ['y1', 'y2', 'ya', 'yb'])

If no match is found for any expression in y_exprs, a NoMatch exception is raised:

>>> matching_xy_fields('', ['y[12]', 'y[jk]'],
...     ['xxx', 'y1', 'y2', 'y3', 'ya', 'yb', 'yc'])
Traceback (most recent call last):
NoMatch: No matching column found for 'y[jk]'

csvsee.utils.read_xy_values(reader, x_column, y_columns, date_format='', gmt_offset=0, zero_time=False)¶

Read values from a csv.DictReader, and return (x_values, y_values). where x_values is a list of values found in x_column, and y_values is a dictionary of {y_column: [values]} for each column in y_columns.

Arguments:

x_column

Name of the column you want to use as the X axis.

y_columns

Names of columns you want to plot on the Y axis.

date_format

If given, treat values in x_column as timestamps with the given format string.

gmt_offset

Add this many hours to every timestamp. Only useful with date_format.

zero_time

If True, adjust timestamps so the earliest one starts at 00:00 (midnight). Only useful with date_format.

csvsee.utils.strip_prefix(strings)¶

Strip a common prefix from a sequence of strings. Return (prefix, [stripped]) where prefix is the string that is common (with leading and trailing whitespace removed), and [stripped] is all strings with the prefix removed.

Examples:

>>> strip_prefix(['first', 'fourth', 'fifth'])
('f', ['irst', 'ourth', 'ifth'])

>>> strip_prefix(['spam and eggs', 'spam and potatoes', 'spam and spam'])
('spam and', ['eggs', 'potatoes', 'spam'])

csvsee.utils.top_by(func, count, y_columns, y_values, drop=0)¶

Apply func to each column, and return the top count column names. Arguments:

func: A function that takes a list of values and returns a single value. max, min, and average are good examples.
count: How many of the “top” values to keep
y_columns: A list of candidate column names. All of these must exist as keys in y_values
y_values: Dictionary of {column: values} for each y-column. Must have data for each column in y_columns (any extra column data will be ignored).
drop: How many top values to skip before returning the next count top columns

csvsee.utils.top_by_average(count, y_columns, y_values, drop=0)¶: Determine the top count columns based on the average of values in y_values, and return the filtered y_columns names.

csvsee.utils.top_by_peak(count, y_columns, y_values, drop=0)¶: Determine the top count columns based on the peak value in y_values, and return the filtered y_columns names.

`csvsee.utils`¶

Project Versions

Previous topic

Next topic

This Page

Navigation

csvsee.utils¶

Project Versions

RTD Search

Previous topic

Next topic

This Page

Quick search

Navigation

`csvsee.utils`¶