I want to change this piece of code where indicated to be more dynamic and specific. I would like to use the first row information in each column as a header that substitutes 'numAtts'. That way, the first row would also not be included in the data underneath the @data.

Here is my code:

# -*- coding: UTF-8 -*-

import logging

from optparse import OptionParser

import sys

def main():

LEVELS = {'debug': logging.DEBUG,

'info': logging.INFO,

'warning': logging.WARNING,

'error': logging.ERROR,

'critical': logging.CRITICAL}

usage = "usage: arff automate [options]\n ."

parser = OptionParser(usage=usage, version="%prog 1.0")

#Defining options

parser.add_option("-l", "--log", dest="level_name", default="info", help="choose the logging level: debug, info, warning, error, critical")

#Parsing arguments

(options, args) = parser.parse_args()

#Mandatory arguments

if len(args) != 1:

parser.error("incorrect number of arguments")

inputPath = args[0]

# Start program ------------------

with open(inputPath, "r") as f:

strip = str.strip

split = str.split

data = [split(strip (line)) for line in f]

###############################################################

## modify here##

numAtts = len(data[0])

logging.info(" Number of attributes : "+str(numAtts) )

print "@RELATION relationData"

print ""

for e in range(numAtts):

print "@ATTRIBUTE att{0} NUMERIC".format(e)

###############################################################

classSet = set()

for e in data:

classSet.add(e[-1])

print "@ATTRIBUTE class {%s}" % (",".join(classSet))

print ""

print "@DATA"

for item in data:

print ",".join(item[0:])

if __name__ == "__main__":

main()

The input file is like this (tab-separated):

F1 F2 F3 F4 F5 F6 STRING

7209 3004 15302 5203 2 1 EXAMPLEA

6417 3984 16445 5546 15 1 EXAMPLEB

8822 3973 23712 7517 18 0 EXPAMPLEC

The output file (actual) is like this:

@RELATION relationData

@ATTRIBUTE att0 NUMERIC

@ATTRIBUTE att1 NUMERIC

@ATTRIBUTE att2 NUMERIC

@ATTRIBUTE att3 NUMERIC

@ATTRIBUTE att4 NUMERIC

@ATTRIBUTE att5 NUMERIC

@ATTRIBUTE att6 NUMERIC

@ATTRIBUTE class {EXAMPLEB,STRING,EXPAMPLEC,EXAMPLEA}

@DATA

F1,F2,F3,F4,F5,{0,1},STRING

7209,3004,15302,5203,2,1,EXAMPLEA

6417,3984,16445,5546,15,1,EXAMPLEB

8822,3973,23712,7517,18,0,EXPAMPLEC

The desired output file is like this:

@RELATION relationData

@attribute 'att[F1]' numeric

@attribute 'att[F2]' numeric

@attribute 'att[F3]' numeric

@attribute 'att[F4]' numeric

@attribute 'att[F5]' numeric

@attribute 'att[F6]' {0,1}

@attribute 'class' STRING

@data

7209,3004,15302,5203,2,1,EXAMPLEA

6417,3984,16445,5546,15,1,EXAMPLEB

8822,3973,23712,7517,18,1,EXPAMPLEC

So, as you see my code is almost there, but I am unable / unsure how to mark the first row as a variable that is used for the header and start processing the data with row 2.

Thus, my question is: How can I format the output to use the 1st row as a header?

Does anyone have any insight? Thanks!

解决方案

You are not exactly formatting your desired title to output. Here

for e in range(numAtts):

print "@ATTRIBUTE att{0} NUMERIC".format(e)

you are merely formatting value of e to output. You need to access the data[0] here.

for e in range(numAtts):

print "@ATTRIBUTE att'[{0}]'' NUMERIC".format(dataa[0][e] )

And later for usage part you can exploit range/xrange to skip 0th index.

for e in range(1, numAtts):

print ",".join(data[e][0:])

Also I would suggest there is no need to store str methods in variables you can use method chaining to get desired value.

Instead of this:

data = [split(strip (line)) for line in f]

use this:

data = [line.strip().split() for line in f]

*********** Edited to include this option ***********

next also permits the skipping of the first row, beginning the data segment, therefore with the second.

next(iter(data))

for item in data[1:]:

print ",".join(item[0:])

Logo

汇聚全球AI编程工具,助力开发者即刻编程。

更多推荐